arxiv:2112.00713v1 [math.na] 1 dec 2021

hIPPYlib-MUQ: A Bayesian Inference Software Frameworkfor Integration of Data with Complex Predictive Modelsunder UncertaintyKI-TAE KIM, University of California, Merced, USAUMBERTO VILLA,Washington University in St. Louis, USAMATTHEW PARNO, Dartmouth College, USAYOUSSEF MARZOUK,Massachusetts Institute of Technology, USAOMAR GHATTAS, The University of Texas at Austin, USANOEMI PETRA, University of California, Merced, USA

Bayesian inference provides a systematic framework for integration of data with mathematical models toquantify the uncertainty in the solution of the inverse problem. However, solution of Bayesian inverse problemsgoverned by complex forward models described by partial differential equations (PDEs) remains prohibitivewith black-box Markov chain Monte Carlo (MCMC) methods. We present hIPPYlib-MUQ, an extensibleand scalable software framework that contains implementations of state-of-the art algorithms aimed toovercome the challenges of high-dimensional, PDE-constrained Bayesian inverse problems. These algorithmsaccelerate MCMC sampling by exploiting the geometry and intrinsic low-dimensionality of parameter spacevia derivative information and low rank approximation. The software integrates two complementary open-source software packages, hIPPYlib and MUQ. hIPPYlib solves PDE-constrained inverse problems usingautomatically-generated adjoint-based derivatives, but it lacks full Bayesian capabilities. MUQ provides aspectrum of powerful Bayesian inversion models and algorithms, but expects forward models to come equippedwith gradients and Hessians to permit large-scale solution. By combining these two complementary libraries,we created a robust, scalable, and efficient software framework that realizes the benefits of each and allows usto tackle complex large-scale Bayesian inverse problems across a broad spectrum of scientific and engineeringdisciplines. To illustrate the capabilities of hIPPYlib-MUQ, we present a comparison of a number of MCMCmethods available in the integrated software on several high-dimensional Bayesian inverse problems. Theseinclude problems characterized by both linear and nonlinear PDEs, low and high levels of data noise, anddifferent parameter dimensions. The results demonstrate that large (∼ 50×) speedups over conventional blackbox and gradient-based MCMC algorithms can be obtained by exploiting Hessian information (from thelog-posterior), underscoring the power of the integrated hIPPYlib-MUQ framework.

CCS Concepts: • Mathematics of computing→ Bayesian computation; Mathematical optimization;Partial differential equations; Computations on matrices; Discretization; Solvers; • Computing method-ologies→ Uncertainty quantification; • Applied computing→ Physical sciences and engineering.

Authors’ addresses: Ki-Tae Kim, University of California, Merced, Applied Mathematics, School of Natural Sciences, Merced,CA, USA, [email protected]; Umberto Villa, Washington University in St. Louis, Electrical & Systems Engineering, St.Louis, MO, USA, [email protected]; Matthew Parno, Dartmouth College, Department of Mathematics, Hanover, NH, USA,[email protected]; Youssef Marzouk, Massachusetts Institute of Technology, Department of Aeronauticsand Astronautics, Boston, MA, USA, [email protected]; Omar Ghattas, The University of Texas at Austin, Oden Institutefor Computational Engineering & Sciences, Department of Mechanical Engineering, Department of Geological Sciences,Austin, TX, USA, [email protected]; Noemi Petra, University of California, Merced, Applied Mathematics, School ofNatural Sciences, Merced, CA, USA, [email protected].

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without feeprovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice andthe full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee. Request permissions from [email protected].© 2021 Association for Computing Machinery.0098-3500/2021/12-ART $15.00https://doi.org/10.1145/nnnnnnn.nnnnnnn

ACM Trans. Math. Softw., Vol. 1, No. 1, Article . Publication date: December 2021.

arX

iv:2

112.

0071

3v1

[m

ath.

NA

] 1

Dec

202

1

HTTPS://ORCID.ORG/TBD

HTTPS://ORCID.ORG/0000-0002-5142-2559

HTTPS://ORCID.ORG/0000-0002-9419-2693

HTTPS://ORCID.ORG/**

HTTPS://ORCID.ORG/

HTTPS://ORCID.ORG/0000-0002-9491-0034

https://orcid.org/TBD

https://orcid.org/0000-0002-5142-2559

https://orcid.org/0000-0002-9419-2693

https://orcid.org/**

https://orcid.org/

https://orcid.org/0000-0002-9491-0034

https://doi.org/10.1145/nnnnnnn.nnnnnnn

2 Ki-Tae Kim, Umberto Villa, Matthew Parno, Youssef Marzouk, Omar Ghattas, and Noemi Petra

Additional Key Words and Phrases: Infinite-dimensional inverse problems, adjoint-based methods, inexactNewton-CG method, low-rank approximation, Bayesian inference, uncertainty quantification, sampling,generic PDE toolkit

ACM Reference Format:Ki-Tae Kim, Umberto Villa, Matthew Parno, Youssef Marzouk, Omar Ghattas, and Noemi Petra. 2021. hIPPYlib-MUQ: A Bayesian Inference Software Framework for Integration of Data with Complex PredictiveModels underUncertainty.ACMTrans. Math. Softw. 1, 1 (December 2021), 32 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTIONWith rapid explosion of observational and experimental data, a prominent challenge is how toderive knowledge and insight from this data to make better predictions and high-consequencedecisions. This question arises in all areas of science, engineering, technology, and medicine, andin many cases, there are mathematical models available that represent the underlying physicalsystems of which the data is observed or measured. These models are often subject to considerableuncertainties stemming from unknown or uncertain input model parameters (e.g., coefficientfields, constitutive laws, source terms, geometries, initial and/or boundary conditions) as well asfrom noisy and limited observations. The goal is to infer these unknown model parameters fromobservations of model outputs through corresponding partial differential equation (PDE) models,and to quantify the uncertainty in the solution of such inverse problems.

Bayesian inversion provides a systematic framework for integration of data with complex PDE-based models to quantify uncertainties in model parameter inference [34, 59]. In the Bayesianframework, noisy data and, possibly uncertain, mathematical models are integrated together witha prior information, yielding a posterior probability distribution of the model parameters. TheMarkov chain Monte Carlo (MCMC) method is a common way to explore the posterior distributionby use of sampling techniques. However, Bayesian inversion with complex forward models viaconventional MCMC methods faces several computational challenges. First, characterizing theposterior distribution of the model parameters or subsequent predictions often requires repeatedevaluations of expensive-to-solve large-scale PDE models. Second, the posterior distribution oftenhas a complex structure stemming from the nonlinear mapping from model parameter to observedquantities. Third, the parameters often are fields, which, after discretization, lead to very high-dimensional posteriors. These difficulties make the solution of Bayesian inverse problems withcomplex large-scale PDE forward models computationally intractable.

Extensive research efforts have been devoted to overcome the prohibitiveness of Bayesian inverseproblems governed by large-scale PDEs. With rapid progress in high-performance computing,and advances in scalable PDE solvers, repeated evaluations of forward PDE models for differentinput parameters [5, 61] are becoming tractable. Furthermore, structure-exploiting MCMC methodshave effectively facilitated the exploration of complex posterior distributions [9, 13, 18, 48]. Finally,dimension reduction methods have proved to significantly reduce the computational cost of MCMCsimulations [20, 69]. Applying and combining these advanced techniques can be extremely chal-lenging. Therefore, a computational tool that will assist the computational and scientific communityto apply, extend and tailor these methods will be very beneficial.In this paper, we present a software framework to tackle large-scale Bayesian inverse prob-

lems with PDE-based forward models, which has applications across a wide range of science andengineering fields. The software integrates two open-source software packages, an Inverse Prob-lems Python library (hIPPYlib) [65] and the MIT Uncertainty Quantification Library (MUQ) [46],respecting their attractive complementary capabilities.hIPPYlib is an extensible software framework for the solution of deterministic and linearized

Bayesian inverse problems constrained by complex PDE models. Based on FEniCS [37] for the


https://doi.org/10.1145/nnnnnnn.nnnnnnn

hIPPYlib-MUQ: Integrating Data with Complex Predictive Models 3

finite element approximation of PDEs and on PETSc [6] for high-performance linear algebraoperations and solvers, hIPPYlib allows users to describe (and solve) the underlying PDE-basedforward model (required by the inverse problem solver) in a relatively straightforward way. hIPPYlibcontains also implementations of efficient numerical methods for the solution of deterministicand linearized Bayesian inverse problems. These include globalized inexact Newton-conjugategradient [1, 10], adjoint-based computation of gradients and Hessian actions [62], randomizedlinear algebra [30], and scalable sampling from large-scale Gaussian fields. The state-of-the-artalgorithms implemented in hIPPYlib deliver the solution of the linearized Bayesian inverse problemat a cost that is independent of the parameter dimension. hIPPYlib is, however, mainly designedfor the deterministic and linearized Bayesian inverse problems, and lacks full Bayesian inversioncapabilities.MUQ complements hIPPYlib’s capabilities with more support for the formulation and solution

of Bayesian inference problems. MUQ is a modular software framework designed to addressuncertainty quantification problems involving complex models. The software provides an abstractmodeling interface for combining physical (e.g., PDEs) and statistical components (e.g., additive errormodels, Gaussian process priors, etc.) to define Bayesian posterior distributions in a flexible andsemi-intrusive way. MUQ also contains a suite of powerful uncertainty quantification algorithmsincluding Markov chain Monte Carlo (MCMC) methods [47], transport maps [40], likelihood-informed subspaces, sparse adaptive generalized polynomial chaos (gPC) expansions [17], Karhunen-Loéve expansions, Gaussian process modeling [31, 51], and prediction methods enabling globalsensitivity analysis and optimal experimental design. To effectively apply these tools to Bayesianinverse problems, however, MUQ needs to be equipped with the type of gradient and/or Hessianinformation that hIPPYlib can provide.

By interfacing these two software libraries, we aim to create a robust, scalable, efficient, flexible,and easy-to-use software framework that overcomes the computational challenges inherent in com-plex large-scale Bayesian inverse problems. Representative features of the software are summarizedas follows:• The software combines the benefits of the two packages, hIPPYlib andMUQ, to enable scalablesolution of Bayesian inverse problems governed by large-scale PDEs.• Various advanced MCMC methods are available that can exploit problem structure (e.g., thederivative/Hessian information of the log-posterior).• The software makes use of sparsity, low-dimensionality, and geometric structure of thelog-posterior to achieve scalable and efficient MCMC methods.• Convergence diagnostics are implemented to assess the quality of MCMC samples.

In the following sections, we first briefly review the Bayesian formulation of inverse problemsgoverned by PDEs both in infinite-dimensional and in finite-dimensional spaces (Section 2). We thendescribe MCMC methods used to characterize the posterior (Section 3) and summarize convergencediagnostics available in the software (Section 3.2). Next, we present the design of hIPPYlib-MUQ(Section 4). Finally, we present numerous benchmark problems and step-by-step implementationguide to illustrate the key aspect of the present software (Section 5). Section 6 provides concludingremarks.

2 THE BAYESIAN INFERENCE FRAMEWORKIn this section, we present a brief discussion of the Bayesian inference approach to solve inverseproblems governed by PDEs. We begin by providing an overview of the framework for infinite-dimensional Bayesian inverse problems following [14, 58, 65]. Then we present a brief discussionof the finite-dimensional approximations of the prior and the posterior distributions; a lengthier



discussion can be found in [14]. Finally, we present the Laplace approximation to the posterior distri-bution, which requires the solution of a PDE-constrained optimization problem for the computationof the maximum a posteriori (MAP).

2.1 Infinite-dimensional Bayesian inverse problemsThe objective of the inverse problem is to determine an unknown input parameter field𝑚 thatwould give rise to given observational (or experimental) data 𝒅 by means of a (physics-based)mathematical model. In other words, given 𝒅 ∈ R𝑞 , we seek to infer 𝑚 ∈ M (here, M is aninfinite-dimensional Hilbert space of functions defined on a domain D ⊂ R𝑑 ) such that

𝒅 ≈ F (𝑚), (1)

where F :M → R𝑞 is the parameter-to-observable map that predicts observations from a givenparameter𝑚 through a forward mathematical model. Note that the evaluation of this map involvessolving the forward PDE model given𝑚, followed by extracting the observations from the solutionof the forward problem.In the Bayesian approach, the inverse problem is framed as a statistical inference problem.

The uncertain parameter𝑚 and the observational data 𝒅 are deemed as random variables and thesolution is a conditional probability distribution that represents level of confidence in the estimationof the parameter given the data. The approach combines a prior model reflecting our knowledge ofthe parameter before the data is acquired, and a likelihood model measuring how likely an inputparameter field would result in the data.

Using the Radon-Nikodym derivative [66] of the posterior measure `post with respect to the priormeasure `prior, Bayes’ theorem in infinite dimensions is stated as

𝑑`post

𝑑`prior∝ 𝜋like (𝒅 |𝑚), (2)

where 𝜋like denotes the likelihood function. For detailed conditions under which the posteriormeasure is well defined, we refer the reader to Stuart [58].

For the construction of the likelihood function, we restrict our attention to additive noise models.Noise may stem from measurement uncertainties and/or modeling errors. In this work, we assumethat the noise is mutually independent of the parameter𝑚, and can be modeled as a Gaussianrandom variable 𝜼 ∈ R𝑞 with zero mean and covariance matrix 𝚪noise ∈ R𝑞×𝑞 , i.e.,

𝒅 = F (𝑚) + 𝜼; 𝜼 ∼ N(0, 𝚪noise) . (3)

This allows us to express the probability density function of the likelihood as

𝜋like (𝒅 |𝑚) ∝ exp{− Φ(𝑚)

}, (4)

where Φ(𝑚) = 12 ∥F (𝑚) − 𝒅∥

2𝚪−1noise

is referred to as the negative log-likelihood.We take the prior to be a Gaussian measure, i.e., `prior = N

(𝑚pr, Cprior

), and assume that samples

from the prior distribution are square-integrable functions in the domain D, i.e. belong to 𝐿2 (D).The prior covariance operator Cprior is constructed to be a trace-class operator to guarantee boundedvariance of samples from the prior distribution and well-posedness of the Bayesian inverse problem;see Bui-Thanh et al. [14], Stuart [58], Villa et al. [65] for detailed explanation. Specifically, we takeCprior := A−𝑣 = (−𝛾Δ + 𝛿𝐼 )−𝑣 ; 𝑣 > 𝑑

2 , where 𝛾 and 𝛿 > 0 control the correlation length and thepointwise variance of the prior operator; see Lindgren et al. [35], Villa et al. [65].



2.2 Discretization of the Bayesian formulationHere, we briefly present the finite-dimensional approximation of the Bayesian formulation describedin the previous section; we refer the reader to Bui-Thanh et al. [14] for a lengthier discussion.We consider a finite-dimensional subspace Mℎ of M, defined by the span of a set of globallycontinuous basis functions

{𝜙 𝑗

}𝑛𝑗=1. For example, for the finite element method, these basis functions

are piecewise polynomial on each element of a mesh discretization of the domain D [8, 57].The parameter field 𝑚 is then approximated as 𝑚 ≈ 𝑚ℎ =

∑𝑛𝑗=1𝑚 𝑗𝜙 𝑗 , and, in what follows,

𝒎 = (𝑚1, . . . ,𝑚𝑛)𝑇 ∈ R𝑛 denotes the vector of the finite element coefficients of𝑚ℎ .The finite-dimensional approximation of the prior measure `prior is now specified by the density

𝜋prior (𝒎) ∝ exp(− 1

2 ∥𝒎 −𝒎pr∥2𝚪−1prior

,

)(5)

where 𝒎pr ∈ R𝑛 and 𝚪prior ∈ R𝑛×𝑛 are the mean vector and the covariance matrix that arise upondiscretization of𝑚pr and Cprior, respectively. We refer the reader to Bui-Thanh et al. [14], Villa et al.[65] for the explicit expression of the prior covariance matrix 𝚪prior.

Then the Bayes’ theorem for the density of the finite-dimensional approximation of the posteriormeasure `post is given by

𝜋post (𝒎) := 𝜋post (𝒎 |𝒅) ∝ 𝜋like (𝒅 |𝒎)𝜋prior (𝒎). (6)

The finite-dimensional posterior probability density function can be expressed explicitly as

𝜋post (𝒎) ∝ exp(− 1

2 ∥F(𝒎) − 𝒅∥2𝚪−1noise− 1


,

)(7)

where F(𝒎) refers to the parameter-to-observable map obtained from the finite element discretiza-tion of the forward model.

2.3 The Laplace approximation of the posterior distributionIn general, the posterior probability distribution (7) is not Gaussian due to the nonlinearity of theparameter-to-observable map. In this section, we discuss the solution to the so-called linearizedBayesian inverse problem by use of the Laplace approximation. The Laplace approximation amountsto constructing a Gaussian distribution centered at the maximum a posteriori (MAP) point. TheMAP point represents the most probable value of the parameter vector over the posterior, i.e.,

𝒎MAP := argmin𝒎(− log𝜋post (𝒎)) = argmin

𝒎

12 ∥F(𝒎) − 𝒅∥

2𝚪−1noise+ 1


. (8)

The covariance matrix of the Laplace approximation is the inverse of the Hessian of the negativelog-posterior evaluated at the MAP point. Then under the Laplace approximation, the solution ofthe linearized Bayesian inverse problem is given by

𝜋post (𝒎) ∼ N (𝒎MAP, 𝚪post). (9)

with𝚪post = H−1 (𝒎MAP) =

(Hmisfit (𝒎MAP) + 𝚪−1

prior

)−1, (10)

where H(𝒎MAP) and Hmisfit (𝒎MAP) denote the Hessian matrices of, respectively, the negative log-posterior and the negative loglikelihood evaluated at the MAP point.

The quality of the Gaussian approximation of the posterior depends on the degree of nonlinearityin the parameter-to-observable map, the noise covariance matrix, and number of observations [14,24, 26, 33, 50, 56, 59, 60, 68]. When the parameter-to-observable map is linear and the additive noise



and prior models are both Gaussian, the Laplace approximation is exact. Even if the parameter-to-observable map is significantly nonlinear, the Laplace approximation is a crucial ingredient toachieve scalable, efficient, and accurate posterior sampling with MCMC methods, as we will discussin the following section.

Note that the Laplace approximation involves the Hessian of the negative log-likelihood (the datamisfit part of the Hessian), which can not be explicitly constructed when the parameter dimensionis large. However, the data typically provide only limited information about the parameter field, andthus the eigenspectrum of the Hessian matrix often decays very rapidly. We exploit this compactnature of the Hessian to overcome its prohibitive computational cost, and construct a low-rankapproximation of the data misfit Hessian matrix using a matrix-free method (such as the randomizedsubspace iteration [30]).Concretely, we construct a low-rank approximation of the data misfit Hessian, i.e., Hmisfit ≈

𝚪−1priorV𝑟𝚲𝑟V𝑇𝑟 𝚪

−1prior, where 𝚲𝑟 = diag(_1, . . . , _𝑟 ) ∈ R𝑟×𝑟 and V𝑟 = [𝒗1, . . . , 𝒗𝒓 ] ∈ R𝑛×𝑟 contain the 𝑟

largest eigenvalues and corresponding eigenvectors, respectively, of the generalized symmetriceigenvalue problems

Hmisfit𝒗𝑖 = _𝑖𝚪−1prior𝒗𝑖 ; 𝑖 = 1, . . . , 𝑛. (11)

Note that the eigenvectors 𝒗𝑖 are orthonormal with respect to 𝚪−1prior, that is 𝒗𝑇𝑖 𝚪

−1prior𝒗 𝑗 = 𝛿𝑖 𝑗 , where 𝛿𝑖 𝑗

is the Kronecker delta. With this row-rank approximation, using the Sherman-Morrison-Woodburyformula [28], we obtain, for the inverse of the Hessian in (10),

H−1 =(Hmisfit + 𝚪−1

prior

)−1= 𝚪prior − V𝑟D𝑟V𝑇𝑟 + O

(𝑛∑

𝑖=𝑟+1

_𝑖

1 + _𝑖

), (12)

where D𝑟 = diag(_1/(_1 + 1), . . . , _𝑟/(_𝑟 + 1)) ∈ R𝑟×𝑟 . We can see from the last remainder termin (12) that to obtain an accurate low-rank approximation of H−1, we must keep eigenvectorscorresponding to eigenvalues that are greater than 1. This approximation is used to efficientlyperform various operations related to the Hessian, for example, applying square-root inverse of theHessian to a vector, which is needed to draw samples from the Gaussian approximation discussedin this section; see Villa et al. [65] for details.

3 MCMC SAMPLINGAs mentioned above, when the parameter-to-observable map is nonlinear, the Laplace approxima-tion may be a poor approximation of the posterior. IN this case, one needs to apply a sampling-basedmethod to explore the full posterior. In this section, we focus on several advanced Markov chainMonte Carlo (MCMC) methods available in the present software. We outline the general structureof MCMCmethods with a brief discussion of their key features. We then present various diagnosticsto assess the convergence of MCMC simulations.

3.1 Markov chain Monte CarloMCMC provides a flexible framework for exploring the posterior distribution. It generates samplesfrom the posterior distribution that can be employed in Monte Carlo approximations of poste-rior expectations. For example, the posterior expectation of a quantity of interest G(𝑚) can beapproximated by ∫

G(𝑚) 𝑑`post ≈1𝑁

𝑁∑𝑖=1G (𝑚𝑖 ) , (13)

where each𝑚𝑖 ∼ `post is 𝑖𝑡ℎ is a sample of the posterior distribution.



MCMC techniques construct ergodic Markov chains where the posterior distribution is theunique stationary distribution of the chain [52]. Asymptotically, the states of the Markov chainare therefore exact samples of the posterior distribution and can be used in (13). Markov chainsare defined in terms of a transition kernel, which is a position dependent probability distribution𝐾 (·|𝒎𝑖 ) over state 𝒎𝑖+1 in the chain given the previous state 𝒎𝑖 , i.e. 𝒎𝑖+1 ∼ 𝐾 (·|𝒎𝑖 ). Note thatchains of finite length must be employed in practice and the statistical accuracy of the Monte Carloestimator is therefore highly dependent on the ability of the transition kernel to efficiently explorethe parameter space.

There are several frameworks for constructing transition kernels that are appropriate for MCMC,including the well known Metropolis-Hastings (MH) rule [32, 42], Gibbs sampler (e.g., [16]), anddelayed rejection (DR) [43]. MUQ provides implementations of these frameworks, as well as thegeneralized Metropolis-Hastings (gMH) kernel [15] and multilevel MCMC framework of [23].Most of these frameworks start by drawing samples from one or more proposal distributions𝑞1 (·|𝒎𝑖 ), . . . , 𝑞𝐾 (·|𝒎𝑖 ) that are easy to sample from (e.g., Gaussian) and then “correct” the proposedsamples to obtain exact, but correlated, posterior samples. In the MH and DR kernels, correctionstake the form of accepting or rejecting the proposed point. In the gMHkernel, the correction involvesanalytically sampling a finite state Markov chain over multiple proposed points. Intuitively, proposaldistributions that capture the shape of the posterior, either locally around𝑚𝑖 or globally over theparameter space, tend to require fewer “corrections” and yield more efficient algorithms.

Algorithm 1: Drawing a sample from the Metropolis-Hastings kernelInput: Current state 𝒎𝑖 , Posterior density 𝜋post (𝒎) , Proposal 𝑞 ( · |𝒎𝑖 ) .Output: Next state 𝒎𝑖+1

1

/* Computes acceptance probability of proposed sample 𝒎′ */

2 Function AcceptProb(𝒎𝑖 , 𝒎′):

3 𝛾 ← 𝜋post (𝒎′)𝜋post (𝒎𝑖 )

𝑞 (𝒎𝑖 |𝒎′)𝑞 (𝒎′|𝒎𝑖 )

4 𝛼 ← min{1, 𝛾 }5 return 𝛼

/* Draws a sample from the Metropolis-Hastings kernel. */

6 Function MHKernel(𝒎𝑖):/* Sample the proposal. */

7 𝒎′ ∼ 𝑞 ( · |𝒎𝑖 )

/* Calculate the acceptance probability: */

8 𝛼 ← AcceptProb (𝒎𝑖 , 𝒎′)

/* Accept proposed point with probability 𝛼. */

9 𝑢 ∼ 𝑈 [0, 1]10 if 𝑢 < 𝛼 then

/* Accept the proposed point as the next step in the chain. */

11 return𝒎′

12 else/* Reject proposed point. Return current state 𝒎𝑖 as next state in chain. */

13 return𝒎𝑖

Proposal Distributions. Let 𝑞(·|𝒎𝑖 ) denote a proposal distribution that is “parameterized" by thecurrent state of the chain 𝒎𝑖 . We require that the proposal distribution is easily sampled and thatits density can be efficiently evaluated. The MH rule [32, 42] defines a transition kernel 𝐾𝑀𝐻 (·|𝒎𝑖 )



RW pCN

H-pCN

MALA ∞-MALA

H-MALA H-∞-MALA

Mesh independence

Gradient

inform

ation

Curvatureinform

ation

Fig. 1. The relationship of various MCMC proposal distributions with respect to mesh-refinement indepen-dence (blue arrow), gradient awareness (green arrow), and curvature awareness (red arrow).

through a two step process: first draw a random sample𝒎′ ∼ 𝑞(·|𝒎𝑖 ) from the proposal distribution,and then accept the proposed sample 𝒎′ as the next step in the chain 𝒎𝑖+1 with a probability 𝛼 ,which is defined in Algorithm 1. If rejected, set 𝒎𝑖+1 = 𝒎𝑖 . Under mild technical conditions onthe proposal distribution (see e.g., Roberts et al. [53]), the MH rule defines a Markov chain that isergodic and has `post as a stationary distribution, thus enabling states in the chain to be used inMonte Carlo estimators. Note that the detailed balance condition (see e.g., Owen [45]) is commonlyemployed to verify that a Markov chain has `post as a stationary distribution, but this conditionalone is not sufficient to guarantee that the chain will converge to the stationary distribution. SeeRoberts et al. [53] for a detailed discussion of MH convergence and convergence rates.

While the the MH rule will yield a valid MCMC kernel for a large class of proposal distributions,the dependence of the proposal on the previous state, combined with possible rejection of theproposed state, result in inter-sample correlations in theMarkov chain. Because of these correlations,the error of the Monte Carlo approximation in (13) will be larger when using MCMC than in theclassic Monte Carlo setting with independent samples. Markov chains with large correlations willresult in larger estimator variance. To reduce correlation in the Markov chain, we seek proposaldistributions that can take large steps with a high probability of acceptance. From the acceptanceprobability in Algorithm 1 we see that this can occur when the proposal density 𝑞(𝒎 |𝒎𝑖 ) is a goodapproximation to 𝜋post (𝒎), so that 𝛾 is close to one.

We now turn to describing specific proposal distributions used in MUQ-hIPPYlib. First, we beginby describing common proposal mechanisms that exploit gradient and curvature information toaccelerate sampling in finite-dimensional spaces. These algorithms comprise the left face of the cubein Figure 1. We then show how these ideas can be extended to construct proposals with performancethat is independent of mesh-refinement, thus “lifting” the derivative-accelerated proposals to aninfinite-dimensional setting. This “lifting” operation transforms proposals on the left face in Figure1 to their dimension-independent analogs on the right face of the proposal cube.

Exploiting Gradient and Curvature Information. Perhaps the simplest and most common, but notgenerally efficient, proposal distribution takes the form of a Gaussian distribution centered at thecurrent state in the chain,

𝑞RW (𝒎 |𝒎𝑖 ) = N(𝒎𝑖 , 𝚪prop

), (14)



where 𝚪prop ∈ R𝑛×𝑛 is a user defined covariance matrix. When used with the MH rule, this randomwalk (RW) proposal yields anMCMC algorithm that is commonly called the randomwalkMetropolisalgorithm. The adaptive Metropolis (AM) algorithm employs a variant of this proposal where thecovariance 𝚪prop is adapted based on previous samples [29]. A proposal covariance 𝚪prop thatmatches posterior covariance increases efficiency, but the random walk proposal is still a poorapproximation of the posterior density.

A slightly more efficient proposal can be obtained through a one-step Euler-Maruyama discretiza-tion of the Langevin stochastic different equations [54]. The resulting Langevin proposal takes theform

𝑞MALA (𝒎 |𝒎𝑖 ) = N(𝒎𝑖 + 𝜏𝚪prop∇ log𝜋post (𝒎𝑖 ), 2𝜏𝚪prop

), (15)

where 𝜏 is the step size parameter.MH samplers with this proposal are called Metropolis adjustedLangevin algorithms (MALA). Like the AM algorithm, adapting the covariance of theMALA proposalcan also improve performance [4, 38].Both (14) and (15) use a covariance that is constant across the parameter space. Allowing this

covariance to adapt to the local correlation structure of the posterior density enables higher orderapproximations to be obtained, resulting in more efficient MCMC algorithms. In Girolami andCalderhead [27], a differential geometric viewpoint was employed to define a family of proposalmechanisms on a Riemannian manifold. Adapting the MALA proposal in (15) to this manifoldsetting and ignoring the manifold’s curvature, results in

𝑞sMMALA (𝒎 |𝒎𝑖 ) = N(𝒎𝑖 + 𝜏G−1 (𝒎𝑖 )∇ log𝜋post (𝒎𝑖 ), 2𝜏G−1 (𝒎𝑖 )

), (16)

where G(𝒎) is a position-dependent metric tensor. This is known as the the simplified ManifoldMALA (sMMALA) proposal. Girolami and Calderhead [27] defined the metric tensor G(𝒎) usingthe expected Fisher information metric, which provides a positive definite approximation of theposterior covariance at the point 𝒎. In this work however, we consider an alternative version ofthe sMMALA proposal that uses a constant metric built from a low rank-based approximation ofthe log-posterior Hessian at the MAP point (c.f. eq. (12))

𝑞H-MALA (𝒎 |𝒎𝑖 ) = N(𝒎𝑖 + 𝜏H−1∇ log𝜋post (𝒎𝑖 ), 2𝜏H−1) . (17)

This metric is similar to the one used by Martin et al. [39] and is equivalent to the preconditionedMALA proposal in (15) using the covariance of the Laplace approximation in (10).

Hamiltonian Monte Carlo techniques define another important class of MCMC proposals. Thesetechniques approximately solve a Hamiltonian system to take large jumps in the parameter space.While efficient in various scenarios (see e.g., Neal [44]), we have found that solving the Hamiltoniansystem typically involves an intractable number of posterior gradient evaluations on our problemsof interest. The transport mapMCMC algorithms of Parno and Marzouk [47] are also not consideredhere because of the challenge of building high-dimensional transformations.

Dimension-Independent Proposal Distributions. For finite-dimensional parameters, the randomwalk and MALA proposals defined above can be used with the MH rule for MCMC. However,their performance is not discretization invariant. As the discretization of the function𝑚 is refined,the performance of the samplers on the finite-dimensional posterior 𝜋post (𝒎) will worsen. Somemodifications to the proposals are necessary to obtain “dimension-independent” performance.The works of Cotter et al. [18] , Beskos et al. [9], and Bardsley et al. [7], for example, modifyexisting finite-dimensional proposals to ensure the algorithm performance is independent of meshrefinement.



The dimension-independent analog of the RW proposal is the preconditioned Crank-Nicolson(pCN) proposal introduced in Cotter et al. [18]. It takes the form

𝑞pCN (𝒎 |𝒎𝑖 ) = N(𝒎pr +

√1 − 𝛽2 (𝒎𝑖 −𝒎pr), 𝛽2

𝚪prior

). (18)

Notice that when 𝛽 = 1, the pCN proposal is equal to the prior distribution. The MALA proposalwas also adapted in Cotter et al. [18] to obtain infinite-dimensional MALA (∞-MALA) proposal

𝑞∞MALA (𝒎 |𝒎𝑖 ) = 𝑁(√

1 − 𝛽2𝒎𝑖 + 𝛽√ℎ

2(𝒎pr − 𝚪prior∇Φ(𝒎𝑖 )

), 𝛽2

𝚪prior

), (19)

where 𝛽 = 4√ℎ/(4 + ℎ) and ℎ is a parameter that can be tuned. While the pCN and ∞-MALA

proposals result in discretization-invariant Metropolis-Hastings algorithms, they suffer from thesame deficiencies as their finite-dimensional RW and MALA analogs: they do not capture theposterior geometry.

Several efforts have worked to minimize this deficiency, see for example Beskos et al. [9], Petraet al. [48], Pinski et al. [49], Rudolph and Sprungk [55]. We consider a generalization of the pCNproposal described in Pinski et al. [49]. It incorporates the MAP point and the posterior curvatureinformation at that point into the pCN proposal, which is denoted by H-pCN and takes the form

𝑞H-pCN (𝒎 |𝒎𝑖 ) = N(𝒎MAP +

√1 − 𝛽2 (𝒎𝑖 −𝒎MAP), 𝛽2H−1

). (20)

Another method that can exploit the posterior geometry is an extension of the∞-MALA proposaldiscussed in Beskos et al. [9]:

𝑞∞sMMALA (𝒎 |𝒎𝑖 ) = N (` ′(𝒎𝑖 ), Γ′(𝒎𝑖 )) , (21)where

` ′(𝒎𝑖 ) =√

1 − 𝛽2𝒎𝑖 + 𝛽√ℎ

2

(𝒎𝑖 − G−1

𝚪−1prior (𝒎𝑖 −𝒎pr) − G−1∇Φ(𝒎𝑖 )

)(22)

Γ′(𝒎𝑖 ) = 𝛽2G−1 (𝒎𝑖 ). (23)This∞-sMMALA proposal simplifies to∞-MALAwhen𝐺−1 (𝒎𝑖 ) = 𝚪prior. When𝐺 (𝒎) is the Laplaceapproximation Hessian from (10), the∞-sMMALA proposal simplifies to

𝑞∞H-MALA (𝒎 |𝒎𝑖 ) = N(√

1 − 𝛽2𝒎𝑖 + 𝛽√ℎ

2

(𝒎𝑖 − H−1

𝚪−1prior (𝒎𝑖 −𝒎pr) − H−1∇Φ(𝒎𝑖 )

), 𝛽2H−1

),

(24)which we denote by H-∞-MALA.

Alternative Transition Kernels. The proposal distributions above are classically considered inthe context of a Metropolis-Hasting kernel. However, there are alternative transition kernelsthat also result in ergodic Markov chains. Here we consider transition kernels constructed fromthe delayed rejection approach of Mira et al. [43] as well as Metropolis-within-Gibbs kernels,which repeatedly use the Metropolis-Hastings rule on different conditional slices of the posteriordistribution to construct the Markov chain. In particular, we consider the family of dimension-independent likelihood-informed (DILI) approaches [19, 21], which define a Metropolis-within-Gibbs sampler that inherits dimension-independent properties from an appropriate dimension-independent proposal.

The delayed rejection kernel allows multiple proposals to be attempted in each step of the Markovchain. This can be advantageous when using multiple proposals with complementary properties.For example, it is possible to start with a proposal that attempts to make large ambitious jumps



across the parameter space but may have low acceptance probability while falling back on a moreconservative proposal that takes smaller steps with a larger probability of acceptance. Similarly, itis possible to start with a proposal that is more computationally efficient (e.g., does not requiregradient information) but less likely to be accepted, while employing a more expensive proposalmechanism in a second stage to ensure the chain explores the space. In either case, if the firstproposed move is rejected by the Metropolis-Hastings rule, another more expensive proposal thatis more likely to be accepted can be tried with an adjusted acceptance probability. More than twostages can also be employed. The details of delayed rejection are provided in Algorithm 2.

Algorithm 2: Drawing a sample from the delayed rejection kernelInput: Current state 𝒎𝑖 , Posterior density 𝜋post (𝒎) , Proposals 𝑞1 ( · |𝒎𝑖 ), . . . 𝑞 𝐽 ( · |𝒎𝑖 ) .Output: Next state 𝒎𝑖+1

1

/* Computes the probability of accepting proposed point 𝒎′𝑗from DR stage 𝑗 given the

previous point 𝒎𝑖 in the chain and the 𝑗 − 1 points [𝒎′1, . . . ,𝒎′𝑗−1 ] that were rejected in

previous DR stages. */

2 Function AcceptProb(𝒎𝑖 , [𝒎′1 . . . ,𝒎′𝑗 ]):

3 𝛾 ←𝜋post (𝒎′𝑗 )𝜋post (𝒎𝑖 )

𝑞 𝑗 (𝒎𝑖 |𝒎′𝑗 )𝑞 𝑗 (𝒎′𝑗 |𝒎𝑖 )

𝑗−1∏𝑘=1

[𝑞𝑘 (𝒎′𝑗−𝑘 |𝒎

′𝑗)

𝑞𝑘 (𝒎′𝑘 |𝒎𝑖 )1−AcceptProb(𝒎′

𝑗,[𝒎′

𝑗−1,𝒎′𝑗−2,...,𝒎

′𝑗−𝑘 ])

1−AcceptProb(𝒎𝑖 ,[𝒎′1,𝒎′2,...,𝒎

′𝑘])

]4 𝛼 ← min{1, 𝛾 }5 return 𝛼

/* Draws a sample from the delayed rejection kernel. */

6 Function DRKernel(𝒎𝑖):7 for 𝑗 ← 1 to 𝐽 do

/* Sample the 𝑗𝑡ℎ proposal. */

8 𝒎′𝑗∼ 𝑞 𝑗 ( · |𝒎𝑖 )

/* Calculate the acceptance probability: */

9 𝛼 ← AcceptProb (𝒎𝑖 , [𝒎′1, . . . ,𝒎 𝑗 ])

/* Accept current proposed point with probability 𝛼. */

10 𝑢 ∼ 𝑈 [0, 1]11 if 𝑢 < 𝛼 then

/* Return the current proposed point. */

12 return 𝒎′𝑗

/* Return the current state. All proposed points were rejected. */

13 return 𝒎𝑖

DILI divides the parameter space into a finite-dimensional subspace, which can be exploredwith standard proposal mechanisms, and a complementary infinite-dimensional space that can beexplored with a dimension-independent approach, such as those described above. The resultingtransition kernel is more complicated than the Metropolis-Hastings rule, but inherits the dimension-independent properties of the complementary space proposal. The likelihood-informed subspace iscomputed using the generalized eigenvalue problem in (11). If the eigenvalue is larger than one,it indicates that the likelihood function dominates the prior density in that direction. The samelow rank structure used to approximate the posterior Hessian can therefore be used to decomposethe parameter space into a likelihood-informed subspace (LIS) spanned by the columns of V𝑟and an orthogonal complementary space (CS). As shown in Algorithm 3, within each subspace, a



standard Metropolis-Hastings kernel is employed. As long as the kernel in the CS uses a dimension-independent proposal (typically pCN), then the DILI sampler will remain dimension-independent.Unlike the original implementation described in Cui et al. [19], the MUQ implementation doesnot use a whitening transform and thus avoids computing any symmetric decomposition of theprior covariance. In general, the Hessian used in (11) can be adapted to capture more correlationstructure. However, we did not find this necessary in the numerical experiments below.

Algorithm 3: Drawing a sample from the DILI kernelInput: Current state 𝒎𝑖 , Current subspace V𝑟 , Subspace kernel 𝐾𝑠 ( · |𝒓, 𝒄) , Complementary Kernel 𝐾𝑐 ( · |𝒓, 𝒄) .Output: Next state 𝒎𝑖+1

1

/* Use Metropolis-in-Gibbs steps to draw a sample from the DILI kernel. */

2 Function DILIKernel(𝒎𝑖):

/* Split current state into LIS and CS components. */

3 W𝑟 ← 𝚪−1priorV𝑟

4 𝒓𝑖 ←W𝑇𝑟 𝒎𝑖

5 𝒄𝑖 ← (𝐼 − V𝑟 W𝑇𝑟 )𝒎𝑖

/* Take a step in the likelihood-informed space (LIS). */

6 𝒓 ′ ← 𝐾𝑠 ( · |𝒓𝑖 , 𝒄𝑖 )

/* Take a step in the complementary space (CS). */

7 𝒄′ ← 𝐾𝑐 ( · |𝒓 ′, 𝒄𝑖 )

/* Compute the new location in the full space. */

8 𝒎′ = V𝑟 𝒓 ′ + 𝒄′

9 return 𝒎′

Assembling an MCMC Algorithm. It is possible to combine nearly any of the proposals and kernelsdescribed above, resulting in myriad possible MCMC algorithms. As suggested in Figure 2, thereare three fundamental building blocks to an MCMC algorithm. The chain keeps track of previouspoints and allows computing Monte Carlo estimates. The kernel defines a mechanism for samplingthe next state 𝒎𝑖+1 given the value of the current state 𝒎𝑖 and one or more proposal distributions.The proposal defines a position specific probability distribution that can be easily sampled andhas a density that can be efficiently evaluated. We mimic these abstract interfaces in our softwaredesign to define and test a large number of kernel-proposal combinations.

3.2 MCMC diagnosticsTwo questions naturally arise when analyzing a length 𝑁 Markov chain [m1, . . . ,m𝑁 ] producedby MCMC. First, has the chain converged to the stationary distribution? Second, what is thestatistical efficiency of the chain? Most theoretical guarantees are asymptotic and it is importantto quantitatively answer these questions when employing finite-length MCMC chains. Based onthese considerations, this section describes the diagnostics implemented in MUQ-hIPPYlib to checkthe convergence and statistical efficiency of high dimensional MCMC chains.

3.2.1 Assessing Convergence. To assess convergence, we will compute two different asymptoticallyunbiased estimators of the posterior covariance: one that is an overestimate for finite 𝑁 and onethat is an underestimate for finite 𝑁 . As the ratio of these two estimates approaches one, we can beconfident that the MCMC chain has converged (see e.g., Brooks and Gelman [11], Gelman et al.[26], Vehtari et al. [64]).



Metropolis-Hastings Algorithms

Chain MH Kernel Proposal

Delayed Rejection Algorithms

Chain DR Kernel

Proposal 1

...

Proposal 𝐽

DILI Algorithms

Chain DILI Kernel

MH Kernel LIS Proposal

MH Kernel CS Proposal

Fig. 2. The flexible framework of MUQ-hIPPylib allows many different combinations of transition kernelsand proposal distributions to be employed. The components of the transition kernels defined in Algorithms1–3 are shown here. Note that each kernel can interact with any proposal distribution, which enables manydifferent MCMC algorithms to be constructed from the same basic components.

The estimates are based on running𝑀 independent chains starting from randomly chosen pointsthat are more disperse than the posterior distribution `post, where we define a “disperse” distributionas one that has a larger covariance than `post. Each chain has the same length 𝑁 .Letting m𝑖 𝑗 be 𝑖th MCMC sample in chain 𝑗 , we define the within-sequence covariance matrix

W and the between-sequence covariance matrix B as

W =1

𝑀 (𝑁 − 1)

𝑀∑𝑗=1

𝑁∑𝑖=1(m𝑖 𝑗 − m. 𝑗 ) (m𝑖 𝑗 − m. 𝑗 )𝑇 ; m. 𝑗 =

1𝑁

𝑁∑𝑖=1

m𝑖 𝑗 , (25)

B =𝑁

𝑀 − 1

𝑀∑𝑗=1(m. 𝑗 − m..) (m. 𝑗 − m..)𝑇 ; m.. =

1𝑀

𝑀∑𝑗=1

m. 𝑗 . (26)

As pointed out in Brooks and Gelman [11], W and B can be combined to produce an estimate V ofthe posterior covariance that takes the form

V =𝑁 − 1𝑁

W + 𝑀 + 1𝑀𝑁

B. (27)

The overdispersion of the initial points in each chain causes V to overestimate the posteriorcovariance for finite 𝑁 . On the other hand, the average within-chain covariance W will tend tounderestimate the covariance because the chains have not explored the entire parameter space.Comparing W and V thus provides a way of assessing convergence.The 𝑅 statistic of Gelman et al. [26] and Vehtari et al. [64] is a common way of comparing W

and V. It uses the ratio of the diagonal component of V and W to construct a componentwiseconvergence diagnostic. For high dimensional problems, it is more natural to consider a multivariateconvergence diagnostic. We will therefore employ the multivariate potential scale reduction factor(MPSRF) of Brooks and Gelman [11], which is a natural extension of the componentwise 𝑅 statistic.The MPSRF is defined by

MPSRF =

√max𝑎

𝑎𝑇 V𝑎𝑎𝑇W𝑎

=

√𝑁 − 1𝑁+ 𝑀 + 1𝑀𝑁

_max, (28)

where _max is the largest eigenvalue satisfying the generalized eigenvalue problem B𝒗 = _W𝒗.



Note that by construction MPSRF ≥ 1. When the MPSRF approaches 1, the variance within eachsequence approaches the variance across sequences, thus indicating that each individual chain hasconverged to the target distribution. Following the recommendations of Vehtari et al. [64], we willconsider the chains “converged” if MPSRF < 1.01.

3.2.2 Statistical Efficiency. The samples in an MCMC chain are generally correlated, which in-creases the variance of Monte Carlo estimators constructed with MCMC samples. For a quantityof interest G(𝒎), the effective sample size (ESS) of a Markov chain is defined as the number ofindependent samples of the posterior that would be needed to estimate E[G] with the same statis-tical accuracy as an estimate from the Markov chain. The ESS is therefore a measure of how muchinformation is contained in the MCMC chain. In this work, it is commonly assumed that the ESS isderived for estimators of the posterior mean, i.e., E[G] = E[𝒎]. Here we derive the ESS under thiscommon assumption, but discuss alternatives that are better suited for high dimensional parameterspaces in Section 3.2.3.

There are several ways of estimating the ESS. For instance, spectral approaches use the integratedautocorrelation of the MCMC chain to estimate the effective sample size (see e.g., Gelman et al.[26], Wolff et al. [67]). Other common methods use the statistics of small sample batches (see e.g.,Flegal and Jones [25], Vats et al. [63]). MUQ provides implementations of both spectral and batchmethods. Here we focus on the spectral formulation of ESS however, because it gives additionalinsight into the structure of MCMC chains. The ESS for component 𝑖 of 𝒎 is defined by

ESS𝑖 =𝑀𝑁

1 + 2∑∞𝑡=1 𝜌𝑖𝑡

, (29)

where 𝜌𝑖𝑡 is the autocorrelation function of component 𝑖 in the MCMC chain at lag 𝑡 . Here, theautocorrelation function 𝜌𝑖𝑡 is estimated by the following formula [26]:

𝜌𝑖𝑡 ≈ 𝜌𝑖𝑡 = 1 − 𝑣𝑖𝑡

2𝑉𝑖𝑖, (30)

where 𝑉𝑖𝑖 is the 𝑖th diagonal component of the posterior covariance estimate defined in (27) and 𝑣𝑖𝑡is the variogram defined by

𝑣𝑖𝑡 =1

𝑀 (𝑁 − 𝑡)

𝑀∑𝑗=1

𝑁∑𝑘=𝑡+1(𝑚𝑘 𝑗,𝑖 −𝑚 (𝑘−𝑡 ) 𝑗,𝑖 )2. (31)

In practice, 𝜌𝑖𝑡 is noisy for large values of 𝑡 and we truncate the summation (29) at some lag𝑡 ′. Following common practice, we choose 𝑡 ′ ≥ 0 to be the lag for which the sum successiveautocorrelation estimates 𝜌2𝑡 ′ + 𝜌2𝑡 ′+1 is negative [26].

3.2.3 Projection along the dominant eigenvectors. We note that the evaluation of the integratedautocorrelation time and ESS for all components of the parameter vector 𝒎 is computationallyintractable when 𝒎 is high-dimensional. Moreover, a huge amount of disk storage is required tosave all the samples before the ESS evaluation. To alleviate these issues for large-scale problems, weconsider only the subspace spanned by the 𝑟 dominant eigenvectors of the generalized eigensystemin (11). Specifically, we compute the autocorrelation time and ESS based on a coefficient vectorc ∈ R𝑟 defined by

c = V𝑇𝑟 𝚪−1priorm. (32)

4 SOFTWARE FRAMEWORKhIPPYlib-MUQ is a Python interface that integrates these two open source software libraries into aunique software framework, allowing the user to implement state-of-the-art Bayesian inversion



algorithms in a seamless way. In this framework, hIPPYlib is used to define the forward model, theprior, and the likelihood, to compute the maximum a posteriori (MAP) point, and to construct aGaussian (Laplace) approximation of the posterior distribution based on approximations of theposterior covariance as a low-rank update of the prior [14]. MUQ is employed to exploit advancedMCMC methods to fully characterize the posterior distribution in non-Gaussain/nonlinear settings.hIPPYlib-MUQ offers a set of wrappers that encapsulate the functionality of hIPPYlib in a way thatvarious features of hIPPYlib can be accessed by MUQ. A key aspect of hIPPYlib-MUQ is that itenables the use of curvature-informed MCMC methods, which is crucial for efficient and scalableexploration of the posterior distribution for large-scale Bayesian inverse problems. We summarize inFigure 3 the main functionalities of hIPPYlib and MUQ and the integration of their complementarycomponents.

Geometry, meshFinite element spacesAssembly of weak formsAutomatic differentiation

FEniCS

• PDE– First/second orderforward/adjoint PDEs

• Likelihood– Observation operator– Noise covariance• Prior– Covariance/regularizationoperators

• QOI– Prediction & sensitivities

hIPPYlib Model

• Large-scale optimization algorithms• Randomized linear algebra– Eigensolvers– Trace/diagonal estimators• Scalable Gaussian random fields

hIPPYlib Algorithms

Parallel linear algebraKrylov methodsPreconditioners

PETSc

• Forward/adjoint solver• Incremental forward/adjoint• Gradient evaluation• Hessian action

Model Evaluation &Sensitivities

• MAP point• Low rank-based decompositionof posterior covariance

Laplace Approximation

• Abstract model interface• Probability distributions

ModPieces

• Curvature-informed proposals– pCN and MALA withLaplace approximation

– Dimension-independentlikelihood-informed

• Flexible kernels– Metropolis-Hastings– Delayed rejection

MCMC Proposals &Kernels

• Graphical model specification• Bayesian hierarchical modeling• Gradient/Hessian propagation

MUQModeling

• Posterior sampling– MCMC– Transport maps– Likelihood-informed subspaces• Surrogates– Sparse adaptive gPC– Gaussian processes• Prediction tools– Global sensitivity analysis– Optimal experimental design

MUQ Algorithms

Interface

Fig. 3. Description of the functionalities of hIPPYlib and MUQ and their interface. Orange and red boxesrepresent hIPPYlib and MUQ functionalities, respectively. Green boxes indicate external software libraries,FEniCS and PETSc, that provide parallel implementation of finite element discretizations and solvers. Arrowsrepresent one-way or reciprocal interactions.

Figure 4 provides an overview of the Python classes implemented by the hIPPYlib-MUQ interface.Inherited from MUQ classes, the interface classes wrap the hIPPYlib functionalities needed toachieve curvature-informed MCMC sampling methods. These include:(1) Prior Gaussian interface (e.g., LaplaceGaussian class) to enable the use of hIPPYlib priormod-

els (e.g., LaplacianPrior class) in MUQ probability distribution models (e.g., GaussianBaseclass);

(2) Likelihood interface (Param2LogLikelihood class) to incorporate hIPPYlib likelihood models(Model class) into the MUQ Bayesian modeling framework so that MUQ can exploit the modelevaluation (the parameter-to-observable map) and optionally its gradient and Hessian actions.

(3) Laplace approximation interface (LAPosteriorGaussian class) to provide access to theLaplace approximation of the posterior distribution generated by hIPPYlib (GaussianLRPosteriorclass) from MUQ modeling component (ModPiece class).



LaplaceGaussian

LaplacianPrior

BiLaplaceGaussian

BiLaplacianPrior

LAPosteriorGaussian

GaussianLRPosterior

Param2LogLikelihood

Model

GaussianBase

Density

Distribution

ModPiece

Fig. 4. Class hierarchy for hIPPYlib-MUQ framework. Classes of hIPPYlib, MUQ, and the interface are coloredin orange, red, and blue, respectively. Dashed arrows represent inheritance relationship between two classes:the arrowhead attaches to the super-class and the other attaches to the sub-class.

# Example code snippetimport muq.Modeling as mmimport hippylib2muq as hm

# ... Use hIPPYlib to define prior and model variables

# Convert hiPPYlib components to MUQ componentsprior_density = hm.BiLaplaceGaussian(prior ). AsDensity()likelihood = hm.Param2LogLikelihood(model)

# Add all of the components to the graphgraph = mm.WorkGraph()graph.AddNode(mm.IdentityOperator(dim), 'Parameter')graph.AddNode(prior_density, 'Prior')graph.AddNode(likelihood, 'Likelihood')graph.AddNode(mm.DensityProduct(2), 'Posterior')

# Define right branch: Parameter−>Prior−>Posteriorgraph.AddEdge('Parameter', 0, 'Prior', 0)graph.AddEdge('Prior', 0, 'Posterior', 0)

# Define left branch: Parameter−>Likelihood−>Posteriorgraph.AddEdge('Parameter', 0, 'Likelihood', 0)graph.AddEdge('Likelihood', 0, 'Posterior', 1)

Parameter(IdentityOperator)

Likelihood(Param2LogLikelihood)

Prior(e.g., LaplaceGaussian)

Posterior(DensityProduct)

Input

Output

Fig. 5. Graphical description of Bayesian posterior modeling using hIPPYlib-MUQ software framework (left)and an example code snippet (right). In the left figure, class names of MUQ and the interface are coloredin red and blue, respectively. MUQ WorkGraph class provides a way to combine all the Bayesian posteriormodel components by its member functions AddNode and AddEdge. MUQ IdentityOperator class identifiesinput parameters and the input argument dim represents the parameter dimension. MUQ DensityProductclass defines product of prior and likelihood densities and the input argument 2 means the number of inputdensities.

These interface classes can then be used to form a Bayesian posterior model governed by PDEsusing MUQ graphical modeling interface (WorkGraph) as shown in Figure 5, as well as to constructMCMC proposals.



hIPPYlib-MUQ also implements the MCMC convergence diagnostics described in Section 3.2.These include the potential scale reduction factor and its extension to multivariate parametercases [11], the autocorrelation function, and the effective sample size. A detailed description of allclasses and functionalities of hIPPYlib-MUQ can also be found at https://hippylib2muq.readthedocs.io/en/latest/modules.html.

5 NUMERICAL ILLUSTRATIONThe objective of this section is to showcase applications of the integrated software frameworkdiscussed in previous sections via a step-by-step implementation procedure. We focus on comparingthe performance of several MCMCmethods available in the software framework. For the illustrationwe first revisit the model problem considered in Villa et al. [65], an inverse problem of reconstructingthe log-diffusion coefficient field in a two-dimensional elliptic partial differential equation. Wethen consider a nonlinear 𝑝-Poisson problem in three-dimension for which the forcing term of anatural boundary condition is inferred. In this section, we summarize the Bayesian formulationof the example problems and present numerical results obtained using the proposed softwareframework. The accompanying Jupyter notebook provides a detailed description of the hIPPYlib-MUQ implementations; see https://hippylib2muq.readthedocs.io/en/latest/tutorial.html.

5.1 Inferring coefficient field in a two-dimensional Poisson PDEWe first consider the coefficient field inversion in a Poisson partial differential equation givenpointwise noisy state measurements. We begin by describing the forward model setup and quantityof interest (the log flux through the bottom surface), followed by the definition of the prior andthe likelihood distributions. We next present the Laplace approximation of the posterior and applyseveral MCMCmethods to characterize the posterior distribution, as well as the predictive posteriordistribution of the scalar quantity of interest. The scalability of the proposed methods with respectof the parameter dimension is then assessed in a mesh refinement study. Finally, a comparisonbetween curvature-informed and classical MCMC methods is shown for a different noise level andnumber of observation points.

5.1.1 Forward model. Let Ω ∈ R𝑑 (𝑑 = 2, 3) be an open bounded domain with boundary 𝜕Ω =

𝜕Ω𝐷 ∪ 𝜕Ω𝑁 , 𝜕Ω𝐷 ∩ 𝜕Ω𝑁 = ∅. Given a realization of the uncertain parameter field 𝑚, the statevariable 𝑢 is governed by

−∇ · (𝑒𝑚∇𝑢) = 𝑓 in Ω,

𝑢 = 𝑔 on 𝜕Ω𝐷 , (33)𝑒𝑚∇𝑢 · n = ℎ on 𝜕Ω𝑁 ,

where 𝑓 is a volume source term, 𝑔 and ℎ are the prescribed Dirichlet and Neumann boundary data,respectively, and n is the outward unit normal vector.

The weak form of (33) reads as follows: Find 𝑢 ∈ V𝑔 such that

⟨𝑒𝑚∇𝑢,∇𝑝⟩ = ⟨𝑓 , 𝑝⟩ + ⟨ℎ, 𝑝⟩𝜕Ω𝑁∀𝑝 ∈ V0, (34)

where

V𝑔 ={𝑣 ∈ 𝐻 1 (Ω) |𝑣 = 𝑔 on 𝜕Ω𝐷

},

V0 ={𝑣 ∈ 𝐻 1 (Ω) |𝑣 = 0 on 𝜕Ω𝐷

}. (35)

Above, we denote the 𝐿2-inner product over Ω by ⟨·, ·⟩ and that over 𝜕Ω𝑁 by ⟨·, ·⟩𝜕Ω𝑁.


https://hippylib2muq.readthedocs.io/en/latest/modules.html

https://hippylib2muq.readthedocs.io/en/latest/tutorial.html


Fig. 6. Prior mean (leftmost) and three sample fields drawn from the prior distribution for the Poissonproblem.

As a quantity of interest, the log of normal flux through the bottom boundary 𝜕Ω𝑏 ⊂ 𝜕Ω𝐷 isconsidered. Specifically, we define the quantity of interest G(𝑚) as

G(𝑚) = ln{−

∫𝜕Ω𝑏

𝑒𝑚∇𝑢 · n𝑑𝑠}. (36)

In this example we consider a unit square domain in R2 with no source term (𝑓 = 0), no normalflux (ℎ = 0) on the left and right boundaries, and the Dirichlet condition imposed on the topboundary (𝑔 = 1) and the bottom boundary (𝑔 = 0).

For the spatial discretization, we use quadratic finite elements for the state variable (also for theadjoint variable) and linear finite elements for the parameter variable. For the numerical resultspresented in Sections 5.1.5 and 5.1.7, the computational domain is then discretized using a regularmesh with 2,048 triangular elements. This leads to 4,225 and 1,089 degrees of freedom for the stateand parameter variables, respectively. In the scalability results presented in Section 5.1.6, the meshis then refined with up to four levels of uniform refinement leading to 263,169 and 66,049 degreesof freedom for the state and parameter variables, respectively, on the finest level.

5.1.2 Prior model. As discussed in Section 2, we choose the prior to be a Gaussian distributionN

(𝑚pr, Cprior

)with Cprior = A−2 where A is a Laplacian-like operator given as

A𝑚 =

{−𝛾∇ · (Θ∇𝑚) + 𝛿𝑚 in Ω,

Θ∇𝑚 · n + 𝛽𝑚 on 𝜕Ω.(37)

Here, 𝛽 ∝√𝛾𝛿 is the optimal Robin coefficient introduced to alleviate undesirable boundary

effects [22], and an anisotropic tensor Θ is of the form

Θ =

[\1 sin2 (𝛼) (\1 − \2) sin(𝛼) cos𝛼

(\1 − \2) sin(𝛼) cos𝛼 \2 cos2 (𝛼)

]. (38)

For this example we take 𝛾 = 0.1, 𝛿 = 0.5, 𝛽 =√𝛾𝛿/1.42, \1 = 2.0, \2 = 0.5 and 𝛼 = 𝜋/4. Figure 6

shows the prior mean𝑚pr and three samples from the prior distribution.

5.1.3 Observations with noise and the likelihood. We generate state observations at 𝑙 randomlocations uniformly distributed over [0.05, 0.95]2 by solving the forward problem on the finestmesh with the true parameter field𝑚true (here a sample from the prior is used) and then addinga random Gaussian noise to the resulting state values; see Figure 7. The number of observations𝑙 is set to 300 for the experiments in Sections 5.1.5 and 5.1.6, while 𝑞 = 60 for the comparison inSection 5.1.7. The vector of synthetic observations is given by

𝒅 = B𝑢 + 𝜼, (39)



Fig. 7. True parameter field (left) and the corresponding state field (right) for the Poisson problem. Thelocations of the observation points are marked as white squares in the right figure.

where B is a linear observation operator, restricting the state solution to the 𝑙 observation points.The additive noise vector 𝜼 has mutually independent components that are normally distributedwith zero mean and standard deviation 𝜎 = 0.005 (Sections 5.1.5 and 5.1.6) or 𝜎 = 0.1 (Section 5.1.7).The likelihood function is then given by

𝜋like (𝒅 |𝑚) ∝ exp(−1

2 ∥B 𝑢 (𝑚) − 𝒅obs∥2𝚪−1noise

), (40)

where 𝚪noise = 𝜎2I.

5.1.4 Laplace approximation of the posterior. We next construct the Laplace approximation of theposterior, a Gaussian distribution ˆpost ∼ N

(𝑚MAP,H(𝑚MAP)−1) with mean equal to the MAP point

and covariance given by the Hessian of the negative log-posterior evaluated at the MAP point. TheMAP point is obtained by minimizing the negative log-posterior, i.e.,

min𝑚∈M

J (𝑚) := 12 ∥B 𝑢 (𝑚) − 𝒅obs∥2Γ−1

noise+ 1

2 ∥𝑚 −𝑚pr∥2C−1prior

. (41)

We employ the inexact Newton-CG algorithm implemented in hIPPYlib to solve the above opti-mization problem. We refer the reader to Villa et al. [65] for a detailed description of the algorithmand the expressions for the gradient and Hessian actions of the negative log-posterior J (𝑚).As pointed out in Section 2, explicitly computing the Hessian is prohibitive for large-scale

problems, as this entails solving two forward-like PDEs as many times as the number of parameters.To make the operations with the Hessian scalable with respect to the parameter dimension, weinvoke a low-rank approximation of the data misfit part of the Hessian, retaining only 𝑟 eigenvectorsthat are the most significantly informed directions from the data [65].Figure 8 shows the eigenspectrum of the prior-preconditioned data misfit Hessian. The double

pass randomized algorithm provided by hIPPYlib with an oversampling factor of 20 is used toaccurately compute the dominant eigenpairs. We see that eigenvalues are smaller than 1 afteraround the 60𝑡ℎ eigenvalue, indicating that keeping 60 eigenpairs is sufficient for the low-rank ap-proximation. Figure 8 also shows four eigenvectors, which, as expected, illustrate that eigenvectorscorresponding to smaller eigenvalues display more fluctuations.

In Figure 9, we depict the MAP point and three samples drawn from the Laplace approximationof the posterior.



0 20 40 60 80 100

10−1

100

101

102

103

104

105

106

index

eigenv

alue

Fig. 8. Logarithmic plot of the 𝑟 = 100 dominant eigenvalues of the prior-preconditioned data misfit Hessianand the eigenvectors corresponding to the 1𝑠𝑡, 4𝑡ℎ, 16𝑡ℎ and 64𝑡ℎ largest eigenvalues for the Poisson problem.

Fig. 9. The MAP point (leftmost) and three sample fields drawn from the Laplace approximation of theposterior distribution for the Poisson problem.

5.1.5 Exploring the posterior using MCMC methods. In this section, we implement the advancedMCMC algorithms discussed in the Section 3 to explore the posterior and compare their perfor-mance.

In particular, we consider pCN, MALA,∞-MALA, DR, DILI, and their Hessian-informed counter-parts. For each method, we simulate 20 independent MCMC chains, each with 25,000 samples, andhence draw a total of 500,000 samples from the posterior. A sample from the Laplace approximationof the posterior is chosen as starting point for the chains.

Table 1 shows the convergence diagnostics and computational efficiency of the MCMC samples.MPSRF and ESS are computed by projections of parameter samples along the first 25 dominanteigenvectors of the prior-preconditioned data misfit Hessian at the MAP point. Table 1 reports themininum, maximum, and average ESS over all the 25 projections.

The last column in Table 1 represents the number of forward and/or adjoint PDE solves requiredto draw a single independent sample (average ESS is used). This quantity can be used to measure thesampling efficiency and rank the methods in terms of computational efficiency. Under this metric,DILI-MAP is the most efficient method and requires only 202 PDE solves for effective sample. DR(213 NPS/ES for H-∞-MALA and 215 NPS/ES for H-MALA) and H-pCN (216 NPS/ES) are closeseconds.



Table 1. Comparison of the performance of several MCMC methods for the Poisson problem: pCN, MALA,∞-MALA, DR, DILI, and their Hessian-informed versions. Acceptance rate (AR), multivariate potential scalereduction factor (MPSRF), and effective sample sample size (ESS) are reported for convergence diagnostics.MPSRF and ESS are computed by projections of parameter samples along the first 25 dominant eigenvectorsof the prior-preconditioned data misfit Hessian at the MAP point. Two values of AR are listed in DR andDILI-MAP, which are for the first and the second proposal moves, respectively. We also provide the number offorward and/or adjoint PDE solves per effective sample (NPS/ES) for sampling efficiency. We use 20 MCMCchains, each with 25,000 iterations (500,000 samples in total). The numbers in the parenthesis in each methodname represent the parameter values used (𝛽 for pCN, 𝜏 for MALA, ℎ for ∞-MALA, and 𝛽 and 𝜏 for andDILI). The number in parenthesis of the minimum ESS and the maximum ESS indicates the correspondingeigenvector index.

Method AR (%) MPSRF Min. ESS (index) Max. ESS (index) Avg. ESS NPS/ESpCN (5.0E-3) 24 2.629 25 (24) 225 (8) 84 5,952MALA (6.0E-6) 48 2.642 26 (22) 874 (5) 148 10,135∞-MALA (1.0E-5) 57 2.943 25 (23) 1,102 (5) 160 9,375H-pCN (4.0E-1) 27 1.192 64 (1) 3,598 (15) 2,314 216H-MALA (6.0E-2) 60 1.014 545 (1) 8,868 (19) 6,459 232H-∞-MALA (1.0E-1) 71 1.016 582 (1) 8,417 (18) 5,905 254DR (H-pCN (1.0E0), H-MALA (6.0E-2)) (4, 61) 1.013 641 (1) 12,522 (17) 9,222 215DR (H-pCN (1.0E0), H-∞-MALA (2.0E-1)) (4, 48) 1.011 613 (1) 12,812 (17) 9,141 213DILI-PRIOR (0.8, 0.1) (60, 33) 1.064 314 (1) 4,667 (13) 3,216 548DILI-LA (0.8, 0.1) (83, 36) 1.017 562 (1) 10,882 (17) 7,192 245DILI-MAP (0.8, 0.1) (77, 22) 1.006 1,675 (1) 10,271 (20) 8,692 202

0 100 200 300 400 500−0.2

0

0.2

0.4

0.6

0.8

1

lag

autocorrelation

pCNMALA∞-MALAH-pCNH-MALAH-∞-MALADR (H-pCN, H-MALA)DR (H-pCN,∞-H-MALA)DILI-PRIORDILI-LADILI-MAP

Fig. 10. Autocorrelation function estimate (30) of the quantity of interest G (36) for several MCMC methods.

We next assess the convergence of MCMC samples of the quantity of interest G(𝑚) in (36) to thepredictive posterior distribution of G(𝑚): the autocorrelation function estimates of the quantityof interest G (36) are shown in Figure 10 (here, we use the formula (30) to account for the use ofmultiple chains), the trace plots from three independent MCMC chains are depicted in Figure 11,and histograms of all the MCMC samples (the number of counts is normalized) are shown inFigure 12.Lastly, we compare estimates of moments of the quantity of interest for the different sampling

strategies. For each MCMC chain, the 𝑘th (𝑘 = 1, 2, 3) moment of the quantity of interest computed



quantityof

interestG

MCMC step

Fig. 11. Trace plots of the quantity of interest G (36) from three MCMC chains (out of 20 independent chains).Different colors (here blue, green and red) represent traces from each chain.

4 2 0 2 40.0

0.2

0.4

0.6

0.8

1.0

1.2pCN

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 H-pCN

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 DR (H-pCN, H-MALA)

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 DILI-PRIOR

4 2 0 2 40.0

0.2

0.4

0.6

0.8

1.0

1.2MALA

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 H-MALA

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 DR (H-pCN, H- -MALA)

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 DILI-LA

4 2 0 2 40.0

0.2

0.4

0.6

0.8

1.0

1.2-MALA

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 H- -MALA

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 DILI-MAP

norm

alized

coun

ts

quantity of interest G

Fig. 12. Probability density function estimate of the quantity of interest G (36) computed from several MCMCmethods. All the 500,000 samples, 20 chains with 25,000 samples each, are pulled together in the histogram.The number of counts is normalized so that the plot represents a probability density function.

from parameter samples m𝑖 (𝑖 = 1, 2, . . . , 𝑁 ;𝑁 = 25, 000) is computed as

G𝑘 =1𝑁

𝑁∑𝑖=1G𝑘 (m𝑖 ). (42)



−1012

firstmom

entG

1

0.40.60.8

0123

second

mom

entG

2

0.60.8

1

pCN

MALA

∞-M

ALA

H-pCN

H-M

ALA

H-∞

-MALA

DR(H

-pCN

H-M

ALA

)

DR(H

-pCN

H-∞

-MALA

)

DILI-P

RIOR

DILI-L

A

DILI-M

AP

−20246

third

mom

entG

3

0.61

1.4

Fig. 13. Box plots of first, second, and third moment estimates (G𝑘 , 𝑘 = 1, 2, 3) of the quantity of interest (42)computed by using several MCMC methods. The central mark is the median; lower and upper quartilesrepresent 25th and 75th percentiles, respectively. Whiskers extend to the extreme data points that fall withinthe distance from the lower or upper quartiles to 1.5 times the interquartile range (the distance between theupper and lower quartiles); all the other data points are plotted as outliers. The number of data points foreach method is 20, the number of independent MCMC chains.

The results are reported in Figure 13 as box-and-whisker plots.From the results presented in this section, we draw the following conclusions:

• The Hessian information at the MAP point plays an important role in enhancing the samplingperformance of the MCMC methods. In fact MCMC chains without the Hessian informationdid not converge over the entire length of the chain and were localized around the startingpoint. The convergence was achieved only when the MCMC proposal exploited the Laplaceapproximation of the posterior that incorporates the Hessian information.• DILI-MAP shows the best sampling efficiency in terms of the number of forward and/oradjoint PDE solves per effective sample. Note that the parameter value used in the MCMCmethods (e.g., 𝛽 and/or 𝜏) was not the optimal and a different result may be obtained withdifferent parameter values.



0 20 40 60 80 100

10−1

100

101

102

103

104

105

106

index

eigenv

alue

mesh 1mesh 2mesh 3mesh 4

Fig. 14. Logarithmic plot of the 𝑟 = 100 dominant eigenvalues of the prior-preconditioned data misfit Hessiancomputed using four different meshes. The mesh is uniformely refined from the coarsest (mesh 1) to thefinest (mesh 4).

Table 2. Acceptance rate (AR), multivariate potential scale reduction factor (MPSRF) and effective samplesize (ESS) of the posterior samples generated by using the H-pCN method with different dimensions. Weuse 𝛽 = 0.4 for the H-pCN method and draw in total 500,000 samples (20 MCMC chains, each with 25,000iterations). MPSRF and ESS are computed from the projection of samples along the first 25 dominanteigenvectors of the prior-preconditioned data misfit Hessian at the MAP point. The number in parenthesis ofthe minimum ESS and the maximum ESS indicates the corresponding eigenvector index.

Dimension (state, parameter) AR (%) MPSRF Min. ESS (index) Max. ESS (index) Avg. ESS(4,225, 1,089) 27 1.192 64 (1) 3,598 (15) 2,314(16,641, 4,225) 24 1.333 63 (1) 3,221 (18) 1,830(66,049, 16,641) 23 1.075 209 (1) 3,073 (11) 1,940(263,169, 66,049) 22 1.117 102 (2) 3,276 (15) 1,767

We further study the performance of MCMC methods under different problem settings to providemore insight into the practical use of the hIPPYlib-MUQ framework.

5.1.6 Scalability of Hessian-informed pCN. Here we investigate the effect of mesh resolution onthe sampling performance. A curvature aware MCMC method, the H-pCN is selected with 𝛽 = 0.4for the test. The dimensions of the parameter and the state variables from a coarse mesh (mesh 1)to the finest mesh (mesh 4) are (1,089, 4,225), (4,225, 16,641), (16,641, 66,049), and (66,049, 263,169),respectively.

We follow the same problem setting as before, and use the same synthetic observations (obtainedfrom the true parameter field generated from the finest mesh) for all levels. Figure 14 shows the𝑟 = 100 dominant eigenvalues of the prior-preconditioned data misfit Hessian. One observes thatthe eigenspectrum is virtually independent of mesh refinement.To assess the convergence of the MCMCM methods, in Table 2 we report the acceptance rate,

MPSRF, and ESS of the posterior samples. The MPSRF and ESS are computed by projections ofparameter samples along the first 25 dominant eigenvectors of the prior-preconditioned data misfitHessian at the MAP point, as discussed in Section 3.2.3. In Figure 15, we present the autocorrelationfunction estimates (30) and show histograms for the quantity of interest G (36). The results show



0 500 1,000 1,500 2,000−0.2

0

0.2

0.4

0.6

0.8

1

lag

autocorrelation

mesh 1mesh 2mesh 3mesh 4

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 mesh 1

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 mesh 2

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 mesh 3

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5

0.6 mesh 4

norm

alized

coun

ts


Fig. 15. Left: Autocorrelation function estimate (30) of the quantity of interest G (36). Right: Probabilitydensity function estimate of the quantity of interest G (36); all the samples, 20 chains with 25,000 sampleseach, so 500,000 in total, are pulled together in the histogram; the number of counts is normalized so thatthe plot represents a probability density function. We use the H-pCN method (𝛽 = 0.4) to draw samples. Weconsider four different meshes which are increasingly refined from the coarsest (mesh 1) to the finest (mesh4).

Table 3. Acceptance rate (AR), multivariate potential scale reduction factor (MPSRF) and effective samplesize (ESS) of the posterior samples generated by using the pCN and H-pCN methods for the larger noisecase. MPSRF and ESS are computed from the projection of samples along the first 5 dominant eigenvectorsof the prior-preconditioned data misfit Hessian at the MAP point. We use 𝛽 = 0.2 for the pCN method and𝛽 = 0.9 for the H-pCN method, respectively, and draw in total 500,000 samples (20 MCMC chains, eachwith 25,000 iterations). The number in parenthesis of the minimum ESS and the maximum ESS indicates thecorresponding eigenvector index.

Method AR (%) MPSRF Min. ESS Max. ESS Avg. ESSpCN 35 1.004 3,014 (5) 16,100 (2) 8,684H-pCN 61 1.006 890 (1) 37,691 (5) 11,401

that while the ESS decreases as the dimension increases, the convergence of samples is almostindependent with respect to the MPSRF and the autocorrelation function.

5.1.7 MCMC results with larger uncertainty. So far, we considered inverse problems with a largenumber of observations (𝑙 = 300) and small noise (𝜎 = 0.005). In some cases, however, a limitednumber of measurements is available with larger noise, and one may expect that the posterior to beless concentrated. In this section we extend our study to such a problem and report a comparisonof two MCMC methods, the pCN and the H-pCN.For consistency, for this study, we use the same setup as in the first example, but with smaller

observations (𝑙 = 60) and larger noise (𝜎 = 0.1). We summarize in Table 3 and Figure 16 theconvergence diagnostics of the pCN and the H-pCN methods. The results reveal that increasing theuncertainty in the observations leads to an improved performance in both the pCN and the H-pCNmethods. As expected, the Hessian-informed h-pCN still largely outperforms pCN both in terms



0 100 200 300 400 500−0.2

0

0.2

0.4

0.6

0.8

1

lag

autocorrelation

pCNH-pCN

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5pCN

4 2 0 2 40.0

0.1

0.2

0.3

0.4

0.5H-pCN

quantityof

interest𝑞

MCMC step

norm

alized

coun

ts


Fig. 16. Left: Autocorrelation function estimate (30) of the quantity of interest G (36) Right (top): Trace plotsof the quantity of interest G (36) computed by the parameter samples from three independent chains. Right(bottom): Probability density function estimate of the quantity of interest G (36); all 500,000 samples (20chains with 25,000 samples each) are pulled together in the histogram. We use the pCN (𝛽 = 0.2) and theH-pCN (𝛽 = 0.9) MCMC methods. These results are for the larger noise case.

of MPRSF and ESS, however it is worth to notice that, in this case, pCN is still able to adequatelysample the posterior distribution.

5.2 Boundary condition inversion in a three-dimensional 𝑝-Poisson nonlinear PDEIn this second example, we consider a nonlinear PDE in three space dimensions for which we seekto infer an unknown boundary data from pointwise uncertain state observations. Specifically, theforward governing equations are given by

−∇ ·(|∇𝑢 |𝑝−2

𝜖 ∇𝑢)= 𝑓 in Ω,

𝑢 = 𝑔 on 𝜕Ω𝐷 , (43)

|∇𝑢 |𝑝−2𝜖 ∇𝑢 · n =𝑚 on 𝜕Ω \ 𝜕Ω𝐷 ,

with 1 ≤ 𝑝 ≤ ∞. Note that the 𝑝-Laplacian, ∇·(|∇𝑢 |𝑝−2∇𝑢

), is singular when 𝑝 < 2 and degenerates

when 𝑝 > 2 at points ∇𝑢 = 0 [12, 36], so a regularization term 𝜖 (here we take 𝜖 = 1.0 × 10−8) isintroduced in the above equation as |∇𝑢 |𝜖 =

√|∇𝑢 |2 + 𝜖 . The 𝑝-Laplacian is a non-linear counterpart

of the Laplacian operator, and appears in many nonlinear diffusion problems (e.g., non-Newtonianfluids), where a nonlinear diffusion is modeled as a power law type.

In this example, we consider a thin brick domain Ω = [0, 1]2 × [0, 0.05] with volume source term(𝑓 = 0) and assume 𝑝 = 3. Homogeneous Dirichlet boundary conditions (𝑔 = 0) are prescribedon the lateral boundaries and no normal flux is applied on the top boundary surface. We aim toestimate the normal flux𝑚 on the bottom boundary surface from the state observations measuredon the top boundary surface.We discretize Ω using a regular tetrahedral grid and use linear finite elements for all the state,

adjoint, and parameter variables. The dimension of each variable after discretization is 233,289.



Fig. 17. Left: True parameter field (true normal flux on the bottom surface) of the 𝑝-Poisson problem. Middle:Corresponding state field and 𝑙 = 300 observation points (white square marks) on the top surface. Right: TheMAP point.

Table 4. Convergence diagnostics for the 𝑝-Poisson problem: acceptance rate (AR), multivariate potentialscale reduction factor (MPSRF), and effective sample sample size (ESS) of the projection of the parametersamples along the first 25 eigenvectors of the prior-preconditioned data misfit Hessian at the MAP point. Weuse H-pCN method (𝛽 = 0.9) with 20 chains, each with 25,000 iterations (500,000 samples in total).

AR (%) MPSRF Min. ESS Max. ESS Avg. ESS50 1.000 27,075 62,005 49,469

The prior is taken as a Gaussian with zero mean and Cprior = (−𝛾Δ + 𝛿𝐼 )−2 with Robin boundaryconditions 𝛾Δ𝑚 · n + 𝛽𝑚 imposed on 𝜕Ω. Here we take 𝛾 = 1, 𝛿 = 1 and 𝛽 = 0.7. In particular,the value of 𝛽 was chosen following [22] to mitigate boundary artifacts in the prior marginalvariance. Synthetic state observations are created at 𝑙 = 300 random locations uniformly distributedon the top surface by solving the forward problem with the true parameter field𝑚true generatedby sampling the prior and then adding a Gaussian noise (here we take 𝜎 = 0.005 for the noisevector). Figure 17 illustrates the true parameter field on the bottom boundary, the locations ofthe observations on the top surface, and the MAP point obtained by solving the optimizationproblem of minimizing the negative log-posterior. The Laplace approximation of the posterioris then constructed based on the low-rank factorization of the data misfit Hessian at the MAPpoint. The spectrum of the prior-preconditioned data misfit Hessian indicates that the number ofdominant eigenvalues (larger than 1) is about 50.

5.2.1 MCMC results for characterizing the posterior. We present MCMC sampling results for theuncertain boundary condition. We only consider in this example the H-pCN method with 𝛽 = 0.9and run 20 independent MCMC chains, each with 25,000 iterations (500,000 samples are generatedin total). For each MCMC run, a sample from the Laplace approximation of the posterior is takenas the starting point.

As before, we consider the quantity of interest defined by

G =

∫𝜕Ω𝑙

|∇𝑢 |𝑝−2∇𝑢 · n 𝑑𝒙, (44)

where 𝜕Ω𝑙 is the lateral boundary surfaces. Note that the above quantity of interest is evaluatedfrom the state field 𝑢 which is the solution of the nonlinear forward problem (43) given a realizationof the parameter field𝑚 (the boundary condition on the bottom surface).Table 4 lists convergence diagnostics of the MCMC simulation. The parameter samples are

projected along the first 25 eigenvectors of the prior-preconditioned data misfit Hessian at the



0 50 100 150 200 250 300−0.2

0

0.2

0.4

0.6

0.8

1

lag

autocorrelation

0.13 0.14 0.15 0.16 0.170

20

40

60

80

100

quantityof

interest𝑞

MCMC step

norm

alized

coun

ts

quantity of interest 𝑞

Fig. 18. Left: Autocorrelation function estimate (30) of the quantity of interest G (44). Middle: Trace plots ofthe quantity of interest G (44) computed using parameter samples from three independent MCMC chains(colored in blue, green, and red). Right: Probability density function estimate of the quantity of interest G (44);all 500,000 samples are pulled together in the histogram. We use the H-pCN (𝛽 = 0.9).

MAP point, and the MPSRF and the ESS are evaluated based on the projection. We also estimate thequantity of interest G (44) using the parameter samples and illustrate its autocorrelation function,trace plots (three independent MCMC chains) and histograms (all the samples) in Figure 18 whereit is observed that the MCMC chains are mixed well and reach stationarity.

6 CONCLUSIONWe have presented a robust and scalable software framework for the solution of large-scale Bayesianinverse problems governed by PDEs. The software integrates two complementary open-sourcesoftware libraries, hIPPYlib and MUQ, resulting in a unique software framework that addresses theprohibitive nature of Bayesian solution of inverse problems governed by PDEs. The main objectivesof the proposed software framework are to(1) provide to domain scientists a suite of sophisticated and computationally efficient MCMC

methods that exploit Bayesian inverse problem structure; and(2) allow researchers to easily implement new methods and compare against the state of the art.The integration of the two libraries allows advanced MCMC methods to exploit the geometry

and intrinsic low-dimensionality of parameter space, leading to efficient and scalable exploration ofthe posterior distribution. In particular, the Laplace approximation of the posterior is employed togenerate high-quality MCMC proposals. This approximation is based on the inverse of the Hessianof the log-posterior, made tractable via low-rank approximation of the Hessian of the log-likelihood.Numerical experiments on linear and nonlinear PDE-based Bayesian inverse problems illustratethe ability of Laplace-based proposals to accelerate MCMC sampling by factors of ∼ 50×.

Despite the fast and dimension-independent convergence of these advanced structure-exploitingMCMC methods, many Bayesian inverse problems governed by expensive-to-solve PDEs remainout of reach. For example, the results of section 5.1.5 for the Poisson coefficient inverse problemindicate that 𝑂 (106) PDE solves may still be required even with the most efficient MCMC methods.In such cases, hIPPYlib-MUQ can be used as a prototyping environment to study new methodsthat further exploit problem structure, for example through the use of various reduced models(e.g., [20]) or via advanced Hessian approximations that go beyond low rank [2, 3].

Future versions of hIPPYlib-MUQ will feature parallel implementations of MCMC methods. Theresulting multilevel parallelism (within PDE solves, and across MCMC chains) will allow solutionof even more complex PDE-based Bayesian inverse problems with higher-dimensional parameterspaces.



Software Availability. hIPPYlib-MUQ is distributed under the GNU General Public License ver-sion 3 (GPL3). The hIPPYlib-MUQ project is hosted on Git-Hub (https://github.com/hippylib/hippylib2muq) and use Travis-CI for continuous integration. hIPPYlib-MUQ uses semantic version-ing. The results presented in this work were obtained with hIPPYlib-MUQ version 0.2.0, hIPPYlibversion 3.0.0, and MUQ version 0.3.5. A Docker image [41] containing the pre-installed softwareand examples is available at https://hub.docker.com/r/ktkimyu/hippylib2muq. hIPPYlib-MUQ docu-mentation is hosted on ReadTheDocs (https://hippylib2muq.readthedocs.io). Users are encouragedto join the hIPPYlib and MUQ workspaces on Slack to connect with other users, get help, anddiscuss new features; see https://hippylib.github.io/#slack-channel and https://mituq.bitbucket.iofor more information on how to join.

ACKNOWLEDGMENTSThis work was supported by the U.S. National Science Foundation, Software Infrastructure forSustained Innovation (SI2: SSE & SSI) Program under grants ACI-1550593, ACI-1550547, and ACI-1550487 and the Division of Mathematical Sciences under the CAREER grant 1654311. MP and YMwere also supported in part by Office of Naval Research MURI grant N00014-20-1-2595. OG wasalso supported in part by Department of Energy Advanced Scientific Computing Research grantsDE-SC0021239 and DE-SC0019303. The authors gratefully acknowledge computing time on theMulti-Environment Computer for Exploration and Discovery (MERCED) cluster at UC Merced,which was funded by National Science Foundation Grant No. ACI-1429783.

REFERENCES[1] Volkan Akçelik, George Biros, Omar Ghattas, Judith Hill, David Keyes, and Bart van Bloeman Waanders. 2006. Parallel

PDE-constrained optimization. In Parallel Processing for Scientific Computing, M. Heroux, P. Raghaven, and H. Simon(Eds.). SIAM.

[2] N. Alger, V. Rao, A. Meyers, T. Bui-Thanh, and O. Ghattas. 2019. Scalable matrix-free adaptive product-convolutionapproximation for locally translation-invariant operators. SIAM Journal on Scientific Computing 41, 4 (2019), A2296–A2328. https://arxiv.org/abs/1805.06018

[3] Ilona Ambartsumyan, Wajih Boukaram, Tan Bui-Thanh, Omar Ghattas, David Keyes, Georg Stadler, George Turkiyyah,and Stefano Zampini. 2020. Hierarchical Matrix Approximations of Hessians Arising in Inverse Problems Governed byPDEs. SIAM Journal on Scientific Computing 42, 5 (2020), A3397–A3426.

[4] Yves F. Atchadé. 2006. An adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift.Methodology and Computing in Applied Probability 8 (2006), 235–254.

[5] Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, AlpDener, Victorand Eijkhout, William D. Gropp, Dinesh Kaushik, Matthew G. Knepley, Dave A. May, Lois CurfmanMcInnes, Richard Tran Mills, Todd Munson, Karl Rupp, Patrick Sanan, Barry F. Smith, Stefano Zampini, and HongZhang. 2018. PETSc Web page. http://www.mcs.anl.gov/petsc. http://www.mcs.anl.gov/petsc

[6] Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschelman, Victor Eijkhout,WilliamD.Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Karl Rupp, Barry F. Smith, and Hong Zhang.2014. PETSc Web page. http://www.mcs.anl.gov/petsc. http://www.mcs.anl.gov/petsc

[7] Johnathan M Bardsley, Tiangang Cui, Youssef M Marzouk, and Zheng Wang. 2020. Scalable optimization-basedsampling on function space. SIAM Journal on Scientific Computing 42, 2 (2020), A1317–A1347.

[8] E. B. Becker, G. F. Carey, and J. T. Oden. 1981. Finite Elements: An Introduction, Vol I. Prentice Hall, Englewoods Cliffs,New Jersey.

[9] Alexandros Beskos, Mark Girolami, Shiwei Lan, Patrick E Farrell, and Andrew M Stuart. 2017. Geometric MCMC forinfinite-dimensional inverse problems. J. Comput. Phys. 335 (2017), 327–351.

[10] Alfio Borzì and Volker Schulz. 2012. Computational Optimization of Systems Governed by Partial Differential Equations.SIAM.

[11] Stephen P Brooks and Andrew Gelman. 1998. General Methods for Monitoring Convergence of Iterative Simulations.Journal of Computational and Graphical Statistics 7, 4 (dec 1998), 434–455. https://doi.org/10.1080/10618600.1998.10474787

[12] Jed Brown. 2010. Efficient Nonlinear Solvers for Nodal High-Order Finite Elements in 3D. Journal of ScientificComputing 45, 1 (2010), 48–63. https://doi.org/10.1007/s10915-010-9396-8


https://github.com/hippylib/hippylib2muq

https://github.com/hippylib/hippylib2muq

https://hub.docker.com/r/ktkimyu/hippylib2muq

https://hippylib2muq.readthedocs.io

https://hippylib.github.io/#slack-channel

https://mituq.bitbucket.io

https://arxiv.org/abs/1805.06018

http://www.mcs.anl.gov/petsc




https://doi.org/10.1080/10618600.1998.10474787

https://doi.org/10.1080/10618600.1998.10474787

https://doi.org/10.1007/s10915-010-9396-8


[13] Tan Bui-Thanh, Carsten Burstedde, Omar Ghattas, James Martin, Georg Stadler, and Lucas C. Wilcox. 2012. Extreme-scale UQ for Bayesian inverse problems governed by PDEs. In SC12: Proceedings of the International Conference forHigh Performance Computing, Networking, Storage and Analysis. Gordon Bell Prize finalist.

[14] T. Bui-Thanh, O. Ghattas, J. Martin, and G. Stadler. 2013. A computational framework for infinite-dimensional Bayesianinverse problems Part I: The linearized case, with application to global seismic inversion. SIAM Journal on ScientificComputing 35, 6 (2013), A2494–A2523.

[15] Ben Calderhead. 2014. A general construction for parallelizing Metropolis- Hastings algorithms. Proceedings of theNational Academy of Sciences 111, 49 (2014), 17408–17413.

[16] George Casella and Edward I. George. 1992. Explaining the Gibbs sampler. The American Statistician 46, 3 (1992),167–174.

[17] Patrick R Conrad and Youssef M Marzouk. 2013. Adaptive Smolyak pseudospectral approximations. SIAM Journal onScientific Computing 35, 6 (2013), A2643–A2670. https://doi.org/10.1137/120890715

[18] S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White. 2012. MCMC methods for functions: modifying old algorithmsto make them faster. (2012). submitted.

[19] T. Cui, K.J.H. Law, and Y.M. Marzouk. 2016. Dimension-independent likelihood-informed MCMC. J. Comput. Phys. 304(2016), 109–137.

[20] Tiangang Cui, Youssef Marzouk, and Karen Willcox. 2016. Scalable posterior approximations for large-scale Bayesianinverse problems via likelihood-informed parameter and state reduction. J. Comput. Phys. 315 (2016), 363–387.

[21] Tiangang Cui and Olivier Zahm. 2021. Data-free likelihood-informed dimension reduction of Bayesian inverseproblems. Inverse Problems 37, 4 (2021), 045009.

[22] Yair Daon and Georg Stadler. 2018. Mitigating the Influence of Boundary Conditions on Covariance Operators Derivedfrom Elliptic PDEs. Inverse Problems and Imaging 12, 5 (2018), 1083–1102. arXiv:1610.05280

[23] Tim J. Dodwell, Christian Ketelsen, Robert Scheichl, and Aretha L. Teckentrup. 2019. Multilevel Markov chain MonteCarlo. SIAM Rev. 61, 3 (2019), 509–545.

[24] M. Evans and T. Swartz. 2000. Approximating integrals via Monte Carlo and deterministic methods. Vol. 20. OUP Oxford.[25] James M Flegal and Galin L Jones. 2010. Batch means and spectral variance estimators in Markov chain Monte Carlo.

The Annals of Statistics 38, 2 (2010), 1034–1070.[26] Andrew Gelman, John B Carlin, Hal S Stern, and Donald B Rubin. 2004. Bayesian data analysis.[27] Mark Girolami and Ben Calderhead. 2011. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal

of the Royal Statistical Society: Series B (Statistical Methodology) 73, 2 (2011), 123–214.[28] Gene H. Golub and Charles F. Van Loan. 1996. Matrix Computations (third ed.). Johns Hopkins University Press,

Baltimore, MD.[29] Heikki Haario, Eero Saksman, and Johanna Tamminen. 2001. An Adaptive Metropolis Algorithm. Bernoulli 7, 2 (sep

2001), 223–242. https://doi.org/10.2307/3318737[30] Nathan Halko, Per Gunnar Martinsson, and Joel A. Tropp. 2011. Finding structure with randomness: Probabilistic

algorithms for constructing approximate matrix decompositions. SIAM Rev. 53, 2 (2011), 217–288.[31] Jouni Hartikainen and Simo Särkkä. 2010. Kalman filtering and smoothing solutions to temporal Gaussian process

regression models. In 2010 IEEE international workshop on machine learning for signal processing. IEEE, 379–384.https://doi.org/10.1109/MLSP.2010.5589113

[32] W. Keith Hastings. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 1(1970), 97–109.

[33] Tobin Isaac. 2015. Scalable, Adaptive Methods for Forward and Inverse Problems in Continental-Scale Ice Sheet Modeling.Ph.D. Dissertation. The University of Texas at Austin.

[34] Jari Kaipio and Erkki Somersalo. 2005. Statistical and Computational Inverse Problems. Applied Mathematical Sciences,Vol. 160. Springer-Verlag New York. https://doi.org/10.1007/b138659

[35] Finn Lindgren, H𝒓avard Rue, and Johan Lindström. 2011. An explicit link between Gaussian fields and GaussianMarkov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society:Series B (Statistical Methodology) 73, 4 (2011), 423–498. https://doi.org/10.1111/j.1467-9868.2011.00777.x

[36] Peter Lindqvist. 2017. Notes on the p-Laplace equation. Number 161. University of Jyväskylä.[37] Anders Logg, Kent-Andre Mardal, and Garth N. Wells (Eds.). 2012. Automated Solution of Differential Equations

by the Finite Element Method. Lecture Notes in Computational Science and Engineering, Vol. 84. Springer. https://doi.org/10.1007/978-3-642-23099-8

[38] Tristan Marshall and Gareth Roberts. 2012. An Adaptive Approach to Langevin MCMC. Statistics and Computing 22, 5(Sept. 2012), 1041–1057. https://doi.org/10.1007/s11222-011-9276-6

[39] James Martin, Lucas C Wilcox, Carsten Burstedde, and Omar Ghattas. 2012. A stochastic Newton MCMC method forlarge-scale statistical inverse problems with application to seismic inversion. SIAM Journal on Scientific Computing 34,3 (2012), A1460–A1487.


https://doi.org/10.1137/120890715

https://arxiv.org/abs/1610.05280

https://doi.org/10.2307/3318737

https://doi.org/10.1109/MLSP.2010.5589113

https://doi.org/10.1007/b138659

https://doi.org/10.1111/j.1467-9868.2011.00777.x

https://doi.org/10.1007/978-3-642-23099-8

https://doi.org/10.1007/978-3-642-23099-8

https://doi.org/10.1007/s11222-011-9276-6


[40] Youssef Marzouk, Tarek Moselhy, Matthew Parno, and Alessio Spantini. 2016. Sampling via Measure Transport: AnIntroduction. Springer International Publishing, 1–41. https://doi.org/10.1007/978-3-319-11259-6_23-1

[41] Dirk Merkel. 2014. Docker: Lightweight Linux Containers for Consistent Development and Deployment. Linux J.2014, 239, Article 2 (2014). http://dl.acm.org/citation.cfm?id=2600239.2600241

[42] Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953.Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics 21, 6 (1953), 1087–1092.https://doi.org/10.1063/1.1699114

[43] Antonietta Mira et al. 2001. On Metropolis-Hastings algorithms with delayed rejection. Metron 59, 3-4 (2001), 231–241.[44] R. M. Neal. 2010. Handbook of Markov Chain Monte Carlo. Chapman & Hall / CRC Press, Chapter MCMC using

Hamiltonian dynamics.[45] Art B Owen. 2013. Monte Carlo theory, methods and examples. (2013).[46] Matthew Parno, Andrew Davis, Patrick Conrad, and YM Marzouk. 2014. MIT Uncertainty Quantification (MUQ)

Library. https://muq.mit.edu[47] Matthew D Parno and Youssef M Marzouk. 2018. Transport map accelerated Markov chain Monte Carlo. SIAM/ASA

Journal on Uncertainty Quantification 6, 2 (2018), 645–682. https://doi.org/10.1137/17M1134640[48] Noemi Petra, JamesMartin, Georg Stadler, andOmarGhattas. 2014. A computational framework for infinite-dimensional

Bayesian inverse problems: Part II. Stochastic Newton MCMC with application to ice sheet inverse problems. SIAMJournal on Scientific Computing 36, 4 (2014), A1525–A1555.

[49] Frank J Pinski, Gideon Simpson, Andrew M Stuart, and Hendrik Weber. 2015. Algorithms for Kullback–Leiblerapproximation of probability measures in infinite dimensions. SIAM Journal on Scientific Computing 37, 6 (2015),A2733–A2757.

[50] S. J. Press. 2003. Subjective and Objective Bayesian Statistics: Principles, Methods and Applications. Wiley, New York.[51] Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning. The MIT

Press.[52] Christian P. Robert and George Casella. 2005. Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer-

Verlag New York, Inc., Secaucus, NJ, USA.[53] Gareth O Roberts, Jeffrey S Rosenthal, et al. 2004. General state space Markov chains and MCMC algorithms. Probability

surveys 1 (2004), 20–71.[54] Gareth O. Roberts and Osnat Stramer. 2003. Langevin Diffussions and Metropolis-Hastings Algorithms. Methodology

and Computing in Applied Probability 4 (2003), 337–357.[55] D. Rudolph and B. Sprungk. 2018. On a Generalization of the Preconditioned Crank-Nicolson Metropolis Algorithm.

Foundations of Computational Mathematics 18 (2018), 309–343. Issue 2.[56] Stigler, S. M. 1986. Laplace’s 1774 Memoir on Inverse Probability. Statist. Sci. 1, 3 (08 1986), 359–363. https:

//doi.org/10.1214/ss/1177013620[57] G. Strang and G. J. Fix. 1988. An Analysis of the Finite Element Method. Wellesley-Cambridge Press, Wellesley, MA.[58] Andrew M. Stuart. 2010. Inverse problems: A Bayesian perspective. Acta Numerica 19 (2010), 451–559. https:

//doi.org/10.1017/S0962492910000061[59] Albert Tarantola. 2005. Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM, Philadelphia, PA.

xii+342 pages.[60] L. Tierney and J. B. Kadane. 1986. Accurate Approximations for Posterior Moments and Marginal Densities. J. Amer.

Statist. Assoc. 81, 393 (1986), 82–86. https://doi.org/10.1080/01621459.1986.10478240[61] The Trilinos Project Team. 2020 (acccessed May 22, 2020). The Trilinos Project Website. https://trilinos.github.io[62] Fredi Tröltzsch. 2010. Optimal Control of Partial Differential Equations: Theory, Methods and Applications. Graduate

Studies in Mathematics, Vol. 112. American Mathematical Society.[63] Dootika Vats, James M Flegal, and Galin L Jones. 2019. Multivariate output analysis for Markov chain Monte Carlo.

Biometrika 106, 2 (2019), 321–337.[64] Aki Vehtari, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2020. Rank-Normalization,

Folding, and Localization: An Improved 𝑅 for Assessing Convergence of MCMC (with Discussion). Bayesian Analysis16, 2 (2020), 1–26. https://doi.org/10.1214/20-ba1221 arXiv:arXiv:1903.08008v5

[65] Umberto Villa, Noemi Petra, and Omar Ghattas. 2021. hIPPYlib: An Extensible Software Framework for Large-ScaleInverse Problems Governed by PDEs: Part I: Deterministic Inversion and Linearized Bayesian Inference. ACM Trans.Math. Softw. 47, 2, Article 16 (April 2021), 34 pages. https://doi.org/10.1145/3428447

[66] David Williams. 1991. Probability with Martingales. Cambridge University Press.[67] Ulli Wolff, Alpha Collaboration, et al. 2004. Monte Carlo errors with less errors. Computer Physics Communications

156, 2 (2004), 143–153.[68] R. Wong. 2001. Asymptotic Approximations of Integrals. Society for Industrial and Applied Mathematics. https:

//doi.org/10.1137/1.9780898719260 arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9780898719260


https://doi.org/10.1007/978-3-319-11259-6_23-1

http://dl.acm.org/citation.cfm?id=2600239.2600241

https://doi.org/10.1063/1.1699114

https://muq.mit.edu

https://doi.org/10.1137/17M1134640

https://doi.org/10.1214/ss/1177013620

https://doi.org/10.1214/ss/1177013620

https://doi.org/10.1017/S0962492910000061

https://doi.org/10.1017/S0962492910000061

https://doi.org/10.1080/01621459.1986.10478240

https://trilinos.github.io

https://doi.org/10.1214/20-ba1221

https://arxiv.org/abs/arXiv:1903.08008v5

https://doi.org/10.1145/3428447

https://doi.org/10.1137/1.9780898719260

https://doi.org/10.1137/1.9780898719260

https://arxiv.org/abs/https://epubs.siam.org/doi/pdf/10.1137/1.9780898719260


[69] Olivier Zahm, Tiangang Cui, Kody Law, Alessio Spantini, and Youssef Marzouk. 2018. Certified dimension reductionin nonlinear Bayesian inverse problems. Preprint (2018).


arxiv:2112.00713v1 [math.na] 1 dec 2021

Documents