introduction to bayesian methods and uncertainty quantiﬁcation · 2019. 10. 1. · wesolowski et...

Introduction to Bayesian Methods and

Uncertainty Quantification

Sarah Wesolowski

Salisbury University

APS Division of Nuclear Physics meeting, 2019

1 / 20

What is "Bayesian" inference?

P. Gregory, "Bayesian Logical Data Analysis for the Physical Sciences"

2 / 20

Comparison of credible and confidence intervals

Bayesian probability:

probabilities treated as degree ofplausibilitymore natural interpretation forquantities like model parameters

Frequentist probability:

probability is long-run frequencyrelies on the idea of identicalrepeats

p% credible interval: there is p%probability that the true, unknownvalue lies in the interval

p% confidence interval: will cover thetrue value of the quantity over p% ofexperiments

See references for more nuance, philosophy, and debates.

3 / 20

Uncertainty Quantification in Nuclear Physics

To produce meaningful experimental measurements and theoreticalpredictions, it is essential to quantify uncertainties!

Theory discrepancy:

Made up of the following:

missing physicsnumerical/ method errorsfitting to uncertain data

Notes:

likely to be "systematic"not usually fully quantifiedoften assumed to be normal

Experimental discrepancy:

Made up of the following:

counting statisticsbackground and selection effectssystematic uncertainties

Notes

systematic errors may not be wellunderstood or inflatedoften assumed to be normal

+ ! = + !"th "th "exp "exp

!"th !"exp

4 / 20

Practical details

Probability of being true given that is true:

"Given information": inclusion of prior information (physics!)Bayesian pdfs follow all the same rules of probability:

Bayes theorem is a simple rearrangement of the product rule:

Can develop prescriptions for combining sources of uncertaintyMany frequentist procedures have a clear Bayesian interpretation.

# $

%(#|$)

%(#|$) + %( |$) = 1#̄

%(#, $|&) = %(#|&)%($|#, &)= %($|&)%(#|$, &)

%(#|$, &) = %($|#, &)%(#|&)%($|&)

5 / 20

Uncertainty quantification issues

Main problem: given the available information, what is the probability

distribution (pdf ) of uncertainties?

Entangled problems of UQ

And more... Design of experiments, sensitivity analysis, etc.

6 / 20

Bayesian parameter estimation

Consider a model or theory with parameters whichwe wish to constrain with measured data , givenbackground information

Goal: estimate the pdf

Bayes theorem allows us to actually compute this pdf:

Names for each of these terms:

Posterior: Likelihood: Prior: Evidence/ marginal likelihood (normalization factor):

' = { , , … , }( ⃗ (0 (1 ('− 1) * = { , , … , }+1 +2 +)

,

pr( |*, ,)( ⃗

pr( |*, ,) =( ⃗ pr(*| , ,) pr( |,)( ⃗ ( ⃗pr(*|,)

pr( |*, ,)( ⃗pr(*| , ,)( ⃗

pr( |,)( ⃗

pr(*|,) = ∫ + pr(*| , ,) pr( |,)( ⃗ ( ⃗ ( ⃗7 / 20

Example parameter estimation problem

The problem (developed to mock up an effective field theory expansion):

Generate synthetic data with indep. Gaussian noise from "real-world"

Given data and series expansion, estimate coefficients of Taylor series(about ) up to some order. Truncated polynomial is the theory

with parameters

This is linear optimization, unlike most problems in nuclear physics.

Problem is simple but helpful for intuition about many statistical issues.

Schindler and Phillips, Annals Phys. 324, 3 (2009)

SW et al., JPG 43, 074001 (2016)

-(.)

-(.) = = 0.25 + 1.57. + 2.47 + 1.29 + …( + tan( .))12

/2

2.2 .3

. = 0(.)-th ' + 1 ( ⃗

(.) =-th ∑0= 0

'(0.0

8 / 20


Given data , estimate and plot the posterior pdf of the parameters

The resulting pdf can be used to propagate one piece ofuncertainty to the final prediction . Here

Also have model discrepancy due to truncation of the Taylor polynomial!

The data: and "real world" function:

* ( ⃗pr( |*, ,)( ⃗

- = + !-th -th

! = (! + (!-th -th)params -th)trunc

*

9 / 20


becomes, with normal, independent data with standarddeviations and a uniform (bounded) prior

where

This is the standard least-squares optimization result. For this example, it canbe solved analytically using standard results.

pr( |*, ,) =( ⃗ pr(*| , ,) pr( |,)( ⃗ ( ⃗pr(*|,)

* = {+1})1= 1

{21})1= 1 pr( |,) ∝ 1( ⃗

pr( |*, ,) ∝( ⃗ 3 − /24 2

( ) =42 ( ⃗ ∑1= 1

)

( )− ( ; )+1 -th .1 ( ⃗21

2

10 / 20

Simple parameter estimation problem

What happens to this problem when you use least-squares?

Underfitting: Overfitting:

pr( |*, ,) ∝( ⃗ 3 − /24 2

=-th (0 =-th ∑50= 0 (0.0

11 / 20

Simple parameter estimation problem

Use information that Taylor series coefficients are "natural" (EFT principle).

Regular least-squares Gaussian prior with width 5

see also: regularized least-squares

pr( |,) ∝( ⃗ 3 − /(2 ))(⃗2 (̄2

pr( |*, ,) ∝( ⃗ 3 − /2− /(2 )4 2 (⃗2 (̄2

12 / 20

More issues in this problem

Check for robustness to the prior pdf Include impact of higher-order terms (theory discrepancy)

Simple to do in this linear problem, but nonlinear problems:not analyticsampling objective function can be costly

have to result to sampling: Markov Chain Monte Carlo (MCMC)non-normality, multimodality

ValidationUQ: theory discrepancy and parameter uncertainty (and everything else)

pr( |,)( ⃗

-(.) = +∑0= 0

'(0.0 ∑

0= '+ 1

∞(0.0

( ) =42 ( ⃗ ∑1= 1

)

( )− ( ; )+1 -th .1 ( ⃗21

2

13 / 20

Marginalization

Marginalization "integrates out" nuisance parameters by summing over acomplete set of possibilities:

Previous example: include effects of unconstrained higher-order terms!

Useful for introducing auxiliary parameters, e.g.:

where is the natural width of the prior. This can be used to avoid too tightlyspecifying a single value for such a parameter.

But we must now specify a prior on the hyperparameter .

Previous plots

Marginalization is used to plot and study pdfs as well, for example:

pr(#) = ∫ +$%(#, $) = ∫ +$%(#|$)%($)

pr( |,) = ∫ + pr( | , ,) pr( |,)( ⃗ (̄ ( ⃗ (̄ (̄

(̄

(̄

pr( |,) = !( − 5)(̄ (̄

pr( |*, ,) = ∫ + ⋯ + pr( |*, ,)(0 (1 (' ( ⃗14 / 20

Approximating a pdf as a histogram

Sampling

may be expensive for nonlinear problems and intensive observables.

We use MCMC sampling to evaluate the pdf at many values of , andhistogram the samples obtained

Many flavors of MCMC on the market in a variety of languages.Some python packages: emcee [used here], pymc3, pySTAN.

Also nested sampling: pyMultiNest

Sampling can still be infeasible. Get around the problem with emulation:

Bayesian optimization Ekström et al. JPhysG 46, 9 (2019)

Eigenvector continuation Frame et al., PRL 121 032501 (2018)

Conjugate priors can simplify things because posterior is analyticsee Melendez, SW, et al. PRC 100 (2019)

pr( |*, ,)( ⃗

( ⃗

15 / 20

Approximating a pdf as a histogram

Marginalization is trivial over various parameters, just use samples in theparameters you want:

16 / 20

Quantitative model comparison with Bayes

Very generic formulation: model 1 ( ) and model 2 ( ). Examples:

totally different theories for a phenomenonhierarchical models (one order vs. next)anything else that can be used to compute data

Using Bayes theorem and assuming models are a priori equally likely

"Evidence ratio" or "Odds factor"

Expensive to compute if integrals are not analyticMCMC samples of the posterior don't help (need normalization!)Tricky to interpret

In our simple example, though, it is computable and has a clear interpretation

51 52

=pr( |*, ,)51pr( |*, ,)52

∫ + pr( |*, ,)(1⃗ (1⃗

∫ + pr( |*, ,)(2⃗ (2⃗

17 / 20

Evidence calculation for Taylor series model

Compute evidence at ascending orders in theory

Compare least-squares (brown squares) with Gaussian prior (blue diamonds)

Natural result of Occam's razor for LS result. Saturates for Gaussian prior.

' = 0, 1, 2, … ,

18 / 20

Our work using Bayes for EFT

The BUQEYE (Bayesian Uncertainty Quantification: Errors in Your EFT)collaboration work with low-energy nuclear EFTs:

parameter estimation of EFT low-energy constantsusing Gaussian processes to model EFT truncation errordiagnostics to validate uncertainty estimates

4

19 / 20

Summary

Bayesian statistics is ideal for UQ problems in physics

Explicit, quantitative incorporation of prior information

Like traditional methods, it can be expensive to fully implement

But taking the time to understand and sample pdfs can yield dividends

proper inclusion of theory errorsare pdfs Gaussian, and are covariance approximations justified?

Many approximation methods (like MCMC sampling) and emulationschemes are possible

Bayesian methods and statistical analysis give a new window into nucleartheory, allowing diagnostics and validation of theory expectations from adata-driven perspective!

Visit the BUQEYE collaboration online at buqeye.github.io

A very nice place to learn more about Bayes for nuclear physicists:nucleartalent.github.io/Bayes2019/

20 / 20

References

P. Gregory, "Bayesian Logical Data Analysis for the Physical Sciences"Schindler and Phillips Annals Phys. 324, 3 (2009)Wesolowski et al., JPG 43, 074001 (2016): "Bayesian parameter estimationfor effective field theories"Ekström et al. JPhysG 46, 9 (2019): "Bayesian optimization in ab initionuclear physics"Frame et al., PRL 121 032501 (2018): "Eigenvector Continuation withSubspace Learning"Melendez et al. PRC 100 (2019): "Quantifying Correlated Truncation Errorsin Effective Field Theory"

20 / 20

https://www.cambridge.org/core/books/bayesian-logical-data-analysis-for-the-physical-sciences/09E9A95DAE275F5B005676C71B542598

https://www.sciencedirect.com/science/article/pii/S000349160800136X?via%3Dihub

https://iopscience.iop.org/article/10.1088/0954-3899/43/7/074001/meta

https://iopscience.iop.org/article/10.1088/1361-6471/ab2b14

https://link.aps.org/doi/10.1103/PhysRevLett.121.032501

https://journals.aps.org/prc/abstract/10.1103/PhysRevC.100.044001

General Bayes/Machine Learning references

Bayesian statistics

D.S. Sivia and J. Skilling, "Data Analysis: A Bayesian Tutorial"P. Gregory, "Bayesian Logical Data Analysis for the Physical Sciences"Gelman et al., "Bayesian Data Analysis" (3rd edition)R. Trotta, "Bayes in the sky: Bayesian inference and model selection incosmology"

Other stu! (like Gaussian processes)

Rasmussen and Williams, "Gaussian processes for machine learning"D. J.C. MacKay, " ︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎Introduction to Gaussian processes"D. J.C. MacKay, "Information Theory, Inference, and Learning Algorithms"

20 / 20

https://global.oup.com/academic/product/data-analysis-9780198568322?cc=us&lang=en&

https://www.cambridge.org/core/books/bayesian-logical-data-analysis-for-the-physical-sciences/09E9A95DAE275F5B005676C71B542598

http://www.stat.columbia.edu/~gelman/book/

https://www.tandfonline.com/doi/abs/10.1080/00107510802066753

http://www.gaussianprocess.org/gpml/

http://www.inference.org.uk/mackay/gpB.pdf

http://www.inference.org.uk/itila/book.html

introduction to bayesian methods and uncertainty quantiﬁcation · 2019. 10. 1. · wesolowski et...

Documents