introduction to bayesian methods and uncertainty quantification · 2019. 10. 1. · wesolowski et...
TRANSCRIPT
Introduction to Bayesian Methods and
Uncertainty Quantification
Sarah Wesolowski
Salisbury University
APS Division of Nuclear Physics meeting, 2019
1 / 20
What is "Bayesian" inference?
P. Gregory, "Bayesian Logical Data Analysis for the Physical Sciences"
2 / 20
Comparison of credible and confidence intervals
Bayesian probability:
probabilities treated as degree ofplausibilitymore natural interpretation forquantities like model parameters
Frequentist probability:
probability is long-run frequencyrelies on the idea of identicalrepeats
p% credible interval: there is p%probability that the true, unknownvalue lies in the interval
p% confidence interval: will cover thetrue value of the quantity over p% ofexperiments
See references for more nuance, philosophy, and debates.
3 / 20
Uncertainty Quantification in Nuclear Physics
To produce meaningful experimental measurements and theoreticalpredictions, it is essential to quantify uncertainties!
Theory discrepancy:
Made up of the following:
missing physicsnumerical/ method errorsfitting to uncertain data
Notes:
likely to be "systematic"not usually fully quantifiedoften assumed to be normal
Experimental discrepancy:
Made up of the following:
counting statisticsbackground and selection effectssystematic uncertainties
Notes
systematic errors may not be wellunderstood or inflatedoften assumed to be normal
+ ! = + !"th "th "exp "exp
!"th !"exp
4 / 20
Practical details
Probability of being true given that is true:
"Given information": inclusion of prior information (physics!)Bayesian pdfs follow all the same rules of probability:
Bayes theorem is a simple rearrangement of the product rule:
Can develop prescriptions for combining sources of uncertaintyMany frequentist procedures have a clear Bayesian interpretation.
# $
%(#|$)
%(#|$) + %( |$) = 1#̄
%(#, $|&) = %(#|&)%($|#, &)= %($|&)%(#|$, &)
%(#|$, &) = %($|#, &)%(#|&)%($|&)
5 / 20
Uncertainty quantification issues
Main problem: given the available information, what is the probability
distribution (pdf ) of uncertainties?
Entangled problems of UQ
And more... Design of experiments, sensitivity analysis, etc.
6 / 20
Bayesian parameter estimation
Consider a model or theory with parameters whichwe wish to constrain with measured data , givenbackground information
Goal: estimate the pdf
Bayes theorem allows us to actually compute this pdf:
Names for each of these terms:
Posterior: Likelihood: Prior: Evidence/ marginal likelihood (normalization factor):
' = { , , … , }( ⃗ (0 (1 ('− 1) * = { , , … , }+1 +2 +)
,
pr( |*, ,)( ⃗
pr( |*, ,) =( ⃗ pr(*| , ,) pr( |,)( ⃗ ( ⃗pr(*|,)
pr( |*, ,)( ⃗pr(*| , ,)( ⃗
pr( |,)( ⃗
pr(*|,) = ∫ + pr(*| , ,) pr( |,)( ⃗ ( ⃗ ( ⃗7 / 20
Example parameter estimation problem
The problem (developed to mock up an effective field theory expansion):
Generate synthetic data with indep. Gaussian noise from "real-world"
Given data and series expansion, estimate coefficients of Taylor series(about ) up to some order. Truncated polynomial is the theory
with parameters
This is linear optimization, unlike most problems in nuclear physics.
Problem is simple but helpful for intuition about many statistical issues.
Schindler and Phillips, Annals Phys. 324, 3 (2009)
SW et al., JPG 43, 074001 (2016)
-(.)
-(.) = = 0.25 + 1.57. + 2.47 + 1.29 + …( + tan( .))12
/2
2.2 .3
. = 0(.)-th ' + 1 ( ⃗
(.) =-th ∑0= 0
'(0.0
8 / 20
Example parameter estimation problem
Given data , estimate and plot the posterior pdf of the parameters
The resulting pdf can be used to propagate one piece ofuncertainty to the final prediction . Here
Also have model discrepancy due to truncation of the Taylor polynomial!
The data: and "real world" function:
* ( ⃗pr( |*, ,)( ⃗
- = + !-th -th
! = (! + (!-th -th)params -th)trunc
*
9 / 20
Example parameter estimation problem
becomes, with normal, independent data with standarddeviations and a uniform (bounded) prior
where
This is the standard least-squares optimization result. For this example, it canbe solved analytically using standard results.
pr( |*, ,) =( ⃗ pr(*| , ,) pr( |,)( ⃗ ( ⃗pr(*|,)
* = {+1})1= 1
{21})1= 1 pr( |,) ∝ 1( ⃗
pr( |*, ,) ∝( ⃗ 3 − /24 2
( ) =42 ( ⃗ ∑1= 1
)
( )− ( ; )+1 -th .1 ( ⃗21
2
10 / 20
Simple parameter estimation problem
What happens to this problem when you use least-squares?
Underfitting: Overfitting:
pr( |*, ,) ∝( ⃗ 3 − /24 2
=-th (0 =-th ∑50= 0 (0.0
11 / 20
Simple parameter estimation problem
Use information that Taylor series coefficients are "natural" (EFT principle).
Regular least-squares Gaussian prior with width 5
see also: regularized least-squares
pr( |,) ∝( ⃗ 3 − /(2 ))(⃗2 (̄2
pr( |*, ,) ∝( ⃗ 3 − /2− /(2 )4 2 (⃗2 (̄2
12 / 20
More issues in this problem
Check for robustness to the prior pdf Include impact of higher-order terms (theory discrepancy)
Simple to do in this linear problem, but nonlinear problems:not analyticsampling objective function can be costly
have to result to sampling: Markov Chain Monte Carlo (MCMC)non-normality, multimodality
ValidationUQ: theory discrepancy and parameter uncertainty (and everything else)
pr( |,)( ⃗
-(.) = +∑0= 0
'(0.0 ∑
0= '+ 1
∞(0.0
( ) =42 ( ⃗ ∑1= 1
)
( )− ( ; )+1 -th .1 ( ⃗21
2
13 / 20
Marginalization
Marginalization "integrates out" nuisance parameters by summing over acomplete set of possibilities:
Previous example: include effects of unconstrained higher-order terms!
Useful for introducing auxiliary parameters, e.g.:
where is the natural width of the prior. This can be used to avoid too tightlyspecifying a single value for such a parameter.
But we must now specify a prior on the hyperparameter .
Previous plots
Marginalization is used to plot and study pdfs as well, for example:
pr(#) = ∫ +$%(#, $) = ∫ +$%(#|$)%($)
pr( |,) = ∫ + pr( | , ,) pr( |,)( ⃗ (̄ ( ⃗ (̄ (̄
(̄
(̄
pr( |,) = !( − 5)(̄ (̄
pr( |*, ,) = ∫ + ⋯ + pr( |*, ,)(0 (1 (' ( ⃗14 / 20
Approximating a pdf as a histogram
Sampling
may be expensive for nonlinear problems and intensive observables.
We use MCMC sampling to evaluate the pdf at many values of , andhistogram the samples obtained
Many flavors of MCMC on the market in a variety of languages.Some python packages: emcee [used here], pymc3, pySTAN.
Also nested sampling: pyMultiNest
Sampling can still be infeasible. Get around the problem with emulation:
Bayesian optimization Ekström et al. JPhysG 46, 9 (2019)
Eigenvector continuation Frame et al., PRL 121 032501 (2018)
Conjugate priors can simplify things because posterior is analyticsee Melendez, SW, et al. PRC 100 (2019)
pr( |*, ,)( ⃗
( ⃗
15 / 20
Approximating a pdf as a histogram
Marginalization is trivial over various parameters, just use samples in theparameters you want:
16 / 20
Quantitative model comparison with Bayes
Very generic formulation: model 1 ( ) and model 2 ( ). Examples:
totally different theories for a phenomenonhierarchical models (one order vs. next)anything else that can be used to compute data
Using Bayes theorem and assuming models are a priori equally likely
"Evidence ratio" or "Odds factor"
Expensive to compute if integrals are not analyticMCMC samples of the posterior don't help (need normalization!)Tricky to interpret
In our simple example, though, it is computable and has a clear interpretation
51 52
=pr( |*, ,)51pr( |*, ,)52
∫ + pr( |*, ,)(1⃗ (1⃗
∫ + pr( |*, ,)(2⃗ (2⃗
17 / 20
Evidence calculation for Taylor series model
Compute evidence at ascending orders in theory
Compare least-squares (brown squares) with Gaussian prior (blue diamonds)
Natural result of Occam's razor for LS result. Saturates for Gaussian prior.
' = 0, 1, 2, … ,
18 / 20
Our work using Bayes for EFT
The BUQEYE (Bayesian Uncertainty Quantification: Errors in Your EFT)collaboration work with low-energy nuclear EFTs:
parameter estimation of EFT low-energy constantsusing Gaussian processes to model EFT truncation errordiagnostics to validate uncertainty estimates
4
19 / 20
Summary
Bayesian statistics is ideal for UQ problems in physics
Explicit, quantitative incorporation of prior information
Like traditional methods, it can be expensive to fully implement
But taking the time to understand and sample pdfs can yield dividends
proper inclusion of theory errorsare pdfs Gaussian, and are covariance approximations justified?
Many approximation methods (like MCMC sampling) and emulationschemes are possible
Bayesian methods and statistical analysis give a new window into nucleartheory, allowing diagnostics and validation of theory expectations from adata-driven perspective!
Visit the BUQEYE collaboration online at buqeye.github.io
A very nice place to learn more about Bayes for nuclear physicists:nucleartalent.github.io/Bayes2019/
20 / 20
References
P. Gregory, "Bayesian Logical Data Analysis for the Physical Sciences"Schindler and Phillips Annals Phys. 324, 3 (2009)Wesolowski et al., JPG 43, 074001 (2016): "Bayesian parameter estimationfor effective field theories"Ekström et al. JPhysG 46, 9 (2019): "Bayesian optimization in ab initionuclear physics"Frame et al., PRL 121 032501 (2018): "Eigenvector Continuation withSubspace Learning"Melendez et al. PRC 100 (2019): "Quantifying Correlated Truncation Errorsin Effective Field Theory"
20 / 20
General Bayes/Machine Learning references
Bayesian statistics
D.S. Sivia and J. Skilling, "Data Analysis: A Bayesian Tutorial"P. Gregory, "Bayesian Logical Data Analysis for the Physical Sciences"Gelman et al., "Bayesian Data Analysis" (3rd edition)R. Trotta, "Bayes in the sky: Bayesian inference and model selection incosmology"
Other stu! (like Gaussian processes)
Rasmussen and Williams, "Gaussian processes for machine learning"D. J.C. MacKay, " ︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎︎Introduction to Gaussian processes"D. J.C. MacKay, "Information Theory, Inference, and Learning Algorithms"
20 / 20