bayesian statistics and information theory€¦ · •ignorance priors and the maximum entropy...

24
Florent Leclercq www.florent-leclercq.eu Imperial Centre for Inference and Cosmology Imperial College London Bayesian statistics and Information Theory Lecture 1: Aspects of probability theory … a.k.a. why am I not allowed to “change the prior” or “cut the data”? May 14 th , 2019

Upload: others

Post on 21-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Florent Leclercq

www.florent-leclercq.eu

Imperial Centre for Inference and CosmologyImperial College London

Bayesian statistics and Information Theory

Lecture 1: Aspects of probability theory… a.k.a. why am I not allowed to “change the prior” or “cut the data”?

May 14th

, 2019

Page 2: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

The github repository

• https://github.com/florent-leclercq/Bayes_InfoTheory

• Course website: http://florent-leclercq.eu/teaching.php2

git clone https://github.com/florent-leclercq/Bayes_InfoTheory.git (or with SSH)

Page 3: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Introduction: why proper statistics matterAn historical example: the Gibbs paradox

• Gibbs’s canonical ensemble and grandcanonical ensembles, derived from the

maximum entropy principle,

of realphysical systems.

• The predicted entropies are always larger thanthe observed ones… there must exist

:

• Discreteness of energy levels: radiation: Planck(1900), solids: Einstein (1907), Debye (1912),Ising (1925), individual atoms : Bohr (1913)…

• …Quantum mechanics: Heisenberg, Schrödinger(1927)

3

J. Willard Gibbs (1839-1903)

Page 4: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Outline: Lecture 1

• Probability theory and Bayesian statistics: reminders

• Ignorance priors and the maximum entropy principle

• Gaussian random fields (and a digression on non-Gaussianity)

• Bayesian signal processing and reconstruction:

• Bayesian de-noising

• Bayesian de-blending

• Bayesian decision theory and Bayesian experimental design

• Bayesian networks, Bayesian hierarchical models and Empirical Bayes

• (time permitting) Hypothesis testing beyond the Bayes factor:

• Model selection as a decision analysis

• Model averaging

• Model selection with insufficient summary statistics

4

Page 5: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Jaynes’s “probability theory”:an extension of ordinary logic

5

Page 6: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Reminders

• A tribute to my PhD supervisor (Benjamin Wandelt):

Ben’s summary of Bayesian statistics:

• Product rule:

• Sum rule:

• Bayes’s formula:

• Bayesian model comparison:

6

priorlikelihoodposterior

evidence

Page 7: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Ignorance priors andthe maximum entropy principle

7

Notebook 1: https://github.com/florent-leclercq/Bayes_InfoTheory/blob/master/LighthouseProblem.ipynbNotebook 2: https://github.com/florent-leclercq/Bayes_InfoTheory/blob/master/MaximumEntropy.ipynb

Page 8: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Gaussian random fields

Bayesian signal processing and reconstruction

8

Notebook 3: https://github.com/florent-leclercq/Bayes_InfoTheory/blob/master/GRF_and_fNL.ipynb

Notebook 4: https://github.com/florent-leclercq/Bayes_InfoTheory/blob/master/WienerFilter_denoising.ipynbNotebook 4bis: https://github.com/florent-leclercq/Bayes_InfoTheory/blob/master/WienerFilter_denoising_CMB.ipynbNotebook 5: https://github.com/florent-leclercq/Bayes_InfoTheory/blob/master/WienerFilter_deblending.ipynb

Page 9: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Bayesian decision theory

Bayesian experimental design(more about that in lecture 3)

9

Notebook 6: https://github.com/florent-leclercq/Bayes_InfoTheory/blob/master/DecisionTheory.ipynb

Page 10: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

10

Structures in the cosmic web

FL, Jasche & Wandelt 2015a, arXiv:1502.02690

Page 11: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

A decision rule for structure classification

• Space of “input features”:

• Space of “actions”:

• A problem of :one should take the action that maximizes the utility

• How to write down the gain functions?

11FL, Jasche & Wandelt 2015b, arXiv:1503.00730

Page 12: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

• One proposal:

• Without data, the expected utility is

• With , it’s a fair game always play

“ ” of the LSS

• Values represent an aversion for risk

increasingly “ ” of the LSS

Gambling with the Universe

12

“Winning”

“Losing”

“Not playing”

“Playing the game”

“Not playing the game”

voidssheetsfilamentsclusters

1.74

7.08

3.83

41.67(T-web, redshift zero)

FL, Jasche & Wandelt 2015b, arXiv:1503.00730

Page 13: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Playing the game…

13

voids

sheets

filaments

clusters

undecided

FL, Jasche & Wandelt 2015b, arXiv:1503.00730

Page 14: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Bayesian networks

Bayesian hierarchical models

and Empirical Bayes

14

Page 15: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Bayesian networks

Bayesian networks are probabilistic graphical models consisting of:

• A

• At each node,

15

Page 16: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Bayesian networks

16

Page 17: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Bayesian networksinference and prediction

• Inference:

• Prediction:

17

Page 18: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

• So we have both:

• This is “ ” or the “ ” phenomenon: two causes collide to explain the same effect.

• Particular case: “ ” or “ ”

Bayesian networksthe “explaining away” phenomenon

18

Page 19: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Malmquist bias

• Malmquist (1925) bias: in magnitude-limited surveys, farobjects are preferentially detected if they are intrinsicallybright.

19detected bright close

unobserved

observed

log(luminosity)

log(distance)

Page 20: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

• Simple inference:

• Adaptive prior:

• … or a full hierarchy of hyperpriors.

• Examples:

• Cosmic microwave background:

• Large-scale structure:

Bayesian hierarchical models

20

prior

prior hyperprior

Page 21: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

BHM example: supernovae (BAHAMAS)

21Shariff et al. 2015, arXiv:1510.05954

Page 22: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

BHM example: weak lensing

22Alsing et al. 2015, arXiv:1505.07840

Page 23: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Empirical Bayesan alternative to maximum entropy for choosing priors

• Iterative scheme (“Gibbs” sampler)

is a truncation of this scheme after a fewsteps (often just one).

• Particular case:

• the (EM) algorithm (machinelearning, data mining).

23

prior hyperprior

Page 24: Bayesian statistics and Information Theory€¦ · •Ignorance priors and the maximum entropy principle •Gaussian random fields (and a digression on non-Gaussianity) •Bayesian

Hypothesis testingbeyond the Bayes factor

24