developments in bayesian priors

27
Developments in Bayesian Priors Roger Barlow Manchester IoP meeting November 16 th 2005

Upload: gaius

Post on 05-Jan-2016

57 views

Category:

Documents


5 download

DESCRIPTION

Developments in Bayesian Priors. Roger Barlow Manchester IoP meeting November 16 th 2005. Plan. Probability Frequentist Bayesian Bayes Theorem Priors Prior pitfalls (1): Le Diberder Prior pitfalls (2): Heinrich Jeffreys’ Prior Fisher Information Reference Priors: Demortier. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Developments in Bayesian Priors

Developments in Bayesian Priors

Roger BarlowManchester IoP meeting

November 16th 2005

Page 2: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 2

Plan

• Probability– Frequentist– Bayesian

• Bayes Theorem– Priors

• Prior pitfalls (1): Le Diberder• Prior pitfalls (2): Heinrich• Jeffreys’ Prior

– Fisher Information

• Reference Priors: Demortier

Page 3: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 3

Probability

Probability as limit of frequencyP(A)= Limit NA/Ntotal

Usual definition taught to studentsMakes senseWorks well most of the time-

But not all

Page 4: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 4

Frequentist probability

“It will probably rain tomorrow.”“ Mt=174.3±5.1 GeV means the top quark

mass lies between 169.2 and 179.4, with 68% probability.”

“The statement ‘It will rain tomorrow.’ is probably true.”

“Mt=174.3±5.1 GeV means: the top quark mass lies between 169.2 and 179.4, at 68% confidence.”

Page 5: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 5

Bayesian Probability

P(A) expresses my belief that A is true

Limits 0(impossible) and 1 (certain)

Calibrated off clear-cut instances (coins, dice, urns)

Page 6: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 6

Frequentist versus Bayesian?

Two sorts of probability – totally different. (Bayesian probability also known as Inverse Probability.)

Rivals? Religious differences? Particle Physicists tend to be frequentists.

Cosmologists tend to be BayesiansNo. Two different tools for practitionersImportant to:• Be aware of the limits and pitfalls of both• Always be aware which you’re using

Page 7: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 7

Bayes Theorem (1763)

P(A|B) P(B) = P(A and B) = P(B|A) P(A)P(A|B)=P(B|A) P(A)

P(B)Frequentist use eg Čerenkov counter

P( | signal)=P(signal | ) P() / P(signal)Bayesian useP(theory |data) = P(data | theory) P(theory) P(data)

Page 8: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 8

Bayesian Prior

P(theory) is the PriorExpresses prior belief theory is trueCan be function of parameter:

P(Mtop), P(MH), P(α,β,γ)

Bayes’ Theorem describes way prior belief is modified by experimental data

But what do you take as initial prior?

Page 9: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 9

Uniform Prior

General usage: choose P(a) uniform in a(principle of insufficient reason)

Often ‘improper’: ∫P(a)da =∞. Though posterior P(a|x) comes out sensible

BUT!If P(a) uniform, P(a2) , P(ln a) , P(√a).. are notInsufficient reason not valid (unless a is ‘most

fundamental’ – whatever that means)Statisticians handle this: check results for

‘robustness’ under different priors

Page 10: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 10

Example – Le Diberder

Sad StoryFitting CKM angle α from B6 observables3 amplitudes: 6 unknown parameters

(magnitudes, phases) α is the fundamentally interesting one

Page 11: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 11

Results

Frequentist

BayesianSet one phase to zeroUniform priors in other two phases and 3 magnitudes

Page 12: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 12

More ResultsBayesianParametrise Tree and Penguin amplitudes

0

00

1

21

2

P

TC

TC P

ii

iiC

i iiC

A Te Pe

A e T T e

A e T e Pe

Bayesian3 Amplitudes: 3 real parts, 3 Imaginary parts

Page 13: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 13

Interpretation

• B shows same (mis)behaviour

• Removing all experimental info gives similar P(α)

• The curse of high dimensions is at work

Uniformity in x,y,z makes P(r) peak at large rThis result is not robust

under changes of prior

Page 14: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 14

Example - Heinrich

CDF statistics group looking at problem of estimating signal cross section S in presence of background and efficiency.

N= εS+bEfficiency and Background from separate calibration

experiments (sidebands or MC). Scaling factors κ, ω are known.

Everything done using Bayesian methods with uniform priors and Poisson statistics formula. Calibration experiments use uniform prior for ε and for b, yielding posteriors used for S

P(N|S)=(1/N!)∫∫e-(εS+b) (εS+b )N P(ε) P(b) dε db Check coverage – all fine

Page 15: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 15

But it all goes pear shaped..

If particle decays in several channelsHγγ H τ+ τ- Hbb

Each channel with different b and ε: total 2N+1 parameters, 2N+1 experiments

Heavy undercoverage!e.g. with 4 channels, all ε=25±10%, b=0.75±0.25For s≈10 get ’90% upper limit’ above s in only 80% of cases

90%

100%

S10 20

Page 16: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 16

The curse strikes again

Uniform prior in ε: fineUniform prior in ε1, ε2…

εN

εN-1 prior in total εPrejudice in favour of

high efficiencySignal size downgraded

Page 17: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 17

Happy ending

Effect avoided by using Jeffreys’ Priors instead of uniform priors for ε and b

Not uniform but like 1/ε, 1/b

Not entirely realistic but interestingUniform prior in S is not a problem –

but maybe should consider 1/√S?Coverage (a very frequentist concept)

is a useful tool for Bayesians

Page 18: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 18

Fisher InformationAn informative experiment

is one for which a measurement of x will give precise information about the parameter a.

Quantify: I(a)= -<2 ln L/a2>

(Second derivative – curvature)

P(x,a): everything

P(x)|a is the pdf

P(a)|x is the likelihood L(a)

Page 19: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 19

Jeffreys’ PriorA prior may be uniform in a – but if

I(a) depends on a it’s still not ‘flat’: special values of a give better measurements

Transform a a’ such that I(a’) is

constant. Then choose a uniform prior• location parameter – uniform prior OK• scale parameter – a’ is ln a. prior 1/a• Poisson mean – prior 1/√a

Page 20: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 20

Objective Prior?

Jeffreys called this an ‘objective’ prior as opposed to ‘subjective’ or straight guesswork, but not everyone was convinced

For statisticians ‘flat prior’ means Jeffreys prior. For physicists it means uniform prior

Prior depends on likelihood. Your ‘prior belief’ P(MH) (or whatever) depends on the analysis

Equivalent to a prior proportional to √I

Page 21: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 21

Reference Priors (Demortier)

4 steps• Intrinsic DiscrepancyBetween two PDFs δ{P1(z),P2(z)}=Min{∫P1(z)ln(P1(z)/P2(z)) dz,

∫P2(z)ln(P2(z)/P1(z))dz}Sensible measure of differenceδ=0 iff P1(z) & P2(z) are the same, else +ve Invariant under all transformations of z

Page 22: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 22

Reference Priors (2)

2) Expected Intrinsic InformationMeasurement M: x is sampled from p(x|a)Parameter a has a prior p(a)Joint distribution p(x,a)=p(x|a) p(a) Marginal distribution p(x)=∫p(x|a) p(a) da

I(p(a),M)=δ{p(x,a),p(x)p(a)}Depends on (i) x-a relationship and (ii)

breadth of p(a)Expected Intrinsic (Shannon) Information from measurement M about parameter a

Page 23: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 23

Reference Priors (3)

3) Missing informationMeasurement Mk – k samples of xEnough measurements fix a completelyLimit k∞ I(p(a),Mk) is the difference

between knowledge encapsulated in prior p(a) and complete knowledge of a. Hence Missing Information given p(a).

Page 24: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 24

Reference Priors(4)

4) Family of priors P (e.g. Fourier series, polynomials, histogram). p(a)P

Ignorance principle: choose the least informative (dumbest) prior in the family: the one for which the missing information Limit k∞ I(p(a),Mk) is largest.

Technical difficulties in taking k limit and integrating over infinite range of a

Page 25: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 25

Family of Priors (Google)

Page 26: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 26

Reference Priors

Do not represent subjective belief – in fact the opposite (like a jury selection). Allow the most input to come from the data. Formal consensus practitioners can use to arrive at sensible posterior

Depend on measurement p(x|a) – cf JeffreysAlso require family of P of possible priorsMay be improper but this doesn’t matter (do not

represent…). For 1 parameter (if measurement is asymptoticallly

Gaussian, which the CLT usually secures) give Jeffreys prior

But can also (unlike Jeffreys) work for several parameters

Page 27: Developments in Bayesian Priors

Manchester IoP Half Day Meeting

Roger Barlow: Developments in Bayesian Priors

Slide 27

Summary

• Probability– Frequentist– Bayesian

• Bayes Theorem– Priors

• Prior pitfalls (1): Le Diberder• Prior pitfalls (2): Heinrich• Jeffreys’ Prior

– Fisher Information

• Reference Priors: Demortier