bayesian case studies, practical 1

12

Click here to load reader

Upload: robin-ryder

Post on 11-Jul-2015

95 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian case studies, practical 1

Bayesian Case Studies, week 1

Robin J. Ryder

7 January 2013

Robin J. Ryder Bayesian Case Studies, week 1

Page 2: Bayesian case studies, practical 1

About this course

Two aims:

1 Implement computational algorithms

2 Analyse real datasets

6× 3 hours.E-mail: [email protected]. Office B627.Evaluation: written-up analysis of a dataset, to hand in by end ofMarch. The project topic will be given in February.

Robin J. Ryder Bayesian Case Studies, week 1

Page 3: Bayesian case studies, practical 1

Exponential family

A family of distributions (=a model) is an exponential family if thedensity can be written as

fX (x |θ) = h(x) exp[η(θ) · T (x)− A(θ)]

where h, η, T and A are known functions.

Then T (x) is a sufficient statistic. For iid x1, . . . , xn,∑

T (xi ) is asufficient statistic for the sample: it encapsulates all theinformation about the parameters included in the data. Theposterior depends on the sample only through the sufficientstatistic.η(θ) is called the natural parameter.A(θ) is the log-partition, the log of the normalizing factor.

Robin J. Ryder Bayesian Case Studies, week 1

Page 4: Bayesian case studies, practical 1

Exponential family

A family of distributions (=a model) is an exponential family if thedensity can be written as

fX (x |θ) = h(x) exp[η(θ) · T (x)− A(θ)]

where h, η, T and A are known functions.

Then T (x) is a sufficient statistic. For iid x1, . . . , xn,∑

T (xi ) is asufficient statistic for the sample: it encapsulates all theinformation about the parameters included in the data. Theposterior depends on the sample only through the sufficientstatistic.η(θ) is called the natural parameter.A(θ) is the log-partition, the log of the normalizing factor.

Robin J. Ryder Bayesian Case Studies, week 1

Page 5: Bayesian case studies, practical 1

Conjugate prior

A family of distributions is a conjugate prior for a given model ifthe posterior belongs to the same family of distributions.This is mostly a computational advantage.If the model is an exponential family, then a conjugate prior exists.

Robin J. Ryder Bayesian Case Studies, week 1

Page 6: Bayesian case studies, practical 1

Jeffreys’ prior

Jeffreys’ prior, also called the uninformative prior, is invariant byreparameterization. In the one-dimensional case, it is defined as

π(θ) ∝√

I (θ)

where I (θ) is the Fisher information, which is defined as a functionof the log-likelihood `:

I (θ) = EX

[(∂`

∂θ

)2∣∣∣∣∣ θ]

= −EX

[∂2`

∂θ2

∣∣∣∣ θ](under certain regularity conditions)

Robin J. Ryder Bayesian Case Studies, week 1

Page 7: Bayesian case studies, practical 1

Jeffreys’ prior (contd)

Jeffreys’ prior may be improper, which means that it integrates toinfinity.This is not an issue as long as the corresponding posterior isproper. This point should always be checked.

Robin J. Ryder Bayesian Case Studies, week 1

Page 8: Bayesian case studies, practical 1

Data: Ship accidents

The dataset ShipAccidents includes data on accidents of 40classes of ships. Each row corresponds to one class. Each class ofship is defined by 3 attributes: type of ship (5 modalities), periodof construction (4 modalities), period of operation (2 modalities).

For each type of ship, we are given the cumulative number ofmonths in operation and the cumulative number of incidents,which we expect to follow a Poisson distribution.

Robin J. Ryder Bayesian Case Studies, week 1

Page 9: Bayesian case studies, practical 1

ABC

Approximate Bayesian Computation is a computational method todraw approximate samples from a posterior distribution in caseswhere the likelihood is intractable, but where it is easy to simulatenew datasets.Given observed data Dobs , with prior π(θ), we wish to sample θfrom the posterior π(theta)L(D|θ).The non-approximate version of the algorithm is:

1 Simulate θ from the prior π.

2 Simulate a new dataset Dsim from the model, with parameterθ.

3 If Dobs = Dsim, then accept θ; else reject θ.

4 Repeat until we get a large enough sample of θ’s.

Robin J. Ryder Bayesian Case Studies, week 1

Page 10: Bayesian case studies, practical 1

ABC (contd)

It is clear that this algorithm gives samples which follow exactlythe posterior distribution, but the acceptation probability at step 3is very small, making the algorithm very slow. Instead, anapproximate version is used, by introducing a distance d ondatasets and a tolerance parameter ε:

1 Simulate θ from the prior π.

2 Simulate a new dataset Dsim from the model, with parameterθ.

3 If d(Dobs ,Dsim) < ε, then accept θ; else reject θ.

4 Repeat until we get a large enough sample of θ’s.

Robin J. Ryder Bayesian Case Studies, week 1

Page 11: Bayesian case studies, practical 1

ABC (contd)

It is clear that this algorithm gives samples which follow exactlythe posterior distribution, but the acceptation probability at step 3is very small, making the algorithm very slow. Instead, anapproximate version is used, by introducing a distance d ondatasets and a tolerance parameter ε:

1 Simulate θ from the prior π.

2 Simulate a new dataset Dsim from the model, with parameterθ.

3 If d(Dobs ,Dsim) < ε, then accept θ; else reject θ.

4 Repeat until we get a large enough sample of θ’s.

Robin J. Ryder Bayesian Case Studies, week 1

Page 12: Bayesian case studies, practical 1

ABC (contd)

In the limit ε→ 0, this algorithm is exact.In practice, the distance is usually computed on a summarystatistic of the data. Ideally, the summary statistic is sufficient,thus incurring no loss of information.

Robin J. Ryder Bayesian Case Studies, week 1