Download - Bayesian Experimental Design for Models with Intractable ... · Introduction Bayes Design Intractable Likelihoods Design Optimisation Example Ending Bayesian Experimental Design for

Introduction Bayes Design Intractable Likelihoods Design Optimisation Example Ending

Bayesian Experimental Design for Models with

Intractable Likelihoods Using Indirect Inference

Dr Chris DrovandiQueensland University of Technology

[email protected]

Collaborators: Tony Pettitt, Caitriona RyanAcknowledgements: Australian Research Council Discovery

Grant and organisers of DEMA

December 14, 2015

Chris Drovandi DEMA 2015, Sydney


Bayesian Experimental Design

Set-up: Have prior distribution p(θ) for parameter ofstatistical model, with likelihood p(y |θ,d ).Wish to design for next n observations. Design variable:d = (d1, . . . , dn).

Define (general) utility function u(d , y ,θ). y is future data.

u(d ) = Eθ,y [u(d , y , θ)] = ∫y ∫θ u(d , y , θ)p(y |d , θ)p(θ)dθdy ,Objective d∗ = argmaxd∈D u(d ).Too difficult to do directly



Bayesian utility functions

Parameter Estimation Design

Kullback-Leibler divergence u(d , y ) = KL(p(θ|y ,d )||p(θ)).Mutual info between θ and y .Concentration of Posterior distribution

Set u(d , y) as entropy of p(θ|y ,d)Posterior precision u(d , y) = 1/det(var(θ|y ,d))

These utility functions assume that the likelihood function is easilycomputable!!!



Dealing with Intractable Likelihood - Parameter Estimation

To estimate u(d , y ) need to obtain posterior distribution

p(θ|y ,d ) ∝ p(y |θ,d )p(θ).But what to do if p(y |θ,d ) is intractable?

Approximate Bayesian Computation (ABC)Bayesian Indirect Inference (II)



Approximate Bayesian Computation

simulation based method that does not involve likelihoodevaluations

involves a joint ‘approximate’ posterior distribution

p(θ, x |y , ǫ) ∝ g(y |x , ǫ)p(x |θ)p(θ)where g(y |x , ǫ) is a weighting functionpopular choice g(y |x , ǫ) = 1(ρ(y , x) ≤ ǫ)

how to choose ρ(y , x)?usually based on small set of summary statistics

choice of ǫ trade-off between accuracy and efficiency



Rejection ABC

1 generate θi ∼ p(θ) for i = 1, . . . ,N

2 simulate x i ∼ p(y |θi ,d ) for i = 1, . . . ,N

3 compute discrepancies ρi = ρ(y , x i ) for i = 1, . . . ,N, creatingparticles {θi , ρi}Ni=1

4 sort particle set via discrepancy ρ

5 calculate ǫ = ρ⌊αN⌋ (where ⌊·⌋ denotes the floor function)

ABC posterior samples = {θi |ρi ≤ ǫ}αNi=1

Steps 1 and 2 are independent of data and {θi , x i }Ni=1 can bestored.Discrepancy function

ρ(y , x) = D∑

i=1

|yi − xi |

stdp(θ)(xi )(1)



ABC Rejection in Design (Drovandi and Pettitt 2013)

Discretise design space (e.g. one (time) dimensiontmin, tmax , tinc )

Store the prior simulations at each design point in the designspace

Search over discretised design space for optimal design

For each proposed design d ∗ can use the relevant stored priorsimulations to sample from p(θ|y ,d ∗, ǫ)

Drawbacks:

Must search over discretised design space

Storage intensive (does not scale to increase in designdimension)

Choosing ǫ (here we end up having different ǫ for eachposterior approximation)

Require larger ǫ for larger number of observations

We use Bayesian indirect inference to overcome these drawbacksChris Drovandi DEMA 2015, Sydney


Bayesian Indirect Inference

Assume that there exists an auxiliary model with a fully tractablelikelihood pa(y |φ,d ), so that approximately

p(y |θ,d ) ∝ pa(y |φ = g(θ),d ).Unfortunately the mapping function g(θ) is unknown but can beestimated via simulation, g(θ). Then we obtain the followingapproximate posterior:

pa(θ|y ,d ) ∝ pa(y |φ = g(θ), d)p(θ).

See Smith (1993) for a classical version of this approach.



Estimating the Mapping Function

An estimate of the auxiliary parameter φ is found by

φ = g(θ) = argmaxφ∈Φ

pa(x |φ,dT ), x ∼ p(x |θ,dT ),Free to choose the training design dT (technically also adesign problem!).

To bring down variance of φ = g(θ) can consider simulatingm independent datasets at θ, x1:m = (x1, . . . , xm).

Do this for a collection of n samples across the prior space to

obtain the estimated mapping {θi , φi}ni=1. Store the mapping.



Approximating the Posterior

Consider a dataset y . Can obtain an importance samplingapproximation of the II posterior with weights

W i ∝ pa(y |φi,d ) for i = 1, . . . , n.

{W i ,θi}ni=1 can be used to estimate u(d , y ).Chris Drovandi DEMA 2015, Sydney


Key Points of Indirect Inference Approach

Advantages:

Free to optimise over continuous design space

Requires storage of the mapping {θi , φi}ni=1 only. Shouldscale better with design dimension.

No need to choose ǫ.

Scale better with number of observations (see later)

Drawbacks:

User must select auxiliary model. Not an obvious choice in allapplications.



Design Optimisation

Once a strategy has been devised to estimate u(d , y ) anyoptmisation approach can be used to solve argmax u(d )where u(d ) = ∫y u(d , y )p(y |d )dy .Here we used the Muller (1999) algorithm. Involves MCMCsimulation from augmented probability model with marginalu(d )J then calculation of multivariate mode (very hard!)



Design Optimisation - Improvement to Muller

A large value of J is required to substantially sharpen theutility surface (computationally intensive)

Consider the modified utility

u(d , y ) = max{(u(d , y )− sup), 0}, (2)

where up is the utility calculated from the prior distribution;

up = 1/det(Var(θ)). (3)

Scaling factor s helps to ensure utility is positive.

Intuitively: how much utility do we obtain above and beyondthe prior.

u(d ) has same mode as u(d ) but u has greater curvature.



Sharpening the Utility Surface with u

0 100 200 3000

1

2

3

4

5

x 1011

Design (days)

u

0 100 200 3000

5

10

x 1010

Design (days)

u



Motivating Example - Macroparasite Immunity

estimate parameters of Markov process model explainingmacroparasite population development with host immunity

212 hosts (cats) i = 1, . . . , 212.Each cat injected with li juvenile Brugia pahangi larvae

at time ti host is sacrificed and number of mature parasitesrecorded

three variable problemM(t) matures, L(t) juveniles, I (t) immunity

only L(0) and M(ti ) observed for each host

[Durham et al (1972)]



0 200 400 600 800 1000 12000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time

Pro

porto

n of

Mat

ures



Trivariate Markov Process of Riley et al (2003)

M(t) L(t)

I(t)

Mature Parasites Juvenile Parasites

Immunity

Maturation

Gain of immunity Loss of immunity

Death due to im

munity

Natural death

Natural death

Invisible

Invisible

Invisible

γL(t)

νL(t) µI I (t)

βI(t)L

(t)

µLL(t)

µMM

(t)



Design Problem

How to choose the sacrifice times and initial juvenile injections togain most information about parameters (ν and µL)?



Auxiliary Beta-Binomial model

The data show too much variation for Binomial

A Beta-Binomial model has an extra parameter to capturedispersion

p(mi |αi , βi ) =

(

li

mi

)

B(mi + αi , li −mi + βi )

B(αi , βi ),

Useful reparameterisation pi = αi/(αi + βi ) andξi = 1/(αi + βi )

Relate the proportion and over dispersion parameters to time,ti , and initial larvae, li , covariates

logit(pi ) = fp(ti , li) = β0 + β1 log(ti) + β2 log(ti)2,

log(ξi ) = η,

Four auxiliary parameters φ = (β0, β1, β2, η)



II design choices (Design just for sacrifice times)

For dT we consider 1000 randomly selected sacrifice times in (30,300) days where the initial number of larvae is set to 100 for everyhost in the training design. m = 1 and n = 10, 000

0.5 1 1.5 2 2.5

x 10−3

−1.5

−1

−0.5

0

ν

β 0

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045−1.5

−1

−0.5

0

µL

β 0



Results - Sacrifice Times Only

method design (days) II utility ABC utility

II design 77.4 5.00 5.95ABC design 99 4.97 6.08

II design (69.7, 97.5) 5.38 7.20ABC design (71, 127) 5.35 7.18

II design (56.5, 82.8, 120.3) 5.73 7.39ABC design (95, 105, 231) 5.54 7.28

II design (50.5, 75.2, 98.2, 143.8) 6.10 7.28ABC design (79, 121, 231, 273) 5.61 7.07



Extend to Sacrifice Times and Initial Juveniles

Extend the auxiliary model (only one extra parameter)

logit(pi ) = fp(ti , li ) = β0 + β1 log(ti ) + β2 log(ti )2,

log(ξi ) = fξ(ti , li) = η0 + η1li ,dT : 1000 hosts in the training design, where the sacrifice times aresampled randomly in (30, 300) days and the initial larvae countsare randomly taken from the set of integers between 100 and 200inclusive.



Results with Initial Juveniles

One host: 69.7 days, 111 juveniles

Two hosts: 1st host 60.9 days, 108 juveniles. 2nd host 135.3 daysand 105 juveniles

Validation: consider 1 host 70 days, estimated utility with ABC for100, 150, 200 juvenilesThe expected utilities for 100, 150 and 200 initial larvae areroughly 6.05 × 1011, 5.32 × 1011 and 4.93 ×1011. Indeed 100juveniles appears optimal.



Future Work

Obtain training design dT optimally.

Smooth out the estimated mapping to obtain more precisemapping

Develop a Laplace approximation of II posterior (allow forhigh-dimensional design for which importance sampling is notsuitable)

Other motivating applications???



Key References

Ryan, Drovandi and Pettitt (2015). Optimal Bayesianexperimental design for models with intractable likelihoodsusing indirect inference applied to biological process models.Bayesian Analysis (to appear)

Drovandi and Pettitt (2013). Bayesian experimental design formodels with intractable likelihoods. Biometrics

Drovandi, Pettitt and Lee (2015). Bayesian indirect inferenceusing a parametric auxiliary model. Statistical Science.

Gallant and McCulloch (2009). On the determination ofgeneral scientific models with application to asset pricing.Journal of the American Statistical Association

Publication page:http://eprints.qut.edu.au/view/person/Drovandi,_Christopher


http://eprints.qut.edu.au/view/person/Drovandi,_Christopher.html

Download - Bayesian Experimental Design for Models with Intractable ... · Introduction Bayes Design Intractable Likelihoods Design Optimisation Example Ending Bayesian Experimental Design for

Top Related