computing bayesian posterior with empirical likelihood in population genetics

46
Computing Bayesian posterior with empirical likelihood in population genetics Pierre Pudlo INRA & U. Montpellier 2 SSB, 18 sept. 2012 Pudlo (INRA & U. Montpellier 2) ABC el GenPop SSB 09/2012 1 / 29

Upload: pierre-pudlo

Post on 24-Jun-2015

136 views

Category:

Science


3 download

DESCRIPTION

A talk I gave at the SSB seminar, see http://www.ssbgroup.fr

TRANSCRIPT

Page 1: Computing Bayesian posterior with empirical likelihood in population genetics

Computing Bayesian posterior with empiricallikelihood in population genetics

Pierre Pudlo

INRA & U. Montpellier 2

SSB, 18 sept. 2012

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 1 / 29

Page 2: Computing Bayesian posterior with empirical likelihood in population genetics

Table of contents

1 Likelihood free methods

2 Empirical likelihood

3 Population genetics

4 Numerical experiments

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 2 / 29

Page 3: Computing Bayesian posterior with empirical likelihood in population genetics

Table of contents

1 Likelihood free methods

2 Empirical likelihood

3 Population genetics

4 Numerical experiments

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 3 / 29

Page 4: Computing Bayesian posterior with empirical likelihood in population genetics

When the likelihood is not completely known

Likelihood intractable: `(y|φ) =

∫U`(y,u|φ)du

where u is a latent process

I Hidden Markov and other dynamic models: latent processwhich is not observed

↪→ Classical answer: Markov chain Monte Carlo,. . .

I Population genetics: the whole gene genealogy is unobservedLikelihood is an integral over

I all possible gene genealogiesI all possible mutations along the genealogies

↪→ Classical answer: Approximate Bayesian computation (ABC)

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 4 / 29

Page 5: Computing Bayesian posterior with empirical likelihood in population genetics

When the likelihood is not completely known

Likelihood intractable: `(y|φ) =

∫U`(y,u|φ)du

where u is a latent processI Hidden Markov and other dynamic models: latent process

which is not observed

↪→ Classical answer: Markov chain Monte Carlo,. . .

I Population genetics: the whole gene genealogy is unobservedLikelihood is an integral over

I all possible gene genealogiesI all possible mutations along the genealogies

↪→ Classical answer: Approximate Bayesian computation (ABC)

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 4 / 29

Page 6: Computing Bayesian posterior with empirical likelihood in population genetics

When the likelihood is not completely known

Likelihood intractable: `(y|φ) =

∫U`(y,u|φ)du

where u is a latent processI Hidden Markov and other dynamic models: latent process

which is not observed

↪→ Classical answer: Markov chain Monte Carlo,. . .

I Population genetics: the whole gene genealogy is unobservedLikelihood is an integral over

I all possible gene genealogiesI all possible mutations along the genealogies

↪→ Classical answer: Approximate Bayesian computation (ABC)

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 4 / 29

Page 7: Computing Bayesian posterior with empirical likelihood in population genetics

ABC in a nutshell

Posterior distribution is the conditional distribution ofπ(φ)`(x|φ) (∗)

knowing that x = xobs

MethodologyDraw a (large) set of particles (φi, xi) from (∗) and use anonparametric estimate of the conditional density

π(φ|xobs) ∝ π(φ)`(xobs|φ)

Seminal papersI Tavare, Balding, Griffith and Donnelly (1997, Genetics)I Pritchard, Seielstad, Perez-Lezuan, Feldman (1999, Molecular

Biology and Evolution)

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 5 / 29

Page 8: Computing Bayesian posterior with empirical likelihood in population genetics

ABC in a nutshellPosterior distribution is the conditional distribution of

π(φ)`(x|φ) (∗)knowing that x = xobs

MethodologyDraw a (large) set of particles (φi, xi) from (∗) and use anonparametric estimate of the conditional density

π(φ|xobs) ∝ π(φ)`(xobs|φ)

Shortcomings.I time consuming – If simulation of the latent process is not

straightforward

I curse of dimensionality vs. lost of information –I If x lies in a high dimensional space X (often), we are unable to

estimate of the conditional densityI Hence, we project the (observed and simulated) datasets on a

space with smaller dimension (trough summary statistics)η : X → Rd (summary statistics)

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 5 / 29

Page 9: Computing Bayesian posterior with empirical likelihood in population genetics

ABC in a nutshellPosterior distribution is the conditional distribution of

π(φ)`(x|φ) (∗)knowing that x = xobs

MethodologyDraw a (large) set of particles (φi, xi) from (∗) and use anonparametric estimate of the conditional density

π(φ|xobs) ∝ π(φ)`(xobs|φ)

Shortcomings.I time consuming – If simulation of the latent process is not

straightforwardI curse of dimensionality vs. lost of information –

I If x lies in a high dimensional space X (often), we are unable toestimate of the conditional density

I Hence, we project the (observed and simulated) datasets on aspace with smaller dimension (trough summary statistics)η : X → Rd (summary statistics)

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 5 / 29

Page 10: Computing Bayesian posterior with empirical likelihood in population genetics

You went to school to learn, girl (. . . )Why 2 plus 2 makes fourNow, now, now, I’m gonna teach you (. . . )

All you gotta do is repeat after me!A, B, C!It’s easy as 1, 2, 3!Or simple as Do, re, mi! (. . . )

SurveysI Beaumont (2010, Annual Review of Ecology, Evolution, and

Systematics)I Lopes, Beaumont (2010, Genetics and Evolution)I Csillery, Blum, Gaggiotti, Francois (2010, Trends in Ecology and

Evolution)I Marin, Pudlo, Robert, Ryder (to appear, Statis. and Comput.)

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 6 / 29

Page 11: Computing Bayesian posterior with empirical likelihood in population genetics

Table of contents

1 Likelihood free methods

2 Empirical likelihood

3 Population genetics

4 Numerical experiments

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 7 / 29

Page 12: Computing Bayesian posterior with empirical likelihood in population genetics

Empirical likelihood (EL)

Owen (1988, Biometrika),Owen (2001, Chapman &Hall)

EmpiricalLikelihood

Data

Parameters' definition in term of a moment constraint

x = (x1, ..., xK)

xProfiled likelihood

xi

Black boxInput:I Independent data

x = (x1, . . . , xK)I Definition of the parameters

E(h(xi,φ)) = 0Output:I A profiled “likelihood function”

Lel(φ|x)

EL do not require a fully defined andoften complex (hence debatable)parametric model

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 8 / 29

Page 13: Computing Bayesian posterior with empirical likelihood in population genetics

Empirical likelihood (EL)

Owen (1988, Biometrika), Owen (2001, Chapman & Hall)

Assume that the dataset x is composed of n independent replicatesx = (x1, . . . , xn) of some X ∼ F

Generalized moment condition modelThe law F of X satisfy

EF[h(X,φ)

]= 0,

where h is a known function, and φ an unknown parameter

Empirical likelihood

Lel(φ|x) = maxp

n∏i=1

pi (∗)

for all p such that 0 6 pi 6 1,∑pi = 1,

∑i pih(xi,φ) = 0.

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 9 / 29

Page 14: Computing Bayesian posterior with empirical likelihood in population genetics

Empirical likelihood (EL)

Empirical likelihood

Lel(φ|x) = maxp

n∏i=1

pi (∗)

for all p such that 0 6 pi 6 1,∑pi = 1,

∑i pih(xi,φ) = 0.

Why (∗)?

(1) L(p) :=n∏i=1

pi is the likehood of

discrete distribution p =∑i piδxi

(2)∑i

pih(xi,φ) = 0 implies that p

fulfils the moment condition

How to compute?

Newton-Lagrange scheme.

The constraint is linear (in p)

See Owen (2001) chap. 12& package emplik in R

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 10 / 29

Page 15: Computing Bayesian posterior with empirical likelihood in population genetics

EL examples

EmpiricalLikelihood

Data

Parameters' definition in term of a moment constraint

x = (x1, ..., xK)

xProfiled likelihood

xi

Simple case: φ = meanh(xi,φ) = xi − φ

I Lel(φ|x) is maximal atφ = 1

K(x1 + · · ·+ xK)

I if enough data (K large),& if data from a true parametricmodel,then Lel ≈ true likelihood

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 11 / 29

Page 16: Computing Bayesian posterior with empirical likelihood in population genetics

EL examples

EmpiricalLikelihood

Data

Parameters' definition in term of a moment constraint

x = (x1, ..., xK)

xProfiled likelihood

xi

h(xi,φ) = xi − φred = param. likelihoodblack = Lel

Example 1: Gaussian modelK = 20

1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

theta

el.n

orm

al

K = 200

1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

theta

el.n

orm

al

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 12 / 29

Page 17: Computing Bayesian posterior with empirical likelihood in population genetics

EL examples

EmpiricalLikelihood

Data

Parameters' definition in term of a moment constraint

x = (x1, ..., xK)

xProfiled likelihood

xi

red = true likelihoodblue = Gaussian likelihoodblack = Lel

Example 2: non Gaussian modelK = 200

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

0.0

0.2

0.4

0.6

0.8

1.0

theta

el.n

orm

al

I Lel is better than a false model.

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 13 / 29

Page 18: Computing Bayesian posterior with empirical likelihood in population genetics

EL examples

EmpiricalLikelihood

Data

Parameters' definition in term of a moment constraint

x = (x1, ..., xK)

xProfiled likelihood

xi

red = true likelihoodblue = Gaussian likelihoodblack = Lel

Example 2: non Gaussian modelK = 200

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

0.0

0.2

0.4

0.6

0.8

1.0

theta

el.n

orm

al

I Lel is better than a false model.

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 13 / 29

Page 19: Computing Bayesian posterior with empirical likelihood in population genetics

Table of contents

1 Likelihood free methods

2 Empirical likelihood

3 Population genetics

4 Numerical experiments

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 14 / 29

Page 20: Computing Bayesian posterior with empirical likelihood in population genetics

Neutral model at a given microsatellite locus, in aclosed panmictic population at equilibrium

Sample of 8 genes

Mutations according tothe Simple stepwiseMutation Model (SMM)• date of the mutations ∼

Poisson process withintensity θ/2 over thebranches• MRCA = 100• independent mutations:±1 with pr. 1/2

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29

Page 21: Computing Bayesian posterior with empirical likelihood in population genetics

Neutral model at a given microsatellite locus, in aclosed panmictic population at equilibrium

Kingman’s genealogyWhen time axis isnormalized,T(k) ∼ Exp(k(k− 1)/2)

Mutations according tothe Simple stepwiseMutation Model (SMM)• date of the mutations ∼

Poisson process withintensity θ/2 over thebranches• MRCA = 100• independent mutations:±1 with pr. 1/2

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29

Page 22: Computing Bayesian posterior with empirical likelihood in population genetics

Neutral model at a given microsatellite locus, in aclosed panmictic population at equilibrium

Kingman’s genealogyWhen time axis isnormalized,T(k) ∼ Exp(k(k− 1)/2)

Mutations according tothe Simple stepwiseMutation Model (SMM)• date of the mutations ∼

Poisson process withintensity θ/2 over thebranches

• MRCA = 100• independent mutations:±1 with pr. 1/2

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29

Page 23: Computing Bayesian posterior with empirical likelihood in population genetics

Neutral model at a given microsatellite locus, in aclosed panmictic population at equilibrium

Observations: leafs of the treeθ = ?

Kingman’s genealogyWhen time axis isnormalized,T(k) ∼ Exp(k(k− 1)/2)

Mutations according tothe Simple stepwiseMutation Model (SMM)• date of the mutations ∼

Poisson process withintensity θ/2 over thebranches• MRCA = 100• independent mutations:±1 with pr. 1/2

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 15 / 29

Page 24: Computing Bayesian posterior with empirical likelihood in population genetics

Much more interesting models. . .

I several independent locusIndependent gene genealogies and mutations

I different populationslinked by an evolutionary scenario made of divergences,admixtures, migrations between populations, etc.

I larger sample sizeusually between 50 and 100 genes

A typical evolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 16 / 29

Page 25: Computing Bayesian posterior with empirical likelihood in population genetics

Moment condition in population genetics?

Main difficultyDerive a constraint

EF[h(X,φ)

]= 0,

on the parameters of interest φ when X is the genotypes of the sampleof individuals at a given locus

E.g., in phylogeography, φ is composed ofI dates of divergence between populations,I ratio of population sizes,I mutation rates, etc.

None of them are moments of the distribution of the allelic states of thesample

↪→ h = pairwise composite scores whose zero is the pairwisemaximum likelihood estimator

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 17 / 29

Page 26: Computing Bayesian posterior with empirical likelihood in population genetics

Moment condition in population genetics?

Main difficultyDerive a constraint

EF[h(X,φ)

]= 0,

on the parameters of interest φ when X is the genotypes of the sampleof individuals at a given locus

E.g., in phylogeography, φ is composed ofI dates of divergence between populations,I ratio of population sizes,I mutation rates, etc.

None of them are moments of the distribution of the allelic states of thesample

↪→ h = pairwise composite scores whose zero is the pairwisemaximum likelihood estimator

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 17 / 29

Page 27: Computing Bayesian posterior with empirical likelihood in population genetics

Pairwise composite likelihood?

The intra-locus pairwise likelihood

`2(xk|φ) =∏i<j

`2(xik, xjk|φ)

with x1k, . . . , xnk : allelic states of the gene sample at the k-th locus

Sample of size 2⇒ coalescent tree is a single line joining xik to xjk⇒ closed form of `2(xik, xjk|φ)

The pairwise score function

∇φ log `2(xk|φ) =∑i<j

∇φ log `2(xik, xjk|φ)

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 18 / 29

Page 28: Computing Bayesian posterior with empirical likelihood in population genetics

Composite lik , approximation of lik� Composite likelihoods are often much more narrow than the

distribution of the model (See, e.g., Cooley et al., 2012)

Hence,I produces too narrow posterior distributionsI confidence intervals or credibility intervals are not reliable

But,

Eφ(∇φ log `2(x|φ)) =∫∇φ

[log `2(x|φ)

]`(x|φ)dx

=∑i<j

∫∇φ

[log `2(xi, xj|φ)

]`2(xi, xj|φ)dxidxj

=∑i<j

∫ ∇φ`2(xi, xj|φ)`2(xi, xj|φ)

`2(xi, xj|φ)dxidxj

=∑i<j

∇φ[ ∫`2(xi, xj|φ)dxidxj

]= 0

Safe with EL because we only use position of its modePudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 19 / 29

Page 29: Computing Bayesian posterior with empirical likelihood in population genetics

Pairwise likelihood: a simple case

Assumptions

I sample ⊂ closed, panmicticpopulation at equilibrium

I marker: microsatelliteI mutation rate: θ/2

if xik et xjk are two genes of thesample,

`2(xik, xjk|θ) depends only on

δ = xik − xjk

`2(δ|θ) =1√1+ 2θ

ρ (θ)|δ|

withρ(θ) =

θ

1+ θ+√1+ 2θ

Pairwise score function∂θ log `2(δ|θ) =

−1

1+ 2θ+

|δ|

θ√1+ 2θ

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 20 / 29

Page 30: Computing Bayesian posterior with empirical likelihood in population genetics

Pairwise likelihood: a simple case

Assumptions

I sample ⊂ closed, panmicticpopulation at equilibrium

I marker: microsatelliteI mutation rate: θ/2

if xik et xjk are two genes of thesample,

`2(xik, xjk|θ) depends only on

δ = xik − xjk

`2(δ|θ) =1√1+ 2θ

ρ (θ)|δ|

withρ(θ) =

θ

1+ θ+√1+ 2θ

Pairwise score function∂θ log `2(δ|θ) =

−1

1+ 2θ+

|δ|

θ√1+ 2θ

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 20 / 29

Page 31: Computing Bayesian posterior with empirical likelihood in population genetics

Pairwise likelihood: a simple case

Assumptions

I sample ⊂ closed, panmicticpopulation at equilibrium

I marker: microsatelliteI mutation rate: θ/2

if xik et xjk are two genes of thesample,

`2(xik, xjk|θ) depends only on

δ = xik − xjk

`2(δ|θ) =1√1+ 2θ

ρ (θ)|δ|

withρ(θ) =

θ

1+ θ+√1+ 2θ

Pairwise score function∂θ log `2(δ|θ) =

−1

1+ 2θ+

|δ|

θ√1+ 2θ

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 20 / 29

Page 32: Computing Bayesian posterior with empirical likelihood in population genetics

Pairwise likelihood: 2 diverging populations

MRCA

POP a POP b

τ

AssumptionsI τ: divergence date of pop.a and b

I θ/2: mutation rateLet xik and xjk be two genescoming resp. from pop. a and bSet δ = xik − x

jk.

Then `2(δ|θ, τ) =e−τθ√1+ 2θ

+∞∑k=−∞ ρ(θ)

|k|Iδ−k(τθ).

whereIn(z) nth-order modified Besselfunction of the first kind

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 21 / 29

Page 33: Computing Bayesian posterior with empirical likelihood in population genetics

Pairwise likelihood: 2 diverging populations

MRCA

POP a POP b

τ

AssumptionsI τ: divergence date of pop.a and b

I θ/2: mutation rateLet xik and xjk be two genescoming resp. from pop. a and bSet δ = xik − x

jk.

A 2-dim score function∂τ log `2(δ|θ, τ) =

−θ+θ

2

`2(δ− 1|θ, τ) + `2(δ+ 1|θ, τ)`2(δ|θ, τ)

∂θ log `2(δ|θ, τ) =

−τ−1

1+ 2θ+

τ

2

`2(δ− 1|θ, τ) + `2(δ+ 1|θ, τ)`2(δ|θ, τ)

+

q(δ|θ, τ)`2(δ|θ, τ)

whereq(δ|θ, τ) :=

e−τθ√1+ 2θ

ρ ′(θ)

ρ(θ)

∞∑k=−∞ |k|ρ(θ)|k|Iδ−k(τθ)

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 21 / 29

Page 34: Computing Bayesian posterior with empirical likelihood in population genetics

Table of contents

1 Likelihood free methods

2 Empirical likelihood

3 Population genetics

4 Numerical experiments

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 22 / 29

Page 35: Computing Bayesian posterior with empirical likelihood in population genetics

A first experimentEvolutionary scenario:

MRCA

POP 0 POP 1

τ

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ = (log10 θ, log10 τ)I uniform prior over(−1., 1.5)× (−1., 1.)

Comparison of the originalABC with ABCel

ESS=7034

log(theta)

Den

sity

0.00 0.05 0.10 0.15 0.20 0.25

05

1015

log(tau1)

Den

sity

−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3

01

23

45

67

histogram = ABCelcurve = original ABCvertical line = “true” parameter

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 23 / 29

Page 36: Computing Bayesian posterior with empirical likelihood in population genetics

A first experimentEvolutionary scenario:

MRCA

POP 0 POP 1

τ

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ = (log10 θ, log10 τ)I uniform prior over(−1., 1.5)× (−1., 1.)

Comparison of the originalABC with ABCel

ESS=7034

log(theta)

Den

sity

0.00 0.05 0.10 0.15 0.20 0.25

05

1015

log(tau1)

Den

sity

−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3

01

23

45

67

histogram = ABCelcurve = original ABCvertical line = “true” parameter

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 23 / 29

Page 37: Computing Bayesian posterior with empirical likelihood in population genetics

ABC vs. ABCel on 100 replicates of the 1st experiment

Accuracy:log10 θ log10 τ

ABC ABCel ABC ABCel(1) 0.097 0.094 0.315 0.117(2) 0.071 0.059 0.272 0.077(3) 0.68 0.81 1.0 0.80

(1) Root Mean Square Error of the posterior mean(2) Median Absolute Deviation of the posterior median(3) Coverage of the credibility interval of probability 0.8

Computation time: on a recent 6-core computer (C++/OpenMP)I ABC ≈ 4 hoursI ABCel≈ 2 minutes

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 24 / 29

Page 38: Computing Bayesian posterior with empirical likelihood in population genetics

Second experimentEvolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ = log10(θ, τ1, τ2)I non-informative uniform prior

Comparison of the original ABCwith ABCelhistogram = ABCelcurve = original ABCvertical line = “true” parameter

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29

Page 39: Computing Bayesian posterior with empirical likelihood in population genetics

Second experimentEvolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ = log10(θ, τ1, τ2)I non-informative uniform prior

Comparison of the original ABCwith ABCel

histogram = ABCelcurve = original ABCvertical line = “true” parameter

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29

Page 40: Computing Bayesian posterior with empirical likelihood in population genetics

Second experimentEvolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ = log10(θ, τ1, τ2)I non-informative uniform prior

Comparison of the original ABCwith ABCel

histogram = ABCelcurve = original ABCvertical line = “true” parameter

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29

Page 41: Computing Bayesian posterior with empirical likelihood in population genetics

Second experimentEvolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ = log10(θ, τ1, τ2)I non-informative uniform prior

Comparison of the original ABCwith ABCel

histogram = ABCelcurve = original ABCvertical line = “true” parameter

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 25 / 29

Page 42: Computing Bayesian posterior with empirical likelihood in population genetics

ABC vs. ABCel on 100 replicates of the 2ndexperimentAccuracy:

log10 θ log10 τ1 log10 τ2ABC ABCel ABC ABCel ABC ABCel

(1) 0.0059 0.0794 0.472 0.483 29.3 4.76(2) 0.048 0.053 0.32 0.28 4.13 3.36(3) 0.79 0.76 0.88 0.76 0.89 0.79

(1) Root Mean Square Error of the posterior mean(2) Median Absolute Deviation of the posterior median(3) Coverage of the credibility interval of probability 0.8

Computation time: on a recent 6-core computer (C++/OpenMP)I ABC ≈ 6 hoursI ABCel≈ 8 minutes

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 26 / 29

Page 43: Computing Bayesian posterior with empirical likelihood in population genetics

Why?

On large datasets, ABCel gives more accurate results than ABC

ABC simplifies the dataset through summary statisticsDue to the large dimension of x, the original ABC algorithm estimates

π(θ∣∣∣η(xobs)

),

where η(xobs) is some (non-linear) projection of the observed dataseton a space with smaller dimension↪→ Some information is lost

ABCelsimplifies the model through a generalized moment conditionmodel.↪→ Here, the moment condition model is based on pairwisecomposition likelihood

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 27 / 29

Page 44: Computing Bayesian posterior with empirical likelihood in population genetics

Joint work withI Christian P. Robert (U. Dauphine & IUF)I Kerrie Mengersen (QUT, Australia)I Raphael Leblois (INRA CBGP, Montpellier)

Grant from ANR throughProject “Emile”

First preprint on arXivApproximate Bayesian computation viaempirical likelihood

Coming soon: population genetic modelswhich are too slow to simulate

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 28 / 29

Page 45: Computing Bayesian posterior with empirical likelihood in population genetics

Joint work withI Christian P. Robert (U. Dauphine & IUF)I Kerrie Mengersen (QUT, Australia)I Raphael Leblois (INRA CBGP, Montpellier)

Grant from ANR throughProject “Emile”

First preprint on arXivApproximate Bayesian computation viaempirical likelihood

Coming soon: population genetic modelswhich are too slow to simulate

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 28 / 29

Page 46: Computing Bayesian posterior with empirical likelihood in population genetics

Empirical likelihood

Wilk’s theorem to EL likelihood ratio.

Theorem (Owen, 1991)Let X,X1, . . . ,Xn be iid random vector with common distribution F0.For φ ∈ Θ, x ∈X , let h(x,φ) ∈ Rs.Let φ0 ∈ Θ be such that Var(h(X, θ0)) is finite and has rank q > 0.

If φ0 satisfies E(h(X,φ0)) = 0, then −2 log(Lel(φ0|X1:n)

n−n

) → χ2(q)

Note thatI n−n is the maximum value of Lel

I no condition to ensure a unique value for θ0I no condition to make φel = argmaxLel a good estimate of φ0I provides tests and confidence intervals

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 29 / 29