peter beerli, scientific computing, florida state university

53
Biogeography, Models, Migrate, oh my Peter Beerli, Scientific Computing, Florida State University Twitter: #evoPB @peterbeerli

Upload: others

Post on 23-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Peter Beerli, Scientific Computing, Florida State University

Biogeography, Models, Migrate, oh myPeter Beerli, Scientific Computing, Florida State University

Twitter: #evoPB @peterbeerli

Page 2: Peter Beerli, Scientific Computing, Florida State University

Roadmap

2/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Phylogenetics, Biogeography, Phylogeography, Population genetics

Modeling a complex world

Model comparison

Bayes factors, marginal likelihoodParallel evaluation of many unlinked loci

Model-based assignment of individuals

Page 3: Peter Beerli, Scientific Computing, Florida State University

{Phylo, Population, Bio}{genetics, geography}

3/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Phylogenetics Population genetics

Page 4: Peter Beerli, Scientific Computing, Florida State University

{Phylo, Population, Bio}{genetics, geography}

4/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Phylogenetics Population genetics

Biogeography

Page 5: Peter Beerli, Scientific Computing, Florida State University

{Phylo, Population, Bio}{genetics, geography}

5/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Phylogenetics Population genetics

Biogeography Phylogeography

Page 6: Peter Beerli, Scientific Computing, Florida State University

{Phylo, Population, Bio}{genetics, geography}

6/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Phylogenetics Population genetics

Biogeography Phylogeography

Page 7: Peter Beerli, Scientific Computing, Florida State University

{Phylo, Population, Bio}{genetics, geography}

7/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Phylogenetics Population genetics

Biogeography Phylogeography

Page 8: Peter Beerli, Scientific Computing, Florida State University

Population models

8/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Species

Page 9: Peter Beerli, Scientific Computing, Florida State University

Population models

9/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Species

Population

Page 10: Peter Beerli, Scientific Computing, Florida State University

Population models

10/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Species

Population

Loss of variability

[genetic drift]

Page 11: Peter Beerli, Scientific Computing, Florida State University

Population models

11/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Species

Population

Mutation

Loss of variability

[genetic drift]

introduces variability

Page 12: Peter Beerli, Scientific Computing, Florida State University

Population models

12/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Species

Population

Migration

Mutation

Loss of variability[genetic drift]

Page 13: Peter Beerli, Scientific Computing, Florida State University

Population structureModels available in MIGRATE

13/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Page 14: Peter Beerli, Scientific Computing, Florida State University

Population structureModels available in MIGRATE

14/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

**

**

*mmmm

m*mmm

mm*mm

mmm*m

mmmm*

**00 *000 0000 0000

***0 0*00 0000 0000

0*** 00*0 0000 0000

00** 000* 0000 0000

*000 **00 *000 0000

0*00 ***0 0*00 0000

...

even more complex

Page 15: Peter Beerli, Scientific Computing, Florida State University

Population through time

15/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Tim

e

Page 16: Peter Beerli, Scientific Computing, Florida State University

Populations through timekey assumption of MIGRATE

16/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

migration rate is constant over time

population sizes are constant over time

[soon taking into account population splitting]

Page 17: Peter Beerli, Scientific Computing, Florida State University

Inference of parameters

17/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Model of prime interest:

Geographic structure, colonization, recurrent gene flow, past population

splitting, ...

Page 18: Peter Beerli, Scientific Computing, Florida State University

Inference of parameters

18/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Model of prime interest:

Geographic structure, colonization, recurrent gene flow, past population

splitting, ...

But our data is usually not a detailed historical record, so we depend on geneticdata. This is problematic because we only see differences in the sequencesthus we need some more models.

Page 19: Peter Beerli, Scientific Computing, Florida State University

Inference of parameters

19/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Model of prime interest:

Geographic structure, colonization, recurrent gene flow, past population

splitting, ...

But our data is usually not a detailed historical record, so we depend on geneticdata. This is problematic because we only see differences in the sequencesthus we need some more models.

Nuisances (we are not really interested in estimating these)

Mutation model, genealogies of individuals

Page 20: Peter Beerli, Scientific Computing, Florida State University

Geographic structure

20/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

The program STRUCTURE is used commonly to find the number of populations.Thus judging whether the data is from a structured population or not.

This seems an abuse of a fine program, because the main goal of the programassigning individuals to populations is not really used. Other programs, such asstructurama coestimate assignment and number of populations.

The model used in STRUCTURE, does not take into account different populationsizes and potential asymmetries in gene flow and thus will not really be able togive a complete picture of historical events.

Page 21: Peter Beerli, Scientific Computing, Florida State University

Geographic structure

21/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Page 22: Peter Beerli, Scientific Computing, Florida State University

Geographic structure

22/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Page 23: Peter Beerli, Scientific Computing, Florida State University

Geographic structure

23/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Page 24: Peter Beerli, Scientific Computing, Florida State University

Model comparisonModels available in MIGRATE

24/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

4-parameters

3-parameters 2-parameters 1-parameter

All simple “two-population” population models that can be use in my softwareMIGRATE to estimate population parameters using Bayesian inference.

[The size difference of the disks marks independent estimation of population sizes.]

Page 25: Peter Beerli, Scientific Computing, Florida State University

The nitty gritty detail

25/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Page 26: Peter Beerli, Scientific Computing, Florida State University

The nitty gritty detail

26/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

infer the posterior probability of parameters of a population model

P (θ|D) =P (θ)P (D|θ)

P (D)=

P (θ)∫GP (G|θ)P (D|G,µ)dG∫

P(θ)∫GP (G|θ)P (D|G,µ)dGdθ

Page 27: Peter Beerli, Scientific Computing, Florida State University

The nitty gritty detail

27/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

infer the posterior probability of parameters of a population model, usually

using Markov Chain Monte Carlo

report the posteriors and highlight some differences of the parameter, done!?

Page 28: Peter Beerli, Scientific Computing, Florida State University

The nitty gritty detail

28/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

infer the posterior probability of parameters of a population model, usually

using Markov Chain Monte Carlo

report the posteriors and highlight some differences of the parameter, done!

We can do better than that and stastistically compare different models.

Page 29: Peter Beerli, Scientific Computing, Florida State University

Bayesian Odds Ratios

29/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Using Bayes’ theorem:

p(M1|X) =p(M1)p(X|M1)

p(X)

we can express support of one model over another as a ratio:

Bayes FactorPosterior Odds Prior Odds

p(M1|X)

p(M2|X)=

p(M1)p(X|M1)p(X)

p(M1)p(X|M1)p(X)

p(M1|X)

p(M2|X)=

p(M1)

p(M2)× p(X|M1)

p(X|M2)

Page 30: Peter Beerli, Scientific Computing, Florida State University

Bayes factor

30/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

We can use the posterior odds ratio or equivalently the Bayes factorsfor model comparison:

BF =p(X|M1)

p(X|M2)LBF = 2 lnBF = 2 ln

(p(X|M1)

p(X|M2)

)

The magnitude of BF gives us evidence how different the models are

LBF = 2 lnBF = z

0 < |z| < 2 No real difference2 < |z| < 6 Positive6 < |z| < 10 Strong|z| > 10 Very strong

Page 31: Peter Beerli, Scientific Computing, Florida State University

Marginal likelihood calculation

31/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

In MCMC application it is often complicated to calculate marginal likelihoods.Several approaches were put forward, of which the easiest, the harmonic meanestimator, has turned out to be unreliable and sometimes wrong.

Several other methods give accurate marginal likelihoods:

Thermodynamic integration [MIGRATE uses this]

Stepping-stone integration

Inflated Density Ratio

Page 32: Peter Beerli, Scientific Computing, Florida State University

A simple example

32/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

We want to establish a direction of geneflow between n populations.

Page 33: Peter Beerli, Scientific Computing, Florida State University

A simple example

33/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

We want to establish a direction of geneflow between 2 populations.

Page 34: Peter Beerli, Scientific Computing, Florida State University

A simple exampleTutorial on MIGRATE website

34/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

We want to establish a direction of geneflow between 2 populations.

We generate 4 hypotheses

&%'$

-� &%

'$&%'$

-

&%'$

&%'$

� &%'$

&%'$

We collect data from individuals in the two populations

Analyze the data in MIGRATE

Page 35: Peter Beerli, Scientific Computing, Florida State University

A simple exampleTutorial on MIGRATE website

35/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Recipe: starting with the finished dish

Log Marginal likelihoods [lmL] of the 4 hypotheses:

lmL&%'$

-� &%

'$

-4856.2&%'$

-

&%'$

-4822.5&%'$

� &%'$

-4832.6&%'$

-4837.8

Data was simulated using the second model (2) from the left.

Page 36: Peter Beerli, Scientific Computing, Florida State University

A simple exampleTutorial on MIGRATE website

36/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Recipe: starting with the finished dish

Log Marginal likelihoods [lmL] of the 4 hypotheses:

lmL&%'$

-� &%

'$

-4856.2&%'$

-

&%'$

-4822.5&%'$

� &%'$

-4832.6&%'$

-4837.8

The best model (highest lmL) is the model second from left (model 2).We can calculate the log Bayes factor for two leftmost models as

LBF12 = 2(lmL1 − lmL2) = 2(−4856.2−−4822.5) = −67.4

The value suggests that we should strongly prefer model 2 over model 1.

Data was simulated using the second model from the left (model 2).

Page 37: Peter Beerli, Scientific Computing, Florida State University

A simple exampleTutorial on MIGRATE website

37/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Recipe:

1. Pick the hypothesis with largest number of parameters

2. Set priors and run parameters (use heated chains) so that you arecomfortable with the result (converged, etc)

3. Record the log marginal likelihood from the output.

4. Pick next hypothesis, adjust migration model, and run and record the logmarginal likelihood.

5. Repeat (4) until all log marginal likelihoods are calculated

6. Compare the log marginal likelihoods, for example order the hypothesisaccordingly, or calculate the model probability

Page 38: Peter Beerli, Scientific Computing, Florida State University

A simple exampleTutorial on MIGRATE website

38/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Ordered models

lmLP(model)

&%'$

-

&%'$

-4822.50.99

&%'$

� &%'$

-4832.60.01

&%'$

-4837.80.0

&%'$

-� &%

'$

-4856.20.0

Model probability (Burnham and Anderson 2002) calculation:

P(Mi) =exp(lmLi)∑j exp(lmLj)

=mLi∑jmLj

Page 39: Peter Beerli, Scientific Computing, Florida State University

Marginal likelihood for lots of data

39/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Running complex models with many genetic loci using MCMC are often verytime consuming (hours, days, weeks computing time). In genetics we have oftenindependent genes (loci) that allow us to treat the analysis as if we evaluatemultiple independent replicates.

P (D|M1) = P (D1, ..., Dn|M1)

Unfortunately live is an inter-dependent mess and we cannot do

P (D1, ..., Dn|M1) 6=n∏i

P (Di|M1)

First we thought that we are doomed to run all the independent data blocks insync to calculate the combine marginal likelihood.

Page 40: Peter Beerli, Scientific Computing, Florida State University

Marginal likelihood for lots of data

40/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Theorem: The combined marginal likelihoods over all independent data blockscan be calculated as a product of independently calculated marginal likelihoodsfor each data block and a constant. (Proof in Beerli and Palczewski 2010)

P (D1, ..., Dn|M1) = K

n∏i

P (Di|M1)

K =

∫θ

n∏i

P (θ|Di,M1)P (θ|M1)1−ndθ.

Page 41: Peter Beerli, Scientific Computing, Florida State University

Marginal likelihood for lots of data

41/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Theorem: The combined marginal likelihoods over all independent data blockscan be calculated as a product of independently calculated marginal likelihoodsfor each data block and a constant. (Proof in Beerli and Palczewski 2010)

P (D1, ..., Dn|M1) = K

n∏i

P (Di|M1)

K =

∫θ

n∏i

P (θ|Di,M1)P (θ|M1)1−ndθ.

This allows to run independent data blocksin parallel on a computer cluster!

Page 42: Peter Beerli, Scientific Computing, Florida State University

Marginal likelihood for lots of dataExample

42 Reanalysis of data from Rosenberg et al. Science 2001/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

70 individuals from 7 populations analyzed for 377 microsatellite loci:Brownian motion approximation to the single-step mutation model

Page 43: Peter Beerli, Scientific Computing, Florida State University

Pick a model

43 Reanalysis of data from Rosenberg et al. Science 2001/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Page 44: Peter Beerli, Scientific Computing, Florida State University

Pick a model

44 Reanalysis of data from Rosenberg et al. Science 2001/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Page 45: Peter Beerli, Scientific Computing, Florida State University

Pick a model

45 Reanalysis of data from Rosenberg et al. Science 2001/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

4

Somewhat less

Page 46: Peter Beerli, Scientific Computing, Florida State University

Pick a model

46 Reanalysis of data from Rosenberg et al. Science 2001/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Page 47: Peter Beerli, Scientific Computing, Florida State University

Pick a model

47 Reanalysis of data from Rosenberg et al. Science 2001/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Page 48: Peter Beerli, Scientific Computing, Florida State University

Pick a model

48 Reanalysis of data from Rosenberg et al. Science 2001/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Page 49: Peter Beerli, Scientific Computing, Florida State University

Pick a model

49 Reanalysis of data from Rosenberg et al. Science 2001/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Page 50: Peter Beerli, Scientific Computing, Florida State University

Pick a model

50 Reanalysis of data from Rosenberg et al. Science 2001/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

1.

2.

3.

4.

5.

6.

7.4

Somewhat less

Model order and probability using Bayes factors

all other models: 0.0Minimal model 1.0

Page 51: Peter Beerli, Scientific Computing, Florida State University

Summary

51/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Caveats:

MIGRATE supports a large list of models but that may not be sufficient for your

hypothesis.

MIGRATE assumes a simple population model that may not fit your data.

Plus:

MIGRATE supports a large list of models.

MIGRATE can run in parallel allowing to analyze large numbers of loci in

decent time.

Bayesian model selection allows comparison of non-nested models.

Page 52: Peter Beerli, Scientific Computing, Florida State University

Questions?Twitter: #evoPB @peterbeerli

52/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

MIGRATE website:http://popgen.sc.fsu.edu

Page 53: Peter Beerli, Scientific Computing, Florida State University

Assignment of individuals

53/?? c©2013 Peter Beerli Twitter: #evoPB @peterbeerli

Migrate (this summer) will be able to assign individuals to populations, forexample we collect individuals from 3 locations and wonder whether theseindividuals at the center location are local or recognizable immigrants.

Migration rate M Average assignment probability1 2 3 4

- 3 2 2 2low migration 1.0 1.0 0.86 0.81medium 0.98 0.99 0.64 0.53high 0.33 0.20 0.68 0.36