bayesian divergence time estimation – workshop lecture

137
B D T E Tracy Heath Integrative Biology, University of California, Berkeley Ecology & Evolutionary Biology, University of Kansas Ecology, Evolution, & Organismal Biology, Iowa State University (in 2015) 2014 Workshop on Molecular Evolution Woods Hole, MA USA

Upload: tracy-heath

Post on 29-Aug-2014

2.309 views

Category:

Science


15 download

DESCRIPTION

A lecture on Bayesian divergence-time estimation by Tracy A. Heath (http://phylo.bio.ku.edu/content/tracy-heath). This lecture was given at the following workshops: #molevol2014 – Workshop on Molecular Evolution at Woods Hole (https://molevol.mbl.edu/wiki/index.php/Main_Page) **current version (8/2/2014)** #bodega14 – Bodega Bay Applied Phylogenetics Workshop, hosted by the University of California, Davis (http://treethinkers.org/2014-workshop/) [replaced (given on 3/10/2014)] This lecture precedes a tutorial on the software BEAST v2 (http://treethinkers.org/divergence-time-estimation-using-beast/)

TRANSCRIPT

Page 1: Bayesian Divergence Time Estimation – Workshop Lecture

B D TE

Tracy Heath

Integrative Biology, University of California, Berkeley

Ecology & Evolutionary Biology, University of Kansas

Ecology, Evolution, & Organismal Biology, Iowa State University (in 2015)

2014 Workshop on Molecular EvolutionWoods Hole, MA USA

Page 2: Bayesian Divergence Time Estimation – Workshop Lecture

O

Overview of divergence time estimation

• Relaxed clock models – accounting for variation insubstitution rates among lineages

• Tree priors and fossil calibration

break

BEAST v2.1.3 Tutorialhttp://treethinkers.org/divergence-time-estimation-using-beast/

• Walk through: set up BEAST input file in BEAUti andexecute BEAST MCMC analysis

• On your own: complete analysis & summarize output

dinner

Page 3: Bayesian Divergence Time Estimation – Workshop Lecture

A T-S E

Phylogenetic trees can provide both topological informationand temporal information

0.2 expected

substitutions/site

Primates

Carnivora

Cetacea

Simiiformes

Artiodactyla

Mic

roce

bu

s

Homininae

Hippo

Fo

ssil

Ca

libra

tio

ns

Ph

ylo

ge

ne

tic R

ela

tio

nsh

ips

Se

qu

en

ce

Da

ta

Did Simiiformes experience

accelerated rates of molecular

evolution?

What is the age of the MRCA of

mouse lemurs (Microcebus)?

100 0.020.040.060.080.0

EquusRhinocerosBosHippopotamusBalaenopteraPhyseterUrsusCanisFelisHomoPanGorillaPongoMacacaCallithrixLorisGalagoDaubentoniaVareciaEulemurLemurHapalemurPropithecusLepilemur

MirzaM. murinusM. griseorufus

M. myoxinusM. berthaeM. rufus1M. tavaratraM. rufus2M. sambiranensisM. ravelobensis

Cheirogaleus

Sim

iiform

es

Mic

rocebus

Cretaceous Paleogene Neogene Q

Time (Millions of years)

Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003)

Page 4: Bayesian Divergence Time Estimation – Workshop Lecture

A T-S EPhylogenetic divergence-timeestimation

• What was the spacial andclimatic environment of ancientangiosperms?

• Did the uplift of the PatagonianAndes drive the diversity ofPeruvian lilies?

• How has mammalian body-sizechanged over time?

• Is diversification in Caribbeananoles correlated with ecologicalopportunity?

• How has the rate of molecularevolution changed across theTree of Life?

(Antonelli & Sanmartin. Syst. Biol. 2011)

(Lartillot & Delsuc. Evolution 2012) (Mahler, Revell, Glor, & Losos. Evolution 2010)

(Nabholz, Glemin, Galtier. MBE 2008)

Historical biogeography Molecular evolution

Trait evolution

Diversi�cation

Anolis fowleri (image by L. Mahler)

Understanding Evolutionary Processes

Page 5: Bayesian Divergence Time Estimation – Workshop Lecture

D T E

Goal: Estimate the ages of interior nodes to understand thetiming and rates of evolutionary processes

Model how rates aredistributed across the tree

Describe the distribution ofspeciation events over time

External calibrationinformation for estimates ofabsolute node times

calibrated node

100 0.020.040.060.080.0

EquusRhinocerosBosHippopotamusBalaenopteraPhyseterUrsusCanisFelisHomoPanGorillaPongoMacacaCallithrixLorisGalagoDaubentoniaVareciaEulemurLemurHapalemurPropithecusLepilemur

MirzaM. murinusM. griseorufus

M. myoxinusM. berthaeM. rufus1M. tavaratraM. rufus2M. sambiranensisM. ravelobensis

Cheirogaleus

Sim

iiform

es

Mic

rocebus

Cretaceous Paleogene Neogene Q

Time (Millions of years)

Page 6: Bayesian Divergence Time Estimation – Workshop Lecture

U H B

“From East Gondwana to Central America: historicalbiogeography of the Alstroemeriaceae”

(Chacón et al., J. Biolgeograpy 2012)

Page 7: Bayesian Divergence Time Estimation – Workshop Lecture

D T E

Historical biogeographyrequires external calibration

Model how rates aredistributed across the tree

Describe the distribution ofspeciation events over time

External calibrationinformation for estimates ofabsolute node times

(Chacón et al., J. Biolgeograpy 2012)

Page 8: Bayesian Divergence Time Estimation – Workshop Lecture

D T E

What about when the fossil record (or other types ofcalibration information) is poor or absent?

Example: Despite the richdiversity of Anolis there arefew fossils

There are some amberfossils, but these fossils fallwithin a narrow time range

Amber Anolis fossil (http://www.anoleannals.org/2012/03/06/the-hi-tech-world-of-anole-paleontology/)

Page 9: Bayesian Divergence Time Estimation – Workshop Lecture

D T E

What about when the fossilrecord is poor or absent?

Model how rates aredistributed across the tree

Describe the distribution ofspeciation events over time

Estimation of relativedivergence times

Anolis hendersoni (Image courtesy of L. Mahler)

Page 10: Bayesian Divergence Time Estimation – Workshop Lecture

R T D

“Ecological opportunity and the rate of morphologicalevolution in the diversification of Greater Antillean Anoles”

Anolis fowleri (image courtesy of L. Mahler)

(Mahler, Revell, Glor, & Losos. Evolution 2010)

Page 11: Bayesian Divergence Time Estimation – Workshop Lecture

T D I D

Divergence time estimationof rapidly evolvingpathogens provideinformation about spatialand temporal dynamics ofinfectious diseases

Sequences sampled atdifferent time horizonsimpose a temporal structureon the tree by providingages for non-contemporaneous tips

(Pybus & Rambaut. 2009. Nature Reviews Genetics.)

Page 12: Bayesian Divergence Time Estimation – Workshop Lecture

A T-S E

Phylogenetic trees can provide both topological informationand temporal information

100 0.020.040.060.080.0

EquusRhinocerosBosHippopotamusBalaenopteraPhyseterUrsusCanisFelisHomoPanGorillaPongoMacacaCallithrixLorisGalagoDaubentoniaVareciaEulemurLemurHapalemurPropithecusLepilemur

MirzaM. murinusM. griseorufus

M. myoxinusM. berthaeM. rufus1M. tavaratraM. rufus2M. sambiranensisM. ravelobensis

Cheirogaleus

Sim

iiform

es

Mic

roce

bu

s

Cretaceous Paleogene Neogene Q

Time (Millions of years)

Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003; Heath et al. MBE 2012)

Page 13: Bayesian Divergence Time Estimation – Workshop Lecture

T G M C

Assume that the rate ofevolutionary change isconstant over time

(branch lengths equalpercent sequencedivergence) 10%

400 My

200 My

A B C

20%

10%10%

(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)

Page 14: Bayesian Divergence Time Estimation – Workshop Lecture

T G M C

We can date the tree if weknow the rate of change is1% divergence per 10 My N

A B C

20%

10%10%

10%200 My

400 My

200 My

(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)

Page 15: Bayesian Divergence Time Estimation – Workshop Lecture

T G M C

If we found a fossil of theMRCA of B and C, we canuse it to calculate the rateof change & date the rootof the tree

N

A B C

20%

10%10%

10%200 My

400 My

(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)

Page 16: Bayesian Divergence Time Estimation – Workshop Lecture

R G M C

Rates of evolution vary across lineages and over time

Mutation rate:Variation in

• metabolic rate

• generation time

• DNA repair

Fixation rate:Variation in

• strength and targets ofselection

• population sizes

10%

400 My

200 My

A B C

20%

10%10%

Page 17: Bayesian Divergence Time Estimation – Workshop Lecture

U A

Sequence data provideinformation about branchlengths

In units of the expected # ofsubstitutions per site

branch length = rate × time0.2 expected

substitutions/site

Ph

ylo

ge

ne

tic R

ela

tio

nsh

ips

Se

qu

en

ce

Da

ta

Page 18: Bayesian Divergence Time Estimation – Workshop Lecture

P L

f (D | V , θs,Ψ)

V Vector of branch lengths

θs Sequence model parameters

D Sequence data

Ψ Tree topology

Page 19: Bayesian Divergence Time Estimation – Workshop Lecture

R T

The expected # of substitutions/site occurring along abranch is the product of the substitution rate and time

length = rate × time length = rate length = time

Methods for dating species divergences estimate thesubstitution rate and time separately

Page 20: Bayesian Divergence Time Estimation – Workshop Lecture

S R

Substitution rate: the rateat which mutations are fixedin a population

Depends on: mutation rate,selection, population size,drift length = subst. rate

Mutation rate measures the rate at which mutations occurover time and is affected by metabolic rate, generation time,DNA repair efficiency

Page 21: Bayesian Divergence Time Estimation – Workshop Lecture

R T

The sequence dataprovide informationabout branch length

for any possible rate,there’s a time that fitsthe branch lengthperfectly

0

1

2

3

4

5

0 1 2 3 4 5

Bra

nch

Rat

e

Branch Time

time = 0.8rate = 0.625

branch length = 0.5

(based on Thorne & Kishino, 2005)

Page 22: Bayesian Divergence Time Estimation – Workshop Lecture

B D T E

length = rate length = time

R = (r, r, r, . . . , rN−)

A = (a, a, a, . . . , aN−)

N = number of tips

Page 23: Bayesian Divergence Time Estimation – Workshop Lecture

B D T E

length = rate length = time

R = (r, r, r, . . . , rN−)

A = (a, a, a, . . . , aN−)

N = number of tips

Page 24: Bayesian Divergence Time Estimation – Workshop Lecture

B D T E

Posterior probability

f (R,A, θR, θA, θs | D,Ψ)

R Vector of rates on branches

A Vector of internal node ages

θR, θA, θs Model parameters

D Sequence data

Ψ Tree topology

Page 25: Bayesian Divergence Time Estimation – Workshop Lecture

B D T E

f(R,A, θR, θA, θs | D) =

f (D | R,A, θs) f(R | θR) f(A | θA) f(θs)

f(D)

f(D | R,A, θR, θA, θs) Likelihood

f(R | θR) Prior on rates

f(A | θA) Prior on node ages

f(θs) Prior on substitution parameters

f(D) Marginal probability of the data

Page 26: Bayesian Divergence Time Estimation – Workshop Lecture

B D T E

Estimating divergence times relies on 2 main elements:

• Branch-specific rates: f (R | θR)

• Node ages: f (A | θA,C)

Page 27: Bayesian Divergence Time Estimation – Workshop Lecture

M R VSome models describing lineage-specific substitution ratevariation:

• Global molecular clock (Zuckerkandl & Pauling, 1962)

• Local molecular clocks (Hasegawa, Kishino & Yano 1989;Kishino & Hasegawa 1990; Yoder & Yang 2000; Yang & Yoder

2003, Drummond and Suchard 2010)

• Punctuated rate change model (Huelsenbeck, Larget andSwofford 2000)

• Log-normally distributed autocorrelated rates (Thorne,Kishino & Painter 1998; Kishino, Thorne & Bruno 2001; Thorne &

Kishino 2002)

• Uncorrelated/independent rates models (Drummond et al.2006; Rannala & Yang 2007; Lepage et al. 2007)

• Mixture models on branch rates (Heath, Holder, Huelsenbeck2012)

Models of Lineage-specific Rate Variation

Page 28: Bayesian Divergence Time Estimation – Workshop Lecture

G M C

The substitution rate isconstant over time

All lineages share the samerate

branch length = substitution rate

low high

Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)

Page 29: Bayesian Divergence Time Estimation – Workshop Lecture

G M C

rate

de

nsity

r

rate prior distribution

Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)

Page 30: Bayesian Divergence Time Estimation – Workshop Lecture

G M C

The sampled rate is appliedto every branch in the tree

rate

de

nsity

r

rate prior distribution

Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)

Page 31: Bayesian Divergence Time Estimation – Workshop Lecture

R G M C

Rates of evolution vary across lineages and over time

Mutation rate:Variation in

• metabolic rate

• generation time

• DNA repair

Fixation rate:Variation in

• strength and targets ofselection

• population sizes

10%

400 My

200 My

A B C

20%

10%10%

Page 32: Bayesian Divergence Time Estimation – Workshop Lecture

R-C M

To accommodate variation in substitution rates‘relaxed-clock’ models estimate lineage-specific substitutionrates

• Local molecular clocks

• Punctuated rate change model

• Log-normally distributed autocorrelated rates

• Uncorrelated/independent rates models

• Mixture models on branch rates

Page 33: Bayesian Divergence Time Estimation – Workshop Lecture

L M C

Rate shifts occurinfrequently over the tree

Closely related lineageshave equivalent rates(clustered by sub-clades)

low high

branch length = substitution rate

Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)

Page 34: Bayesian Divergence Time Estimation – Workshop Lecture

L M C

Most methods forestimating local clocksrequired specifying thenumber and locations ofrate changes a priori

Drummond and Suchard(2010) introduced aBayesian method thatsamples over a broad rangeof possible random localclocks

low high

branch length = substitution rate

Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)

Page 35: Bayesian Divergence Time Estimation – Workshop Lecture

A R

Substitution rates evolvegradually over time –closely related lineages havesimilar rates

The rate at a node isdrawn from a lognormaldistribution with a meanequal to the parent rate

low high

branch length = substitution rate

Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)

Page 36: Bayesian Divergence Time Estimation – Workshop Lecture

A R

Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)

Page 37: Bayesian Divergence Time Estimation – Workshop Lecture

P R C

Rate changes occur alonglineages according to apoint process

At rate-change events, thenew rate is a product ofthe parent’s rate and aΓ-distributed multiplier

low high

branch length = substitution rate

Models of Lineage-specific Rate Variation (Huelsenbeck, Larget and Swofford 2000)

Page 38: Bayesian Divergence Time Estimation – Workshop Lecture

I/U R

Lineage-specific rates areuncorrelated when the rateassigned to each branch isindependently drawn froman underlying distribution

low high

branch length = substitution rate

Models of Lineage-specific Rate Variation (Drummond et al. 2006)

Page 39: Bayesian Divergence Time Estimation – Workshop Lecture

I/U R

In BEAST, the rates for the branches are drawn from adiscretized lognormal distribution

De

nsity

0 2.01.51.00.5

Branch rate (r)

Branch rates under the uncorrelated, discritized LN model

Models of Lineage-specific Rate Variation (Drummond et al. 2006)

Page 40: Bayesian Divergence Time Estimation – Workshop Lecture

I M M

Dirichlet process prior:Branches are partitionedinto distinct rate categories

branch length = substitution rate

c5

c4

c3

c2

substitution rate classes

c1

Models of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE)

Page 41: Bayesian Divergence Time Estimation – Workshop Lecture

T D P P (DPP)

A stochastic process that models data as a mixture ofdistributions and can identify latent classes present in thedata

Branches are assumed toform distinct substitutionrate clusters

Efficient Markov chainMonte Carlo (MCMC)implementations allow forinference under this model

branch length = substitution rate

c5

c4

c3

c2

substitution rate classes

c1

DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)

Page 42: Bayesian Divergence Time Estimation – Workshop Lecture

T D P P (DPP)

A stochastic process that models data as a mixture ofdistributions and can identify latent classes present in thedata

Random variables under theDPP informed by the data:

• the number of rateclasses

• the assignment ofbranches to classes

• the rate value for eachclass

branch length = substitution rate

c5

c4

c3

c2

substitution rate classes

c1

DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)

Page 43: Bayesian Divergence Time Estimation – Workshop Lecture

T D P P (DPP)

A stochastic process that models data as a mixture ofdistributions and can identify latent classes present in thedata

Random variables under theDPP informed by the data:

• the number of rateclasses

• the assignment ofbranches to classes

• the rate value for eachclass

DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)

Page 44: Bayesian Divergence Time Estimation – Workshop Lecture

DPP C P

Global molecular

clock

rate classes

branch length = substitution rate

18c1

DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)

Page 45: Bayesian Divergence Time Estimation – Workshop Lecture

DPP C P

Independent

rates

1c1

1c18

1c3

1c2

rate classes

branch length = substitution rate

DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)

Page 46: Bayesian Divergence Time Estimation – Workshop Lecture

DPP C P

Local molecular

clock

5c3

5c2

rate classes

branch length = substitution rate

8c1

(262,142 different local molecular clocks)

DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)

Page 47: Bayesian Divergence Time Estimation – Workshop Lecture

DPP C P

Global molecular

clock

4c3

6c2

rate classes

branch length = substitution rate

8c1

Each of the

682,076,806,159

configurations

has a prior weight

DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)

Page 48: Bayesian Divergence Time Estimation – Workshop Lecture

M R V

These are only a subset of the available models forbranch-rate variation

• Global molecular clock

• Local molecular clocks

• Punctuated rate change model

• Log-normally distributed autocorrelated rates

• Uncorrelated/independent rates models

• Dirchlet process prior

Models of Lineage-specific Rate Variation

Page 49: Bayesian Divergence Time Estimation – Workshop Lecture

M R V

Are our models appropriate across all data sets?

cave bear

American

black bear

sloth bear

Asian

black bear

brown bear

polar bear

American giant

short-faced bear

giant panda

sun bear

harbor seal

spectacled

bear

4.08

5.39

5.66

12.86

2.75

5.05

19.09

35.7

0.88

4.58

[3.11–5.27]

[4.26–7.34]

[9.77–16.58]

[3.9–6.48]

[0.66–1.17]

[4.2–6.86]

[2.1–3.57]

[14.38–24.79]

[3.51–5.89]14.32

[9.77–16.58]

95% CI

mean age (Ma)

t 2

t 3

t 4

t 6

t 7

t 5

t 8

t 9

t 10

t x

node

MP•MLu•MLp•Bayesian

100•100•100•1.00

100•100•100•1.00

85•93•93•1.00

76•94•97•1.00

99•97•94•1.00

100•100•100•1.00

100•100•100•1.00

100•100•100•1.00

t 1

Eocene Oligocene Miocene Plio Plei Hol

34 5.3 1.823.8 0.01

Epochs

Ma

Global expansion of C4 biomassMajor temperature drop and increasing seasonality

Faunal turnover

Krause et al., 2008. Mitochondrial genomes reveal anexplosive radiation of extinct and extant bears near theMiocene-Pliocene boundary. BMC Evol. Biol. 8.

Taxa

1

5

10

50

100

500

1000

5000

10000

20000

0100200300MYA

Ophidiiformes

Percomorpha

Beryciformes

Lampriformes

Zeiforms

Polymixiiformes

Percopsif. + Gadiif.

Aulopiformes

Myctophiformes

Argentiniformes

Stomiiformes

Osmeriformes

Galaxiiformes

Salmoniformes

Esociformes

Characiformes

Siluriformes

Gymnotiformes

Cypriniformes

Gonorynchiformes

Denticipidae

Clupeomorpha

Osteoglossomorpha

Elopomorpha

Holostei

Chondrostei

Polypteriformes

Clade r ε ΔAIC

1. 0.041 0.0017 25.32. 0.081 * 25.53. 0.067 0.37 45.1 4. 0 * 3.1Bg. 0.011 0.0011

Ostariophysi

Acanthomorpha

Teleo

stei

Santini et al., 2009. Did genome duplication drive the originof teleosts? A comparative study of diversification inray-finned fishes. BMC Evol. Biol. 9.

Page 50: Bayesian Divergence Time Estimation – Workshop Lecture

M R V

These are only a subset of the available models forbranch-rate variation

• Global molecular clock

• Local molecular clocks

• Punctuated rate change model

• Log-normally distributed autocorrelated rates

• Uncorrelated/independent rates models

• Dirchlet process prior

Model selection and model uncertainty are very importantfor Bayesian divergence time analysis

Models of Lineage-specific Rate Variation

Page 51: Bayesian Divergence Time Estimation – Workshop Lecture

B D T E

Estimating divergence times relies on 2 main elements:

• Branch-specific rates: f (R | θR)

• Node ages: f (A | θA,C)

http://bayesiancook.blogspot.com/2013/12/two-sides-of-same-coin.html

Page 52: Bayesian Divergence Time Estimation – Workshop Lecture

P N T

Relaxed clock Bayesian analyses require a prior distributionon node times

f(A | θA)

Different node-age priors make different assumptions aboutthe timing of divergence events

Node Age Priors

Page 53: Bayesian Divergence Time Estimation – Workshop Lecture

G N T P

Assumed to be vague or uninformative by not makingassumptions about biological processes

Uniform prior: the time ata given node has equalprobability across theinterval between the timeof the parent node and thetime of the oldest daughternode

(conditioned on root age)

Node Age Priors

Page 54: Bayesian Divergence Time Estimation – Workshop Lecture

G N T P

Assumed to be vague or uninformative by not makingassumptions about biological processes

Dirichlet prior: ages of theinterior nodes on a singlepath spanning the age ofthe root node to one ofthe tip nodes are sampledfrom a flat Dirichletdistribution

(conditioned on root age)

Node Age Priors

Page 55: Bayesian Divergence Time Estimation – Workshop Lecture

S B P

Node-age priors based on stochastic models of lineagediversification

Yule process: assumes aconstant rate of speciation,across lineages

A pure birth process—everynode leaves extantdescendants (no extinction)

Node Age Priors

Page 56: Bayesian Divergence Time Estimation – Workshop Lecture

S B P

Node-age priors based on stochastic models of lineagediversification

Yule process: assumes aconstant rate of speciation,across lineages

A pure birth process—everynode leaves extantdescendants (no extinction)

Node Age Priors

Page 57: Bayesian Divergence Time Estimation – Workshop Lecture

S B P

Node-age priors based on stochastic models of lineagediversification

Constant-rate birth-deathprocess: at any point intime a lineage can speciateat rate λ or go extinct witha rate of μ

Node Age Priors

Page 58: Bayesian Divergence Time Estimation – Workshop Lecture

S B P

Node-age priors based on stochastic models of lineagediversification

Constant-rate birth-deathprocess: at any point intime a lineage can speciateat rate λ or go extinct witha rate of μ

Node Age Priors

Page 59: Bayesian Divergence Time Estimation – Workshop Lecture

S B P

Different values of λ and μ leadto different trees

Bayesian inference under thesemodels can be very sensitive tothe values of these parameters

Using hyperpriors on λ and μaccounts for uncertainty in thesehyperparameters

Node Age Priors

Page 60: Bayesian Divergence Time Estimation – Workshop Lecture

S B P

Node-age priors based on stochastic models of lineagediversification

Birth-death-samplingprocess: an extension ofthe constant-rate birth-deathmodel that accounts forrandom sampling of tips

Conditions on a probabilityof sampling a tip, ρ

Node Age Priors

Page 61: Bayesian Divergence Time Estimation – Workshop Lecture

S B P

Node-age priors based on stochastic models of lineagediversification

Birth-death-samplingprocess: an extension ofthe constant-rate birth-deathmodel that accounts forrandom sampling of tips

Conditions on a probabilityof sampling a tip, ρ

Node Age Priors

Page 62: Bayesian Divergence Time Estimation – Workshop Lecture

P N T

Sequence data are only informative on relative rates & times

Node-time priors cannot give precise estimates of absolutenode ages

We need external information (like fossils) to calibrate orscale the tree to absolute time

f(A | θA,C)

Node Age Priors

Page 63: Bayesian Divergence Time Estimation – Workshop Lecture

C D T

Fossils (or other data) are necessary to estimate absolutenode ages

There is no information inthe sequence data forabsolute time

Uncertainty in theplacement of fossils

N

A B C

20%

10%10%

10%200 My

400 My

Page 64: Bayesian Divergence Time Estimation – Workshop Lecture

C D

Bayesian inference is well suited to accommodatinguncertainty in the age of the calibration node

Divergence times arecalibrated by placingparametric densities oninternal nodes offset by ageestimates from the fossilrecord

N

A B C

200 My

De

nsity

Age

Page 65: Bayesian Divergence Time Estimation – Workshop Lecture

F C

Fossil and geological datacan be used to estimate theabsolute ages of ancientdivergences

Time (My)

Calibrating Divergence Times

Page 66: Bayesian Divergence Time Estimation – Workshop Lecture

F C

The ages of extant taxaare known

Time (My)

Calibrating Divergence Times

Page 67: Bayesian Divergence Time Estimation – Workshop Lecture

F C

Fossil taxa are assigned tomonophyletic clades

Time (My)Minimum age

Calibrating Divergence Times Notogoneus osculus (Grande & Grande J. Paleont. 2008)

Page 68: Bayesian Divergence Time Estimation – Workshop Lecture

F C

Fossil taxa are assigned tomonophyletic clades andconstrain the age of theMRCA

Minimum age Time (My)

Calibrating Divergence Times

Page 69: Bayesian Divergence Time Estimation – Workshop Lecture

A F C

Misplaced fossils can affect node age estimates throughoutthe tree – if the fossil is older than its presumed MRCA

Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)

Page 70: Bayesian Divergence Time Estimation – Workshop Lecture

A F C

Crown clade: allliving species andtheir most-recentcommon ancestor(MRCA)

Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)

Page 71: Bayesian Divergence Time Estimation – Workshop Lecture

A F C

Stem lineages:purely fossil formsthat are closer totheir descendantcrown clade thanany other crownclade

Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)

Page 72: Bayesian Divergence Time Estimation – Workshop Lecture

A F C

Fossiliferoushorizons: thesources in therock record forrelevant fossils

Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)

Page 73: Bayesian Divergence Time Estimation – Workshop Lecture

F C

Age estimates from fossilscan provide minimum timeconstraints for internalnodes

Reliable maximum boundsare typically unavailable

Minimum age Time (My)

Calibrating Divergence Times

Page 74: Bayesian Divergence Time Estimation – Workshop Lecture

P D C N

Parametric distributions aretypically off-set by the ageof the oldest fossil assignedto a clade

These prior densities do not(necessarily) requirespecification of maximumbounds

Uniform (min, max)

Exponential (λ)

Gamma (α, β)

Log Normal (µ, σ2)

Time (My)Minimum age

Calibrating Divergence Times

Page 75: Bayesian Divergence Time Estimation – Workshop Lecture

P D C N

Describe the waiting timebetween the divergenceevent and the age of theoldest fossil

Minimum age Time (My)

Calibrating Divergence Times

Page 76: Bayesian Divergence Time Estimation – Workshop Lecture

P D C N

Overly informative priorscan bias node ageestimates to be too young

Minimum age

Exponential (λ)

Time (My)

Calibrating Divergence Times

Page 77: Bayesian Divergence Time Estimation – Workshop Lecture

P D C N

Uncertainty in the age ofthe MRCA of the claderelative to the age of thefossil may be bettercaptured by vague priordensities

Minimum age

Exponential (λ)

Time (My)

Calibrating Divergence Times

Page 78: Bayesian Divergence Time Estimation – Workshop Lecture

P D C N

Expecte

d n

ode a

ge

Min age

(fossil)

De

nsity

Node age - Fossil age0 252015105 30

30

35

40

45

Lognormal prior density

λ = 5-1

λ = 20-1

λ = 60-1

De

nsity

Node age - Fossil age0 80604020 100

60

80

100

120

140

Exponential prior density

Expecte

d n

ode a

ge

Min age

(fossil)

Calibrating Divergence Times

Page 79: Bayesian Divergence Time Estimation – Workshop Lecture

P P

Hyperprior: place an higher-order prior on the parameter ofa prior distribution

Sample the time fromthe MRCA to thefossil from a mixtureof differentexponentialdistributions

Account foruncertainty in valuesof λ

De

nsity

Hyperparameter

Hyperprior

De

nsity

Parameter

Prior

Page 80: Bayesian Divergence Time Estimation – Workshop Lecture

P D C N

Common practice in Bayesian divergence-time estimation:

Estimates of absolute nodeages are driven primarily bythe calibration density

Specifying appropriatedensities is a challenge formost molecular biologists

Uniform (min, max)

Exponential (λ)

Gamma (α, β)

Log Normal (µ, σ2)

Time (My)Minimum age

Calibration Density Approach

Page 81: Bayesian Divergence Time Estimation – Workshop Lecture

I F C

We would prefer toeliminate the need forad hoc calibrationprior densities

Calibration densitiesdo not account fordiversification of fossils

Domestic dog

Spotted seal

Giant panda

Spectacled bear

Sun bear

Am. black bear

Asian black bear

Brown bear

Polar bear

Sloth bear

Zaragocyon daamsi

Ballusia elmensis

Ursavus brevihinus

Ailurarctos lufengensis

Ursavus primaevus

Agriarctos spp.

Kretzoiarctos beatrix

Indarctos vireti

Indarctos arctoides

Indarctos punjabiensis

Giant short-faced bear

Cave bear

Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)

Page 82: Bayesian Divergence Time Estimation – Workshop Lecture

I F C

We want to use allof the available fossils

Example: Bears12 fossils are reducedto 4 calibration ageswith calibration densitymethods

Domestic dog

Spotted seal

Giant panda

Spectacled bear

Sun bear

Am. black bear

Asian black bear

Brown bear

Polar bear

Sloth bear

Zaragocyon daamsi

Ballusia elmensis

Ursavus brevihinus

Ailurarctos lufengensis

Ursavus primaevus

Agriarctos spp.

Kretzoiarctos beatrix

Indarctos vireti

Indarctos arctoides

Indarctos punjabiensis

Giant short-faced bear

Cave bear

Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)

Page 83: Bayesian Divergence Time Estimation – Workshop Lecture

I F C

We want to use allof the available fossils

Example: Bears12 fossils are reducedto 4 calibration ageswith calibration densitymethods

Domestic dog

Spotted seal

Giant panda

Spectacled bear

Sun bear

Am. black bear

Asian black bear

Brown bear

Polar bear

Sloth bear

Zaragocyon daamsi

Ballusia elmensis

Ursavus brevihinus

Ailurarctos lufengensis

Ursavus primaevus

Agriarctos spp.

Kretzoiarctos beatrix

Indarctos vireti

Indarctos arctoides

Indarctos punjabiensis

Giant short-faced bear

Cave bear

Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)

Page 84: Bayesian Divergence Time Estimation – Workshop Lecture

I F C

Because fossils arepart of thediversification process,we can combine fossilcalibration withbirth-death models

Domestic dog

Spotted seal

Giant panda

Spectacled bear

Sun bear

Am. black bear

Asian black bear

Brown bear

Polar bear

Sloth bear

Zaragocyon daamsi

Ballusia elmensis

Ursavus brevihinus

Ailurarctos lufengensis

Ursavus primaevus

Agriarctos spp.

Kretzoiarctos beatrix

Indarctos vireti

Indarctos arctoides

Indarctos punjabiensis

Giant short-faced bear

Cave bear

Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)

Page 85: Bayesian Divergence Time Estimation – Workshop Lecture

I F C

This relies on abranching model thataccounts forspeciation, extinction,and rates offossilization,preservation, andrecovery

Domestic dog

Spotted seal

Giant panda

Spectacled bear

Sun bear

Am. black bear

Asian black bear

Brown bear

Polar bear

Sloth bear

Zaragocyon daamsi

Ballusia elmensis

Ursavus brevihinus

Ailurarctos lufengensis

Ursavus primaevus

Agriarctos spp.

Kretzoiarctos beatrix

Indarctos vireti

Indarctos arctoides

Indarctos punjabiensis

Giant short-faced bear

Cave bear

Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)

Page 86: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Improving statistical inference of absolute node ages

Eliminates the need to specify arbitrarycalibration densities

Better capture our statisticaluncertainty in species divergence dates

All reliable fossils associated with aclade are used 150 100 50 0

Time

Heath, Huelsenbeck, & Stadler. 2014. The fossilized birth-death process forcoherent calibration of divergence time estimates. PNAS 111(29):E2957–E2966.

http://www.pnas.org/content/111/29/E2957.abstract

Page 87: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Recovered fossil specimensprovide historicalobservations of thediversification process thatgenerated the tree ofextant species

150 100 50 0

Time

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 88: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

The probability of the treeand fossil observationsunder a birth-death modelwith rate parameters:

λ = speciation

μ = extinction

ψ = fossilization/recovery

150 100 50 0

Time

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 89: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

We assume that the fossilis a descendant of aspecified calibrated node

The time of the fossil: Aindicates an observation ofthe birth-death process afterthe age of the node N

0250 50100150200

Time (My)

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 90: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

The fossil must attach tothe tree at some time: ⋆

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 91: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

If it is the descendant ofan unobserved lineage, thenthere is a speciation eventat time ⋆ on one of the 2branches

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 92: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

If it is the descendant ofan unobserved lineage, thenthere is a speciation eventat time ⋆ on one of the 2branches

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 93: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

If ⋆ = A, the fossil is anobservation of a lineageancestral to the extantspecies

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 94: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

If ⋆ = A, the fossil is anobservation of a lineageancestral to the extantspecies

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 95: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

The probability of thisrealization of thediversification process isconditional on:

λ, μ, and ψ

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 96: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Using MCMC, we cansample the age of thecalibrated node • whileconditioning on

λ, μ, and ψother node agesA and ⋆

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 97: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

MCMC allows us toconsider all possible valuesof ⋆ (marginalization)

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 98: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

MCMC allows us toconsider all possible valuesof ⋆ (marginalization)

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 99: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

MCMC allows us toconsider all possible valuesof ⋆ (marginalization)

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 100: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

MCMC allows us toconsider all possible valuesof ⋆ (marginalization)

0250 50100150200

Time (My)

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 101: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

The posterior samples ofthe calibrated node age areinformed by the fossilattachment times

0250 50100150200

Time (My)

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 102: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

The FBD model allowsmultiple fossils to calibratea single node

N

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 103: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Given ⋆ and A, the newfossil can attach to the treevia speciation along eitherbranch in the extant tree

N

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 104: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Given ⋆ and A, the newfossil can attach to the treevia speciation along eitherbranch in the extant tree

N

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 105: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Given ⋆ and A, the newfossil can attach to the treevia speciation along eitherbranch in the extant tree

N

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 106: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Or the unobserved branchleading to the othercalibrating fossil

N

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 107: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

If ⋆ = A, then the newfossil lies directly on abranch in the extant tree

N

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 108: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

If ⋆ = A, then the newfossil lies directly on abranch in the extant tree

N

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 109: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Or it is an ancestor of theother calibrating fossil

N

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 110: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

The probability of thisrealization of thediversification process isconditional on:

λ, μ, and ψ

N

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 111: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Using MCMC, we cansample the age of thecalibrated node whileconditioning on

λ, μ, and ψother node agesA and ⋆

A and ⋆N

0250 50100150200

Time (My)

N

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 112: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

MCMC allows us toconsider all possible valuesof ⋆ (marginalization)

0250 50100150200

Time (My)

Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 113: Bayesian Divergence Time Estimation – Workshop Lecture

B I U FBD

Implemented in:

• DPPDiv:github.com/trayc7/FDPPDIV

• BEAST2:beast2.cs.auckland.ac.nz/

Available soon:

• RevBayes:sourceforge.net/projects/revbayes

150 100 50 0

Time

0250 50100150200

Time (My)

FBD Implementation (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 114: Bayesian Divergence Time Estimation – Workshop Lecture

L T T

The FBD model provides anestimate of the number oflineages over time

050100150200

Time (My)

3.0

2.0

1.0

1.0

50100150 0200

Time (My)

Nu

mb

er

of lin

ea

ge

s

Page 115: Bayesian Divergence Time Estimation – Workshop Lecture

L T T

Lineage diversity over time with fossils

MCMC samples thetimes of lineages inthe reconstructed treeand the times of thefossil lineages

(for 1 simulation replicate with 10% random sample of fossils)

Simulations: Results

Page 116: Bayesian Divergence Time Estimation – Workshop Lecture

L T T

Lineage diversity over time with fossils

Visualize extant andsampled fossil lineagediversification whenusing all availablefossils (21 total)

(for 1 simulation replicate with 10% random sample of fossils)

Simulations: Results

Page 117: Bayesian Divergence Time Estimation – Workshop Lecture

L T T

Lineage diversity over time with fossils

Choosing only theoldest (calibration)fossils reduced the setof sampled fossilsfrom 21 to 12,giving less informationabout diversificationover time

(for 1 simulation replicate with 10% random sample of fossils)

Simulations: Results

Page 118: Bayesian Divergence Time Estimation – Workshop Lecture

B: D T

Sequence data for extantspecies:

• 8 Ursidae• 1 Canidae (gray wolf)• 1 Phocidae (spotted seal)

Fossil ages:

• 12 Ursidae• 5 Canidae• 5 Pinnipedimorpha

DPP relaxed clock model

(Heath, Holder, Huelsenbeck MBE 2012)

Fixed tree topology

Se

qu

en

ce

Da

ta

Fossil Data Phylogenetic Relationships

fossil canids

fossil pinnepeds

stem fossil ursids

fossil Ailuropodinae

giant short-faced bear

cave bear

Empirical Analysis (Wang 1994, 1999; Krause et al. 2008; Fulton & Strobeck 2010; Abella et al. 2012)

Page 119: Bayesian Divergence Time Estimation – Workshop Lecture

B: D TGray wolf

Spotted seal

Giant panda

Spectacled bear

Sun bear

Am. black bear

Asian black bear

Brown bear

Polar bear

Sloth bear

1. fossil canids

2. fossil pinnepeds

2. stem fossil ursids

3. fossil Ailuropodinae

4. giant short-faced bear

5. cave bear

1

3

2

4

5

Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)

Page 120: Bayesian Divergence Time Estimation – Workshop Lecture

B: D TGray wolf

Spotted seal

Giant panda

Spectacled bear

Sun bear

Am. black bear

Asian black bear

Brown bear

Polar bear

Sloth bear

Urs

ida

e

Time (My)

60 2040 0

Eocene Oligocene Miocene

Plio

Ple

is

Paleocene

95% CI

Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)

Page 121: Bayesian Divergence Time Estimation – Workshop Lecture

B: D T

fossil Ailuropodinae

fossil Arctodus

fossil Ursus

fossil pinnipeds

Gray wolf

Spotted seal

Giant panda

Spectacled bear

Sun bear

Am. black bear

Asian black bear

Brown bear

Polar bear

Sloth bear

Urs

ida

e

Time (My)

60 2040 0

Eocene Oligocene Miocene

Plio

Ple

is

Paleocene

stem fossil Ursidae

fossil canids

Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)

Page 122: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Improved statistical inference of absolute node ages

Biologically motivatedmodels can bettercapture statisticaluncertainty

fossil Ailuropodinae

fossil Arctodus

fossil Ursus

fossil pinnipeds

Gray wolf

Spotted seal

Giant panda

Spectacled bear

Sun bear

Am. black bear

Asian black bear

Brown bear

Polar bear

Sloth bear

Urs

ida

e

Time (My)

60 2040 0

Eocene Oligocene Miocene

Plio

Ple

is

Paleocene

stem fossil Ursidae

fossil canids

Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 123: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Improved statistical inference of absolute node ages

Use all available fossils

Eliminates arbitrarychoice of calibrationpriors

fossil Ailuropodinae

fossil Arctodus

fossil Ursus

fossil pinnipeds

Gray wolf

Spotted seal

Giant panda

Spectacled bear

Sun bear

Am. black bear

Asian black bear

Brown bear

Polar bear

Sloth bear

Urs

ida

e

Time (My)

60 2040 0

Eocene Oligocene Miocene

Plio

Ple

is

Paleocene

stem fossil Ursidae

fossil canids

Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 124: Bayesian Divergence Time Estimation – Workshop Lecture

T F B-D P (FBD)

Improved statistical inference of absolute node ages

Extensions of the FBDcan account forstratigraphic samplingof fossils and shifts inrates of speciationand extinction

0175 255075100125150

Time

0.2 1.05

Preservation Rate

Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)

Page 125: Bayesian Divergence Time Estimation – Workshop Lecture

F T E D

Ideally, we would like toinclude all of the availabledata

Account for uncertainty inthe placement of fossillineages

Keep all fossil data, not justthe oldest descendant for agiven node

Time

Page 126: Bayesian Divergence Time Estimation – Workshop Lecture

F T E D

Combining extant and fossilspecies

Fo

ssil

Ag

e D

ata

Se

qu

en

ce

Da

ta

© AntWeb.org

Mo

rph

olo

gic

al

Da

ta © AntWeb.org

0%

20%

40%

60%

80%

100%

OrthopteraParaneoptera

NeuropteraRaphidiopteraColeo PolyphagaColeoptera AdephagaLepidopteraMecoptera

XyelaMacroxyela

RunariaParemphytus

Blasticotoma

TenthredoAglaostigmaDolerus

SelandriaStrongylogasterMonophadnoidesMetallus

Athalia

Taxonus

HoplocampaNematinusNematusCladiusMonoctenusGilpiniaDiprion

CimbicinaeAbiaCorynis

ArgeSterictiphoraPergaPhylacteophagaLophyrotoma

AcorduleceraDecameria

Neurotoma

OnycholydaPamphiliusCephalciaAcantholyda

Megalodontes cephalotesMegalodontes skorniakowii

CephusCalameutaHartigiaSyntexisSirexXerisUrocerusTremexXiphydria

Orussus

Stephanidae AStephanidae B

Megalyridae

Trigonalidae

Chalcidoidea

Evanioidea

Ichneumonidae

Cynipoidea

Apoidea AApoidea BApoidea CVespidae

Grimmaratavites

Ghilarella

AulidontesProtosirexAuliscaKaratavitesSepulca

Onokhoius

TrematothoraxThoracotremaProsyntexis

FerganolydaRudisiriciusSogutia

XyelulaBrigittepterus

MesolydaBrachysyntexis

DahuratomaPseudoxyelocerus

PalaeathaliaAnaxyela

SyntexyelaKulbastavia

Undatoma

Abrotoxyela

Mesoxyela mesozoicaSpathoxyela

TriassoxyelaLeioxyela

NigrimonticolaChaetoxyelaAnagaridyela

EoxyelaLiadoxyela

Xyelotoma

Pamphiliidae undescribed

Turgidontes

PraeoryssusParoryssus

Mesorussus

SymphytopterusCleistogasterLeptephialtitesStephanogaster

97

100

100

100100

100100

100

100

100100

100

100100

100

100100

100

100100

100

100

100100

100

100

100

100

100

100

97

68

55

77

75

91

84

6157

62

9896

71

71

85

93

52

95

64

80

64

99

77

99

98

97

82

52

58

9490

60

70

57

8361

7060

350 300 250 200 150 100 50 0 million years before present

% of morphological

characters scored

for each terminal

(Ronquist, Klopfstein, et al. Syst. Biol. 2012.)

Page 127: Bayesian Divergence Time Estimation – Workshop Lecture

O

Overview of divergence time estimation

• Relaxed clock models – accounting for variation insubstitution rates among lineages

• Tree priors and fossil calibration

break

BEAST v2.1.3 Tutorialhttp://treethinkers.org/divergence-time-estimation-using-beast/

• Walk through: set up BEAST input file in BEAUti andexecute BEAST MCMC analysis

• On your own: complete analysis & summarize output

dinner

Page 128: Bayesian Divergence Time Estimation – Workshop Lecture

D T E S

Program Models/Methodr8s Strict clock, local clocks, NPRS, PLape (R) NPRS, PLmultidivtime log-n autocorrelated (plus some others)PhyBayes OU, log-n autocorrelated (plus some others)PhyloBayes CIR, white noise (uncorrelated) (plus some others)BEAST Uncorrelated (log-n & exp), local clocks (plus others)TreeTime Dirichlet model, CPP, uncorrelatedMrBayes 3.2 CPP, strict clock, autocorrelated, uncorrelatedDPPDiv DPP, strict clock, uncorrelatedRevBayes “the limit is the sky”

Page 129: Bayesian Divergence Time Estimation – Workshop Lecture

BEAST

Bayesian Evolutionary Analysis Sampling Trees

• population size

• growth/decline inpopulation

• bottlenecks/transitionpoints

• gene trees/species trees

• virus transmissiondynamics

• recombination

• migration

• founder effects

• epidemiological tracking

• phylogeography

• trait evolution

• dates of MRCAs

• lineage rates

• ancestral character statereconstruction

• times ofbottlenecks/transitions

http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)

Page 130: Bayesian Divergence Time Estimation – Workshop Lecture

BEAST

Bayesian Evolutionary Analysis Sampling Trees

• free, open-source, cross-platform software package forestimating evolutionary parameters on rooted time trees

• includes several utility programs for creating input filesand summarizing output

• relies on a verbose XML syntax for executing analysis

http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)

Page 131: Bayesian Divergence Time Estimation – Workshop Lecture

BEAST

BEAUti

• a GUI, utility program fro generating properly formattedBEAST XML input files

http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)

Page 132: Bayesian Divergence Time Estimation – Workshop Lecture

BEASTBEAST 2 XML: The eXtensible Markup Language

• specifies the sequences, node calibrations, models, priors,output file names, etc.

• dataset-specific issues can arise and some understandingof the BEAST-specific XML format is essential fortroubleshooting

• there are a number of interesting models and analysesavailable in BEAST that cannot be specified using theBEAUti utility

• XML syntax help:http://www.beast2.org/wiki/index.php/BEAST_2.0.x_XML

http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)

Page 133: Bayesian Divergence Time Estimation – Workshop Lecture

BEAST

BEAST (analysis program)

• reads the commands in the xml input file• performs MCMC and generates output files

• <file_stem>.trees (contains the trees and branch rates forevery n generations)

• <file_stem>.log (contains the parameter samples forevery n generation)

• the main BEAST binary only runs a single chain (forMCMCMC use BEASTMC3 only available in BEASTv1.8)

http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)

Page 134: Bayesian Divergence Time Estimation – Workshop Lecture

BEAST

LogCombiner

• combines the log or trees files from multiple,independent MCMC runs

http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)

Page 135: Bayesian Divergence Time Estimation – Workshop Lecture

BEAST

TreeAnnotator

• reads in the trees file and summarizes the topology,branch times, and rates

• places annotations on the tree that can easily be viewedin FigTree

http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)

Page 136: Bayesian Divergence Time Estimation – Workshop Lecture

BEAST T

• Walkthrough using BEAUTti and executing the analysisin BEAST

• Independently complete the tutorial and summarize theBEAST output

http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)

Page 137: Bayesian Divergence Time Estimation – Workshop Lecture

BEAST TThis tutorial uses a simulated dataset of 10 taxa with 4calibration points

ML phylogeny

R

1

2

3

Log-normal (ln(20), 0.75)

Exponential (30-1)

Normal (140, 10)

Uniform (8, 32)

3

2

1

R

T1

T2

T3

T4

T5

T6

T8

T7

T9

T10

20406080100120 0160 140200 180

Time

60

40

8

32

1R Calibrated

nodes

FossilsT

mean = 140

mean = 90

median = 60

mean = 20