michel veuille ecole pratique des hautes etudes director of the systematics and evolution dept

30
Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept Muséum National d’Histoire Naturelle Paris Scientific Advisory Board of the CBOL Data Analysis Working Group

Upload: tawana

Post on 19-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept Muséum National d’Histoire Naturelle Paris Scientific Advisory Board of the CBOL Data Analysis Working Group. What is the molecular signature of speciation events?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Michel VeuilleEcole pratique des Hautes Etudes

Director of the Systematics and Evolution deptMuséum National d’Histoire Naturelle

Paris

Scientific Advisory Board of the CBOL

Data Analysis Working Group

Page 2: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

What is the molecular signature of speciation events?

There is no molecular signature of speciation events

What are the other signatures of speciation events?

There is no universal signature of speciation events

But there are local signatures of speciation events,and one kind of signature (e.g. morphological) can be present when the other (e.g. genetical) is absent

Page 3: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

In 1998, the common European earwig was shown to consist of two sympatric and reproductively isolated species differing only in the number of annual broods (one or two broods per year).

Wirth, Le Guellec, Vancassel, & Veuille. 1998. Evolution 52: 260-265Wirth, Le Guellec, & M. Veuille. 1999 MBE, 16: 1645-1653.

A case of two mtDNA specieswith no morphological difference

The two species differ strikingly in COII sequence

But since they present no apparent morphological difference, the two species remain unnamed

Two examples : 1st / 2

European earwig Forficula auricularia

This is because the GC% of these species evolves at a very high rate

GC% at COII in hexapoda

earwigs

Other hexapoda

Page 4: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Drosophila santomea lives in the highlands of São Tome above 1100 mDrosophila yakuba lives in the lowlands, below 1100 m.

After Lachaise et al. Proc. Roy Soc. London, 2000

A case of two morphological specieswith no mtDNA difference

Two examples : 2nd / 2

Drosophila santomea Drosophila yakuba

São Tome

They hybridize at 1100 m, and nevertheless remain genetically distinct

They share the same mitochondria, but can be easily identified through the colour pattern of the abdomen

Page 5: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

1830 Tropical Africa + worldwide

D. erecta

D. teissieri

D. yakuba

D. santomea

D. melanogaster

D. simulans

D. mauritiana

D. sechellia

D. orena

2000 São Tome island

1919 Tropical Africa + worldwide

1978 Cameroon

1974 Tropical Africa

1971 Tropical Africa

1954 Tropical Africa

1974 Mauritius island1981 Sechelles islands

D. santomea D. yakuba

Share the same mitochondrion through common descent

They belong to the Drosophila melanogaster ("black abdomen") subgroup

Page 6: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

There are many definitions of species

The species concept is hotly debated

The condition of the barcoder is challenging

« Species » make sense to everybody.

For example, 12% of the nouns in the French vocabulary* correspond to taxa that make sense to a taxonomist (species, families, varieties)

* : From the Robert a classic French dictionary

A solution is to let people use whatever species concept they prefer

and limit the barcoder’s activity to the domain where he/she can be helpful

Page 7: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

?0,000,000 species Black boxData & tools

« This is species A or B »

« This is a new species »

Data analysis consists in providing data to taxonomists, in order to make decisions about the status of specimens and taxa.

(taxonomist)(barcoder)

Barcoding and taxonomic decisions are logically distinct, even though they can be performed by the same person.

What data analysis is about

Page 8: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Query sequence

closest validated node

Tree of life

Local barcode

sister group

closest COI validated node

Tree of life

Local barcode

Closest validated node using additional information

If we want to be 100% sure of the assignment of a taxon, then we must look at the nodes below the closest node excluding a sister group with probability p < 0.01.

Below this point, a series of statistical and classificatory approaches allow us to estimate the probability that the query sequence belongs or not to an already described species, based on the available information.

Alternatively, additional information using other genes, or an enlarged dataset can increase our understanding of the taxonomic status of the query.

What data analysis is about (contd)

Page 9: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

The population genetics background behind data analysis

Page 10: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Principletwo sequences from the same population find their last common ancestor with some constant probabiilty p = 1/N It is a « death process » Very different from a normal distribution

The most probable coalescence time: t = 1

the expectation: t = N

P = 0.05 for: t = 3N

Past (generations)

Page 11: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

2 399 19

MRCA

Sample n1

n

p

Probability p that the MRCA of a sample of size n is also the MRCA of the speciesassuming a standard Wright-Fisher model.

p increases very rapidly. The probability is p = 0.6667 for n = 5, and p = 0.8 for p = 9Increasing the sample size beyond this is useless

In a very large population p = (n-1)/(n+1)

Page 12: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

MRCA

Sample n1

N generations

2N (1-1/n)generations

Typically, under a standard equilibrium Wright-Fisher model(*) , the expected time to the last common ancestor of the tree (MRCA) is only twice the time to the common ancestor of two randomly sampled sequences

(*) assuming :- neutrality - constant population size- no structuring - mutation drift-equilibrium- N = effective number of genes

Page 13: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

MRCA

Sample n1 Sample n2 > n1

MRCA

« The older nodes of a genealogy tend to be revealed in a small sample, whereas more recent portions are, on average, only revealed as the sample size per locus grows large. » 

Kliman et al. 2000.

N generations

2N (1-1/n)generations

Using a larger dataset does not increase the information very much at this level

Page 14: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

After AG Clark 1997

A long time after they have split, two species still share some neutral polymorphisms.

polymorphisms can go very far, back in the past of the species, and enter the ancestral population with a sister species

Page 15: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Exploring shallow nodes

Page 16: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Derived from Nielsen and Hey’s (2001) IM method, based on MCMC(Monte Carlo Markov chains).

This method estimated 5 Parameters, thus involving very long computation time

1. Nielsen and Matzen’s MCMC method

Page 17: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

1. Matz and Nielsen’s MCMC method

Derived from Nielsen and Hey’s (2001) IM method, based on MCMC(Monte Carlo Markov chains).

This method estimated 5 Parameters, thus involving very long computation time

Matz and Nielsen (2005) reduce it to two parameters:- the population size- time to speciation.

They estimate the probability that the query sequence belongs or not to the same species as the reference sample

Page 18: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

The classification methods partition the dataset using a few characters

The distance methods work well with a small dataset, provided there are enough mutations

2. Evaluating classification and phylogenetic methods : Austerlitz et al.

They compare two classification methods CARTrandom forest

And two phylogenetic methodsNeighbour-joiningphy-ML

They simulate n +1 individuals in each species.

n individuals are a reference sample

the last individual is the query.

Repeated simulations, allow them to record the rate of

correct assigment of the query to its species

Page 19: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Comparison of the methods for a low

(2 populations, reference sample size = 10, )

50%

60%

70%

80%

90%

100%

100 1000 10000

Separation time

succ

ess

rate

mlcartRF

Classification methods perform better for a low variation

Page 20: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Comparison of the methods for a high

(2 populations, Reference sample size = 10, θ = 30)

50%

60%

70%

80%

90%

100%

100 1000 10000

Separation time

succ

ess

rate

mlCARTRF

Phylogenetic methods perform better for a highly variable population

Page 21: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Conclusion :

the appropriate method varies with the properties of the dataset

Page 22: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Comparing methods using realistic datasets

Page 23: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

1. Litoria nannotis

2. Astraptes fulgeraptor

80.00%

85.00%

90.00%

95.00%

100.00%

0 5 10 15 20 25 30

succ

ess

rate

sample size

ML

CART

Random Forest

90%

91%

92%

93%

94%

95%

96%

97%

98%

99%

100%

3 4 5 6 7 8 9 10

Reference Sample size

Go

od

ass

ign

men

t ra

te

phylo

CART

4 speciesAverage sample size: 43.7average = 1.54

12 speciesAverage sample size: 38.8average = 23.5

Page 24: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

3. Cowries

80.00%

85.00%

90.00%

95.00%

100.00%

0 5 10 15 20 25 30

sample size

good

ass

ignm

ent r

ate

MLCARTRandom Forest

Page 25: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Other solutions:

Can we replace CO1 ?Can we complement it with other genes

Page 26: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Properties of bilaterian mtDNA Other systems

Large number of copies per cell rDNA has a high copy number

High mutation rate

Low variation / divergence ratio

No recombination

asexual

Haploid X-chromosome, Y chromosome

Centromeres, telomeres (documented in Drosophila)

Microsatellites also

Centromeres, telomeres (documented in Drosophila)

The Y is asexualThe other chromosomes recombine

Maternally inherited

The main disadvantage of asexuality is that mitochondria do not follow the 2nd law of Mendel :

mtDNA carries no information on genetic barriers..

The main disadvantage of maternal inheritance is that mitochondria can be transferred horizontally along with Wolbachia endosymbiotic bacteria. Examples: Protocalliphora and Drosophila

Variation in mtDNA is lowered due to selective sweeps according to Bazin et al (2006)Variation is also lowered in some nuclear regions due to background selection

Page 27: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Phylogeny of the fly Protocalliphora based on AFLP (nuclear markers),according to Whitworth et al (2007).

Symbols represent different Wolbachia strains

Maternally transmitted endosymbiotic bacteria : hitchhiking by Wolbachia

Phylogeny of Protocalliphora based on COI+COII.The authors claim that the assignment of unknown individuals to species is impossible in 60% of the species

After Whitworth et al. Proc Roy. Soc. B, in press

nuclear

mtDNA

Page 28: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

MRCA

Phylogenetic tree of mtDNA Phylogram of nuclear DNA

A phyletic tree in mtDNA represents true phyletic relationships.Mutations are in linkage disequilibrium because they do not recombine.Having two divergent clades is trivial under a FW standard model

Whereas the phylogram of a recombining gene represents distances between haplotypes,where mutations can seem to « appear » repeatedly on several terminal branches.

They thus inform us on the existence of barrier to gene flow

Page 29: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Conclusions

1. There is no mitochondrial signature of speciation. There is no room for a barcode species concept, and anything like a « barcodon ».

2. Even a moderate sample can provide a wealth of information on the history of a species.

3. Additional information can be obtained in difficult cases, either by increasing the population sample, or by using additional markers.

Page 30: Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

The END