msc in bio informaticsmscbioinformatics.uab.cat/base/documents/... · mscin bioinformatics module...
TRANSCRIPT
Molecular Evolution and Phylogeny (2)Sebastián E. Ramos-Onsins
Centre of Research in Agricultural Genomics
(CRAG )
1
Module 2: Core BioinformaticsModule 2: Core Bioinformatics
MSc in Bioinformatics
Course 2014-15
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
2 Sebastián E. Ramos-OnsinsMolecular Evolution
Representation of the genealogical relationships
among species, genes, population or even
individuals.
Phylogeny:
Ziheng Yang (2006)
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
3 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
A tree is a graphical representation of the relationships between
lineages using a tree structure in nodes and branches.
Rooted vs Unrooted Trees:
1
2
3
4
5
6
12
3
4
5
6
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
4 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Cladogram vs Phylogram Trees:
1
2
3
4
5
6
1
2
3
4
5
6
Qualitative Lengths are represented
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
5 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Unsolved vs resolved Trees:
Star Tree Partially resolved Tree Resolved Tree
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
6 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Species vs Gene Trees:
1
2
3
4
5
6
1
2
3
4
5
6
Based on multiple information
of the species
Based on a single or few regions of
(ex.) DNA of the species
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
7 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Ultrametric and AdditiveTrees: (not excludent)
1
2
3
4
5
6
Ex: d45 <= d43 = d53
The distances between any three
nodes connected by the same internal
node are equal.
d15 = d1i + dij + djk + dk5
The distances between species on the tips of
the tree are equal to the sum of the lengths
of the branches connecting them.
1
2
3
4
5
6
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
8 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Let’s create a tree history using R:
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
9 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Distance Methods
- Maximum Parsimony
- Maximum Likelihood
- Bayesian Inference
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
10 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Distance Methods
- Maximum Parsimony
- Maximum Likelihood
- Bayesian Inference
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
11 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Distance Methods
Two steps:
- Calculate the distance matrix.
- Reconstruct the phylogenetic tree from matrix.
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
12 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Distance Methods
UPGMA (Unweighted Pair Group Method with Arithmetic Mean)
1 2 3 4
1 0
2 1 0
3 2 4 0
4 3 5 6 0
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
13 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Distance Methods
UPGMA (Unweighted Pair Group Method with Arithmetic Mean)
1 2 3 4
1 0
2 1 0
3 2 4 0
4 3 5 6 0
3 4 5
3 0
4 6 0
5 3 4 0
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
14 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Distance Methods
UPGMA (Unweighted Pair Group Method with Arithmetic Mean)
1 2 3 4
1 0
2 1 0
3 2 4 0
4 3 5 6 0
3 4 5
3 0
4 6 0
5 3 4 0
4 6
4 0
6 4.67 0
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
15 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Distance Methods
UPGMA (Unweighted Pair Group Method with Arithmetic Mean)
node1 node2 go.to.n
ode
Div
1 1 - 5 0.5
2 2 - 5 0.5
3 3 - 6 1.5
4 4 - 7 2.33
5 2 1 6 1.0
6 5 3 7 0.83
7 6 4 - -
1
2
3
4
5
6
7
0.5
1.5
2.33
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
16 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Distance Methods
NJ (Neighbour-Joining): Minimum evolution tree criterion based on the
smallest sum of total length branches.
Starting from a star-tree, join the two nodes that give the minimum length
distance, repeat the process until resolve the tree.
From Yang 2006
To calculate the distances, it is assumed they are additive.
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
17 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Distance Methods
- Maximum Parsimony
- Maximum Likelihood
- Bayesian Inference
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
18 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
-Maximum Parsimony:
-Criterion based on minimum evolution.
-The best tree is the tree with the minimum number of changes.
-Reconstruct all possible trees assigning values to the internal nodes and score the
trees according to the number of changes.
-Heuristic methods are necessary for large samples.
-Long Branch Attraction (LBA) is specially problematic in MP trees; MP trees support
wrong reconstructions in case having longer branches (join together).
A
A
AG
G G G
A
A G A Aa
b d
c a
b
d
c
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
19 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Distance Methods
- Maximum Parsimony
- Maximum Likelihood
- Bayesian Inference
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
20 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Maximum Likelihood:
-Criterion is the maximum probability tree.
-Calculate the probability of a tree for a given evolutionary model
-Computationally expensive calculations to obtain the ML tree.
-Nice statistical properties. Popular method and gives reasonable results.
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
21 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Distance Methods
- Maximum Parsimony
- Maximum Likelihood
- Bayesian Inference
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
22 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Tree-reconstruction Methods
- Bayesian Inference
Seek for a distribution of compatible trees with the highest probabilities
according to a given model and a prior distribution of the parameters included.
Main criticisms concerning the selection of the prior distributions.
Method also popular and gives reasonable results.
Based on the Bayes theorem (inverse probability theorem):
P(A|B) = P(A) x P(B|A)
P(B)
P(A) x P(B|A)
P(A) x P(B|A) + P(Ā) x P(B|Ā)=
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
23 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Let’s do a simple tree reconstruction using R:
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
24 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Different methods to contrast the support of phylogenetic trees
-Depending on the method of reconstruction (Bremer Support in MP)
-Non-parameteric methods of resampling (no model is assumed)
-Parametric methods (assuming a model)
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
25 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Different methods to contrast the support of phylogenetic trees
-Depending on the method of reconstruction (Bremer Support in MP)
-Non-parameteric methods of resampling (no model is assumed)
-Parametric methods (assuming a model)
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
26 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Non-parameteric methods of resampling (no model is assumed)
Jacknife
Bootstrap
-Draw a subset of the data
-This data is used to infer again the tree
-The support for the obtained tree is obtained from the number of
times the same clusters (nodes) are obtained in the
pseudoreplicates.
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
27 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Non-parameteric methods of resampling (no model is assumed)
Jacknife
Bootstrap
Assumptions:
-Data size is large, so we have accurate estimates of the error.
-Each position (column in the alignment) is independent from each
other.
Results:
The resulted values are not directly a probability value but a support
value of the reliability of the obtained tree.
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
28 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Non-parameteric methods of resampling (no model is assumed)
Bootstrap
1
2
3
4
5
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
29 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Non-parameteric methods of resampling (no model is assumed)
Bootstrap
1234567
ATCTTCT
GTCTTCT
ATGATCC
ATGAACC
AGGAACC
1
2
3
4
5
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
30 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Non-parameteric methods of resampling (no model is assumed)
Bootstrap
1234567
ATCTTCT
GTCTTCT
ATGATCC
ATGAACC
AGGAACC
Resampling
1137721
AACTTTA
GGCTTTG
AAGCCTA
AAGCCTA
AAGCCGA
1
2
3
4
5
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
31 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Non-parameteric methods of resampling (no model is assumed)
Bootstrap
Resampling Do Tree
1
2
3
4
5
1
2
3
4
5
1234567
ATCTTCT
GTCTTCT
ATGATCC
ATGAACC
AGGAACC
1137721
AACTTTA
GGCTTTG
AAGCCTA
AAGCCTA
AAGCCGA
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
32 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Non-parameteric methods of resampling (no model is assumed)
Bootstrap
Resampling Do Tree
1
2
3
4
5
1
2
3
4
5
+1
+1
+0
1234567
ATCTTCT
GTCTTCT
ATGATCC
ATGAACC
AGGAACC
1137721
AACTTTA
GGCTTTG
AAGCCTA
AAGCCTA
AAGCCGA
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
33 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Non-parameteric methods of resampling (no model is assumed)
Bootstrap
Resampling Do Tree
1
2
3
4
5
1
2
3
4
5
+1
+1
+0
… and repeat again n times!
1234567
ATCTTCT
GTCTTCT
ATGATCC
ATGAACC
AGGAACC
1137721
AACTTTA
GGCTTTG
AAGCCTA
AAGCCTA
AAGCCGA
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
34 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Let’s do a Bootstrap analysis using R:
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
35 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Different methods to contrast the support of phylogenetic trees
-Depending on the method of reconstruction (Bremer Support in MP)
-Non-parameteric methods of resampling (no model is assumed)
-Parametric methods (assuming a model)
- Parametric bootstraping
- Bayesian Inference
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
36 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Different methods to contrast the support of phylogenetic trees
-Depending on the method of reconstruction (Bremer Support in MP)
-Non-parameteric methods of resampling (no model is assumed)
-Parametric methods (assuming a model)
- Parametric bootstraping
Repetition of phylogeny based on a given model
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
37 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Support of the phylogenetic Trees obtained
Different methods to contrast the support of phylogenetic trees
-Depending on the method of reconstruction (Bremer Support in MP)
-Non-parameteric methods of resampling (no model is assumed)
-Parametric methods (assuming a model)
-Bayesian Inference
-Bayesian inference itself collects compatible trees assuming
the uncertainty of the tree
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
38 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Phylogenomics: An approach to obtain the Species Tree
In case the speciation process is close among species, a gene tree can give
an erroneous topology:
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
39 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Phylogenomics: An approach to obtain the Species Tree
In case the speciation process is close among species, a gene tree can give
an erroneous topology:
Incomplete Lineage Sorting
Anomalous Region
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
40 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Phylogenomics: An approach to obtain the Species Tree
-Having a large number of regions (or also information from different
sources) can help to solve the incongruence.
-Heuristic methods based on a Supermatrix (concatenate all regions as
one) or on a Supertree (make a single tree from individual trees) are used.
-Likelihood-based methods are computationally expensive but are
statistically well supported.
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
41 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Let’s try to obtain the species Tree using the library phybase in R:
MSc in Bioinformatics Module 2: Core BioinformaticsModule 2: Core Bioinformatics
42 Sebastián E. Ramos-OnsinsMolecular Evolution
Phylogeny
Use of phylogenies for different objectives:
- Ancestral sequence reconstruction
- Dating ancestral events
- Detection of selection (Syn vs Nsyn positions)
- Correlation of the phylogenetic signal with phenotypic Traits