from ernst haeckel, 1891 the tree of life. classical approach considers morphological features ...
Post on 19-Dec-2015
222 views
TRANSCRIPT
![Page 1: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/1.jpg)
From Ernst Haeckel, 1891
The Tree of Life
![Page 2: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/2.jpg)
Classical approach considers morphological features number of legs, lengths of legs, etc.
Modern approach considers molecular features gene sequences protein sequences
Use of molecular data provides objective criteria for constructing phylogenetic trees
Phylogenetic Analysis
![Page 3: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/3.jpg)
Phylogenetic analysis is based on homologous sequences in different species (e.g., globins)
Sequences can be homologous for different reasons: orthologs -- sequences diverged after a speciation event
paralogs -- sequences diverged after a duplication event
xenologs -- sequences diverged after horizontal transfer (e.g., by virus)
Phylogenetic Analysis
![Page 4: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/4.jpg)
A tree is a collection of nodes and edges with no cycles (i.e. there is no path from a node to itself)
Tree topology refers to the “shape” of the tree
Tree Terminology
tree not a tree
topologically equivalent
![Page 5: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/5.jpg)
A tree is a collection of nodes and edges with no cycles (i.e. there is no path from a node to itself)
Classification of nodes (in the context of phylogenetic trees) root – (a single distinguished node) represents the common
ancestor internal nodes – represent intermediate ancestors in the course of
evolution leaves – (the non-branching nodes) represent the species for which
the tree is built
Tree Terminology
tree not a tree
![Page 6: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/6.jpg)
Rooted Trees internal nodes have 3 edges (1 for parent, 2 for children) a special node (the root) has 2 edges the leaves (the given taxa) have one edge
Unrooted trees – same as above but do not have root node
Tree Terminology
![Page 7: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/7.jpg)
Classification of nodes (in the context of phylogenetic trees) root – (a single distinguished node) represents the common ancestor internal nodes – represent ancestors in the course of evolution leaves – (the non-branching nodes) represent the species for which
the tree is built
When the root node is not specified the tree is unrooted
Tree Terminology
![Page 8: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/8.jpg)
Three Leaf Nodes
Only one unrooted tree is possible
Four Leaf Nodes
AA
D
C
B
D
BC
Three different unrooted trees are possible
A
B
C
D
A
B
C
Counting Trees How many trees are there that have n leaf nodes (or taxa)?
![Page 9: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/9.jpg)
How many trees are there that have n leaf nodes (or taxa)?
NR = Number of possible rooted trees
=
NU = Number of possible unrooted trees
=
)32(7531)!2(2
)!32(2
nn
nn
)52(7531)!3(2
)!52(3
nn
nn
Counting Trees
![Page 10: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/10.jpg)
n Unrooted Rooted3 1 3
4 3 15
5 15 105
6 105 945
7 945 10395
8 10395 135135
9 135135 2027025
10 2027025 34459425
11 34459425 654729075
12 65729075 1.375*10-10
Tree Explosion
![Page 11: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/11.jpg)
The number of possible rooted trees for 15 different taxa is
213,458,046,767,875
Assuming a computer can create a tree in 10-9 seconds, it would take 2.47 days of computation time to create them.
For 20 taxa, there are 8,200,794,532,637,891,559,337 possible trees and the same computer would take 259,867 years to generate this many trees!
Tree Explosion
![Page 12: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/12.jpg)
Distance-based UPGMA – Unweighted Pair-Group Mathod with Arithmetic Means Fitch-Margoliash (FM) Neighbor-Joining
Character-based Maximum parsimony algorithm
Algorithms
![Page 13: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/13.jpg)
Distance-based algorithms expect as input a matrix of distances (dij) between each pair of sequences
Distance data can be generated from the available sequences and models of base substitution
Jukes-Cantor model
p – fraction of mismatches
Kimura model
P – fraction of transitions
Q – fraction transversions
Distance Data
)3
41ln(
4
3pdij
)21
1ln(4
1)
21
1ln(2
1
QQPdij
![Page 14: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/14.jpg)
UPGMA Algorithm
![Page 15: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/15.jpg)
Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains
Algorithm Add a leaf to the tree for each taxon Initially make each taxon be its own cluster Find the closest clusters and connect with node in the tree
(place new node at equal distance from the clusters) Repeat previous step until all clusters are connected
UPGMA Algorithm
x4
x2
x3
x5
x1
x3 x5x1 x2 x4
root
![Page 16: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/16.jpg)
The algorithm needs to compute distance between clusters
The distance between clusters Ci and Cj is defined to be the average distance between all pairs of taxa in Ci and Cj
UPGMA Clustering
ji CqCpji
ji qpdCC
CCd,
),(||||
1),(
![Page 17: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/17.jpg)
The algorithm needs to compute distance between clusters
The distance between clusters Ci and Cj is defined to be the average distance between all pairs of taxa in Ci and Cj
Shortcut when combining Ci and Cj to form new cluster Ck
UPGMA Clustering
ji CqCpji
ji qpdCC
CCd,
),(||||
1),(
||||
),(||),(||),(
ji
ljjliilk CC
CCdCCCdCCCd
![Page 18: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/18.jpg)
UPGMA Example
![Page 19: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/19.jpg)
Assume the following distance matrix
x1 x2 x3 x4 x5
x1 - 16 6 16 6
x2 16 - 16 8 16
x3 6 16 - 16 2
x4 16 8 16 - 16
x5 6 16 2 16 -Closest Pair is {x3, x5} so cluster them, C1 = {x3,C5}
Compute the distance from C1 to the rest
d(C1,x1) = 1/2 (d(x3,x1) + d(x5,x1) ) = 6
d(C1,x2) = 1/2 (d(x3,x2) + d(x5,x2) ) = 16
d(C1,x4) = 1/2 (d(x3,x4) + d(x5,x4) ) = 16
Add new node for x3, x5 at height d(x3,x5) / 2 = 1
x3 x5
11
UPGMA
![Page 20: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/20.jpg)
x1 x2 x4 C1
x1 - 16 16 6
x2 16 - 8 16
x4 16 8 - 16
C1 6 16 16 -
Closest Pair is {x1, C1} so cluster them, C2 = {x1,C1}
Compute the distances from C2 to the
d(C2,x2) = 1/3 (d(x1,x2) + d(x3,x2) +d(x5,x2) ) = 16
d(C2,x4) = 1/3 (d(x1,x4) + d(x3,x4) +d(x5,x4) ) = 16
Add new node for x1, C1 at height d(x1,C1) / 2 = 3
The updated distance matrix – C1 replaced x3, x5
x1
3 2
x3 x5
11
UPGMA
![Page 21: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/21.jpg)
Closest Pair is {x2, x4} so cluster them, C3 = {x2,x4}
Compute the distances from C3 to the rest
d(C3,C2) = 1/6 (d(x2,x1) + d(x2,x3) +d(x2,x5) +
d(x4,x1) + d(x4,x3) +d(x4,x5)) = 16
Add new node for x2, x4 at height d(x2,x4) / 2 = 4
The updated distance matrix – C2 replaced x1, C1
x2 x4 C2
x2 - 8 16
x4 8 - 16
C2 16 16 -
x3 x5
1
x1
3 2
1
x2 x4
4 4
UPGMA
![Page 22: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/22.jpg)
Closest Pair is {C2, C3} so cluster them, C4 = {C2,C3}
Add new node for C2, C3 at height d(C2,C4) / 2 = 8
The updated distance matrix – C3 replaced x2, x4
C2 C3
C2 - 16
C3 16 -
x3 x5
1
x1
3 2
1
x2 x4
4 4
45
root
UPGMA
Done!
Double-check if original distances between taxa are preserved (not guaranteed)
![Page 23: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/23.jpg)
UPGMA Summary
Distance-based algorithm that produces rooted trees
Assumes that all species evolve at the same rate
(molecular clock hypothesis)
Implication of molecular clock hypothesis is that
distance from root to any taxon is the same
Final tree may not preserve original
distances between the taxa
x3 x5
1
x1
3 2
1
x2 x4
4 4
45
root
![Page 24: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/24.jpg)
Fitch-Margoliash (FM) Algorithm
![Page 25: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/25.jpg)
FM Algorithm
Similar to UPGMA but removes molecular clock assumption
(i.e. distance from an internal node to leaves differs)
Produces unrooted trees
Algorithm (similar to UPGMA) Add a leaf to the tree for each taxon Initially make each taxon be its own cluster Find the closest clusters and connect with node in the tree (place new node at equal distance from the clusters
at distance given by 3-point formula) Repeat previous step until all clusters are connected
![Page 26: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/26.jpg)
Given three taxa i, j, k with distances d(i, j), d(i, k), d(j, k)
where should the interior node m be placed to connect the
taxa and preserve the distances?
i
j
k
m
)),(),(),((2
1),( jidkjdkidkmd
FM and 3-point formula
),(),(),(),(),( jidkmdkjdkmdkid ),(),(),( jidjmdmid
),(),(),( kmdkidmid
),(),(),( kmdkjdjmd
![Page 27: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/27.jpg)
Given three taxa i, j, k with distances d(i, j), d(i, k), d(j, k)
where should the interior node m be placed to connect the
taxa and preserve the distances?
i
j
k
m
FM and 3-point formula
)),(),(),((2
1),( jidkjdkidkmd
)),(),(),((2
1),( jkdijdikdimd
)),(),(),((2
1),( kidjkdjidjmd
![Page 28: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/28.jpg)
Algorithm (similar to UPGMA) Add a leaf to the tree for each taxon Initially make each taxon be its own cluster Find the closest clusters and connect with node in the tree
(place new at distance given by 3-point formula, where the points are clusters of tax and we use the distance between
clusters) Repeat previous step until all clusters are connected
FM Algorithm
x4
x2
x3
x5
x1
x3
x5
x1
x2
x4
![Page 29: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/29.jpg)
Apply the FM algorithm to the following distance matrix:
B C D E
A .31 1.01 .75 1.03
B - 1.00 .69 .90
C - - .61 .42
D - - - .37
A and B are closest; temporarily group C-D-E and compute d(A, B), d(A, C-D-E), d(B, C-D-E) to apply 3-point formula
d(A,C-D-E) = 1/3(1.01+.75+1.03) = .93
d(B,C-D-E) = 1/3(1.00+.69+.90) = .863
d(A, B) = .31 only used to helpus group A, B
By 3-point formula:
d(C-D-E,X) = 1/2(d(C-D-E,A) + d(C-D-E,B) – d(A,B))
d(B, X) = 1/2(d(B,A) + d(B,C-D-E) – d(A,C-D-E))
d(A, X) = 1/2(d(A,B) + d(A,C-D-E) – d(B,C-D-E))
C-D-E.7415
A
.1215
.1885
B
X
![Page 30: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/30.jpg)
A and B are combined in a cluster for the rest of the algorithm, so need to recompute the distances from A-B to other clusters:
d(A-B,C) = 1/2(1.01 + 1.00) = 1.005
d(A-B,D) = 1/2(.75 +.69) = .72
d(A-B, E) = 1/2(1.03 + .90) = .965
The updated table is:
C D E
A-B 1.005 .72 .965
C - .61 .42
D - - .37
The partial tree so far is:
A
.1215
.1885
B
![Page 31: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/31.jpg)
Based on the updated table
C D E
A-B 1.005 .72 .965
C - .61 .42
D - - .37
D and E are closest; temporarily group A-B-C and compute d(D, E), d(D, A-B-C), d(E, A-B-C) to apply 3-point formula
d(D,A-B-C) = 1/3(.75+.69+.61) = .683
d(E,A-B-C) = 1/3(1.03+.90+.42) = .783
d(D, E) = .37
only used to helpus group D, E
.135
.548
.235E
D
A-B-C YBy 3-point formula:
d(A-B-C,Y) = 1/2(d(A-B-C, D) + d(A-B-C,E) – d(D,E))
d(D, Y) = 1/2(d(D,E) + d(D,A-B-C) – d(E,A-B-C))
d(E, Y) = 1/2(d(E,D) + d(E,A-B-C) – d(D,A-B-C))
![Page 32: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/32.jpg)
The partial tree so far is:
D and E are combined in a cluster for the rest of the algorithm, so need to recompute the distances from D-E to other clusters:
d(A-B,D-E) = 1/4 (.75+1.03+.69+90) = .8425
d(A-B,C) = 1/2(1.01 + 1.00) = 1.005
d(C,D-E) = 1/2 (.61+.42) = .515
.135
.235E
DA
.1215
.1885
B
The updated table is now:C D-E
A-B 1.005 .8425
C - .515
![Page 33: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/33.jpg)
Based on the updated table
C D-E
A-B 1.005 .8425
C - .515
There are only three clusters, so just apply the 3-point formula
d(A-B,Z) = 1/2(d(A-B, D-E) + d(A-B,C) – d(D-E,C))
d(D-E,Z) = 1/2(d(D-E,A-B) + d(D-E,-C) – d(A-B,C))
d(C, Y) = 1/2(d(C,A-B) + d(C,D-E) – d(A-B,D-E))
A-B
.33875
.17625
.66625
C
D-E
Z
![Page 34: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/34.jpg)
Now we need to expand the clusters A-B, D-E
We also need to compute the values for a and b:
The negative value for b is a cause for concern about the quality of the data. If we are confident of our data and since .00875 is close to 0, b would be set to 0.
A-B
.33875
.17625
.66625
C
D-E
Z
.33875
CA
.1215
.1885
B
a
.135
.235
E
DbZ
d(A-B, Z) = 1/2 (d(A,Z) + d(B, Z)) = 1/2 (.1885+a + .1215+a) = .66625
a = .51125
d(D-E, Z) = 1/2 (d(D,Z) + d(E, Z)) = 1/2 (.235+b + .135+b) = .17265
b = -.00875
![Page 35: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/35.jpg)
FM Summary
Distance-based algorithm that produces unrooted trees
Removes the assumption of molecular clock, but does not give information about the root (common ancestor)
To detect the root could introduce an extra taxon (outgroup) that is more distantly related to the given taxa
![Page 36: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/36.jpg)
Neighbor-Joining (NJ) Algorithm
![Page 37: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/37.jpg)
NJ Algorithm
Similar to FM (also removes molecular clock assumption)
but more sophisticated in how it selects clusters to join
Produces unrooted trees
Algorithm (similar to FM) Add a leaf to the tree for each taxon Initially make each taxon be its own cluster Find the closest clusters (using more sophisticated criterion) (place new node at distance given by a variant of 3-point formula) Repeat previous step until all clusters are connected
![Page 38: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/38.jpg)
Suppose that you are given n taxa x1, x2, x3, …, xn, and suppose that you have some tree that fits the distance data
NJ “closeness” Criterion
observation: d(x1,x2) + d(xi,xj) < d(x1,xi) + d(x2,xj)
x2
x1
x4x5
x3
x6
yz
(right side includes yz twice, left does not)
![Page 39: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/39.jpg)
From previous slide
NJ “closeness” Criteriond(x1,x2) + d(xi,xj) < d(x1,xi) + d(x2,xj)
d(x1,x2) + d(x3,x4) < d(x1,x3) + d(x2,x4)d(x1,x2) + d(x3,x5) < d(x1,x3) + d(x2,x5)d(x1,x2) + d(x3,x6) < d(x1,x3) + d(x2,x6) … … …
d(x1,x2) + d(x3,xn) < d(x1,x3) + d(x2,xn)
-------------------------------------------------
For a fixed i, say i = 3:
4
),2()3,1()3(4
),3()2,1()3(k kxxdxxdn
k kxxdxxdn
Add d(x3,x1),d(x3,x2) , d(x3,x3), d(x2,x1), d(x2,x2) to both sides
1
),2()3,1()2(1
),3()2,1()2(k kxxdxxdn
k kxxdxxdn
2),1()2()2,1()2( SixxdniSxxdn
iSixxdnSxxdn ),1()2(2)2,1()2(
iSSixxdnSSxxdn 1),1()2(21)2,1()2(
![Page 40: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/40.jpg)
From previous slide, if x1 and x2 are neighbors
Let
Then in general, if xk and xl are neighbors
NJ uses this observation to determine “closeness” and computes the smallest value M(k, l) to determine a cluster
Unlike UPGMA and FM, NJ has a more global view of “closeness” when selecting neighbors
NJ “closeness” Criterion
iSSixxdnSSxxdn 1),1()2(21)2,1()2(
),(),( mkMlkM
lSkSlxkxdnlkM ),()2(),(
![Page 41: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/41.jpg)
If x1 and x2 are neighbors; where should new node y be
NJ new node Placement
x2
x1
x4 x5
x3
y
by 3-point formula
))3,2()3,1()2,1((2/1)1,( xxdxxdxxdxyd
))4,2()4,1()2,1((2/1)1,( xxdxxdxxdxyd
))5,2()5,1()2,1((2/1)1,( xxdxxdxxdxyd
)),2(),1()2,1((2/1)1,( nxxdnxxdxxdxyd … … …
--------------------------------------------------------------
3
),2(3),1()2,1(2/)2()1,()2(k kxxd
k kxxdxxdnxydn
add on right side d(x1,x1 ) + d(x1,x2) - d(x2,x1 ) - d(x2,x2 )
)21)2,1((2/)2()1,()2( SSxxdnxydn
)2
2
2
1)2,1((2/1)1,(
n
S
n
Sxxdxyd
![Page 42: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/42.jpg)
For each pair of nodes xk and xl compute the quantity
Actually, could compute
When xk and xl are replaced by new node y, place y at
From now on Si will always be divided implicitly by (n-2)
NJ mini summary
lSkSlxkxdnlkM ),()2(),(
)22
),((2/1),(
n
Sl
n
Sklxkxdkxyd
22),(),(
n
Sl
n
SklxkxdlkM
)22
)2,1((2/1),(
n
Sk
n
Slxxdlxyd
![Page 43: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/43.jpg)
NJ Algorithm
From the distance matrix compute the criterion matrix
Find the smallest value in M(i, j) – cluster the corresponding pair
Connect taxa xi and xj with a new node y placed at distance
Remove xi and xj and replace with y; update the distance matrix using the 3-point formula
Repeat from beginning
lSkSlxkxdlkM ),(),(
)),((2/1),( jSiSjxixdixyd
)),((2/1),( iSjSjxixdjxyd
)),(),(),((2/1),( jxixdkxjxdkxixdkxyd
![Page 44: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/44.jpg)
Apply the NJ algorithm to the
given distance matrix:x1 x2 x3 x4 x5 x6
x1 - 8 3 14 10 12
x2 8 - 9 10 6 8
x3 3 9 - 15 11 13
x4 14 10 15 - 10 8
x5 10 6 11 10 - 8
X6 12 8 13 8 8 -
First compute Si=sum-of-row / (n-2)
Compute
M(1,2) = d(1,2) – S1 – S2 = 8 – 22= -14
M(1,3) = d(1,3) – S1 – S3 = 3 – 24.5= -21.5
M(1,4) = d(1,4) – S1 – S4 = 14 – 26 = -12
M(1,5) = d(1,5) – S1 – S5 = 10 – 23 = -13
M(1,4) = d(1,4) – S1 – S4 = 12 – 24 = -12
and so on …
S1= 11.75 S2=10.25 S3=12.75
S4=14.25 S5=11.25 S6= 12.25
lSkSlxkxdlkM ),(),(
x1 x2 x3 x4 x5 x6
x1 - -14 -21 -12 -13 -12
x2 - -14 -14 -15 -14
x3 - -12 -13 -12
x4 - -15 -18
x5 - -15
X6 -Find min value, i.e. the pair to cluster
![Page 45: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/45.jpg)
From previous slide we need to cluster x1 and x3
Add a new taxon x7 and place it at distance
Recompute distances from x7 to all
others using the 3-point formula
x1
21
x3
x7
1)75.1275.113(2/1)31)3,1((2/1)1,7( SSxxdxxd
2)75.1175.123(2/1)13)3,1((2/1)3,7( SSxxdxxd
d(7,2) = ½(d(1,2) + d(3,2) – d(1,3)) = 7
d(7,4) = ½(d(1,4) + d(3,4) – d(1,3)) = 13
d(7,5) = ½(d(1,5) + d(3,5) – d(1,3)) = 9
d(7,6) = ½(d(1,6) + d(3,6) – d(1,3)) = 11
x2 x4 x5 x6 x7
x2 - 10 6 8 7
x4 10 - 10 8 13
x5 6 10 - 8 9
x6 8 8 8 - 11
x7 7 13 9 11 -
![Page 46: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/46.jpg)
Apply the NJ algorithm to the
new distance matrix:
First compute Si=sum-of-row / (n-2)
Compute
M(2,4) = d(2,4) – S2 – S4 =
M(2,5) = d(2,5) – S2 – S5 =
M(2,6) = d(2,6) – S2 – S6 =
M(2,7) = d(2,7) – S2 – S7 =
and so on …
S2= S4= S5= S6= S7=
lSkSlxkxdlkM ),(),(
x2 x4 x5 x6 x7
x2 - 10 6 8 7
x4 10 - 10 8 13
x5 6 10 - 8 9
x6 8 8 8 - 11
x7 7 13 9 11 -
x2 x4 x5 x6 x7
x2 -
x4 - -
x5 - - -
x6 - - - -
x7 - - - - -
Find min value, i.e. the pair to cluster
![Page 47: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/47.jpg)
From previous slide we need to cluster ? and ??
Add a new taxon x8 and place it at distance
Recompute distances from x8 to all
others using the 3-point formula
x?
??
x??
x8
)???)??,?((2/1)??,8( SSxxdxxd
)???)??,?((2/1)?,8( SSxxdxxd
x? x? x? x8
x? -
x? -
x? -
x6 -
![Page 48: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/48.jpg)
NJ Summary
Distance-based algorithm that produces unrooted trees
Removes the assumption of molecular clock, but does not give information about the root (common ancestor)
Typically performs better than UPGMA and FM – uses a more global criterion to select pairs to cluster
To detect the root could introduce an extra taxon (outgroup) that is more distantly related to the given taxa
![Page 49: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/49.jpg)
![Page 50: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/50.jpg)
Maximum Parsimony (MP)Algorithm
![Page 51: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/51.jpg)
MP Algorithm
Character-based algorithm – does not use distances, but utilizes the character information in sequences
A criticism of distance-based methods is that they do not exploit the structure of the sequences (collapse them to a number – the distance)
Main philosophy is “economy of substitutions” – find the tree that requires the fewest mutations (maximum parsimony)
![Page 52: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/52.jpg)
MP Algorithm
The strategy explore a number of possible trees report the tree with smallest score (most parsimonious)
Need to be able to solve two problems small parsimony problem -- given a candidate tree compute its
parsimony score
large parsimony problem -- generate efficiently viable candidate trees (cannot generate all – tree explosion)
![Page 53: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/53.jpg)
Small Parsimony Problem
Given a candidate tree, compute its parsimony score
Consider a candidate tree for one-site sequences
s1 = A s2 = T s3 = T s4 = G s5 = A
A T T G A
AT
AG
T
AGT
Final Score = 3
![Page 54: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/54.jpg)
Solving Small Parsimony Problem
explore the tree bottom-up (from leaves to interior) for each internal node one level up
if the “labels” at the two child nodes have no symbols in common assign as label at this node the sum of both labels
penalize the tree one unit
if the “labels” at the two child nodes do have
symbols in common, label with common portion
no penalty
AGC
AG
C
AG
GT
G
![Page 55: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/55.jpg)
Solving Small Parsimony Problem
For n-site sequences run the algorithm in parallel for each site and add up the parsimony scores for all sites
Consider a candidate tree for the following sequences
s1 = ATC s2 = ACC s3 = GTA s4 = GCA
ATC ACC GTA GCA
TC
A C
AG
TAC
TCT
A
Final Score = 4
![Page 56: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/56.jpg)
Solving Large Parsimony Problem
Generate efficiently viable candidate trees (cannot try all)
Branch-and-bound approach create a possible tree by some method; calculate its score start building a tree from scratch; discarding trees that cost more
than current best
![Page 57: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/57.jpg)
Solving Large Parsimony Problem
Branch-and-bound approach
http://artedi.ebc.uu.se/course/X3-2004/Phylogeny/Phylogeny-TreeSearch/Phylogeny-Search.html
![Page 58: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/58.jpg)
MP Summary
Character-based algorithm – uses the sequence data
Produces unrooted trees
Economy of substitution – best tree is one that requires fewest number of substitutions
Examines a number of possible trees in search for best one
![Page 59: From Ernst Haeckel, 1891 The Tree of Life. Classical approach considers morphological features number of legs, lengths of legs, etc. Modern approach](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d2f5503460f94a06d9f/html5/thumbnails/59.jpg)