distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf ·...
TRANSCRIPT
![Page 1: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/1.jpg)
Distance-based approaches to inferring phylogenetic trees
![Page 2: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/2.jpg)
Review
• Input: – Data from a set of genes/species
• Output: – A phylogenetic tree that accurately
characterizes the respective lineages
![Page 3: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/3.jpg)
Inference
• We infer trees because we don�t really know all the species, esp. ancestors represented by internal nodes.
• Today, we�ll discuss simple approaches for phylogenetic tree inference based on distance.
![Page 4: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/4.jpg)
Other species trees
Darwin�s Finches
Primates
http://members.aol.com/darwinpage/trees.htm
![Page 5: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/5.jpg)
Example gene tree
Lodish et al. (2000)
![Page 6: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/6.jpg)
A sample tree
Reed et al. (2004)
![Page 7: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/7.jpg)
Basic construction approaches
• Distance – Tree accounts for evolutionary distances
estimated from data • Parsimony
– Tree that requires minimum about of change to explain the data
• Maximum likelihood – Tree that maximizes the likelihood of the data
![Page 8: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/8.jpg)
Details about ideal distance metrics
• D(xi, xj) >= 0 – Distances must be non-negative
• D(xi, xi) = 0 • D(xi, xj) = D(xj, xi)
– symmetric • D(xi, xj) <= D(xi, xa) + D(xa, xj)
– Additive property
![Page 9: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/9.jpg)
Goal of distance approach
• Given a m x m matrix, where each value is the distance between two sequences.
• Build a tree such that distances between two leaves i and j is consistent with the matrix data.
![Page 10: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/10.jpg)
UPGMA Method
• Unweighted Pair Group Method using Arithmetic Averages
• Distance is defined between two clusters Ci and Cj such that:
![Page 11: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/11.jpg)
Basic idea
• Dij is the average distance between pairs of taxa from each cluster
• Algorithm: – Start with one taxa per cluster – Iteratively pick two clusters and merge – Create a new node in the tree for the
merged cluster
![Page 12: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/12.jpg)
More specifics
• Place each taxon at height 0 in the tree
• While more than two clusters: – Determine clusters with smallest dij – Merge clusters into a new one Ck – Make a new node k at height dij/2 – Replace Ci and Cj with Ck – Recompute distance of Ck to other clusters
• Hook in the two remaining clusters to the root with height calculated as above.
![Page 13: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/13.jpg)
Updating distances
• Distance between Ck and Cl defined as:
![Page 14: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/14.jpg)
In-class example
• Consider the following symmetric matrix:
A B C D E A 0 5 3 8 10 B 5 0 5 8 10 C 3 5 0 8 10 D 8 8 8 0 1 E 10 10 10 1 0
![Page 15: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/15.jpg)
λ UPGMA visually
![Page 16: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/16.jpg)
Molecular clocks
• A molecular clock assumption is divergence is uniform and equal across all branches of the tree
• Seldom (never?) true in practice
• If it is true, these data are called ultrametric.
![Page 17: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/17.jpg)
Neighbor joining
• Does not assume a molecular clock, but does assume additively – Distance between a pair of leaves is sum of edges between
them
• Constructs an unrooted tree iteratively, just like UPGMA
• Two differences: – How subtrees selected – How distances are updated
• Root can be added via inclusion of an �outgroup�
![Page 18: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/18.jpg)
Basics
• NJ is a greedy algorithm that starts with a center star tree (all taxa connected to a single root)
• Criterion for merging is key: identifies topological neighbors using math that is correct for all additive distance matrices.
• Once merged, the two taxa are treated as a single taxon.
![Page 19: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/19.jpg)
λ Reconstructing a 3 leaved tree
![Page 20: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/20.jpg)
Caveats
• Matrix is updated iteratively after merge.
• Produces unrooted trees; need an outgroup for a rooted version
• Always gives the true tree if distances are additive (may not with noise)
![Page 21: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/21.jpg)
λ Four-point condition
λ Pairwise distances are additive if and only if for every set of four leaves i,j,k,l, two of the following three sums are equal and larger than the third:
- Dij + Dkl - Dik + Djl - Dil + Djk
![Page 22: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/22.jpg)
λ Example
• A
• B
• C
• D
• 0.1
• 0.1
• 0.1
• 0.4 • 0.4
• Neighbor-joining will find the correct tree here
![Page 23: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/23.jpg)
One more thing
• There are an assortment of formats for trees as there is with DNA sequence data.
• Newick format indicated in the text is one of the more common ones (like FASTA is for sequences) – Ex: ((A,B),(C,D))
![Page 24: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/24.jpg)
Why compute more distances?
• The most crucial piece of NJ is computing a new matrix using the previously mentioned equations.
• This allows us to choose the smallest one greedily, in something called the �4-point condition.�
• UPGMA is a simpler form of NJ that is correct when distance between all taxa and the root is the same. – This seems true but rarely holds up in observed DNA
sequence data
![Page 25: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/25.jpg)
Calculating distances
• Saitou and Nei (1987)
![Page 26: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/26.jpg)
SARS
• The genome of SARS was sequenced by a Canadian group in April 2003
• 29,751bp, single stranded RNA sequence
• Has 5-6 genes in the typical structure of a coronavirus – One of the causes of the common cold
![Page 27: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/27.jpg)
Where did SARS come from?
![Page 28: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/28.jpg)
Himalayan palm civet
![Page 29: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/29.jpg)
Neighbor joining can also be used to study epidemiology
![Page 30: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/30.jpg)
Date of origin
![Page 31: Distance-based approaches to inferring phylogenetic treessemrich/bc17/notes/lecture17.pdf · 2017-10-26 · – A phylogenetic tree that accurately characterizes the respective lineages](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f203ccf7a797230cc06172c/html5/thumbnails/31.jpg)
Finishing up
• Build a NJ tree for the matrix earlier:
A B C D E A 0 5 3 8 10 B 5 0 5 8 10 C 3 5 0 8 10 D 8 8 8 0 1 E 10 10 10 1 0