giri narasimhan - school of computing and information …giri/teach/qbic/su11/lec9.pdf · giri...
TRANSCRIPT
![Page 1: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/1.jpg)
7/25/11 CAP 5510 / CGS 5166 1
BSC 4934: QʼBIC Capstone Workshop"
Giri Narasimhan ECS 254A; Phone: x3748
[email protected] http://www.cs.fiu.edu/~giri/teach/BSC4934_Su11.html
July 2011
![Page 2: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/2.jpg)
Introduction
Page 215
Darwin: Evolution & Natural Selection" Charles Darwin’s 1859 book (On the Origin of Species
By Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life) introduced the Theory of Evolution.
Struggle for existence induces a natural selection. Offspring are dissimilar from their parents (that is, variability exists), and individuals that are more fit for a given environment are selected for. In this way, over long periods of time, species evolve. Groups of organisms change over time so that descendants differ structurally and functionally from their ancestors.
Slide by Pevsner 7/25/11 2 CAP 5510 / CGS 5166
![Page 3: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/3.jpg)
7/25/11 CAP 5510 / CGS 5166 3
Dominant View of Evolution" All existing organisms are derived from a common
ancestor and that new species arise by splitting of a population into subpopulations that do not cross-breed.
Organization: Directed Rooted Tree; Existing species: Leaves; Common ancestor species (divergence event): Internal node; Length of an edge: Time.
![Page 4: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/4.jpg)
plants animals
monera
fungi protists
protozoa
invertebrates
vertebrates
mammals Five kingdom system
(Haeckel, 1879)
Page 516
Slide by Pevsner
7/25/11 4 CAP 5510 / CGS 5166
![Page 5: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/5.jpg)
Evolution & Phylogeny" At the molecular level, evolution is a process of
mutation with selection. Molecular evolution is the study of changes in genes
and proteins throughout different branches of the tree of life.
Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features between organisms. Today, molecular sequence data are also used for phylogenetic analyses.
Slide by Pevsner 7/25/11 5 CAP 5510 / CGS 5166
![Page 6: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/6.jpg)
Questions for Phylogenetic Analysis" How many genes are related to my favorite gene? How related are whales, dolphins & porpoises to
cows? Where and when did HIV or other viruses
originate? What is the history of life on earth? Was the extinct quagga more like a zebra or a
horse?
Slide by Pevsner
7/25/11 6 CAP 5510 / CGS 5166
![Page 7: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/7.jpg)
Phylogenetic Trees" Molecular phylogeny
uses trees to depict evolutionary relationships among organisms. These trees are based upon DNA and protein sequence data.
A
B
C
D
E
F
G
H I
time
6
2 1 1
2
1
2
Slide by Pevsner 7/25/11 7 CAP 5510 / CGS 5166
![Page 8: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/8.jpg)
A
B
C
D
E
F
G
H I
time
6
2 1 1
2
1
2
6
1 2
2
1
A
B C
2
1
2 D
E one unit
Tree nomenclature
taxon
taxon
Fig. 7.8 Page 232
Tree Nomenclature"Slide by Pevsner
7/25/11 8 CAP 5510 / CGS 5166
![Page 9: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/9.jpg)
A
B
C
D
E
F
G
H I
time
6
2 1 1
2
1
2
6
1 2
2
1
A
B C
2
1
2 D
E one unit
Tree nomenclature
taxon
operational taxonomic unit (OTU) such as a protein sequence
Fig. 7.8 Page 232
Slide by Pevsner
7/25/11 9 CAP 5510 / CGS 5166
![Page 10: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/10.jpg)
A
B
C
D
E
F
G
H I
time
6
2 1 1
2
1
2
6
1 2
2
1
A
B C
2
1
2 D
E one unit
Tree nomenclature
branch (edge)
Node (intersection or terminating point of two or more branches)
Fig. 7.8 Page 232
Slide by Pevsner
7/25/11 10 CAP 5510 / CGS 5166
![Page 11: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/11.jpg)
A
B
C
D
E
F
G
H I
time
6
2 1 1
2
1
2
6
1 2
2
1
A
B C
2
1
2 D
E one unit
Tree nomenclature
Branches are unscaled... Branches are scaled...
…branch lengths are proportional to number of amino acid changes
…OTUs are neatly aligned, and nodes reflect time
Fig. 7.8 Page 232
Slide by Pevsner
7/25/11 11 CAP 5510 / CGS 5166
![Page 12: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/12.jpg)
A
B
C
D
E
F
G
H I
time
6
2 1 1
2
1
2
6
1 2
2
1
A
B C
2 2 D
E one unit
Tree nomenclature
bifurcating internal node
multifurcating internal node
Fig. 7.9 Page 233
Slide by Pevsner
7/25/11 12 CAP 5510 / CGS 5166
![Page 13: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/13.jpg)
Examples of multifurcation: failure to resolve the branching order of some metazoans and protostomes
Rokas A. et al., Animal Evolution and the Molecular Signature of Radiations Compressed in Time, Science 310:1933 (2005), Fig. 1.
Slide by Pevsner
7/25/11 13 CAP 5510 / CGS 5166
![Page 14: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/14.jpg)
A
B C
D
E
F
G
H I
time
6
2 1 1
2
1
2
Tree nomenclature: clades
Clade ABF (monophyletic group)
Fig. 7.8 Page 232
Slide by Pevsner
7/25/11 14 CAP 5510 / CGS 5166
![Page 15: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/15.jpg)
A
B
C
D
E
F
G
H I
time
6
2 1 1
2
1
2
Tree nomenclature
Clade CDH
Fig. 7.8 Page 232
Slide by Pevsner
7/25/11 15 CAP 5510 / CGS 5166
![Page 16: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/16.jpg)
A
B
C
D
E
F
G
H I
time
6
2 1 1
2
1
2
Tree nomenclature
Clade ABF/CDH/G
Fig. 7.8 Page 232
Slide by Pevsner
7/25/11 16 CAP 5510 / CGS 5166
![Page 17: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/17.jpg)
Examples of clades
Lindblad-Toh et al., Nature 438: 803 (2005), fig. 10
Slide by Pevsner
7/25/11 17 CAP 5510 / CGS 5166
![Page 18: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/18.jpg)
Tree nomenclature: roots
past
present
1
2 3 4
5
6 7 8
9
4
5
8 7
1
2
3 6
Rooted tree (specifies evolutionary path)
Unrooted tree
Fig. 7.10 Page 234
Slide by Pevsner
7/25/11 18 CAP 5510 / CGS 5166
![Page 19: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/19.jpg)
Tree nomenclature: outgroup rooting
past
present
1
2 3 4
5
6 7 8
9
Rooted tree
1 2 3 4
5 6 Outgroup
(used to place the root)
7 9 10
root
8
Fig. 7.10 Page 234
Slide by Pevsner
7/25/11 19 CAP 5510 / CGS 5166
![Page 20: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/20.jpg)
7/25/11 CAP 5510 / CGS 5166 20
Constructing Evolutionary/Phylogenetic Trees"
2 broad categories: " Distance-based methods
Ultrametric Additive:
UPGMA Transformed Distance Neighbor-Joining
" Character-based Maximum Parsimony Maximum Likelihood Bayesian Methods
![Page 21: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/21.jpg)
7/25/11 CAP 5510 / CGS 5166 21
Ultrametric" An ultrametric tree:
" decreasing internal node labels " distance between two nodes is label of
least common ancestor. An ultrametric distance matrix:
" Symmetric matrix such that for every i, j, k, there is tie for maximum of D(i,j), D(j,k), D(i,k)
Dij, Dik
i j k
Djk
![Page 22: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/22.jpg)
7/25/11 CAP 5510 / CGS 5166 22
Ultrametric: Assumptions" Molecular Clock Hypothesis, Zuckerkandl & Pauling,
1962: Accepted point mutations in amino acid sequence of a protein occurs at a constant rate. " Varies from protein to protein " Varies from one part of a protein to another
![Page 23: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/23.jpg)
7/25/11 CAP 5510 / CGS 5166 23
Ultrametric Data Sources" Lab-based methods: hybridization
" Take denatured DNA of the 2 taxa and let them hybridize. Then measure energy to separate.
Sequence-based methods: distance
![Page 24: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/24.jpg)
7/25/11 CAP 5510 / CGS 5166 24
Ultrametric: Example"
A B C D E F G H
A 0 4 3 4 5 4 3 4
B
C
D
E
F
G
H C,G
B,D,F,H
E
A
5
4
3
![Page 25: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/25.jpg)
7/25/11 CAP 5510 / CGS 5166 25
Ultrametric: Example"
A B C D E F G H
A 0 4 3 4 5 4 3 4
B 0 4 2 5 1 4 4
C
D
E
F
G
H A C,G
E
5
4
3
F
D H
B
2
1
![Page 26: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/26.jpg)
7/25/11 CAP 5510 / CGS 5166 26
Ultrametric: Distances Computed"
A B C D E F G H
A 0 4 3 4 5 4 3 4
B 0 4 2 5 1 4 4
C 2
D
E
F
G
H A C,G
E
5
4
3
F
D H
B
2
1
![Page 27: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/27.jpg)
7/25/11 CAP 5510 / CGS 5166 27
Ultrametric: Assumptions" Molecular Clock Hypothesis, Zuckerkandl & Pauling,
1962: Accepted point mutations in amino acid sequence of a protein occurs at a constant rate. " Varies from protein to protein " Varies from one part of a protein to another
![Page 28: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/28.jpg)
7/25/11 CAP 5510 / CGS 5166 28
Ultrametric Data Sources" Lab-based methods: hybridization
" Take denatured DNA of the 2 taxa and let them hybridize. Then measure energy to separate.
Sequence-based methods: distance
![Page 29: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/29.jpg)
7/25/11 CAP 5510 / CGS 5166 29
Additive-Distance Trees"
A B C D
A 0 3 7 9
B 0 6 8
C 0 6
D 0
A 2
B C
D 3
2
4
1
Additive distance trees are edge-weighted trees, with distance between leaf nodes are exactly equal to length of path between nodes.
![Page 30: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/30.jpg)
7/25/11 CAP 5510 / CGS 5166 30
Unrooted Trees on 4 Taxa"
A
D
C
B
A
D
B
C
A
B
C
D
![Page 31: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/31.jpg)
7/25/11 CAP 5510 / CGS 5166 31
Four-Point Condition" If the true tree is as shown below, then
1. dAB + dCD < dAC + dBD, and 2. dAB + dCD < dAD + dBC
A
D
C
B
![Page 32: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/32.jpg)
7/25/11 CAP 5510 / CGS 5166 32
Unweighted pair-group method with arithmetic means (UPGMA)"
A B C
B dAB
C dAC dBC
D dAD dBD dCD
A B
dAB/2
AB C
C d(AB)C
D d(AB)D dCD
d(AB)C = (dAC + dBC) /2
![Page 33: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/33.jpg)
7/25/11 CAP 5510 / CGS 5166 33
Transformed Distance Method" UPGMA makes errors when rate constancy among
lineages does not hold. Remedy: introduce an outgroup & make corrections
Now apply UPGMA ⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛
+−−
=∑=
n
DDDDD
n
kkO
jOiOijij 1
2'
![Page 34: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/34.jpg)
7/25/11 CAP 5510 / CGS 5166 34
Saitou & Nei: Neighbor-Joining Method"
Start with a star topology. Find the pair to separate such that the total length
of the tree is minimized. The pair is then replaced by its arithmetic mean, and the process is repeated.
∑∑≤≤≤= −
++−
+=njiij
n
kkk D
nDD
nDS
3321
1212
)2(1)(
)2(21
2
![Page 35: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/35.jpg)
7/25/11 CAP 5510 / CGS 5166 35
Neighbor-Joining"
1
2
n n
3 3
1
2
∑∑≤≤≤= −
++−
+=njiij
n
kkk D
nDD
nDS
3321
1212
)2(1)(
)2(21
2
![Page 36: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/36.jpg)
7/25/11 CAP 5510 / CGS 5166 36
Constructing Evolutionary/Phylogenetic Trees"
2 broad categories: " Distance-based methods
Ultrametric Additive:
UPGMA Transformed Distance Neighbor-Joining
" Character-based Maximum Parsimony Maximum Likelihood Bayesian Methods
![Page 37: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/37.jpg)
7/25/11 CAP 5510 / CGS 5166 37
Character-based Methods" Input: characters, morphological features, sequences, etc. Output: phylogenetic tree that provides the history of what features
changed. [Perfect Phylogeny Problem] one leaf/object, 1 edge per character, path ⇔changed traits
1 2 3 4 5
A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 0 1
D 0 0 1 1 0
E 0 1 0 0 0
3
4
2
1
5 D
A C
E B
![Page 38: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/38.jpg)
7/25/11 CAP 5510 / CGS 5166 38
Example" Perfect phylogeny does not always exist.
1 2 3 4 5
A 1 1 0 0 0
B 0 0 1 0 1
C 1 1 0 0 1
D 0 0 1 1 0
E 0 1 0 0 1
1 2 3 4 5
A 1 1 0 0 0
B 0 0 1 0 0
C 1 1 0 0 1
D 0 0 1 1 0
E 0 1 0 0 0 3
4
2
1
5 D
A C
E B
![Page 39: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/39.jpg)
7/25/11 CAP 5510 / CGS 5166 39
Maximum Parsimony" Minimize the total number of mutations implied by
the evolutionary history
![Page 40: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/40.jpg)
7/25/11 CAP 5510 / CGS 5166 40
Examples of Character Data"
Characters/Sites
Sequences 1 2 3 4 5 6 7 8 9
1 A A G A G T T C A
2 A G C C G T T C T
3 A G A T A T C C A
4 A G A G A T C C T
1 2 3 4 5
A 1 1 0 0 0
B 0 0 1 0 1
C 1 1 0 0 1
D 0 0 1 1 0
E 0 1 0 0 1
![Page 41: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/41.jpg)
7/25/11 CAP 5510 / CGS 5166 41
Maximum Parsimony Method: Example"
Characters/Sites
Sequences 1 2 3 4 5 6 7 8 9
1 A A G A G T T C A
2 A G C C G T T C T
3 A G A T A T C C A
4 A G A G A T C C T
![Page 42: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/42.jpg)
7/25/11 CAP 5510 / CGS 5166 42
Unrooted Trees on 4 Taxa"
A
D
C
B
A
D
B
C
A
B
C
D
![Page 43: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/43.jpg)
7/25/11 CAP 5510 / CGS 5166 43
1 2 3 4 5 6 7 8 9
1 A A G A G T T C A
2 A G C C G T T C T
3 A G A T A T C C A
4 A G A G A T C C T
1 2 3 4 5 6 7 8 9
1 A A G A G T T C A
2 A G C C G T T C T
3 A G A T A T C C A
4 A G A G A T C C T
1 2 3 4 5 6 7 8 9
1 A A G A G T T C A
2 A G C C G T T C T
3 A G A T A T C C A
4 A G A G A T C C T
1 2 3 4 5 6 7 8 9
1 A A G A G T T C A
2 A G C C G T T C T
3 A G A T A T C C A
4 A G A G A T C C T
![Page 44: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/44.jpg)
7/25/11 CAP 5510 / CGS 5166 44
Inferring nucleotides on internal nodes"
![Page 45: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/45.jpg)
7/25/11 CAP 5510 / CGS 5166 45
Searching for the Maximum Parsimony
Tree: Exhaustive Search"
![Page 46: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/46.jpg)
7/25/11 CAP 5510 / CGS 5166 46
Searching for the Maximum Parsimony Tree: Branch-&-Bound"
![Page 47: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/47.jpg)
7/25/11 CAP 5510 / CGS 5166 47
Probabilistic Models of Evolution"
Assuming a model of substitution, " Pr{Si(t+Δ) = Y |Si(t) = X},
Using this formula it is possible to compute the likelihood that data D is generated by a given phylogenetic tree T under a model of substitution. Now find the tree with the maximum likelihood.
X
Y
• Time elapsed? Δ • Prob of change along edge? Pr{Si(t+Δ) = Y |Si(t) = X} • Prob of data? Product of prob for all edges
![Page 48: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/48.jpg)
7/25/11 CAP 5510 / CGS 5166 48
Computing Maximum Likelihood
Tree"
![Page 49: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/49.jpg)
Basic Population Genetics" Allele: one of two or more forms of DNA sequence of a particular gene
" The word "allele" is a short form of allelomorph ('other form') Diploid: organisms with two sets of chromosomes
" Homozygous alleles: if both copies of the allele are the same " Heterozygous alleles
Alleles may be " Dominant: allele that is more often expressed in heterozygous individuals " Recessive
Genotype: set of alleles in an individual, i.e., genetic composition
7/25/11 CAP 5510 / CGS 5166 49
![Page 50: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/50.jpg)
Genetic Characters" Characters can be
" Mendelian, i.e., single-gene effects, OR " Polygenic, i.e., caused by combined effect of multiple genetic factors, OR " Environmental
Characters can be: " discrete (e.g., disease) or " continuous (e.g., height)
Gene loci involved in continuous characters are called Quantitative Trait Loci
7/25/11 CAP 5510 / CGS 5166 50
![Page 51: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/51.jpg)
Hardy-Weinberg Principle" G.H. Hardy & Wilhelm Weinberg (1908)
" Allele and genotype frequencies in a population remain constant.
" Assumptions: Diploid; sexual reproduction; non-overlapping generations Biallelic loci; Allele frequencies independent of gender Mating is random Population size is infinite Mutations can be ignored Migration is negligible Natural selection does not affect allele in question Equilibrium attained in one generation
7/25/11 CAP 5510 / CGS 5166 51
Females A (p) a (q)
Males A (p) AA (p2) Aa (pq) a (q) Aa (pq) aa (q2)
![Page 52: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/52.jpg)
Genetic Linkage" Meiosis: Cell division necessary for sexual reproduction
" Produces gametes like sperm and egg cells. Meiosis: Starts with one diploid cell with 2 copies of each chromosome
and produces four haploid cells, each with one copy of each chromosome. Each chromosome is recombined from the 2 copies. " At start of meiosis, chromosome pair recombine and exchange sections.
Then they separate into two chromosomes. " Recombination: alleles on same chromosome may end up in different
daughter cells " If two alleles are far apart, then there is a higher probability of a cross-
over event between them putting them on different chromosomes. " Genetically linked traits are caused by alleles sufficiently close to each
other. Used to produce genetic maps or linkage maps.
7/25/11 CAP 5510 / CGS 5166 52
![Page 53: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/53.jpg)
Linkage Disequilibrium (D)" D = Difference between observed and expected allelic frequencies Given 2 bi-allelic loci A and B
D = x11 – p1q1
7/25/11 CAP 5510 / CGS 5166 53
AB x11
Ab x12
aB x21
ab x22
Allele Frequency A P1 = x11 + x12
a P2 = x21 + x22
B q1 = x11 + x21
b q2 = x12 + x22
A a Total B x11 = p1q1 + D x11 = p2q1 – D q1 b x12 = p1q2 - D x22 = p2q2 + D q2
Total P1 P2 1
![Page 54: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/54.jpg)
Linkage Disequilibrium" Linkage (dis)equilibrium: when genotype at loci are (not) independent Assumptions of basic population genetics
" Transmission of alleles (across generations) at two loci are independent " Fitness of genotypes at different loci are independent
Both assumptions are not true in general There exists non-random associations of alleles at different loci The extent of these associations are measured by Linkage
Disequilibrium
7/25/11 CAP 5510 / CGS 5166 54
![Page 55: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/55.jpg)
SNPs" SNP: single nucleotide polymorphism
" Mutations in single nucleotide position " Occurred once in human history " Passed on through heredity " ~10M SNPs in human genome " 1 SNP every 300 bp, most with a frequency of 10-50%
Most variations within a population characterized by SNPs Want to correlate SNPs to human disease Genotype
" Gives bases at each SNP for both copies of chromosome, but loses information as to the chromosome on which it appears. NO LABEL!
Haplotype " Gives bases at each SNP for each chromosome. LABELED!
7/25/11 CAP 5510 / CGS 5166 55
![Page 56: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/56.jpg)
Genotype vs Haplotype" If the first locus is bi-allelic with two possible alleles (say, A & G)
" Genotypes: AA, GG, AG If a second bi-allelic locus has alleles G & C
" Genotypes: GG, CC, GC Genotypes & Haplotypes for the two loci are:
Interesting problem: " Given genotypes, resolve the haplotypes
7/25/11 CAP 5510 / CGS 5166 56
Haplotypes Second Locus
GG GC CC
First Locus
AA AG AG AG AC AC AC AG AG GG AG GC or AC GG AC GC GG GG GG GG GC GC GC
![Page 57: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/57.jpg)
Genome-wide Association Studies (GWAS)" To identify patterns of polymorphisms that vary systematically between
individuals with different disease states " To identify risk-enhancing or risk-decreasing alleles
Examples of GWAS (900 studies; 3500 associations) " Prostate Cancer: Nature Genetics, 1 Apr 2007 " Type 2 Diabetes: Science Express, 26 Apr 2007 " Heart Diseases: Science Express, 3 May 2007 " Breast Cancer, Nature & Nature Genetics, 27 May 2007 " … " See: http://www.genome.gov/Pages/About/OD/ReportsPublications/
GWASUpdateSlides-9-19-07.pdf Since variation is inherited in blocks / groups, it is enough to study a
sample of the population, instead of looking at the whole population. GWA databases at NIH: dbGaP, caBIG, and CGEMS
7/25/11 CAP 5510 / CGS 5166 57
![Page 58: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/58.jpg)
GWAS Process"
7/25/11 CAP 5510 / CGS 5166 58
![Page 59: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/59.jpg)
Analysis" Summary statistics for quality control
" Allele, genotypes frequencies, missing genotype rates, inbreeding stats, non-Mendelian transmission in family data, Sex checks based on X chromosome SNPs
Population stratification detection " Complete linkage hierarchical clustering " Multidimensional scaling analysis to visualise substructure " Significance test for whether two individuals belong to the same population
Association Testing: " Case vs Control
Standard allelic test, Fisher’s exact test, Cochran-Armitage trend test, Mantel-Haenszel and Breslow-Day tests for stratified samples, Dominant/recessive and general models, Model comparison tests
" Family-based associations " QTLs
… 7/25/11 CAP 5510 / CGS 5166 59
![Page 60: Giri Narasimhan - School of Computing and Information …giri/teach/qbic/Su11/Lec9.pdf · Giri Narasimhan ECS 254A; Phone: x3748 ... Slide by Pevsner ... Molecular Clock Hypothesis,](https://reader031.vdocuments.site/reader031/viewer/2022022510/5adbb9e37f8b9add658e4f86/html5/thumbnails/60.jpg)
Software" PLINK: for analysis of genotype, phenotype data EIGENSOFT: for population structure analysis IMPUTE, SNPTEST, MACH, ProbABEL, BimBam, QUICKTEST
7/25/11 CAP 5510 / CGS 5166 60