measuring genetic change level 3 molecular evolution and bioinformatics jim provan page and holmes:...
DESCRIPTION
Types of substitution (continued) Multiple substitutions can greatly obscure actual evolutionary history, particularly in cases where there have been many mutations i.e. over long evolutionary time scales Final three examples have serious implications for inference of evolutionary history: Similarity inherited from an ancestor is called homology Independently acquired similarity is called homoplasy All tree-building methods rely on sufficient levels of homologyTRANSCRIPT
![Page 1: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/1.jpg)
Measuring genetic Measuring genetic changechange
Level 3 Molecular Evolution and Level 3 Molecular Evolution and BioinformaticsBioinformatics
Jim ProvanJim Provan
Page and Holmes: Section 5.2Page and Holmes: Section 5.2
![Page 2: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/2.jpg)
AParallelParallel
2 changes, no difference2 changes, no difference
AA CCAA CC
ACoincidentalCoincidental
2 changes, 1 difference2 changes, 1 difference
AA CCAA GG
ASingleSingle
1 change, 1 difference1 change, 1 difference
AA CC
ABackBack
2 changes, no difference2 changes, no difference
AA CCCC AA
AConvergentConvergent
3 changes, no difference3 changes, no difference
AA CCCC TT AA TT
AMultipleMultiple
2 changes, 1 difference2 changes, 1 difference
AA CCCC TT
Types of substitutionTypes of substitution
A C A T G C
C C T T A A
![Page 3: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/3.jpg)
Types of substitution (continued)Types of substitution (continued)Multiple substitutionsMultiple substitutions can greatly obscure actual can greatly obscure actual evolutionary history, particularly in cases where evolutionary history, particularly in cases where there have been many mutations i.e. over long there have been many mutations i.e. over long evolutionary time scalesevolutionary time scalesFinal three examples have serious implications Final three examples have serious implications for inference of evolutionary history:for inference of evolutionary history:
Similarity inherited from an ancestor is called Similarity inherited from an ancestor is called homologyhomologyIndependently acquired similarity is called Independently acquired similarity is called homoplasyhomoplasy
All tree-building methods rely on sufficient levels All tree-building methods rely on sufficient levels of homologyof homology
![Page 4: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/4.jpg)
Types of substitution (continued)Types of substitution (continued)
Substitutions that Substitutions that exchange a purine exchange a purine for another purine or for another purine or a pyrimidine for a pyrimidine for another pyrimidine another pyrimidine are called are called transitionstransitions
A
TG
C
Substitutions that Substitutions that exchange a purine for a exchange a purine for a pyrimidine or vice-versa pyrimidine or vice-versa are called are called transversionstransversions
![Page 5: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/5.jpg)
Measuring evolutionary changeMeasuring evolutionary changeSimplest measure is to Simplest measure is to count number of count number of different sitesdifferent sitesPoor measure:Poor measure:
Some sites may Some sites may undergo repeated undergo repeated substitutionssubstitutionsAs sequences diverge, As sequences diverge, measure becomes less measure becomes less accurateaccurate 0
20
40
60
80
100
120
0 5 10 15 20 25
Time since divergence (Myr)Time since divergence (Myr)
Base
pai
r di
ffer
ence
sBa
se p
air
diff
eren
ces
SaturationSaturation occurs - most occurs - most sites changing have sites changing have changed beforechanged before
![Page 6: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/6.jpg)
TimeTime
Sequ
ence
diff
eren
ceSe
quen
ce d
iffer
ence
Correction of observed sequence Correction of observed sequence differencesdifferences
Observeddifference
Expected difference ‘‘Correction’Correction’
![Page 7: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/7.jpg)
A general framework of A general framework of sequence evolution modelssequence evolution models
PPtt = =
ppAAAA
ppCACA
ppGAGA
ppTATA
ppACAC
ppCCCC
ppGCGC
ppTCTC
ppAGAG
ppCGCG
ppGGGG
ppTGTG
ppATAT
ppCTCT
ppGTGT
ppTTTT
PPiiii = 1 - = 1 - ppijijjjii
ff = [ = [ffAA ffCC ffGG ffTT]]
![Page 8: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/8.jpg)
The Jukes-Cantor (JC) modelThe Jukes-Cantor (JC) modelAssumes that all four bases have equal frequencies Assumes that all four bases have equal frequencies and that all substitutions are equally likelyand that all substitutions are equally likely
PPtt ==
--
--
--
--
ff = [¼ ¼ ¼ ¼] = [¼ ¼ ¼ ¼]
![Page 9: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/9.jpg)
Kimura’s 2 parameter model Kimura’s 2 parameter model (K2P)(K2P)
Takes into account different Takes into account different frequencies of transitions frequencies of transitions vs. transversionsvs. transversions
PPtt ==
--
--
--
--
ff = [¼ ¼ ¼ ¼] = [¼ ¼ ¼ ¼]0
102030405060708090
100
0 5 10 15 20 25
Transitions ()
Transversions ()
![Page 10: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/10.jpg)
Felsenstein (1981) (F81)Felsenstein (1981) (F81)Takes into account Takes into account differences in base differences in base compositioncompositionPercentage (G + C) can Percentage (G + C) can range from 25% - 75%range from 25% - 75%F81 model allows the F81 model allows the frequencies of the four frequencies of the four nucleotides to be differentnucleotides to be differentDoes not allow for Does not allow for variation between variation between genes/speciesgenes/species ff = [ = [AA CC GG TT]]
PPtt ==
--
AA
AA
AA
CC
--
CC
CC
GG
GG
--
GG
TT
TT
TT
--
![Page 11: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/11.jpg)
Hasegawa, Kishino and Yano Hasegawa, Kishino and Yano (1985) (HKY85)(1985) (HKY85)
Essentially merges the K2P and F81 models to allow Essentially merges the K2P and F81 models to allow transitions and transversions to occur at different transitions and transversions to occur at different rates as well as allowing base frequencies to varyrates as well as allowing base frequencies to vary
ff = [ = [AA CC GG TT]]PPtt ==
--
AA
AA
AA
CC
--
CC
CC
GG
GG
--
GG
TT
TT
TT
--
![Page 12: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/12.jpg)
General reversible model (REV)General reversible model (REV)Most general model - each substitution has its Most general model - each substitution has its own probabilityown probability
ff = [ = [AA CC GG TT]]PPtt ==
--
AAaa
AAbb
AAcc
CCaa
--
CCdd
CCee
GGbb
GGdd
--
GGff
TTcc
TTee
TTff
--By constraining By constraining a-fa-f it is possible to generate all it is possible to generate all the other modelsthe other models
![Page 13: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/13.jpg)
Comparing the modelsComparing the models
JC
A=C=G=T
=
HKY85
ACGT
REV
ACGT
a,b,c,d,e,f
K2P
A=C=G=T
Allow transition/Allow transition/transversion biastransversion bias
Allow transition/Allow transition/transversion biastransversion bias
F81
ACGT
=
Allow baseAllow basefrequencies to varyfrequencies to vary
Allow baseAllow basefrequencies to varyfrequencies to vary
![Page 14: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/14.jpg)
Comparing the models Comparing the models (continued)(continued)
AA CC GG TT
AA
CC
GG
TT
ObservedObserved
AA CC GG TT
AA
CC
GG
TT
JCJC
AA CC GG TT
AA
CC
GG
TT
K2PK2P
AA CC GG TT
AA
CC
GG
TT
HKY85HKY85
![Page 15: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/15.jpg)
Assumptions: independenceAssumptions: independenceAssumes that change Assumes that change at one site has no at one site has no effect on other siteseffect on other sitesGood example is in RNA Good example is in RNA stem-loop structuresstem-loop structures
AA CC CC CC CC UU UU GGCCAA
UUGG GG GG GG GG AA AA
Substitution may result in Substitution may result in mismatched bases and mismatched bases and decreased stem stabilitydecreased stem stability
AA CC CC CC CC UU UU GGCCAA
UUGG GG GG GG CC AA AA
AA CC CC CC GG UU UU GGCCAA
UUGG GG GG GG CC AA AA
Compensatory changeCompensatory change may occur to restore may occur to restore Watson-Crick base pairingWatson-Crick base pairing
![Page 16: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/16.jpg)
Assumptions: base compositionAssumptions: base compositionAssumption that base Assumption that base composition is at composition is at equilibrium and that equilibrium and that it is similar across all it is similar across all taxa studiedtaxa studiedIn example opposite, In example opposite, trees inferred using trees inferred using models which do not models which do not allow for this will not allow for this will not group group ThermusThermus and and DeinococcusDeinococcus
AquifexAquifex
ThermotogaThermotoga
ThermusThermus
DeinococcusDeinococcus
OthersOthers
64.064.0
63.763.7
63.263.2
55.555.5
53.953.9
% G + C% G + C
![Page 17: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/17.jpg)
Assumptions: variation in Assumptions: variation in substitution rate across sitessubstitution rate across sitesAll sites are not All sites are not equally likely to equally likely to undergo a substitutionundergo a substitutionFunctional constraints:Functional constraints:
Pseudogenes have lost Pseudogenes have lost all function and can all function and can evolve freelyevolve freelyFourfold degenerate Fourfold degenerate sites do not change sites do not change amino acid composition amino acid composition of proteinsof proteinsNon-degenerate sites are Non-degenerate sites are highly constrainedhighly constrained
00.5
11.5
22.5
33.5
4
5’ flan
king r
egion
5’ flan
king r
egion
5’ un
trans
lated
regio
n
5’ un
trans
lated
regio
n
Non-de
gene
rate
site
s
Non-de
gene
rate
site
s
Twofo
ld de
gene
rate
site
s
Twofo
ld de
gene
rate
site
s
Four
fold d
egen
erat
e site
s
Four
fold d
egen
erat
e site
s
Intro
ns
Intro
ns
3’ un
trans
lated
regio
n
3’ un
trans
lated
regio
n
3’ fla
nking
regio
n
3’ fla
nking
regio
n
Pseu
doge
nes
Pseu
doge
nesSu
bsti
tuti
on /
site
/ 10
Subs
titu
tion
/ si
te /
1099 y
ears
yea
rs
![Page 18: Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2](https://reader035.vdocuments.site/reader035/viewer/2022081808/5a4d1b277f8b9ab059997735/html5/thumbnails/18.jpg)
Assumptions: variation in Assumptions: variation in substitution rate across sites substitution rate across sites
(continued)(continued)
More rapidly evolving sequence shows most divergence More rapidly evolving sequence shows most divergence initially but soon saturatesinitially but soon saturatesSequence A actually appears to be more rapidly evolvingSequence A actually appears to be more rapidly evolving
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 50 100 150 200 250
DN
A di
verg
ence
DN
A di
verg
ence
Divergence time (Myr)Divergence time (Myr)
A 0.5% / Myr + 20% constraint0.5% / Myr + 20% constraint
B 2% / Myr + 50% constraint2% / Myr + 50% constraint