models of molecular evolution ii level 3 molecular evolution and bioinformatics jim provan page and...

26
Models of Molecular Models of Molecular Evolution II Evolution II Level 3 Molecular Evolution and Level 3 Molecular Evolution and Bioinformatics Bioinformatics Jim Provan Jim Provan Page and Holmes: Sections 7.3 – 7.4 Page and Holmes: Sections 7.3 – 7.4

Upload: betty-gordon

Post on 31-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Models of Molecular Models of Molecular Evolution IIEvolution II

Level 3 Molecular Evolution and Level 3 Molecular Evolution and BioinformaticsBioinformatics

Jim ProvanJim Provan

Page and Holmes: Sections 7.3 – 7.4Page and Holmes: Sections 7.3 – 7.4

Page 2: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Isochore structure of vertebrate Isochore structure of vertebrate genomesgenomes

Why do patterns of base composition – the Why do patterns of base composition – the frequencies of the four bases and of codons used to frequencies of the four bases and of codons used to specify amino acids – differ between genomes?specify amino acids – differ between genomes?

Mean G + C content in bacteria ranges from 25% to Mean G + C content in bacteria ranges from 25% to 75%, but there is little intragenome variation75%, but there is little intragenome variation

Genomes of vertebrates have a much greater Genomes of vertebrates have a much greater range of G + C values:range of G + C values:

Caused by continuous sections (> 300kb) each of which Caused by continuous sections (> 300kb) each of which has a uniform G + C content (has a uniform G + C content (isochoresisochores))

G + C content of isochores also varies between speciesG + C content of isochores also varies between species

Page 3: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Properties of vertebrate Properties of vertebrate isochoresisochores

G + C rich isochoresG + C rich isochores

Correlate with reverse Giesma (R) bandsCorrelate with reverse Giesma (R) bandsEarly replicatingEarly replicatingHigh density of genesHigh density of genesSINEs presentSINEs presentCpG islands in genesCpG islands in genesHigh G + C content at third codon positionHigh G + C content at third codon positionHigh frequency of retroviral sequencesHigh frequency of retroviral sequencesHigh frequency of chiasmataHigh frequency of chiasmata

A + T rich isochoresA + T rich isochores

Correlate with Giesma (G) bandsCorrelate with Giesma (G) bandsLate replicatingLate replicatingLow density of genes (only tissue specific)Low density of genes (only tissue specific)LINEs presentLINEs presentNo CpG islandsNo CpG islandsHigh A + T content at third codon positionHigh A + T content at third codon positionLow frequency of retroviral sequencesLow frequency of retroviral sequencesLow frequency of chiasmataLow frequency of chiasmata

Page 4: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Theories on the existence of Theories on the existence of isochoresisochores

Selectionist hypothesis of Bernardi Selectionist hypothesis of Bernardi et al.et al. suggests that GC-rich isochores predominantly suggests that GC-rich isochores predominantly found in warm-blooded vertebrates are an found in warm-blooded vertebrates are an adaptation to higher body temperature:adaptation to higher body temperature:

Extra hydrogen bond in G-C pair may lessen Extra hydrogen bond in G-C pair may lessen possibility of thermal damage to DNApossibility of thermal damage to DNADesert plants also have higher GC contentsDesert plants also have higher GC contents

Evidence for independent occurrence of Evidence for independent occurrence of isochores since birds and mammals do not isochores since birds and mammals do not share an immediate ancestorshare an immediate ancestorHowever, some thermophilic bacteria are AT-However, some thermophilic bacteria are AT-richrich

Page 5: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Theories on the existence of Theories on the existence of isochoresisochores

Neutralist explanation for the existence of Neutralist explanation for the existence of isochores is that they simply reflect variation in isochores is that they simply reflect variation in the process of mutation across the genomethe process of mutation across the genomeStudies on argininosuccinate synthetase Studies on argininosuccinate synthetase processed pseudogenes from anthropoid primates:processed pseudogenes from anthropoid primates:

Pseudogenes were derived from same functional Pseudogenes were derived from same functional ancestral gene but then inserted into different parts of ancestral gene but then inserted into different parts of the genomethe genomeDespite their common ancestry, they now differ in base Despite their common ancestry, they now differ in base compositioncompositionBecause pseudogenes are not subject to selection, Because pseudogenes are not subject to selection, differences in base composition must have been due to differences in base composition must have been due to regional variation in mutation patternsregional variation in mutation patterns

Page 6: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Why should mutation patterns Why should mutation patterns vary across genomes?vary across genomes?

Replication hypothesisReplication hypothesis suggests that genes which suggests that genes which replicate earlier in the cell cycle are more GC-rich replicate earlier in the cell cycle are more GC-rich than those which replicate later:than those which replicate later:

Believed to be due to the fact that G and C precursor Believed to be due to the fact that G and C precursor pools of dNTPs are larger at this time – errors are more pools of dNTPs are larger at this time – errors are more likely to incorporate G or Clikely to incorporate G or C

Repair hypothesisRepair hypothesis is based on assumption that is based on assumption that efficiency of DNA repair varies across genome:efficiency of DNA repair varies across genome:

May be an outcome of transcriptionally active areas May be an outcome of transcriptionally active areas being repaired more efficientlybeing repaired more efficientlyCpG islands are maintained by a special repair system – CpG islands are maintained by a special repair system – efficiency of DNA replication may be dependent on efficiency of DNA replication may be dependent on locationlocation

Page 7: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Why should mutation patterns Why should mutation patterns vary across genomes?vary across genomes?

Recombination hypothesisRecombination hypothesis claims that isochore claims that isochore structure of vertebrate genomes is the outcome of structure of vertebrate genomes is the outcome of differences in the pattern and frequency of differences in the pattern and frequency of recombination:recombination:

Low GC localities will be associated with regions of reduced Low GC localities will be associated with regions of reduced recombination:recombination:

— Genes with low rates of recombination have low GC valuesGenes with low rates of recombination have low GC values— The large, non-recombining region of the Y-chromosome has a The large, non-recombining region of the Y-chromosome has a

low GC compositionlow GC composition

Fact that recombination plays such a large part in the Fact that recombination plays such a large part in the structuring of eukaryote genomes makes this an attractive structuring of eukaryote genomes makes this an attractive hypothesishypothesis

Although the relative contributions of these Although the relative contributions of these hypotheses are still unclear, the neutralist hypotheses are still unclear, the neutralist interpretation seems more likelyinterpretation seems more likely

Page 8: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Codon usageCodon usage

0

10

20

30

40

50

60

0

10

20

30

40

50

60

CGACGA

CGCCGC

CGGCGG

CGUCGU

AGAAGA

AGGAGG

CUACUA

CACCAC

CUGCUG

CUUCUU

UUAUUA

UUGUUG

E. coliE. coli

HumanHuman

ARGARGLEULEU

Page 9: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

What determines codon usage?What determines codon usage?

Degeneracy of genetic code:Degeneracy of genetic code:Null hypothesis is that all codons for a particular Null hypothesis is that all codons for a particular amino acid are used with equal frequencyamino acid are used with equal frequency

Refuted when nucleotide sequences became Refuted when nucleotide sequences became available for a wide range of organismsavailable for a wide range of organisms

Selectionist argument:Selectionist argument:Highly expressed genes show most codon bias Highly expressed genes show most codon bias because they require more translational efficiency: because they require more translational efficiency: coevolution of tRNAs and codonscoevolution of tRNAs and codons

Also supports the neutralist prediction of a Also supports the neutralist prediction of a relationship between functional constraint and relationship between functional constraint and substitution ratesubstitution rate

Page 10: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Gene expression and codon biasGene expression and codon bias

Highly expressedHighly expressedgenesgenes

Strong selection forStrong selection fortranslational efficiencytranslational efficiency

RestrictedRestrictedtRNAs usedtRNAs used

Strong codon biasStrong codon bias

Low rate ofLow rate ofsynonymous substitutionsynonymous substitution(few neutral mutations)(few neutral mutations)

Lowly expressedLowly expressedgenesgenes

Weak selection forWeak selection fortranslational efficiencytranslational efficiency

MoreMoretRNAs usedtRNAs used

Weak codon biasWeak codon bias

High rate ofHigh rate ofsynonymous substitutionsynonymous substitution(many neutral mutations)(many neutral mutations)

Page 11: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

The molecular clockThe molecular clock

Idea of a molecular clock is central to the Idea of a molecular clock is central to the neutralist theory, since it demonstrates the neutralist theory, since it demonstrates the constancy of the underlying neutral mutation rateconstancy of the underlying neutral mutation ratePrevious example of Previous example of -globin-globinDoes not imply that all genes and proteins evolve Does not imply that all genes and proteins evolve at the same rate:at the same rate:

Great variation between proteins (fibrinonectins vs. Great variation between proteins (fibrinonectins vs. histones)histones)Variation in rate among genes and proteins is compatible Variation in rate among genes and proteins is compatible with the neutral theory if the underlying cause is with the neutral theory if the underlying cause is changes in selective constraintchanges in selective constraintKey question concerning the validity of a molecular clock Key question concerning the validity of a molecular clock is whether rates of substitution are constant is whether rates of substitution are constant withinwithin genes across evolutionary timegenes across evolutionary time

Page 12: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Neutral theory and the Neutral theory and the molecular clockmolecular clock

Rate of nucleotide substitution (fixation) at any Rate of nucleotide substitution (fixation) at any site per year, site per year, kk, in a diploid population of size , in a diploid population of size 22NN is equal to the number of new mutations is equal to the number of new mutations (neutral, deleterious or advantageous) arising (neutral, deleterious or advantageous) arising per year, per year, , multiplied by their probability of , multiplied by their probability of fixation, fixation, uu::

kk = 2 = 2N N uu

For a neutral mutation, probability of fixation is For a neutral mutation, probability of fixation is reciprocal of population size:reciprocal of population size:

uu = 1/2 = 1/2NN

So substitution rate for a neutral mutation is:So substitution rate for a neutral mutation is:

kk = (2 = (2N N )(1/2)(1/2N N ))

Page 13: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Neutral theory and the Neutral theory and the molecular clock (continued)molecular clock (continued)

Parameters for population size (2Parameters for population size (2NN) cancel out, ) cancel out, leaving:leaving:

k k = = One of the most important formulae in molecular One of the most important formulae in molecular evolution – means that rate of substitution in evolution – means that rate of substitution in neutral mutations is dependent only on neutral mutations is dependent only on underlying mutation rate and is independent of underlying mutation rate and is independent of other factors such as population sizeother factors such as population size

Also holds for mutants with a very weak Also holds for mutants with a very weak selective advantage e.g. selective advantage e.g. s s < 1/2< 1/2NNee

Page 14: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Substitution of selectively Substitution of selectively advantageous mutationsadvantageous mutations

Probability of fixation is roughly twice the selection Probability of fixation is roughly twice the selection coefficient:coefficient:

uu = 2 = 2sNsNee//NNSubstituting this into the original equation, we get:Substituting this into the original equation, we get:

kk = 4 = 4NNeessIn this case, substitution rate for an advantageous In this case, substitution rate for an advantageous mutation also depends on population size and mutation also depends on population size and magnitude of selective advantagemagnitude of selective advantageFor natural selection to produce a molecular clock, For natural selection to produce a molecular clock, it is necessary for it is necessary for NNee, , ss and and (combination of (combination of ecological, mutational and selective events) to be ecological, mutational and selective events) to be the same across evolutionary time – highly unlikely!the same across evolutionary time – highly unlikely!

Page 15: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Constancy of the molecular Constancy of the molecular clockclock

Neutral theory predicted a molecular clock and Neutral theory predicted a molecular clock and first protein sequence data appeared to first protein sequence data appeared to confirm this: led Kimura to cite this as the best confirm this: led Kimura to cite this as the best evidence for neutralityevidence for neutrality

As more comparative sequence data became As more comparative sequence data became available, particularly from mammals, available, particularly from mammals, examples of rate variation began to appearexamples of rate variation began to appear

Debate arose concerning the constancy of the Debate arose concerning the constancy of the molecular clockmolecular clock

Page 16: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Testing the molecular clockTesting the molecular clock

Dispersion index Dispersion index R(t)R(t): test whether there is : test whether there is more rate variation between lineages than more rate variation between lineages than expected under a Poisson process:expected under a Poisson process:

If the data fit a Poisson process, variance in number If the data fit a Poisson process, variance in number of substitutions between lineages should be no of substitutions between lineages should be no greater than the mean numbergreater than the mean number

If the data fit a Poisson process then If the data fit a Poisson process then R(t)R(t) = 1.0, if not = 1.0, if not then then R(t)R(t) > 1.0 and the clock is said to be > 1.0 and the clock is said to be overdispersedoverdispersed

A star phylogeny should be used, since any A star phylogeny should be used, since any phylogenetic structure will complicate the phylogenetic structure will complicate the calculations (e.g. placental mammals)calculations (e.g. placental mammals)

Page 17: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Testing the molecular clockTesting the molecular clock

Mammalian protein data presented a serious problem Mammalian protein data presented a serious problem for neutralistsfor neutralistsProblems most likely due to inaccuracies in Problems most likely due to inaccuracies in phylogenies:phylogenies:

““Outlier” in data was guinea pigOutlier” in data was guinea pigGuinea pig is much more divergent than previously thoughtGuinea pig is much more divergent than previously thought

ProteinProtein

Haemoglobin Haemoglobin Haemoglobin Haemoglobin MyoglobinMyoglobinCytochrome Cytochrome ccRibonucleaseRibonuclease-Crystallin-Crystallin

Species (Species (nn))

666666444466

Amino acidsAmino acids

141141146146153153104104123123175175

R(t)R(t)

1.171.173.043.041.601.603.223.222.152.152.712.71

Page 18: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

The relative rate testThe relative rate test

The The relative rate testrelative rate test compares the difference between compares the difference between the numbers of substitutions between two closely the numbers of substitutions between two closely related taxa in comparison with a third, more distantly related taxa in comparison with a third, more distantly related outgrouprelated outgroup

If A and B have If A and B have evolved according to evolved according to a molecular clock, a molecular clock, both should be both should be equidistant from Cequidistant from C

ddACAC = = ddBCBC

A and B must be A and B must be closest relatives and closest relatives and C must not be too far C must not be too far removedremoved

A B C

X

Page 19: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

The relative rate testThe relative rate test

Synonymous sites in nine Synonymous sites in nine nuclear genes (3520 bp):nuclear genes (3520 bp):

dd1212 = 6.7 = 6.7

dd13 13 – – dd2323 = 2.3 = 2.3 ± 0.6± 0.6

-globin pseudogene (1827 -globin pseudogene (1827 bp):bp):

dd1212 = 7.9 = 7.9

dd13 13 – – dd2323 = 1.5 = 1.5 ± 0.4± 0.4

Three introns (3376 bp):Three introns (3376 bp):dd1212 = 6.9 = 6.9

dd13 13 – – dd2323 = 1.0 = 1.0 ± 0.5± 0.5

Two flanking regions (936 bp):Two flanking regions (936 bp):dd1212 = 7.9 = 7.9

dd13 13 – – dd2323 = 3.1 = 3.1 ± 1.1± 1.1

11 22 33

Old WorldOld Worldmonkeymonkey HumanHuman

New WorldNew Worldmonkeymonkey

Page 20: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Lineage effects and the molecular Lineage effects and the molecular clockclock

Substitution rate varies with underlying neutral Substitution rate varies with underlying neutral mutation rate: mutation rate: k = k = Three ways for rates to vary between species:Three ways for rates to vary between species:

Differences in generation timeDifferences in generation timeDifferences in metabolic rateDifferences in metabolic rateDifferences in efficiency of DNA repairDifferences in efficiency of DNA repair

These are known as These are known as lineage effectslineage effects: neutralists : neutralists believe that lineage effects alone can account for believe that lineage effects alone can account for all variation in molecular clockall variation in molecular clockSelectionists believe that genes also show rate Selectionists believe that genes also show rate variation due to other, selection-driven factors variation due to other, selection-driven factors ((residue effectsresidue effects))

Page 21: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Generation time and the Generation time and the molecular clockmolecular clock

Tim

eTim

e

Page 22: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Generation time and the Generation time and the molecular clockmolecular clock

At the molecular level, generation time (At the molecular level, generation time (gg) can ) can be defined as time it takes for germ-line DNA to be defined as time it takes for germ-line DNA to replicate i.e. from one gamete to the nextreplicate i.e. from one gamete to the nextSince most mutations occur at this point, rate of Since most mutations occur at this point, rate of substitution under neutral theory is a function of substitution under neutral theory is a function of both mutation rate and generation time:both mutation rate and generation time:

kk = = //ggGeneral conclusion from molecular data is that General conclusion from molecular data is that the clock is generation time dependent at silent the clock is generation time dependent at silent sites and in non-coding DNA:sites and in non-coding DNA:

Silent rates in orang-utan, gorilla and chimp are 1.3-, Silent rates in orang-utan, gorilla and chimp are 1.3-, 2.2- and 1.2-fold faster than in humans, which matches 2.2- and 1.2-fold faster than in humans, which matches differences in generation timesdifferences in generation times

Page 23: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

The metabolic rate hypothesisThe metabolic rate hypothesis

In sharks, rate of silent change is five- to sevenfold In sharks, rate of silent change is five- to sevenfold lower than in primates and ungulates which have lower than in primates and ungulates which have similar generation times:similar generation times:

Led to the hypothesis that differences in molecular rate are Led to the hypothesis that differences in molecular rate are a better explanation for differences in mutation rates than a better explanation for differences in mutation rates than differences in generation time (differences in generation time (metabolic rate hypothesismetabolic rate hypothesis))States that organisms with high metabolic rates have States that organisms with high metabolic rates have higher levels of DNA synthesishigher levels of DNA synthesis

Two pieces of mitochondrial DNA evidence support Two pieces of mitochondrial DNA evidence support this:this:

Small bodied animals, which have higher metabolic rates, Small bodied animals, which have higher metabolic rates, tend to have higher mutation ratestend to have higher mutation ratesWarm-blooded animals also have higher mutation rates Warm-blooded animals also have higher mutation rates than cold-blooded animalsthan cold-blooded animals

Page 24: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

Relationship between body mass Relationship between body mass and sequence evolutionand sequence evolution

0.010.01 0.10.1 11 1010 100100 10001000 10,00010,000 100,000100,0000.10.1

11

1010

% s

equence

div

erg

ence

per

Myr

% s

equence

div

erg

ence

per

Myr

Body mass (kg)Body mass (kg)

Rodents

GeeseDogs

Primates HorsesBears

WhalesNewts

Frogs

Tortoises

TortoisesSalmon

Sea turtles Sharks

Page 25: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

DNA repair and mutationDNA repair and mutation

DNADNA

DirectDirectdamagedamage

ReplicationReplicationerrorserrors

RepairRepair IncorrectlyIncorrectlyrepairedrepaired

CorrectlyCorrectlyrepairedrepaired

MutationMutation

Page 26: Models of Molecular Evolution II Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.3 – 7.4

DNA repair and mutationDNA repair and mutation

Repair mechanisms are extremely complex and Repair mechanisms are extremely complex and there are many repair pathwaysthere are many repair pathwaysThere is some evidence supporting the hypothesis There is some evidence supporting the hypothesis that DNA repair influences mutation rate:that DNA repair influences mutation rate:

Evidence that highly transcribed genes are more Evidence that highly transcribed genes are more efficiently repairedefficiently repairedBase composition and substitution rates at silent sites in Base composition and substitution rates at silent sites in mammalian genes tends to be gene- rather than species-mammalian genes tends to be gene- rather than species-specific: suggests that homologous genes are transcribed specific: suggests that homologous genes are transcribed and repaired in a similar mannerand repaired in a similar manner

Conversely, closely related species such as Conversely, closely related species such as hominind primates, which share very similar repair hominind primates, which share very similar repair mechanisms, can exhibit greatly differing mechanisms, can exhibit greatly differing substitution ratessubstitution rates