1
Patterns of Patterns of Substitution and Substitution and
ReplacementReplacement
2
To A To T To C To G
From A A to T A to C A to G
From T T to A T to C T to G
From C C to A C to T C to G
From G G to A G to T G to C
3
4
5
6
To A To T To C To G Row totals
From A3.4 ± 0.7(3.6 ± 0.7)
4.5 ± 0.8(4.8 ± 0.9)
12.5 ± 1.1(13.3 ± 1.1)
20.3(21.6)
From T3.3 ± 0.6(3.5 ± 0.6)
13.8 ± 1.9(14.7 ± 2.0)
3.3 ± 0.6(3.5 ± 0.6)
20.4(21.7)
From C4.2 ± 0.5(4.2 ± 0.5)
20.7 ± 1.3(16.4 ± 1.3)
4.6 ± 0.6(4.4 ± 0.6)
29.5(25.1)
From G20.4 ± 1.4(21.9 ± 1.5)
4.4 ± 0.6(4.6 ± 0.6)
4.9 ± 0.7(5.2 ± 0.8)
29.7(31.6)
Columntotals
27.9(29.5)
28.5(24.6)
23.2(23.2)
20.5(21.3)
Pattern of Substitution* in
Pseudogenes
*Based on a sample of 105 mammalian retropseudogenes.
7
To A To T To C To G Row totals
From A3.4 ± 0.7(3.6 ± 0.7)
4.5 ± 0.8(4.8 ± 0.9)
12.5 ± 1.1(13.3 ± 1.1)
20.3(21.6)
From T3.3 ± 0.6(3.5 ± 0.6)
13.8 ± 1.9(14.7 ± 2.0)
3.3 ± 0.6(3.5 ± 0.6)
20.4(21.7)
From C4.2 ± 0.5(4.2 ± 0.5)
20.7 ± 1.3(16.4 ± 1.3)
4.6 ± 0.6(4.4 ± 0.6)
29.5(25.1)
From G20.4 ± 1.4(21.9 ± 1.5)
4.4 ± 0.6(4.6 ± 0.6)
4.9 ± 0.7(5.2 ± 0.8)
29.7(31.6)
Columntotals
27.9(29.5)
28.5(24.6)
23.2(23.2)
20.5(21.3)
The sum of the relative frequencies of transitions is ~68%If all mutations occur with equal frequencies the expectation is 33%
8
To A To T To C To G Row totals
From A3.4 ± 0.7(3.6 ± 0.7)
4.5 ± 0.8(4.8 ± 0.9)
12.5 ± 1.1(13.3 ± 1.1)
20.3(21.6)
From T3.3 ± 0.6(3.5 ± 0.6)
13.8 ± 1.9(14.7 ± 2.0)
3.3 ± 0.6(3.5 ± 0.6)
20.4(21.7)
From C4.2 ± 0.5(4.2 ± 0.5)
20.7 ± 1.3(16.4 ± 1.3)
4.6 ± 0.6(4.4 ± 0.6)
29.5(25.1)
From G20.4 ± 1.4(21.9 ± 1.5)
4.4 ± 0.6(4.6 ± 0.6)
4.9 ± 0.7(5.2 ± 0.8)
29.7(31.6)
Columntotals
27.9(29.5)
28.5(24.6)
23.2(23.2)
20.5(21.3)
In the absence of selection, DNA will tend to become AT-rich
In comparison to the 50% expectation, 59.2%59.2% of all substitutions are from G of all substitutions are from G and Cand C, and 56.4%56.4% of all substitutions of all substitutions are to A and Tare to A and T.
9
To A To T To C To G Row totals
From A3.4 ± 0.7(3.6 ± 0.7)
4.5 ± 0.8(4.8 ± 0.9)
12.5 ± 1.1(13.3 ± 1.1)
20.3(21.6)
From T3.3 ± 0.6(3.5 ± 0.6)
13.8 ± 1.9(14.7 ± 2.0)
3.3 ± 0.6(3.5 ± 0.6)
20.4(21.7)
From C4.2 ± 0.5(4.2 ± 0.5)
20.7 ± 1.3(16.4 ± 1.3)
4.6 ± 0.6(4.4 ± 0.6)
29.5(25.1)
From G20.4 ± 1.4(21.9 ± 1.5)
4.4 ± 0.6(4.6 ± 0.6)
4.9 ± 0.7(5.2 ± 0.8)
29.7(31.6)
Columntotals
27.9(29.5)
28.5(24.6)
23.2(23.2)
20.5(21.3)
(CG dinucleotides
excluded)
10
11
12
13
To A To T To C To G Row totals
From A 0.4 1.1 14.1 15.6
From T 0.3 33.8 0.3 34.4
From C 1.1 25.8 0.5 27.4
From G 20.0 1.1 1.6 22.7
Column
totals21.4 27.3 36.5 14.9
Pattern of Substitution*
in mtDNA
*Based on 95 sequences from human and chimpanzee.
14
To A To T To C To G Row totals
From A 0.4 1.1 14.1 15.6
From T 0.3 33.8 0.3 34.4
From C 1.1 25.8 0.5 27.4
From G 20.0 1.1 1.6 22.7
Column
totals21.4 27.3 36.5 14.9
*Based on 95 sequences from human and chimpanzee.
The sum of the relative frequencies of transitions is ~94%If all mutations occur with equal frequencies the expectation is 33%
15
Mutations: Strand (Leading and Lagging) Effects
16
Possible inequalities between strands
A change from G to A actually means that a G:C pair is replaced by an A:T pair.
This can occur as a result of either a G mutating to A in the one strand or a C to T mutation in the complementary strand.
Similarly, a change from C to T can occur as a result of either a C mutating to T in one strand or a G mutating to A in the other.
17
Detection of Strand Detection of Strand Inequalities in Inequalities in Mutation RatesMutation Rates
• If G A on leading strand, then C T on lagging strand
• If G A on lagging strand,then C T on leading strand
• If G A on leading = G A on lagging,then G A = C T
19
If there are no If there are no differences in the differences in the mutation pattern mutation pattern between the two between the two strands, thenstrands, then
20
To A To T To C To G Row totals
From A 0.4 1.1 14.1 15.6
From T 0.3 33.8 0.3 34.4
From C 1.1 25.8 0.5 27.4
From G 20.0 1.1 1.6 22.7
Column
totals21.4 27.3 36.5 14.9
The transitional rate between pyrimidines (C, pyrimidines (C, T)T) is much higher than that between purines purines (G, A)(G, A), suggesting different patterns and rates of mutation between the two strands.
Is G A = C T?
21
Pattern Pattern of amino-of amino-
acid acid replacemereplaceme
ntnt
22
Physicochemical distances = measures for quantifying the dissimilarity between two amino acids.
23
24
25
Grantham’s physicochemical distances between pairs of amino acidsGrantham’s physicochemical distances between pairs of amino acids
26
The most similar amino acid pairs are leucine and isoleucine (Grantham's distance = 5) and leucine and methionine
(Grantham's distance = 15).
27
215215205205
202202
The The most most
dissimidissimilar lar
amino amino acid acid pairspairs
28
A replacement of an amino acid by a similar one (e.g., leucine to isoleucine) is called a conservative replacement.
A replacement of an amino acid by a dissimilar one (e.g., glycine to tryptophan) is called a radical replacement.
29
Empirical Empirical findings:findings:
During evolution, amino acidsamino acids are mostly are mostly replacedreplaced by similarsimilar ones.
30
Similar amino acidsSimilar amino acids
Dissimilar amino acidsDissimilar amino acids
A little
A lot
31
SimilarSimilar DissimilarDissimilar
32
Kimura 1985
33
Exchanges between similar Exchanges between similar structures occur frequently. structures occur frequently. Exchanges between Exchanges between dissimilar structures occur dissimilar structures occur rarely. rarely.
Nothing happens, but if it Nothing happens, but if it does, it doesn’t matter.does, it doesn’t matter.
34
Amino-acid exchangeability
Numbers in parentheses denote codon family for amino acids encoded by two codon families
60-90% of the amino-acid replacements involve the nearest or second nearest neighbors in the ring
Argyle’s exchangeability ring Argyle’s exchangeability ring
35
What protein properties are conserved in evolution?
Protein specific constraints:Protein specific constraints: The evolution of each protein-coding gene is constrained by the specific functional requirements of the protein it produces.
General constraints:General constraints: Are there general properties that are constrained during evolution in all proteins?
36
degree of conservation lowhigh
bulkiness(volume)
37
degree of conservation lowhigh
hydrophobicity
38
degree of conservation lowhigh
polarity
39
degree of conservation lowhigh
opticalrotation
40
degree of conservation lowhigh
chargeopticalrotation
surprise!
41
42
43
44
45
46
47
Amino-acid composition may be an important factor in determining rates of nucleotide substitution.
48
Most conserved amino acids:
GlycineGlycine is irreplaceable because of its small size.
LysineLysine is irreplaceable because of its involvement in amidine bonds that crosslink polypeptide chains
CysteineCysteine is irreplaceable because of its involvement in cystine bonds that crosslink polypeptide chains
Proline Proline is irreplaceable because of its contribution to the contortion of proteins.
49
Does the frequency Does the frequency of amino acids in of amino acids in proteins reflect proteins reflect “functional need” “functional need” or “availability”?or “availability”?
50
The frequencies of nucleotides in vertebrate mRNA are 22.0% uracil, 30.3% adenine, 21.7% cytosine, and 26.1% guanine.
51
The expected frequency of a particular codon can be calculated by multiplying the frequencies of each of the nucleotides comprising the codon.
52
The expected frequency of the amino acid can be calculated by adding the frequencies of each codon that codes for that amino acid.
53
For example, the codons for tyrosine are UAU and UAC, so the random expectation for its frequency is:
1.057[(0.220)(0.303)(0.220) + (0.220)(0.303)(0.217)] = 0.0309
Since 3 of the 64 codons are stop codons, this frequency for each amino acid is multiplied by a correction factor of 1.057.
54
By plotting the expected frequency against the observed frequency, we can see if some amino acids are occurring more or less often than expected by chance. If the observed and expected frequencies are close to equal, we would expect a regression line with a slope = 1.
55
Excluding arginine, the correlation between observed and expected frequencies was highly significant (r = 0.9). Arginine frequency seems to be affected by selection acting on one or more of its codons.
56
Conclusions (?)
•Amino acid frequencies are not determined by functional requirements. •Amino acid frequencies are determined by nucleotide composition and the number of codons for for each amino acid.