1 patterns of substitution and replacement. 2 3

56
1 Patterns of Patterns of Substitution and Substitution and Replacement Replacement

Upload: daniel-hunter

Post on 26-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Patterns of Substitution and Replacement. 2 3

1

Patterns of Patterns of Substitution and Substitution and

ReplacementReplacement

Page 2: 1 Patterns of Substitution and Replacement. 2 3

2

To A To T To C To G

From A A to T A to C A to G

From T T to A T to C T to G

From C C to A C to T C to G

From G G to A G to T G to C

Page 3: 1 Patterns of Substitution and Replacement. 2 3

3

Page 4: 1 Patterns of Substitution and Replacement. 2 3

4

Page 5: 1 Patterns of Substitution and Replacement. 2 3

5

Page 6: 1 Patterns of Substitution and Replacement. 2 3

6

To A To T To C To G Row totals

From A3.4 ± 0.7(3.6 ± 0.7)

4.5 ± 0.8(4.8 ± 0.9)

12.5 ± 1.1(13.3 ± 1.1)

20.3(21.6)

From T3.3 ± 0.6(3.5 ± 0.6)

13.8 ± 1.9(14.7 ± 2.0)

3.3 ± 0.6(3.5 ± 0.6)

20.4(21.7)

From C4.2 ± 0.5(4.2 ± 0.5)

20.7 ± 1.3(16.4 ± 1.3)

4.6 ± 0.6(4.4 ± 0.6)

29.5(25.1)

From G20.4 ± 1.4(21.9 ± 1.5)

4.4 ± 0.6(4.6 ± 0.6)

4.9 ± 0.7(5.2 ± 0.8)

29.7(31.6)

Columntotals

27.9(29.5)

28.5(24.6)

23.2(23.2)

20.5(21.3)

Pattern of Substitution* in

Pseudogenes

*Based on a sample of 105 mammalian retropseudogenes.

Page 7: 1 Patterns of Substitution and Replacement. 2 3

7

To A To T To C To G Row totals

From A3.4 ± 0.7(3.6 ± 0.7)

4.5 ± 0.8(4.8 ± 0.9)

12.5 ± 1.1(13.3 ± 1.1)

20.3(21.6)

From T3.3 ± 0.6(3.5 ± 0.6)

13.8 ± 1.9(14.7 ± 2.0)

3.3 ± 0.6(3.5 ± 0.6)

20.4(21.7)

From C4.2 ± 0.5(4.2 ± 0.5)

20.7 ± 1.3(16.4 ± 1.3)

4.6 ± 0.6(4.4 ± 0.6)

29.5(25.1)

From G20.4 ± 1.4(21.9 ± 1.5)

4.4 ± 0.6(4.6 ± 0.6)

4.9 ± 0.7(5.2 ± 0.8)

29.7(31.6)

Columntotals

27.9(29.5)

28.5(24.6)

23.2(23.2)

20.5(21.3)

The sum of the relative frequencies of transitions is ~68%If all mutations occur with equal frequencies the expectation is 33%

Page 8: 1 Patterns of Substitution and Replacement. 2 3

8

To A To T To C To G Row totals

From A3.4 ± 0.7(3.6 ± 0.7)

4.5 ± 0.8(4.8 ± 0.9)

12.5 ± 1.1(13.3 ± 1.1)

20.3(21.6)

From T3.3 ± 0.6(3.5 ± 0.6)

13.8 ± 1.9(14.7 ± 2.0)

3.3 ± 0.6(3.5 ± 0.6)

20.4(21.7)

From C4.2 ± 0.5(4.2 ± 0.5)

20.7 ± 1.3(16.4 ± 1.3)

4.6 ± 0.6(4.4 ± 0.6)

29.5(25.1)

From G20.4 ± 1.4(21.9 ± 1.5)

4.4 ± 0.6(4.6 ± 0.6)

4.9 ± 0.7(5.2 ± 0.8)

29.7(31.6)

Columntotals

27.9(29.5)

28.5(24.6)

23.2(23.2)

20.5(21.3)

In the absence of selection, DNA will tend to become AT-rich

In comparison to the 50% expectation, 59.2%59.2% of all substitutions are from G of all substitutions are from G and Cand C, and 56.4%56.4% of all substitutions of all substitutions are to A and Tare to A and T.

Page 9: 1 Patterns of Substitution and Replacement. 2 3

9

To A To T To C To G Row totals

From A3.4 ± 0.7(3.6 ± 0.7)

4.5 ± 0.8(4.8 ± 0.9)

12.5 ± 1.1(13.3 ± 1.1)

20.3(21.6)

From T3.3 ± 0.6(3.5 ± 0.6)

13.8 ± 1.9(14.7 ± 2.0)

3.3 ± 0.6(3.5 ± 0.6)

20.4(21.7)

From C4.2 ± 0.5(4.2 ± 0.5)

20.7 ± 1.3(16.4 ± 1.3)

4.6 ± 0.6(4.4 ± 0.6)

29.5(25.1)

From G20.4 ± 1.4(21.9 ± 1.5)

4.4 ± 0.6(4.6 ± 0.6)

4.9 ± 0.7(5.2 ± 0.8)

29.7(31.6)

Columntotals

27.9(29.5)

28.5(24.6)

23.2(23.2)

20.5(21.3)

(CG dinucleotides

excluded)

Page 10: 1 Patterns of Substitution and Replacement. 2 3

10

Page 11: 1 Patterns of Substitution and Replacement. 2 3

11

Page 12: 1 Patterns of Substitution and Replacement. 2 3

12

Page 13: 1 Patterns of Substitution and Replacement. 2 3

13

To A To T To C To G Row totals

From A 0.4 1.1 14.1 15.6

From T 0.3 33.8 0.3 34.4

From C 1.1 25.8 0.5 27.4

From G 20.0 1.1 1.6 22.7

Column

totals21.4 27.3 36.5 14.9

Pattern of Substitution*

in mtDNA

*Based on 95 sequences from human and chimpanzee.

Page 14: 1 Patterns of Substitution and Replacement. 2 3

14

To A To T To C To G Row totals

From A 0.4 1.1 14.1 15.6

From T 0.3 33.8 0.3 34.4

From C 1.1 25.8 0.5 27.4

From G 20.0 1.1 1.6 22.7

Column

totals21.4 27.3 36.5 14.9

*Based on 95 sequences from human and chimpanzee.

The sum of the relative frequencies of transitions is ~94%If all mutations occur with equal frequencies the expectation is 33%

Page 15: 1 Patterns of Substitution and Replacement. 2 3

15

Mutations: Strand (Leading and Lagging) Effects

Page 16: 1 Patterns of Substitution and Replacement. 2 3

16

Possible inequalities between strands

A change from G to A actually means that a G:C pair is replaced by an A:T pair.

This can occur as a result of either a G mutating to A in the one strand or a C to T mutation in the complementary strand.

Similarly, a change from C to T can occur as a result of either a C mutating to T in one strand or a G mutating to A in the other.

Page 17: 1 Patterns of Substitution and Replacement. 2 3

17

Detection of Strand Detection of Strand Inequalities in Inequalities in Mutation RatesMutation Rates

• If G A on leading strand, then C T on lagging strand

• If G A on lagging strand,then C T on leading strand

• If G A on leading = G A on lagging,then G A = C T

Page 18: 1 Patterns of Substitution and Replacement. 2 3
Page 19: 1 Patterns of Substitution and Replacement. 2 3

19

If there are no If there are no differences in the differences in the mutation pattern mutation pattern between the two between the two strands, thenstrands, then

Page 20: 1 Patterns of Substitution and Replacement. 2 3

20

To A To T To C To G Row totals

From A 0.4 1.1 14.1 15.6

From T 0.3 33.8 0.3 34.4

From C 1.1 25.8 0.5 27.4

From G 20.0 1.1 1.6 22.7

Column

totals21.4 27.3 36.5 14.9

The transitional rate between pyrimidines (C, pyrimidines (C, T)T) is much higher than that between purines purines (G, A)(G, A), suggesting different patterns and rates of mutation between the two strands.

Is G A = C T?

Page 21: 1 Patterns of Substitution and Replacement. 2 3

21

Pattern Pattern of amino-of amino-

acid acid replacemereplaceme

ntnt

Page 22: 1 Patterns of Substitution and Replacement. 2 3

22

Physicochemical distances = measures for quantifying the dissimilarity between two amino acids.

Page 23: 1 Patterns of Substitution and Replacement. 2 3

23

Page 24: 1 Patterns of Substitution and Replacement. 2 3

24

Page 25: 1 Patterns of Substitution and Replacement. 2 3

25

Grantham’s physicochemical distances between pairs of amino acidsGrantham’s physicochemical distances between pairs of amino acids

Page 26: 1 Patterns of Substitution and Replacement. 2 3

26

The most similar amino acid pairs are leucine and isoleucine (Grantham's distance = 5) and leucine and methionine

(Grantham's distance = 15).

Page 27: 1 Patterns of Substitution and Replacement. 2 3

27

215215205205

202202

The The most most

dissimidissimilar lar

amino amino acid acid pairspairs

Page 28: 1 Patterns of Substitution and Replacement. 2 3

28

A replacement of an amino acid by a similar one (e.g., leucine to isoleucine) is called a conservative replacement.

A replacement of an amino acid by a dissimilar one (e.g., glycine to tryptophan) is called a radical replacement.

Page 29: 1 Patterns of Substitution and Replacement. 2 3

29

Empirical Empirical findings:findings:

During evolution, amino acidsamino acids are mostly are mostly replacedreplaced by similarsimilar ones.

Page 30: 1 Patterns of Substitution and Replacement. 2 3

30

Similar amino acidsSimilar amino acids

Dissimilar amino acidsDissimilar amino acids

A little

A lot

Page 31: 1 Patterns of Substitution and Replacement. 2 3

31

SimilarSimilar DissimilarDissimilar

Page 32: 1 Patterns of Substitution and Replacement. 2 3

32

Kimura 1985

Page 33: 1 Patterns of Substitution and Replacement. 2 3

33

Exchanges between similar Exchanges between similar structures occur frequently. structures occur frequently. Exchanges between Exchanges between dissimilar structures occur dissimilar structures occur rarely. rarely.

Nothing happens, but if it Nothing happens, but if it does, it doesn’t matter.does, it doesn’t matter.

Page 34: 1 Patterns of Substitution and Replacement. 2 3

34

Amino-acid exchangeability

Numbers in parentheses denote codon family for amino acids encoded by two codon families

60-90% of the amino-acid replacements involve the nearest or second nearest neighbors in the ring

Argyle’s exchangeability ring Argyle’s exchangeability ring

Page 35: 1 Patterns of Substitution and Replacement. 2 3

35

What protein properties are conserved in evolution?

Protein specific constraints:Protein specific constraints: The evolution of each protein-coding gene is constrained by the specific functional requirements of the protein it produces.

General constraints:General constraints: Are there general properties that are constrained during evolution in all proteins?

Page 36: 1 Patterns of Substitution and Replacement. 2 3

36

degree of conservation lowhigh

bulkiness(volume)

Page 37: 1 Patterns of Substitution and Replacement. 2 3

37

degree of conservation lowhigh

hydrophobicity

Page 38: 1 Patterns of Substitution and Replacement. 2 3

38

degree of conservation lowhigh

polarity

Page 39: 1 Patterns of Substitution and Replacement. 2 3

39

degree of conservation lowhigh

opticalrotation

Page 40: 1 Patterns of Substitution and Replacement. 2 3

40

degree of conservation lowhigh

chargeopticalrotation

surprise!

Page 41: 1 Patterns of Substitution and Replacement. 2 3

41

Page 42: 1 Patterns of Substitution and Replacement. 2 3

42

Page 43: 1 Patterns of Substitution and Replacement. 2 3

43

Page 44: 1 Patterns of Substitution and Replacement. 2 3

44

Page 45: 1 Patterns of Substitution and Replacement. 2 3

45

Page 46: 1 Patterns of Substitution and Replacement. 2 3

46

Page 47: 1 Patterns of Substitution and Replacement. 2 3

47

Amino-acid composition may be an important factor in determining rates of nucleotide substitution.

Page 48: 1 Patterns of Substitution and Replacement. 2 3

48

Most conserved amino acids:

GlycineGlycine is irreplaceable because of its small size.

LysineLysine is irreplaceable because of its involvement in amidine bonds that crosslink polypeptide chains

CysteineCysteine is irreplaceable because of its involvement in cystine bonds that crosslink polypeptide chains

Proline Proline is irreplaceable because of its contribution to the contortion of proteins.

Page 49: 1 Patterns of Substitution and Replacement. 2 3

49

Does the frequency Does the frequency of amino acids in of amino acids in proteins reflect proteins reflect “functional need” “functional need” or “availability”?or “availability”?

Page 50: 1 Patterns of Substitution and Replacement. 2 3

50

The frequencies of nucleotides in vertebrate mRNA are 22.0% uracil, 30.3% adenine, 21.7% cytosine, and 26.1% guanine.

Page 51: 1 Patterns of Substitution and Replacement. 2 3

51

The expected frequency of a particular codon can be calculated by multiplying the frequencies of each of the nucleotides comprising the codon.

Page 52: 1 Patterns of Substitution and Replacement. 2 3

52

The expected frequency of the amino acid can be calculated by adding the frequencies of each codon that codes for that amino acid.

Page 53: 1 Patterns of Substitution and Replacement. 2 3

53

For example, the codons for tyrosine are UAU and UAC, so the random expectation for its frequency is:

1.057[(0.220)(0.303)(0.220) + (0.220)(0.303)(0.217)] = 0.0309

Since 3 of the 64 codons are stop codons, this frequency for each amino acid is multiplied by a correction factor of 1.057.

Page 54: 1 Patterns of Substitution and Replacement. 2 3

54

By plotting the expected frequency against the observed frequency, we can see if some amino acids are occurring more or less often than expected by chance. If the observed and expected frequencies are close to equal, we would expect a regression line with a slope = 1.

Page 55: 1 Patterns of Substitution and Replacement. 2 3

55

Excluding arginine, the correlation between observed and expected frequencies was highly significant (r = 0.9). Arginine frequency seems to be affected by selection acting on one or more of its codons.

Page 56: 1 Patterns of Substitution and Replacement. 2 3

56

Conclusions (?)

•Amino acid frequencies are not determined by functional requirements. •Amino acid frequencies are determined by nucleotide composition and the number of codons for for each amino acid.