1 patterns of substitution and replacement. 2 3

Post on 26-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Patterns of Patterns of Substitution and Substitution and

ReplacementReplacement

2

To A To T To C To G

From A A to T A to C A to G

From T T to A T to C T to G

From C C to A C to T C to G

From G G to A G to T G to C

3

4

5

6

To A To T To C To G Row totals

From A3.4 ± 0.7(3.6 ± 0.7)

4.5 ± 0.8(4.8 ± 0.9)

12.5 ± 1.1(13.3 ± 1.1)

20.3(21.6)

From T3.3 ± 0.6(3.5 ± 0.6)

13.8 ± 1.9(14.7 ± 2.0)

3.3 ± 0.6(3.5 ± 0.6)

20.4(21.7)

From C4.2 ± 0.5(4.2 ± 0.5)

20.7 ± 1.3(16.4 ± 1.3)

4.6 ± 0.6(4.4 ± 0.6)

29.5(25.1)

From G20.4 ± 1.4(21.9 ± 1.5)

4.4 ± 0.6(4.6 ± 0.6)

4.9 ± 0.7(5.2 ± 0.8)

29.7(31.6)

Columntotals

27.9(29.5)

28.5(24.6)

23.2(23.2)

20.5(21.3)

Pattern of Substitution* in

Pseudogenes

*Based on a sample of 105 mammalian retropseudogenes.

7

To A To T To C To G Row totals

From A3.4 ± 0.7(3.6 ± 0.7)

4.5 ± 0.8(4.8 ± 0.9)

12.5 ± 1.1(13.3 ± 1.1)

20.3(21.6)

From T3.3 ± 0.6(3.5 ± 0.6)

13.8 ± 1.9(14.7 ± 2.0)

3.3 ± 0.6(3.5 ± 0.6)

20.4(21.7)

From C4.2 ± 0.5(4.2 ± 0.5)

20.7 ± 1.3(16.4 ± 1.3)

4.6 ± 0.6(4.4 ± 0.6)

29.5(25.1)

From G20.4 ± 1.4(21.9 ± 1.5)

4.4 ± 0.6(4.6 ± 0.6)

4.9 ± 0.7(5.2 ± 0.8)

29.7(31.6)

Columntotals

27.9(29.5)

28.5(24.6)

23.2(23.2)

20.5(21.3)

The sum of the relative frequencies of transitions is ~68%If all mutations occur with equal frequencies the expectation is 33%

8

To A To T To C To G Row totals

From A3.4 ± 0.7(3.6 ± 0.7)

4.5 ± 0.8(4.8 ± 0.9)

12.5 ± 1.1(13.3 ± 1.1)

20.3(21.6)

From T3.3 ± 0.6(3.5 ± 0.6)

13.8 ± 1.9(14.7 ± 2.0)

3.3 ± 0.6(3.5 ± 0.6)

20.4(21.7)

From C4.2 ± 0.5(4.2 ± 0.5)

20.7 ± 1.3(16.4 ± 1.3)

4.6 ± 0.6(4.4 ± 0.6)

29.5(25.1)

From G20.4 ± 1.4(21.9 ± 1.5)

4.4 ± 0.6(4.6 ± 0.6)

4.9 ± 0.7(5.2 ± 0.8)

29.7(31.6)

Columntotals

27.9(29.5)

28.5(24.6)

23.2(23.2)

20.5(21.3)

In the absence of selection, DNA will tend to become AT-rich

In comparison to the 50% expectation, 59.2%59.2% of all substitutions are from G of all substitutions are from G and Cand C, and 56.4%56.4% of all substitutions of all substitutions are to A and Tare to A and T.

9

To A To T To C To G Row totals

From A3.4 ± 0.7(3.6 ± 0.7)

4.5 ± 0.8(4.8 ± 0.9)

12.5 ± 1.1(13.3 ± 1.1)

20.3(21.6)

From T3.3 ± 0.6(3.5 ± 0.6)

13.8 ± 1.9(14.7 ± 2.0)

3.3 ± 0.6(3.5 ± 0.6)

20.4(21.7)

From C4.2 ± 0.5(4.2 ± 0.5)

20.7 ± 1.3(16.4 ± 1.3)

4.6 ± 0.6(4.4 ± 0.6)

29.5(25.1)

From G20.4 ± 1.4(21.9 ± 1.5)

4.4 ± 0.6(4.6 ± 0.6)

4.9 ± 0.7(5.2 ± 0.8)

29.7(31.6)

Columntotals

27.9(29.5)

28.5(24.6)

23.2(23.2)

20.5(21.3)

(CG dinucleotides

excluded)

10

11

12

13

To A To T To C To G Row totals

From A 0.4 1.1 14.1 15.6

From T 0.3 33.8 0.3 34.4

From C 1.1 25.8 0.5 27.4

From G 20.0 1.1 1.6 22.7

Column

totals21.4 27.3 36.5 14.9

Pattern of Substitution*

in mtDNA

*Based on 95 sequences from human and chimpanzee.

14

To A To T To C To G Row totals

From A 0.4 1.1 14.1 15.6

From T 0.3 33.8 0.3 34.4

From C 1.1 25.8 0.5 27.4

From G 20.0 1.1 1.6 22.7

Column

totals21.4 27.3 36.5 14.9

*Based on 95 sequences from human and chimpanzee.

The sum of the relative frequencies of transitions is ~94%If all mutations occur with equal frequencies the expectation is 33%

15

Mutations: Strand (Leading and Lagging) Effects

16

Possible inequalities between strands

A change from G to A actually means that a G:C pair is replaced by an A:T pair.

This can occur as a result of either a G mutating to A in the one strand or a C to T mutation in the complementary strand.

Similarly, a change from C to T can occur as a result of either a C mutating to T in one strand or a G mutating to A in the other.

17

Detection of Strand Detection of Strand Inequalities in Inequalities in Mutation RatesMutation Rates

• If G A on leading strand, then C T on lagging strand

• If G A on lagging strand,then C T on leading strand

• If G A on leading = G A on lagging,then G A = C T

19

If there are no If there are no differences in the differences in the mutation pattern mutation pattern between the two between the two strands, thenstrands, then

20

To A To T To C To G Row totals

From A 0.4 1.1 14.1 15.6

From T 0.3 33.8 0.3 34.4

From C 1.1 25.8 0.5 27.4

From G 20.0 1.1 1.6 22.7

Column

totals21.4 27.3 36.5 14.9

The transitional rate between pyrimidines (C, pyrimidines (C, T)T) is much higher than that between purines purines (G, A)(G, A), suggesting different patterns and rates of mutation between the two strands.

Is G A = C T?

21

Pattern Pattern of amino-of amino-

acid acid replacemereplaceme

ntnt

22

Physicochemical distances = measures for quantifying the dissimilarity between two amino acids.

23

24

25

Grantham’s physicochemical distances between pairs of amino acidsGrantham’s physicochemical distances between pairs of amino acids

26

The most similar amino acid pairs are leucine and isoleucine (Grantham's distance = 5) and leucine and methionine

(Grantham's distance = 15).

27

215215205205

202202

The The most most

dissimidissimilar lar

amino amino acid acid pairspairs

28

A replacement of an amino acid by a similar one (e.g., leucine to isoleucine) is called a conservative replacement.

A replacement of an amino acid by a dissimilar one (e.g., glycine to tryptophan) is called a radical replacement.

29

Empirical Empirical findings:findings:

During evolution, amino acidsamino acids are mostly are mostly replacedreplaced by similarsimilar ones.

30

Similar amino acidsSimilar amino acids

Dissimilar amino acidsDissimilar amino acids

A little

A lot

31

SimilarSimilar DissimilarDissimilar

32

Kimura 1985

33

Exchanges between similar Exchanges between similar structures occur frequently. structures occur frequently. Exchanges between Exchanges between dissimilar structures occur dissimilar structures occur rarely. rarely.

Nothing happens, but if it Nothing happens, but if it does, it doesn’t matter.does, it doesn’t matter.

34

Amino-acid exchangeability

Numbers in parentheses denote codon family for amino acids encoded by two codon families

60-90% of the amino-acid replacements involve the nearest or second nearest neighbors in the ring

Argyle’s exchangeability ring Argyle’s exchangeability ring

35

What protein properties are conserved in evolution?

Protein specific constraints:Protein specific constraints: The evolution of each protein-coding gene is constrained by the specific functional requirements of the protein it produces.

General constraints:General constraints: Are there general properties that are constrained during evolution in all proteins?

36

degree of conservation lowhigh

bulkiness(volume)

37

degree of conservation lowhigh

hydrophobicity

38

degree of conservation lowhigh

polarity

39

degree of conservation lowhigh

opticalrotation

40

degree of conservation lowhigh

chargeopticalrotation

surprise!

41

42

43

44

45

46

47

Amino-acid composition may be an important factor in determining rates of nucleotide substitution.

48

Most conserved amino acids:

GlycineGlycine is irreplaceable because of its small size.

LysineLysine is irreplaceable because of its involvement in amidine bonds that crosslink polypeptide chains

CysteineCysteine is irreplaceable because of its involvement in cystine bonds that crosslink polypeptide chains

Proline Proline is irreplaceable because of its contribution to the contortion of proteins.

49

Does the frequency Does the frequency of amino acids in of amino acids in proteins reflect proteins reflect “functional need” “functional need” or “availability”?or “availability”?

50

The frequencies of nucleotides in vertebrate mRNA are 22.0% uracil, 30.3% adenine, 21.7% cytosine, and 26.1% guanine.

51

The expected frequency of a particular codon can be calculated by multiplying the frequencies of each of the nucleotides comprising the codon.

52

The expected frequency of the amino acid can be calculated by adding the frequencies of each codon that codes for that amino acid.

53

For example, the codons for tyrosine are UAU and UAC, so the random expectation for its frequency is:

1.057[(0.220)(0.303)(0.220) + (0.220)(0.303)(0.217)] = 0.0309

Since 3 of the 64 codons are stop codons, this frequency for each amino acid is multiplied by a correction factor of 1.057.

54

By plotting the expected frequency against the observed frequency, we can see if some amino acids are occurring more or less often than expected by chance. If the observed and expected frequencies are close to equal, we would expect a regression line with a slope = 1.

55

Excluding arginine, the correlation between observed and expected frequencies was highly significant (r = 0.9). Arginine frequency seems to be affected by selection acting on one or more of its codons.

56

Conclusions (?)

•Amino acid frequencies are not determined by functional requirements. •Amino acid frequencies are determined by nucleotide composition and the number of codons for for each amino acid.

top related