1 number of substitutions between two protein- coding genes dan graur

32
1 Number of substitutions between two protein-coding genes Dan Graur

Upload: toni-benton

Post on 01-Apr-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Number of substitutions between two protein- coding genes Dan Graur

1

Number of substitutions between two

protein-coding genes

Dan Graur Dan Graur

Page 2: 1 Number of substitutions between two protein- coding genes Dan Graur

2

Handout

Page 3: 1 Number of substitutions between two protein- coding genes Dan Graur

3

Computing the number of substitutions between two protein-coding sequences is more complicated, because a distinction should be made between synonymous and nonsynonymous substitutions.

Page 4: 1 Number of substitutions between two protein- coding genes Dan Graur

4

Number of synonymous substitutionsNumber of synonymous sites

Number of nonsynonymous substitutionsNumber of nonsynonymous sites

Page 5: 1 Number of substitutions between two protein- coding genes Dan Graur

5

Page 6: 1 Number of substitutions between two protein- coding genes Dan Graur

6

Aims:

1. Compute two numerators: The numbers of synonymous and nonsynonymous substitutions. 2. Compute two denominators: The numbers of synonymous and nonsynonymous sites.

Page 7: 1 Number of substitutions between two protein- coding genes Dan Graur

7

1. The classification of a site changes with time: For example, the third position of CGG (Arg) is synonymous. However, if the first position changes to T, then the third position of the resulting codon, TGG (Trp), becomes nonsynonymous.

Difficulties with denominator:

T Trp

Nonsynonymous

Page 8: 1 Number of substitutions between two protein- coding genes Dan Graur

8

2. Many sites are neither completely synonymous nor completely nonsynonymous. For example, a transition in the third position of GAT (Asp) will be synonymous, while a transversion to GAG or GAA will alter the amino acid.

Difficulties with denominator:

Page 9: 1 Number of substitutions between two protein- coding genes Dan Graur

9

Difficulties with numerator:

1. The classification of the change depends on the order in which the substitutions had occurred.

Page 10: 1 Number of substitutions between two protein- coding genes Dan Graur

10

Difficulties with numerator:1. When two homologous codons differ from each other by two substitutions or more the order of the substitutions must be known in order to classify substitutions into synonymous and nonsynonymous.

Example: CCC in sequence 1 and CAA in Example: CCC in sequence 1 and CAA in sequence 2.sequence 2.Pathway I:

CCC (Pro) CCA (Pro) CAA (Gln)1 synonymous and 1 nonsynonymous

Pathway II:

CCC (Pro) CAC (His) CAA (Gln)2 nonsynonymous

Page 11: 1 Number of substitutions between two protein- coding genes Dan Graur

11

Difficulties with numerator:

2. Transitions occur with different frequencies than transversions.

3. The type of substitution depends on the mutation. Transitions result more frequently in synonymous substitutions than transversions.

Page 12: 1 Number of substitutions between two protein- coding genes Dan Graur

12

1. Classification of sites. Consider a particular position in a codon. Let i be the number of possible synonymous changes at this site. Then this site is counted as i/3 synonymous and (3 – i)/3 nonsynonymous.

Methods: Miyata & Yasunaga (1980) and Nei & Gojobori(1986)

Page 13: 1 Number of substitutions between two protein- coding genes Dan Graur

13

In TTT (Phe), the first two positions are nonsynonymous, because no synonymous change can occur in them, and the third position is 1/3 synonymous and 2/3 nonsynonymous because one of the three possible changes is synonymous.

Page 14: 1 Number of substitutions between two protein- coding genes Dan Graur

14

2. Count the number of synonymous and nonsynonymous sites in each sequence and compute the averages between the two sequences. The average number of synonymous sites is NS and that of nonsynonymous sites is NA.

Page 15: 1 Number of substitutions between two protein- coding genes Dan Graur

15

3. Classify nucleotide differences into synonymous and nonsynonymous differences.

Page 16: 1 Number of substitutions between two protein- coding genes Dan Graur

16

•For two codons that differ by only one nucleotide, the difference is easily inferred. For example, the difference between the two codons GTC (Val) and GTT (Val) is synonymous, while the difference between the two codons GTC (Val) and GCC (Ala) is nonsynonymous.

Page 17: 1 Number of substitutions between two protein- coding genes Dan Graur

17

Page 18: 1 Number of substitutions between two protein- coding genes Dan Graur

18

• For two codons that differ by two or more nucleotides, the estimation problem is more complicated, because we need to determine the order in which the substitutions occurred.

Page 19: 1 Number of substitutions between two protein- coding genes Dan Graur

19

Pathway (1) requires one synonymous and one nonsynonymous change, whereas pathway (2) requires two nonsynonymous changes.

Page 20: 1 Number of substitutions between two protein- coding genes Dan Graur

20

There are two approaches to deal with multiple substitutions at a codon:

Page 21: 1 Number of substitutions between two protein- coding genes Dan Graur

21

The unweighted method: Average the numbers of the different types of substitutions for all the possible scenarios. For example, if we assume that the two pathways are equally likely, then the number of nonsynonymous differences is (1 + 2)/2 = 1.5, and the number of synonymous differences is (1 + 0)/2 = 0.5.

Page 22: 1 Number of substitutions between two protein- coding genes Dan Graur

22

The weighted method. Employ a priori criteria to assign the probability of each pathway. For instance, if the weight of pathway 1 is 0.9, and the weight for pathway 2 is 0.1, then the number of nonsynonymous differences between the two codons is (0.9 1) + (0.1 2) = 1.1, and the number of synonymous differences is 0.9.

Page 23: 1 Number of substitutions between two protein- coding genes Dan Graur

23

Page 24: 1 Number of substitutions between two protein- coding genes Dan Graur

24

4. The numbers of synonymous and nonsynonymous differences between the two protein-coding sequences are MS and MA, respectively.

Page 25: 1 Number of substitutions between two protein- coding genes Dan Graur

25

The number of synonymous differences per synonymous

site is

pS = MS/NS

The number of nonsynonymous differences per

nonsynonymous site is

pA = MA/NA

Page 26: 1 Number of substitutions between two protein- coding genes Dan Graur

26

If we take into account the effect of multiple hits at the same site, we can make corrections by using Jukes and Cantor's formula:

Page 27: 1 Number of substitutions between two protein- coding genes Dan Graur

27

Page 28: 1 Number of substitutions between two protein- coding genes Dan Graur

28

Page 29: 1 Number of substitutions between two protein- coding genes Dan Graur

29

Number of Amino-Acid Replacements between Two

Proteins• The observed proportion of different amino acids between the two sequences (p) isp = n /L

• n = number of amino acid differences between the two sequences

• L = length of the aligned sequences.

Page 30: 1 Number of substitutions between two protein- coding genes Dan Graur

30

Page 31: 1 Number of substitutions between two protein- coding genes Dan Graur

31

Number of Amino-Acid Replacements between Two

ProteinsThe Poisson model is used to convert p into the number of amino replacements between two sequences (d ):

d = – ln(1 – p)

The variance of d is estimated as

V(d) = p/L (1 – p)

Page 32: 1 Number of substitutions between two protein- coding genes Dan Graur

Saturation