maynard smith - mbe 2002 - re combination in animal mitocondria

Upload: subhartho-bhattacharya

Post on 09-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 MAYNARD SMITH - MBE 2002 - Re Combination in Animal Mitocondria

    1/3

    2330

    Mol. Biol. Evol. 19(12):23302332. 2002

    2002 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038

    Letter to the Editor

    Recombination in Animal Mitochondrial DNA

    J. Maynard Smith and N. H. Smith

    Centre for the Study of Evolution, University of Sussex

    Table 1Rana Sequences

    SequenceAccession

    Number Species

    1 Rb7 . . . . .2 Rn8 . . . . .3 Rc6 . . . . .4 Ra1 . . . . .5 Rd5 . . . . .6 Rd4 . . . . .7 Rr2 . . . . .8 Rr3 . . . . .

    AF205088AF205087AF205089AF205094AF205090AF205091AF205093AF205092

    R. plancyiR. nigromaculata

    R. catesbeiana R. amurensis R. dybowskii

    R. dybowskii R. rugosa R. rugosa

    Previous attempts to demonstrate, or refute, the oc-currence of recombination in mitochondria have used

    tests designed to measure relatively frequent recombi-nation between rather similar sequences: for example,the homoplasy test (Maynard Smith and Smith 1998)and the regression of linkage disequilibrium betweenpairs of loci with distance along the chromosome (Awa-dalla, Eyre-Walker, and Maynard Smith 1999). Recently,Ladoukakis and Zouros (2001; subsequently LZ) haveattempted to demonstrate rare recombination events be-tween sequences differing at 10% or more of nucleo-tides. The aim of this note is, first, to evaluate the sta-tistical support for their conclusion and, second, brieflyto discuss its significance.

    LZ analyze three published data sets consisting of1,140-bp sequences of cytochrome b from eight individ-

    uals of Rana and 10 of Apodemus, and 366 bp of cy-tochrome oxidase I from 10 individuals of Gammarus:the Rana sequences come from different species andtheir origin is given in table 1. For each set, they firstexamined a matrix of the variable, informative aminoacids (2023 per set) and identified, by visual inspec-tion, one potential crossover event, involving two par-ents (a, c) and a recombinant (b), and two break pointsidentifying a central recombinant piece, in which b re-sembles c, and two flanking regions in which b resem-bles a (see fig. 1). Thus the proposed event is the trans-fer of a short central region from a donor, c, to a recip-ient, a, yielding sequence b.

    These putative events are not themselves statisti-cally significant. They are best regarded as hypotheses,whose statistical significance can be tested by examiningthe full nucleotide sequences. As a measure of recom-bination, LZ use the postrecombination divergence,prd (see fig. 1): this is a number which would be zeroif the two parents and a recombinant were sequencedbefore any further variation had arisen and which wouldthen increase by mutation. When all trace of a recom-binant has disappeared, the observed value of prd willbe no greater than the value calculated if the sequenceof sites is randomized. LZ performed 1,000 random per-mutations, retaining the frequencies of each nucleotideat each site but randomizing their sequence. They reportthat in no case was the value of prd for the randomsequence as low as that for the actual sequence.

    However, it is not clear from their paper just howthis calculation was performed. In particular, did their

    Key words: recombination, mitochondria, Rana, Apodemus,Gammarus.

    Address for correspondence and reprints: J. Maynard Smith, Cen-tre for the Study of Evolution, University of Sussex, BN1 9QL, UnitedKingdom. E-mail: [email protected].

    permutations involve all nucleotides, including thosecoding for amino acid variation and therefore used in

    formulating the hypothesis they are testing? If these nu-cleotides were included, then the appropriate signifi-cance test would be to randomize a matrix of all thesequences and show that no choice of three sequencesand two break points would yield a value of prd asstatistically significant as that observed; such a testwould be difficult to perform. We have therefore re-peated their permutations, accepting their identificationof the parental and recombinant sequences and proposedbreak points, but using only sites responsible for syn-onymous variation in the data set and not used in for-mulating the hypothesis being tested. The results areshown in table 2. There is convincing evidence of re-combination in Rana and a strong suggestion of recom-bination in Apodemus but no reason for suspecting re-combination in Gammarus.

    In view of this discrepancy, we decided to use amore direct test of recombination, not dependent on vi-sual inspection of the data. The test used is a modifi-cation of the maximum chi-square method (MaynardSmith 1992). We look for crossovers involving only asingle break within the gene (see fig. 2) and use datafrom variable synonymous sites only, to reduce the like-lihood that any observed patterns were caused by selec-tion. For each data set, we compared all pairs of se-quences and for each pair all possible crossover pointsand found the pair of sequences, a and b, and the break

    point that maximized the value of chi-square calculatedas in figure 2; this value is denoted maxch.To decide whether the event so identified is statis-

    tically significant, we generated 1,000 new matrices,maintaining the nucleotide frequency of each site butrandomizing their allocation to individuals: thus, we re-tained the observed polymorphism in the data set buteliminated any linkage disequilibrium. For each matrixwe calculated maxch as above. The results are given intable 3. Statistical significance depends on whether theobserved value of maxch is greater than the simulatedvalues.

  • 8/8/2019 MAYNARD SMITH - MBE 2002 - Re Combination in Animal Mitocondria

    2/3

    Letter to the Editor 2331

    FIG. 1.Two parental sequences, a and c, and recombinant, b,formed by the insertion of a central region from c into a. Numbers ofnucleotide differences are denoted by x and y. The postrecombina-tional divergence, prd x y. If the recombinant event is veryrecent, x y 0.

    FIG. 2.A single crossover between sequences a and c, producingb. For a versus b, or b versus c, the strength of evidence for a crossoveris given by 2 (eh gf)2s/mn(e f)(g h). The expected valuesof 2 are obtained by randomizing the sequences, as explained in thetext.

    Table 2Test of the Laoukakis and Zouros Hypothesis

    Gammarus Apodemus Rana

    Number of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . .Polymorphic synonymous sites. . . . . . . . . . . . . . . . . . . .

    1083

    10293

    8260

    Ends of middle piece

    All sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Polymorphic synonymous sites. . . . . . . . . . . . . . . . . .

    19288472

    6881105182289

    636863165208

    Parental sequencesa . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Recombinant sequencea . . . . . . . . . . . . . . . . . . . . . . . . . .Trials, out of 1000, with prd observed value. . . . . .

    Gf17, Gf1Gf15201

    Ap5, Ag3As26

    11

    Rr2, Rc6Rr3

    0

    a Nomenclature of Laoukakis and Zouros (2001).

    There is no evidence of recombination in eitherGammarus or Apodemus, but there is strong evidencefor recombination in Rana (P K 0.01). This supportsthe conclusion that emerged from our reanalysis of LZshypothesis, although the break points and recombinantsequences are not the same as those detected using themethod of LZ. The evidence for recombination is illus-trated in figure 3. It is not clear which of sequences a

    (Ra1) and b (Rc6) is the parent and which the re-combinant: there is no sequence in the data set that issimilar to a (or to b) before the break but different fromboth a and b after it, although there are sequences, e.g.,sequence c (Rr2), that are very different from both aand b before and after the break.

    Both the test suggested by LZ and the maximumchi-square test indicate that, in the Rana data, whencomparing a pair of sequences, there are regions of sim-ilarity and regions of difference along the gene. Such apattern suggests recombination. As a final confirmationof this conclusion, we applied a modified version of theruns test suggested by Sawyer (1989) to a matrix ofsynonymous differences for the three data sets. As be-

    fore, we found highly significant evidence of runs inthe Rana data set but no sign of runs in either Gam-marus or Apodemus.

    A pattern of differences of the kind shown in figure3 strongly suggests a recombination event affecting se-quences a and b (Ra1 and Rc6). Such a pattern couldnot be generated by clusters of hypervariable sites, orof constrained sites, unless each sequence was subjectto a unique set of constraints at synonymous sites. Itwould require that in sequences a and b, but not c, thesites after the break have a reduced rate of change. Thisseems implausible, particularly for synonymous sites.

    Thus for Rana, but not the other genera, there isoverwhelming evidence for regions of similarity and dif-ference between sequences for synonymous sites. Thisis difficult to explain except by recombination. However,

    there are real difficulties with recombination as an ex-planation. It is not just that it requires recombinationbetween different species: relatively few recombina-tion events are required to produce the observed pat-terns. The real difficulty is as follows. The maximumchi-square test reveals seven statistically significantcrossovers, each involving two of the sequences num-bered 1 to 5 in table 1 and a similar break point inthe range 165 to 181. An examination of the geneticdistances between the eight sequences, before and aftersite 165, reveals that the five sequences are similar toone another after the break point (six of 10 pairwisecomparisons differ at less than 20 sites) but very dif-ferent before the break (nine of 10 pairwise comparisons

    differ at more than 75 sites). This suggests that, rela-tively recently, a region of DNA roughly from poly-morphic site 165 (site 636) to the end of the availablesequence was introduced into each of the five sequences.This seems to require five separate events (or four if oneof the sequences was the donor of the DNA). This couldperhaps be explained by the spread, by recombinationaffecting several species, of a selectively favored re-gion of DNA. The synonymous sites analyzed, althoughnot themselves selected, could have hitch-hiked with theselectively favored amino acid substitutions. It is rele-vant that there are, in this region, 12 polymorphic amino

  • 8/8/2019 MAYNARD SMITH - MBE 2002 - Re Combination in Animal Mitocondria

    3/3

    2332 Maynard Smith and Smith

    Table 3Maximum Chi-Square Test Using Variable Synonymous Sites Only

    Gammarus Apodemus Rana

    Sequencesa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Break pointb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .maxch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Trials, out of 100, giving greater maxch . . . . . . . .Greatest trial maxch. . . . . . . . . . . . . . . . . . . . . . . . .

    Gf11, Gf1270/28811.23315.6

    As6, Ap590/29312.7

    714.1

    Ra1, Rc6167/63642.1

    019.8

    a Nomenclature of Laoukakis and Zouros (2001).b Position in sequence of variable synonymous sites and of all sites.

    FIG. 3.A potential recombinant in sequences from Rana. Se-quences a and b give a highly significant value of 2, for differencesbefore and after polymorphic site 167 (equivalent to nucleotide 636 inthe full sequence), as judged by the maximum chi-square test. How-

    ever, sequence c (chosen because it gives the largest chi-square valuecompared with either a or b) is not a plausible second parent, whichshould resemble either a or b before the break.

    acids present in all five sequences but that these are rarein the other three. Members of the genus Rana have anumber of characteristics that may facilitate the inter-species spread of a selectively favorable region of mi-tochondrial DNA. These include external fertilization,weak premating isolation and hybrid amphispermy in R.esculenta (Graf and Pelaz 1989). However, both in thelaboratory and the wild, the progeny of interspecies mat-ings are usually inviable (T. Beebee, personalcommunication).

    To summarize, we can find no evidence for recom-bination in Gammarus and only weak evidence in Apo-

    demus, but there is overwhelming evidence for a patternof similarity and difference at synonymous sites inRana. Although there are difficulties with recombinationas an explanation, it is hard to think of any other.

    LITERATURE CITED

    AWADALLA, P., A. EYRE-WALKER, and J. MAYNARD SMITH.1999. Linkage disequilibrium and recombination in homi-nid mitochondria DNA. Science 286:25242525.

    GRAF, J.-D., and M. P. PELAZ. 1989. Evolutionary genetics of

    the Rana esculenta complex. Pp. 289301 inR. M. DAWLEYand J. P. BOGART, eds. Evolution and ecology of unisexualvertebrates. Bulletin 466. New York State Museum, NewYork.

    LADOUKAKIS, E. D., and E. ZOUROS. 2001. Recombination inanimal mitochondrial DNA: evidence from published se-quences. Mol. Biol. Evol. 18:21272131.

    MAYNARD SMITH, J. 1992. Analyzing the mosaic structure ofgenes. J. Mol. Evol. 35:126129.

    MAYNARD SMITH, J., and N. H. SMITH. 1998. Detecting recom-bination from gene trees. Mol. Biol. Evol. 15:590599.

    SAWYER, S. 1989. Statistical tests for detecting gene conver-sion. Mol. Biol. Evol. 6:526538.

    DIETHARD TAUTZ, reviewing editor

    Accepted August 6, 2002