synteny conservation and chromosome rearrangements … · synteny conservation and chromosome...

8
Copyright 6 1997 by the Genetics Society of America Synteny Conservation and Chromosome Rearrangements During Mammalian Evolution Jason Ehrlich, * J David Sankof f and Joseph H. Nadeau * *Jackson LaboratoT, Bar Harbor, Maine 04609 and Centre derecherches mathdmatiques, Universiti de Montrdal, Montrial, Qukbec, H3C 3J7 Canada Manuscript received December 13, 1996 Accepted for publication June 4, 1997 ABSTRACT An important problem in comparative genome analysis has been defining reliable measures of synteny conservation. The published analytical measures of synteny conservation have limitations. Nonindepen- dence of comparisons, conserved and disrupted syntenies that are as yet unidentified, and redundant rearrangements lead to systematic errors that tend to overestimate the degree of conservation. We recently derived methods to estimate the total number of conserved syntenieswithin the genome, counting both those that have already been described and those that remain to be discovered. With this method, we show that -65% of the conserved syntenies have already been identified for humans and mice, that rates of synteny disruption vary -25-fold among mammalian lineages, and that despite strong selection against reciprocal translocations, inter-chromosome rearrangements occurred approxi- mately fourfold more often than inversions and other intra-chromosome rearrangements, at least for lineages leading to humans and mice. G ENOMES evolve through changes in DNA se- quence as a result of nucleotide substitution and through changes in DNA organization because of chro- mosome rearrangement. The types, rates, and patterns of nucleotide substitution have been documented for many genes and a sound theoretical basis has been developed for analyzing these changes ( SANKOFF and KRUSKAL 1984; NEI 1987; LI and GRAUR 1991; WATER- MAN 1995). Relatively less evidence is available for chro- mosome rearrangements and the analyticalbasis for interpreting these changes is modest (SANKOFF 1993; HANNENHALI et al. 1995). An important method to de- scribe changes in genome organization involves com- paring chromosome banding patterns between differ- ent but closely related species (JAUCH 1992; REID et al. 1993; SCHERTHAN et al. 1994; WTTENBERGER et al. 1995) . However, the number of bands that can be dis- criminated is inherently limited and assessing similari- ties in banding patterns becomes increasingly ambigu- ous for distantly related species. Comparative genetic maps are a powerful alternative. By comparing the chro- mosomal location of homologous genes in different species, the extent of chromosome conservation and rearrangement can be described even for distantly re- lated species ( OHNO 1970; LUNDIN 1979; NADEAU and TAYLOR 1984; O’BRIEN et al. 1993; ANDERSSON et al. Corresponding author: Joseph Nadeau, Genetics Department, BRB- 630, Case Western Reserve University School of Medicine, 10900 Eu- clid Ave., Cleveland, OH 44106-4955. E-mail: [email protected] versity, Princeton, NJ 08544. Present address: Department of Molecular Biology, Princeton Uni- ‘Resent address: Genetics Department, Case Western Reserve Uni- versity School of Medicine, 10900 Euclid Ave., Cleveland, OH 44106. Genetics 147: 289-296 (September, 1997) 1996). Homologous genes are readily identified through cross-hybridization and DNA sequence analysis (ANDERSON et al. 1996). Moreover the number of ho- mologous genes that are being identified is large (tens of thousands) relative to the number of chromosome rearrangements that arelikely to have occurred during mammalian evolution (hundreds) ( NADEAU and TAY- LOR 1984; NADEAU et al. 1995). An important problem in comparative genome analy- sis has been reliably measuring the extent of chromo- some rearrangement duringevolution. These measures could be used to calculate rates of chromosome rear- rangements, to determine whether some lineages are pronetorearrangement, and to test whether some chromosome segments are preferentially conserved. NADEAU and TAYLOR ( 1984) developed a measure of linkage conservation based on the genetic length of each conserved segment and the number of homolo- gous genes that have been mapped to each segment. With this measure, it was estimated that since diver- gence of lineages leading to humans and mice, the rate of lineage disruption has been approximately one event per million years. However, this method is applicable only to species with genetic maps, as are most measures of genome distance ( SANKOFF et al. 1992; HANNENHALLI and PEVZNER 1995 ) . For most species, genes have been assigned to chromosomes but the location of most genes on the chromosome has not been determined, i.e., synteny but not linkage has been established. As a result, methods are needed to exploit these data for understanding questions of genome organization and evolution.

Upload: lenguyet

Post on 03-Jul-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Synteny Conservation and Chromosome Rearrangements … · Synteny Conservation and Chromosome Rearrangements During Mammalian Evolution ... From the gene list for each species,

Copyright 6 1997 by the Genetics Society of America

Synteny Conservation and Chromosome Rearrangements During Mammalian Evolution

Jason Ehrlich, * J David Sankof f and Joseph H. Nadeau * *Jackson LaboratoT, Bar Harbor, Maine 04609 and Centre de recherches mathdmatiques,

Universiti de Montrdal, Montrial, Qukbec, H3C 3J7 Canada

Manuscript received December 13, 1996 Accepted for publication June 4, 1997

ABSTRACT An important problem in comparative genome analysis has been defining reliable measures of synteny

conservation. The published analytical measures of synteny conservation have limitations. Nonindepen- dence of comparisons, conserved and disrupted syntenies that are as yet unidentified, and redundant rearrangements lead to systematic errors that tend to overestimate the degree of conservation. We recently derived methods to estimate the total number of conserved syntenies within the genome, counting both those that have already been described and those that remain to be discovered. With this method, we show that -65% of the conserved syntenies have already been identified for humans and mice, that rates of synteny disruption vary -25-fold among mammalian lineages, and that despite strong selection against reciprocal translocations, inter-chromosome rearrangements occurred approxi- mately fourfold more often than inversions and other intra-chromosome rearrangements, at least for lineages leading to humans and mice.

G ENOMES evolve through changes in DNA se- quence as a result of nucleotide substitution and

through changes in DNA organization because of chro- mosome rearrangement. The types, rates, and patterns of nucleotide substitution have been documented for many genes and a sound theoretical basis has been developed for analyzing these changes ( SANKOFF and KRUSKAL 1984; NEI 1987; LI and GRAUR 1991; WATER- MAN 1995). Relatively less evidence is available for chro- mosome rearrangements and the analytical basis for interpreting these changes is modest (SANKOFF 1993; HANNENHALI et al. 1995). An important method to de- scribe changes in genome organization involves com- paring chromosome banding patterns between differ- ent but closely related species (JAUCH 1992; REID et al. 1993; SCHERTHAN et al. 1994; WTTENBERGER et al. 1995) . However, the number of bands that can be dis- criminated is inherently limited and assessing similari- ties in banding patterns becomes increasingly ambigu- ous for distantly related species. Comparative genetic maps are a powerful alternative. By comparing the chro- mosomal location of homologous genes in different species, the extent of chromosome conservation and rearrangement can be described even for distantly re- lated species ( OHNO 1970; LUNDIN 1979; NADEAU and TAYLOR 1984; O’BRIEN et al. 1993; ANDERSSON et al.

Corresponding author: Joseph Nadeau, Genetics Department, BRB- 630, Case Western Reserve University School of Medicine, 10900 Eu- clid Ave., Cleveland, OH 44106-4955. E-mail: [email protected]

versity, Princeton, NJ 08544. Present address: Department of Molecular Biology, Princeton Uni-

‘Resent address: Genetics Department, Case Western Reserve Uni- versity School of Medicine, 10900 Euclid Ave., Cleveland, OH 44106.

Genetics 147: 289-296 (September, 1997)

1996). Homologous genes are readily identified through cross-hybridization and DNA sequence analysis (ANDERSON et al. 1996). Moreover the number of ho- mologous genes that are being identified is large (tens of thousands) relative to the number of chromosome rearrangements that are likely to have occurred during mammalian evolution (hundreds) ( NADEAU and TAY- LOR 1984; NADEAU et al. 1995). An important problem in comparative genome analy-

sis has been reliably measuring the extent of chromo- some rearrangement during evolution. These measures could be used to calculate rates of chromosome rear- rangements, to determine whether some lineages are prone to rearrangement, and to test whether some chromosome segments are preferentially conserved. NADEAU and TAYLOR ( 1984) developed a measure of linkage conservation based on the genetic length of each conserved segment and the number of homolo- gous genes that have been mapped to each segment. With this measure, it was estimated that since diver- gence of lineages leading to humans and mice, the rate of lineage disruption has been approximately one event per million years. However, this method is applicable only to species with genetic maps, as are most measures of genome distance ( SANKOFF et al. 1992; HANNENHALLI and PEVZNER 1995 ) . For most species, genes have been assigned to chromosomes but the location of most genes on the chromosome has not been determined, i.e., synteny but not linkage has been established. As a result, methods are needed to exploit these data for understanding questions of genome organization and evolution.

Page 2: Synteny Conservation and Chromosome Rearrangements … · Synteny Conservation and Chromosome Rearrangements During Mammalian Evolution ... From the gene list for each species,

290 J. Ehrlich, D. Sankoff and J. H. Nadeau

baboon Capuchin

monkey cat chimp Chinese

hamster

deermouse dog gibbon gorilla African green

human lemur marmoset mink mouse owl monkey orangutan

rabbit rat rhesus

cow

monkey

Pig

48

4 6 8 7

10 0 6 5 4 5

14 5 7 6

11 4 4 2 1 6 5 A

19

26 6 7 3

4 0 6 5 3 3

7 5 4 4 4 5 5 2 2 2 4 A

26 32

17 22 60 32 8 63 5 5

7 10 1 0 4 9 5 7 5 8 4 7

14 12 6 7 5 6 7 8

10 9 5 5 6 8 2 3 5 5 7 6 6 7

25 34 2 23

13 19 2 21 24 38 4 24 26 43 4 28 64 32 2 19

8 239 5 28 0 0 2 3 1 4 4 0 3 9 4 9 0 7 3 5 0 5 4 6 0 4

15 37 1 8 5 6 0 7 5 3 0 1 8 7 0 7

12 42 3 11 6 1 1 0 5 4 6 0 6 3 1 3 0 3 3 4 1 5

10 19 2 7

24

18 25 34 19

32 1

22 41 5 6

9 6 3 4 6 5 5 3 3 5

24 23

18 13 25 22 38 30 20 21

32 31 3 2

24 15 26 26 42 24 4 35

7 8 5 4 3 3 5 5 7 7 4 5 6 4 2 4 3 2 5 5

47

26 51 62 55

196 11 39 41 41 35

3437 10 9

12 91 12 8

17 5

40

21 22

17 14 23 22 30 19 22 17

26 20 2 1 22 18 30 13 22 16 24 13

36 25 36 15 5 25 7 5 7 7 6 3 6 3 4 2 2 2 5 5

29 43 20 21 14 12 30

18 20 16 18 10 9 16 30 50 22 21 17 18 34 32 52 26 32 18 20 38 29 50 22 17 15 15 35

36 181 39 27 49 23 75 3 15 3 3 5 5 1 2

27 34 19 21 15 16 23 25 36 20 20 16 16 30 27 37 23 30 18 16 29 23 34 18 19 17 12 26

54 1152 55 34 82 30 226 24 31 20 20 15 13 27 22 25 13 15 10 7 20 60 53 19 21 18 20 39 11 4191 52 32 70 31 219 4 11 55 20 17 13 38 5 8 5 35 14 11 24 3 15 4 3 99 13 37 4 5 3 2 1 5 8 2 3 7 29 5 6 8 5 329

20

18 21 30 16

27 2

23 23 30 20

36 20 13 20 30 22 22 13 15 22

23

14 24 27 22

52 4

22 22 24 20

60 20 14 25 57 19 22 26 14 33

2 5 0 7 6 6 4 7 4 3 5 6 5 5 4 3 3 3 7 1 9 sheep . 5 7 5 1 0 0 3 6 5 5 1 2 5 1 4 1 2 5 6 4 3 6 6 6 0

FIGURE 1.-Numbers of genes examined in each species (on diagonal), number of homologous genes between species (above diagonal), number of conserved syntenies (below diagonal). Data were derived from Mouse Genome Database (MGD) as posted in March 1995. Guidelines for identifying homologous loci have been established by the Human Genome Database (GDB) and HUGO Comparative Mapping Committees (ANDERSON et al. 1996). The MGD contains information about homolo- gous genes for nearly 50 mammalian species. From the gene list for each species, mitochondrial genes were excluded, as were genes on the X and Y chromosomes. Exchanges of chromosome segments among mitochondrial DNA, sex chromosomes and autosomes are rare ( OHNO 1970) though some have been attested ( DISTECHE et al. 1992; PALMER et al. 1995; RUGARLI et al. 1995). Including these data would inflate measures of synteny conservation.

Several measures of synteny conservation have been proposed ( ZAKHAROV and VALEEV 1988; ZAKHAROV et al. 1992,1995; BENGTSSON et al. 1993; ZAKHAROV 1993), but they have limitations. Disproportionate weight is given to segments in which many genes have been mapped, thereby overestimating the degree of conser- vation. More importantly, these measures are based only on the observed number of conserved and dis- rupted syntenies, and do not fully exploit implicit as- sumptions of uniform randomness both of genomic re- arrangement events and of gene discovery patterns. SANKOFF and NADEAU (1996) derived methods to esti- mate the total number of conserved syntenies within the genome, counting both those that have already been described and those that remain to be discovered. In this article, we apply this method to comparative maps for a variety of mammalian species and use these esti- mates to calculate lineage-specific rates of chromosome rearrangement.

Conserved syntenies among mammalian species: To ensure adequate sample size for rigorous analysis, we selected nine of -50 species in MGD (NADEAU et al. 1995) with which to quantify the extent of synteny con-

servation (Figure 1 ) . These species were baboon, cat, chimpanzee, Chinese hamster, cow, human, mink, mouse and rat. At least 20 homologous genes have been mapped in each pair-wise species comparison. This group represented two major orders (primates and ro- dents), while also including selected carnivores and artiodactyls.

Synteny refers to the occurrence of two or more genes on the same chromosome, whereas conserved synteny re- fers to two or more homologous genes that are syntenic in two or more species, regardless of gene order on each chromosome, i.e., synteny but not necessarily gene order is conserved (Figure 2; see also NADEAU 1989) . Conserved linkage pertains to the conservation of both synteny and order of homologous genes between spe- cies (Figure 2; see also NADEAU 1989). A dismpted syn- teny refers to circumstances where a pair of genes are located on the same chromosome in one species but their homologues are located on different chromo- somes in another species, i.e., the genes are syntenic in only one of the two species. Syntenic genes can be identified by examining published genetic maps and conserved segments can be identified by comparing

Page 3: Synteny Conservation and Chromosome Rearrangements … · Synteny Conservation and Chromosome Rearrangements During Mammalian Evolution ... From the gene list for each species,

Synteny Conservation 29 1

maps that show homologous genes in different species ( LUNDIN 1979; NADEAU and TAYLOR 1984; O’BRIEN et al. 1993; LYONS et al. 1997) .

Measures of synteny conservation and disrup tion: The BENGTSSON (BENGTSSON et al. 1993) and ZAKHAROV ( ZAKHAROV et al. 1992, 1993) measures of synteny distance focus on the observed syntenies with- out considering how the statistical regularities in these observations are generated by randomness both in the process of genome rearrangement and the discovery and attribution of genes to chromosomes. Rather than just normalizing the observed number of conserved syn- tenies by the number of syntenies within the two species being compared, it is preferable to estimate the total number of conserved syntenies given the observed num- ber of conserved syntenies and how many genes they contain. The estimated number of conserved segments should then be a direct measure of the total number of disruptions, without recourse to normalization.

As a probabilistic model according to which we con- structed our estimator, we assumed that evolutionary disruptions resulted in n + c conserved segments in the genome of a species, separated by n breakpoints scattered at random ( i.e., according to the uniform dis- tribution ) along the lengths of the c chromosomes, and that the m mapped genes were also distributed at ran- dom, independently of the location of the breakpoints. Observed lengths of conserved linkages between hu- man and mice support these assumptions ( NADEAU and TAYLOR 1984; NADEAU 1989).

It is a property of the uniform distribution on the unit interval that if we draw k independent samples, then the probability density of the length x of the inter- val between any two adjacent sample points is

p ( x) = k ( 1 - x) k - l .

Setting k = n + c - 1, the probability density of segment length in our model is then

p ( x ) = ( n + c - 1 ) ( 1 - x ) n + C - Z ,

where x is the proportion of total chromosomal length in the genome represented by a segment, n is the num- ber of breakpoints resulting from chromosome rear- rangements, and c - 1 is the number of “artificial” breakpoints introduced by concatenating the c chromo- some for analytical purposes. Analysis of chromosome lengths in many species shows that treating the concate- nation points as also drawn independently from the uniform distribution is a reasonable assumption ( SAN- KOFF et d . 1996) .

Now, for a segment of length x, the probability that a gene randomly placed on the genome will be found in that segment is just x, because we consider the total genome to be of length 1, and the probability that r randomly and independently located genes out of m

observed genes will fall in the segment is just the bino- mial probability

B ( m,x;r) = (:) x‘(1 -

Then the probability P( r ) of finding r genes in any segment is found by integrating this binomial probabil- ity times the density function p over all possible segment lengths. This gives

P ( r ) = ( n + c- l )m! ( n + c + m - r - 2 ) !

( n + c - 1 + m ) ! ( m - r ) !

for r = 0, 1, . . . , m. Given this theoretical distribution, our knowledge of m, c, and the observed frequency distribution of r for nonzero values of r, we could then approximate the maximum likelihood estimator of n ( SANKOFF and NADEAU 1996).

To illustrate this method, we evaluated data on 1152 homologous autosomal genes in humans and mice, ob- served to be distributed among 91 conserved syntenies. As a first approximation, we equate one conserved seg- ment to one conserved synteny, although this is some- what of an underestimate. Based on the formula for P( r ) and applying maximum likelihood methods ( SAN- KOFF and NADEAu 1996), we estimate n = 122 breakpoints. The observed frequency distribution for the number of genes per conserved synteny is com- pared in Figure 3 with the theoretical distribution P based n = 122 breakpoints and c = 19 autosomes. The estimated total number of conserved syntenies is 141. Given that 91 have already been identified, we estimate an additional 31 remain to be found, a result suggesting that -65% ( =91/ 141 ) of the conserved syntenies have already been identified for humans and mice.

Phylogenies: Phylogenetic analysis was used to evalu- ate data quality. If homologies and genetic map localiza- tions are correctly determined, the phylogenetic tree based on synteny distance should be similar to the known evolutionary relations among these nine species. A phylogenetic tree based on estimates of n was con- structed using the neighbor-joining (NJ) method as implemented in PHYLIP version 3 . 5 4 ~ (Figure 4) . Other tree-building algorithms such as UPGMA and Fitch-Margoliash / Least-Squares, as well as other mea- sures of synteny distance ( ZAKHAROV et al. 1992, 1995; BENGTSSON et al. 1993; J. EHRLICH, D. SANKOFF and J. H. NADEAU, unpublished results), were also tested and the results were highly consistent (not shown).

It should be noted that our method underestimates the number of conserved syntenies yet to be found for species in which few genes have been mapped. As a result, our model must be modified to take into account that at early stages of mapping, experimentalists tend to report more genes that are linked in two or more species than would be predicted from our assumptions of randomness.

Page 4: Synteny Conservation and Chromosome Rearrangements … · Synteny Conservation and Chromosome Rearrangements During Mammalian Evolution ... From the gene list for each species,

292 J. Ehrlich, D. Sankoff and J. H. Nadeau

Rates of synteny disruption: We next calculated rates of synteny disruption for selected lineages and exam- ined these rates for homogeneity. The analysis parti- tions the total synteny distance separating two species into the distance since divergence for each lineage ( SARICH and WILSON 1967). At least three pair-wise distances are needed to estimate the three lineage-spe- cific rates. This analysis does not require knowledge of divergence times, which are generally not known accurately ( ROMER 1966).

We define the following relations and terms:

&f + Rh = NreJh,

R e f + R, = NmJ,,

Rh -k &n = Nh,rn,

where Nref;h, NTdm and Nh,,, represent the synteny dis- tance between pairs of species, and Rq, & and R, desig- nate the lineage-specific number of synteny disruptions for reference species (cat, cow or mink), human and mouse, respectively.

We assumed that the NJ tree (Figure 4) was correct and used it, rather than the raw estimates of number of conserved syntenies between pairs of species, as a source of distance estimates of N; the tree structure imposes a degree of consistency on the distances and reduces statistical fluctuations. To calculate synteny dis- tance for a species pair, the estimated number of syn- teny disruptions was summed from the branch tip for one species to the branch tip for the other. The cat lineage had the lowest rate of disruption; chimp, cow and mink had intermediate rates, and humans and mice showed had the highest rates (Table 1 ) , Rates for lineages leading to human, mouse and rat did not differ substantially. It is important to emphasize that these comparisons are not independent, cannot be tested for statistical significance, and should be considered as qualitative generalizations only.

To test whether the variable number of genes used in the comparisons contributed to the heterogeneous rates, we replaced mouse with rat as the representative rodent and human with chimp as the representative primate (Table 1 ) . The cat lineage still showed the lowest rate of synteny disruption; chimp, cow and mink lineages showed intermediate rates, and the rat lineage showed a high rate. Thus sample size did not bias rate estimates.

Finally, we examined rates of disruption for the three primate species and separately for the three rodent spe- cies (Table 1 ) . Baboon and chimp lineages showed comparable rates of disruption; the rate for the human lineage was approximately twice the rate for these other primates. Among rodents, the rat and mouse lineages showed comparable rates, whereas the hamster showed a low rate of disruption that was similar to the rate for

the cat lineage. Thus rates varied considerably even within the primate and rodent lineages.

DISCUSSION

A concern in the use of comparative maps for ge- nome analysis is the modest number of genes that have been mapped in most species and the disproportionate influence that errors might consequently have. A possi- ble example of this problem is the apparent excess of synteny segments marked by a single gene: -16 are expected and 35 are observed (Figure 3 ) . Explanations for this excess include incorrect homology determina- tion, mistaken synteny assignment, or questionable as- sumptions in the estimation procedures ( SANKOFF and N ~ E A U 1996). To further illustrate the nature of the problem, consider the following. In April 1996, only a single gene marked 28 of the 100 conserved syntenies in the mouse and human comparative map (MGD, http: / / www.informaticsljax.org ) . By August 1996, five of these genes had been removed from the mouse or human genetic map, four genes had been reassigned to another chromosome in one or both species, two segments were confirmed by the addition of a new gene, and six new single gene segments had been identified. These data support the experience of the mapping community that apparent conserved syntenies marked by a single gene are suspect; some may be subsequently confirmed with additional data, and some may be wrong. The apparent excess of single gene segments (Figure 3) is therefore not surprising. More compelling is the excellent fit between the observed and predicted number of segments marked by two or more genes.

Despite the inherent ambiguities in some gene map- ping data, three arguments suggest that most mapping and homology data are largely correct and that, while improvements might be made, estimation procedures are robust to a modest gene mapping error rate and to small sample sizes. The first is that the phylogenetic trees (Figure 4 and not shown ) based on various mea- sures of synteny distance gave similar results, each of which reflected expectations. If synteny distances were based on too small sample sizes, or if many errors ex- isted in the database, the resulting trees should have been unreliable. The second argument is that replacing “map rich” species (humans and mice) with “map poor” species (chimps and rats) did not substantially influence estimates of lineage-specific rates of synteny disruption (Table 1 ) . Finally, experience with con- served linkages suggests that measures of genome rear- rangement may be remarkably good even with small sample sizes. The original estimate for the average length of all autosomal linkages that have been con- served in the genomes of mouse and human was 8.1 ? 1.6 cM ( NADEAU and TAYLOR 1984) . This estimate was based on 83 genes and 13 conserved linkages. Nine

Page 5: Synteny Conservation and Chromosome Rearrangements … · Synteny Conservation and Chromosome Rearrangements During Mammalian Evolution ... From the gene list for each species,

Synteny Conservation

A. Genetic map in reference species

293

,A ,B IC ,D ,E ,F I I I I I I

B. No aenetic map in comparison species

Conserved synteny

Gene arrangement: Chr i (A,B,C,D,E,F)

Definition: Genes are located on the homologous chromosome; order of genes is not known or is ignored for analytical purposes.

Count: One conserved synteny involving genes A,B,C,D,E,F.

Disrupted synteny

Definition: Genes that are on the same chromosome in one species are on different chromosomes in another species.

Count: One disrupted synteny involving genes A,B,C,D vs genes E,F; one conserved synteny involving genes A,B,C,D; one conserved synteny involving genes E,F.

Possible cause: No inter-chromosomal rearrangement.

Possible cause:

No evidence for intra-chromosomal rearrangement. An inter-chromosomal rearrangement, e.g. reciprocal translocation. No evidence for intra-chromosomal rearrangement.

C. Genetic map in comparison species

Conserved synteny and linkage Conserved synteny, conserved linkage, disrupted linkage

Gene arrangement: Gene arrangement:

A ,B ,C ,E ,A I

Definition: Same gene order and similar genetic distances.

Count:

one conserved synteny. Involving genes A,B,C,E,F. One conserved linkage involving genes,

Possible cause: No inter-chromosomal rearrangement. No Intra-chromosomal rearrangement.

Conserved synteny, disrupted synteny, conserved linkage, disrupted linkage

Gene arrangement: A B C

Chr i D E F

-Chr j

One conserved linkage involving genes A,B! Count:

One conservedlinkage Involving genes D,E, One dlsrupted llnkage involving genes A,B,( One conserved synteny involving genes A,E One conservedsynteny involving genes D,E One disrupted synteny involving genes A,B,

,C; .F. 5 vs D,E,F . I$. i,F. C vs D,E,F.

Count: One conserved linkage Involving genes B,C,D; One conserved linkage involving genes E,F. One disrupted linkage involving genes B,C,D vs E,F vs A. One conserved synteny involving genes A,B,C,D,E,F.

Possible causes: An intra-chromosomal rearrangement,

such as a paracentric inversion.

FIGURE 2.-Definition of conserved and disrupted synten- ies and linkages. Examples are given for a chromosome marked by genes A, B, C, D, E and F. Spacing reflects approxi- mate recombination distances. In the comparative map (parts A-C) , one of the species is referred to as the reference spe- cies, the other as the comparison species. In the comparison species, two chromosomes are illustrated, Chr i and Chr j . Conserved linkages are marked by heavy black lines. Count: the number of conserved or disrupted syntenies and linkages for each example.

Possible causes: An interchromosomal rearrangement,

such as a reciprocal translocation.

Page 6: Synteny Conservation and Chromosome Rearrangements … · Synteny Conservation and Chromosome Rearrangements During Mammalian Evolution ... From the gene list for each species,

294 J. Ehrlich, D. Sankoff and J. H. Nadeau

147 “,I 10 i

0 10 20 30 40 50 60 Number of genes in conserved segment

FIGURE 3.-Distribution of the number of genes per con- served synteny. Two data points for conserved syntenies with 62 and 64 genes are not on scale. There is no data point for conserved syntenies with zero genes, hut the random disrup tion model predicts 15 or 29, depending on whether the total number of conserved syntenies is 141 (heavy line), the maximum likelihood estimate, or 200 (thin line), an arhi- trarilv chosen number of the same order of magnitude, illus- trating the resolution of the method. Note the apparent em- pirical excess of conserved syntenies with just one gene. This excess could result from incorrect homology determinations or mistaken linkage assignments. Note also the excellent fit between the observed and predicted numbers of segments marked by nvo or more genes.

years later, and with -11-fold more data, the average length was 8.8 cM ( COPELAND et al. 1993). These corre- spond to 178 2 39 and 144 chromosome rearrange- ments, respectively. Thus modest genetic maps and pos- sible mapping errors do not seem to influence results dramatically.

Given rates of linkage and synteny disruption, occur- rence of inter- us. intra-chromosome rearrangements can be characterized. Both intra-chromosomal rear- rangements, e . g , inversions, and interchromosomal re- arrangements, e+, reciprocal translocations, disrupt linkage, whereas only interchromosomal rearrange- ments disrupt synteny. We estimated that -140 synteny disruptions and -180 linkage disruptions ( NADEAU and TAYLOR 1984; NADEAU 1989; COPELAND et al. 1993) have occurred since divergence of lineages leading to hu- mans and mice. The difference between these two num- bers is -40 intra-chromosome rearrangements. For these two species, inter-chromosome rearrangements have been fixed during evolution approximately four- fold (-140 vs. -40) more often than intra-chromo- some rearrangements. It will be interesting to deter- mine whether a similar bias is found for other mamma- lian lineages.

The ability to use synteny maps for estimating rates of chromosome rearrangements for many mammalian lineages provides an opportunity to study lineage-spe- cific rates of chromosome rearrangement. An im-

cow 4’7.r 25.11z1 - chimp

3S6 human

tE 14.7 -baboon

23.7 4-, 42s -mouse

41 -Chinese hamster

rat

17.6 - mink

4.2-cat

FIGURE 4.-Neighborjoining tree based on n, the esti- mated number of conserved syntenies.

portant limitation of the method proposed by NADEAU and TAYLOR ( 1984) is that i t depends on the availability of genetic maps. Although rapid progress is being made in developing genetic maps for several mammalian spe- cies, most maps still have more synteny assignments than linkage assignments (ANDERSON et al. 1996). An important consequence is that conserved linkages can- not be used to calculate lineagespecific rearrangement rates. Synteny data partially resolve this problem be- cause distances can be calculated for many pairs of spe- cies (Figure 1 ) .

We found that rates of synteny disruption differ sub- stantially among mammalian lineages, with cats and hamster showing the slowest rates (range 3.1 -4.2) , hu- mans, mice and rats showing the highest rates (range 48.8-79.9), and baboons, chimps, cow, and mink show- ing intermediate rates (range 14.7-25.1 ) . These rates correspond to -0.05,0.90, and 0.30 synteny disruptions per million years, respectively. ZAKAROV (1993) also found substantial variation in rates of synteny disrup tion among mammalian lineages. The measures of syn- teny conservation that were used have statistical limita- tions, however ( SANKOFF and NADEAU 1996). Various cytogenetic methods have previously shown variability in chromosome rearrangements among lineages ( WIL SON et al. 1974, 1977; BUSH et al. 1977). The advantage of comparative maps for this purpose is that the data have high resolution, the analysis is based on a mathe- matical model, and the results are quantitative. With these rate estimates, it will now be possible to construct and evaluate more detailed models of the patterns of chromosome rearrangement during mammalian evolu- tion.

Many important problems remain to be resolved in using comparative maps for genome analysis. Examples include using comparative mapping information for in- ferring the organization of the “primordial” mamma- lian genome ( FEIUZITI et al. 1996), defining segment identities ( SANKOFF et al. 1997) and developing meth- ods for quantitating comparative mapping information resulting from chromosome painting (JAUCH 1992; REID et al. 1993; SCHERTHAN et al. 1994; RETTENBERGER et al. 1995) .

Page 7: Synteny Conservation and Chromosome Rearrangements … · Synteny Conservation and Chromosome Rearrangements During Mammalian Evolution ... From the gene list for each species,

Synteny Conservation

TABLE 1

Rates of synteny disruption among representative mammalian species

295

Synteny distance Lineage-specific rate Relative rate

Test for choice of reference species: cat, cow or mink us human and mouse

cat-human-mouse cat-human: 57.7 cat-mouse: 79.4 human-mouse: 128.7

cow-human-mouse cow-human: 73.9 cow-mouse: 105.0 human-mouse: 128.7

human-mink-mouse human-mink 76.9 human-mouse: 128.7 mink-mouse: 87.0

cat-human-rat cat-human: cat-rat: human-rat:

cowhuman-rat cow-human: human-rat: cow-rat:

mink-human-rat human-mink: human-rat: mink-rat:

cat-chimp-mouse cat-chimp: cat-mouse: chimpmouse:

cow-chimp-mouse chimpcow: chimpmouse: cow-mouse:

mink-chimpmouse chimp-mink: chimpmouse: mink-mouse:

cat: 4.2 human: 53.5 mouse: 75.2

cow: 25.1 human: 48.8 mouse: 79.9

human: 59.3 mink: 17.6 mouse: 69.4

mouse:cat human:cat mouse:human

mouse:cow human:cow mouse:human

mouse:mink human:mink mouse:human

Test for sample size effect: replacing rat for mouse and chimp for human

57.7 69.1

118.4

78.6 118.4 94.7

76.9 118.4 76.7

27.6 76.3 95.5

48.5 95.5

101.9

46.8 95.5 83.9

baboon-chimphuman baboon-chimp: 32.7 baboon-human: 51 .O chimphuman: 36.3

hamster-mouse-rat hamster-mouse: 45.7 hamster-rat: 38.5 mouse-rat: 78.0

cat: human: rat:

human: cow: rat:

human: mink: rat:

cat: chimp: mouse:

chimp:

mouse: cow:

chimp: mink: mouse:

4.2 53.5 64.9

51.2 27.5 67.3

59.3 17.6 59.1

4.2 23.4 72.1

21.1 27.5 74.5

29.2 17.6 66.3

Variation within mammalian orders: primates and rodents

baboon: 14.7 chimp: 18.0 human: 36.3

hamster: 3.1 mouse: 42.6 rat: 35.4

rat:cat human:cat rat:human

rat:cow human:cow human:rat

rat:mink human:mink rat:human

mouse:cat chimp:cat mouse:chimp

mouse:chimp mouse:cow cow:chimp

mouse:mink mouse:chimp chimp:mink

human:baboon human:chimp chimp:baboon

mouse:hamster rat:hamster mouse:rat

17.9 12.7 1.4

3.2 1.9 1.6

3.9 3.4 1.2

15.5 12.7 1.2

2.5 1.9 1.3

3.4 3.4 1 .o

17.2 5.6 3.1

3.5 2.7 1.3

3.8 2.3 1.6

2.5 2.0 1.2

13.7 11.4 1.2

Page 8: Synteny Conservation and Chromosome Rearrangements … · Synteny Conservation and Chromosome Rearrangements During Mammalian Evolution ... From the gene list for each species,

296 J. Ehrlich, D. Sankoff and J. H. Nadeau

We thank JOE FELSENSTEIN for providing the PHYLIP software pack- age and ALEX SMITH for help in designing some of the computer programs used here. Part of this work was completed at the JACKSON

Laboratory. This work was supported by National Institutes of Health grant HG00330, grant BIR/00379 from the National Science Foun- dation Research Experiences for Undergraduates Program, a grant from the Burroughs Wellcome Fund to the JACKSON Laboratory, grants from the Natural Sciences and Engineering Research Council of Canada and the Canadian Genome Analysis and Technology pro- gram. D.S. is a fellow of the Canadian Institute for Advanced Re- search.

LITERATURE CITED

ANDERSSON, L., A. ARCHIBALD, M. ASHBURNER, S. AUDUN, W. BAR- ENDSE et al., 1996 Comparative genome organization of verte- brates. Mammal. Genome 7: 717-734.

BENGTSSON, B. O., K. K. LEVAN and G. LEVAN 1993 Measuring ge-

6 4 198-200. nome organization from synteny data. Cytogenet. Cell Genet.

BUSH, G. L., S. M. CASE, A. C. WILSON and J. L. PATTON 1977 Rapid speciation and chromosomal evolution in mammals. Proc. Natl. Acad. Sci. USA 7 4 3942-3946.

COPELAND, N. G., N. A. JENKINS, D. J. GILBERT, J. T. EPPIG, L. J. MAL TNS et al., 1993 A genetic linkage map of the mouse: current applications and future prospects. Science 262: 57-66.

DISTECHE, C. M, C. I. BRANNAN,A. LARSEN, D. A. ADI.ER, D. F. SCHORL- ERET et al., 1992 The human pseudoautosomal GM-CSF recep- tor alpha subunit gene is autosomal in the mouse. Nature Genet. 1: 333-336.

FERRETTI, V., J. H. NADEAU and D. SANKOFF, 1996 Original synteny, pp. 159-167, in Combinatorial Pattern Matching, 7th Annual Symprr sium, Lecture Notes in Computer Science, edited by D. HIRSCHBERG and G. MYERS. Springer-Verlag, New York.

HANNENHALLI, S., and P. A. PEVZNER 1995 Transforming men into mice (polynomial algorithm for genomic distance problem). Proc. 27th Ann. ACM-SIAM Symp. Found. Computer Sci. pp.

HANNENHALLI, S., C. CHAPPEY, E. V. KOONIN and P. A. PEVZNER 1995 Genome sequence comparison and scenarios for gene rearrange- ments: a test case. Genomics 30: 299-311.

JAUCH, A,, 1992 Reconstruction of genomic rearrangements in great apes and gibbons by chromosome pints. Proc. Natl. Acad. Sci.

LI, W.-H., and D. GRAUR 1991 Fundamentals of Molecular Evolution. Sinauer Associates, Sunderland, M A .

LUNDIN, L-G. 1979 Evolutionary conservation of large chromosomal segments reflected in mammalian gene maps. Clin. Genet. 16:

LYONS, L. A,, T. F. LAUGHLIN, N. G. COPELAND, N. A. JENKINS, J. E. WOMACK et al., 1997 Comparative anchor tagged sequences (CATS) for integrative mapping of mammalian genomes. Nat. Genet. 15: 47-56.

NADEAU, J. H., 1989 Maps of linkage and synteny homologies be- tween mouse and man. Trends Genet. 5 82-86.

NADEAU, J. H., and B. A. TAYLOR, 1984 Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl. Acad. Sci. USA 81: 814-818.

NADEAU, J. H., P. L. GRANT, S . MANKALA, A. H. REINER, J. E. RICHARD- SON et al., 1995 A Rosetta Stone for mammalian genetics. Na- ture 373: 363-365.

NEI, M., 1987 Molecular Evolutionaly Genetics. Columbia University Press, New York.

O'BRIEN, S . J., J. E. WOMACK, L. A. LYONS, K. J. MOORE, N. A. JENKINS

178-189.

USA 89: 8611-8613.

72-81.

et al., 1993 Anchored reference loci for comparative genome mapping in mammals. Nature Genet. 3: 103-112.

OHNO, S. 1970 Evolution ly Gene Duplication. Springer-Verlag, New York.

PALMER, S., J. PERRY and A. ASHWORTH 1995 A contravention of Ohno's law in mice. Nature Genet. 10: 472-476.

REID, T., N. ARNOLD, D. C. WARO and J. WEINBERG 1993 Compara- tive high-resolution mapping of human and primate chromo- somes by fluorescence in situ hybridization. Genomics 18: 381- 386.

RE'ITENBERGER, G., C. KLETT, U. ZECHNER, J. KUNZ, W. VOCE[. et al., 1995 Visualization of the conservation of synteny between humans and pigs by heterologous chromosome painting. Geno- mics 26: 372-378.

ROMER, A. S., 1966 Vertebrate Paleontology. University of Chicago Press, Chicago.

RUGARIJ, E. L., D. A. ADLER, G. BORSANI, K. TSUCHIYA, B. FRANCO et al., 1995 Different chromosomal localization of the Clcn4 gene in Mzls spetus and C57BL/6J. Nat. Genet. 10: 466-471.

SANKOFF, D., 1993 Analytical approaches to genomic evolution. Bio- chemie 75: 409-413.

SANKOFF, D., V. FERRETTI and J. H. NADEAU 1997 Conserved seg- ment identification. RECOMB 97. Proceedings of the First An- nual International Conference on Computational Molecular Bi- ology, ACM Press, New York, pp. 252-256.

SANKOFF, D., and J. B. KRUSKAI. (Editors), 1984 Time Warps, String Edits and Macromolecules. Addison Wesley, Reading, MA.

SANKOFF, D., G. LEDUC, N. ANTOINE, B. PAQUIN, B. F. LAN(; et al., 1992 Gene order comparisons for phylogenetic inference: evolution of

6575-6579. the mitochondrial genome. Proc. Natl. Acad. Sci. USA 89:

SANKOFF, D., and J. H. NADEAU 1996 Conserved synteny as a mea- sure of genome rearrangement. Discrete Appl. Math. 71: 247- 257.

SARICH, V. M., and A. C. WILSON 1967 Immunological time scale for homonid evolution. Science 158 1200-1203.

SCHERTHAN, H., T. CREMER, U. ARNASON, H. U. WEIER, A. LIMA-DE- FARIA et al., 1994 Comparative chromosome painting discloses homologous segments in distantly related mammals. Nature Genet. 6 342-347.

WATERMAN, M. 1995 Introduction to Computational Biology. Chapman Hall, New York.

WILSON, A. C., V. M. SARICH and L. R. MAXSON 1974 The impor- tance of gene rearrangement in evolution: evidence from studies on rates of chromosomal, protein and anatomical evolution. Proc. Natl. Acad. Sci. USA 71: 3028-3030.

WILSON, A. C., T. J. WHITE, S. C. CARLSON and L. M. CHERRY 1977 Molecular evolution and cytogenetic evolution, pp. 375-393 in Molecular Human Cytogenetics, ICN-UCLA Symposia on Molecular and Cellular Biology, Vol. VII, edited by R. S. SPARKES, D. E. COMINGS and C. F. FOX. Academic Press, San Francisco.

ZAKHAROV, I. A,, 1993 Measurements of similarity of synteny groups and an analysis of genome rearrangements in the evolution of mammals, pp. 107- 113 in Bioinfmatics, Supercomputingand Com- plex Genome Analysis, edited by H. LIM, J. FICKET and C. CANTOR. World Scientific, New York.

ZAKHAROV, I. A,, and A. K. VALEEV 1988 Quantitative analysis of the mammal genome evolution via the comparison of genetic maps. Proc. (Doklady) Acad. Sci. USSR 301: 1213-1218.

ZAKHARov, I . A., V. I. NIKIFOROV and E. V. STEPANYUK 1992 Homol- ogy and evolution of gene orders: combination measurement of synteny group similarity and simulation of the evolutionary

ZAKHAROV, I. A., V. I . NIKIFOROV and E. V. STEPANYUK 1995 Interval process. Genetika 28: 77-81.

estimates of combinatorial measures of similarity for orders of homologous genes. Genetika 31: 1163-1167.

Communicating editor: A. G. CLARK