development of species- and genome-specific genetic...
TRANSCRIPT
DEVELOPMENT OF SPECIES- AND GENOME-SPECIFIC GENETIC
MARKERS BY REPRESENTATIONAL DIFFERENCE ANALYSIS:
APPLICATION IN SYSTEMATIC AND EVOLUTIONARY RESEARCH
by
ANTON NEKRUTENKO, M.S.
A DISSERTATION
IN
BIOLOGY
Submitted to the Graduate Faculty of Texas Tech University in
Partial FulfiUment of the Requirements for
the Degree of
DOCTOR OF PHILOSOPHY
Approved
Accepted
August, 199é
ACKNOWLEDGMENTS
This dissertation v\/as conceived and completed because I was given a
once in a life time opportunity. I was at the right place at the hght time, and it
certainly was one of the greatest things that have ever happened to me. I was
immersed in an environment that allowed me to grow and become confident in
who I will be. The person who is entirely responsible for granting these endless
opportunities to me is Dr. Robert Baker who brought me to Texas Tech. His
scientific guidance has had a great impact on how I view science and scientific
conduct now. His everyday support changed, in many ways, my understanding
of how to function in the society and how to interact with other people. He is
and will be my teacher.
Members of my doctoral committee—Dr. Randy Allen, Dr. Robert
Bradley, Dr. Ronald Chesser and Dr. Marilyn Houck—have always been
supportive and put a significant amount of time and effort in making this
dissertation happen both linguistically and scientifically. 1 am grateful to Dr.
John Patton for his constant flow of ideas and support throughout all this time.
Dr. David Hillis helped me with analyses of data and writing of my first
manuscript. i thank Drs. Jim Bull and Holly Wichman for reviewing parts of this
dissertation and for theirs suggestions. For favors great and small I want to
thank members of Dr. Baker's laboratory and the Natural Sciences Research
Laboratory. graduate students of the Department of Biology, my friends.
I thank my wife, Kateryna Makova, who supported and survived my work
with all its victories and failures. She also happened to be my co-worker who
designed all primers presented in this dissertation. She is a great and
incredibly strong person who invested so much in my success. I am grateful to
my parents Yuri Nekrutenko and Ihna Deriugina for their strong belief in my
abilities and constant support. My father, also a biologist, set for me an
example of a scientist whose only reason to be a scientist is a pure curiosity
and nothing else. My mother, on the other hand, has been always there for me
to remind that the real world does exist and requires my attention. I also want
to thank my grandparents and especially my grandmother Alexandra Deriugina
who's endless optimism and believe in the good sides of life always recharges
me in bad times. My older sister, Olga Razgonova, helped and supported me in
all aspects of every day life.
Finally, but not at last, I want to thank the person who taught me practical
molecular biology—^AIex Palamarchuk. He is an organic chemist who told me
that it is impossible to understand complex processes without knowing the
basics.
This study was supported by contract DE-FC09-96SR18546 between the
U.S. Department of Energy and the University of Georgia and by funds from
Texas Tech University.
III
TABLE OF CONTENTS
ACKNOWLEDGMENTS
LIST OF TABLES vi
LIST OF FIGURES vii
LIST OF ABBREVIATiONS AND ACRONYMS viii
CHAPTER
1.
II.
III.
INTRODUCTION Literature cited
REPRESENTATIONAL DIFFDERENCE ANALYSIS TO DISTINGUISH CRYPTIC SPECIES Abstract Introduction Materials and Methods
Specimen Collection, Identification and Isolation of Genomic DNA RDA and Analvsis of Difference Products
Results and Discussion Expehment A Experiment B
Literature Cited
ISOLATION OF BINARY SPECIES-SPECIFIC PCR-BASED MARKERS AND THEIR VALUE FOR DIAGNOSTIC APPLiCATIONS Abstract Introduction Materials and Methods Results and Discussion Literature Cited
1 8
10 10 10 13
13 14 15 15 16 18
23 23 23 27 28 32
iV REPRESENTATiVE DiFFERENCE ANALYSiS IN THE STUDY OF ALLOPOLYPLOIDS: SUBGENOME-SPECiFIC MARKERS AND FURTHER EVIDENCE OF BIASED CONCERTED EVOLUTiON IN ALLOTETRAPLOID COTTON Gossypium hirsutum 36 Abstract 36
IV
Introduction 37 Materials and Methods 40 Results 41
isolation of DNA Fragments Specific to A and Genomes 41 Analysis of the Difference Products 43 Sequencing Analysis of Individual Difference Products _ 4 3
Discussion 46 Literature Cited 50
V. SUMMARY 56 Features of Interspecific RDA 57 Major Conthbutions from the Dissertation 59
LiST OF TABLES
2.1 Diagnostic phmers designed from RDA products 20
3.1 Diagnostic phmers designed from RDA products 34
4.1 Amplification phmers for A and D genome-specific difference products 52
4.2 Amino acid sequences of the reverse transchptase conserved motif from representatives of plant, yeast and animal retrotransposons 53
VI
LiST OF FiGURES
2.1 Schematic representation of the RDA procedure 21
2.2 Results of RDA expehments and amplification with resulting pri mers 22
3.1 Results of two reciprocal RDA experiments and test of
pnmers on four species of voles 35
4.1 Cotton genome-specific RDA difference products 54
4.2 PARF D1 sequences are homogenized to a D-type in the allotetraploid cotton G. hirsutum 55
VII
LiST OF ABBREVIATIONS AND ACRONYMS
bp base pairs
DNA deoxyribonudeic acid
kb kilobases/kilobase pairs
dNTP deoxynudeotide triphosphate
PARF polymorphic amplifiable
restriction fragment
PCR polymerase chain reaction
RDA representational difference analysis
VIII
CHAPTER1
INTRODUCTiON
Genomic DNA is an enormous databank containing information about
function, development, reproduction, and evolution of every living organism.
This overwhelming "information space" can be used in evolutionary and
systematic research to learn which properties of the genome define various
taxonomic groups and how the genome itself changes during the process of
evolution. One of the problems associated with the exploration of the genome
is its size. For example, the human genome and genomes of other mammalian
species are highly complex and composed of billions of base pairs. They
contain sequences that are highly similar between all major branches of life
such as ribosomal genes (Woese and Fox 1977; Lake 1988), protein synthesis
machinery genes (Rivera and Lake 1992; Brown and Doollttle 1995) or certain
housekeeping genes (Nekrutenko et al. 1998). On the other hand other
sequences can be unique to a particular taxonomic group (Baker et al. 1997),
one of two closely related species (Nekrutenko et al. 1999), or even vary
between indivlduals of the same population.
in order to develop reliable diagnostic markers for individuals,
populations, subspecies and closely related species and to record evolutionary
processes in situ, it is necessary to gain access to rapidly evolving regions of
the genome. The rapidly evolving regions will be able to provide a desired
resolution. it is especially valuable in the instances of closely related species
recently separated by a speciation event. While such taxa have highly similar
genomes (possibly up to 99.9% in overall sequence similarity) they
nevertheless contain genetic fingerprints that are the benchmark to establish
the two as separate species. Given these observations, the questions this
dissertation primarily addresses are how to effectively isolate and analyze
differences between closely related genomes and what types of sequences
these differences represent?
Hlstorically, the most effective way for the isolation of differences
between two similar pools of DNA sequences was subtractive hybridization. In
a subtractive hybridization experiment two compared pools of DNA are cut into
fragments, denatured either by heat or sodium hydroxide solution and mixed
together to allow reannealing. Typically a small amount of one of these two
DNA samples (driver) is mixed with an excess of the other sample (tester) so
that the tester fragment predominantly reanneal to the driver fragments. if the
tester fragments are labeled with a hapten then the tester/tester homoduplicis
and tester/driver heteroduplicis can be isolated using affinity chromatography.
This approach was successfully used for isolation of gross differences such as
Y chromosome-specific sequences (Lamar and Palmer 1984), or large
deletions on the X chromosome related to Duchenne muscular dystrophy and
choroidermia disorders (Kunkel et al. 1985; Nussbaum et al. 1987). However in
all of these examples, DNA differences were large enough to be detected when
degree of enrichment is only 10^-10^ times - typical for subtractive
hybridization. Sequence complexity of mammalian genomes is so high that it
does not allow finer differences, such as small deletions/additions and
nudeotide polymorphisms, to be detected with this method due to inadequate
enrichment.
Lisitsyn et al. (1993) proposed an approach, representational difference
analysis (RDA), for isolation of differences between closely related mammalian
genomes that combines methodology of subtractive hybridization with kinetic
enrichment. Kinetic enrichment is another tool for DNA difference enrichment
that is based on the second-order kinetics of self-reassociation (Wieland et al.
1990). Importantly, to overcome the problem created by the genome
complexity the first stage of RDA, representation, reduces initial complexity of
genomic DNA to 2%-15%. This is achieved by digesting tester and driver
genomes with a restriction enzyme (typically with 6 bp recognition site), ligation
of an oligonudeotide adapter followed by PCR amplification with the same
adapter used as a primer. Amplification conditions are adjusted in a way that
only fragments with an average size of 0.6 kb can be effectively amplified. This
new, synthetic pool of amplification products or amplicons, is prepared
separately for tester and driver and represent, as was mentioned above, only
2%-15% of the initial genomic DNA. Subsequently, oligonudeotide adapters
are cut off from tester and driver amplicons and a new set (with a different
sequence) ligated to the tester amplicons only. Small amount of tester
amplicons (having new oligonudeotides at the ends) is then ligated to a large
excess of tester amplicons, denatured and allowed to reassociate. Because the
driver amplicon fragments constitute the majority in this mix, tester sequences
that are aiso found in the driver amplicons preferentially form heteroduplicis in
which the oligonudeotide is attached only from the tester side. On the other
hand, sequences that are unique to the tester do not have twins in the driver
pool to reassociate with and form homoduplices with the oligonudeotide
attached to both strands. After reassociation is completed, the mix is used as a
template in a PCR amplification with the oligonudeotide as a primer. Only
tester homoduplicis that have adapters attached to both stransd can be
exponentially amplified and therefore greatly enriched relative to other
sequences. This hybridization/amplification step, difference enrichment, is
repeated 3 to 4 times resulting in final enrichment over a 10^-fold. This degree
of enrichment makes it possible to isolate polymorphisms even between
individuals of the same population (reviewed in Lisitsyn 1995).
RDA allows isolation of differences that can be divided into two classes:
binary (absence/presence) differences and restriction site polymorphisms.
Absence/difference type relates to sequences that are present in one genome
(tester) and absent in the other (driver). Restriction site polymorphisms, also
called polymorphic amplifiable restriction endonudease fragments (PARF's;
Lisitsyn et al. 1993) represent differences in the position of restriction sites in
the tester and driver genomes. In tester genome, restriction sites for a
particular PARF's are close enough together so the sequence between them
can be amplified during the representation step, whereas in the driver these
sites are too far apart prevenfmg the sequence between them from being
amplified.
Originally, RDA was developed for isolation of genetic lesions in cancer
where losses and amplifications of genome regions can be detected by
comparing genomic DNA from cancerous and normal cells of the same
individual (Lisitsyn et al. 1995). Since that time, the technique has been
successfully used in various applications induding: the isolation of Y
chromosome-specific sequences (Donnison et al. 1996), identification of
differentially expressed transcripts (Hubank and Schatz 1994), isolation of
probes detecting DNA loss and amplification in tumors (Lisitsyn et al. 1995) and
others (reviewed in Baldocchi and Flaherty 1997).
The Chernobyl Research Team directed by Drs. Ronald Chesser and
Robert Baker—a result of collaboration between the Texas Tech University and
the Savannah River Ecology Laboratory—has been conducting several studies
of the effects of the Chernobyl power plant meltdown accident on genetics of
animals (reviewed in Baker et al. 1996). One of the problems the team has
faced is the reliable identification of some rodent species that inhabit
radioactively contaminated areas that could serve as a model system for the
study of the ecological consequences of the accident. For example, four
species of voles (genus Microtus) that were collected at the accident site are
very difficult to distinguish at juvenile stages. Moreover, two of these species
(M. an/alis and M. rossiaemeridionalis) represent an impressive example of
cryptic species that are truly sympatric over extensive portions of their ranges
and cannot be identified based on morphological characters. Although these
taxa can be distinguished by karyotyping or DNA sequencing this approach is
too labor and cost intensive to be used for identifying large numbers of
individuals. Thus, an initial objective of my dissertation was do employ RDA for
the first tlme to isolate genetic markers capable of unambiguous identification of
closely related species (interspedfic RDA) and to develop PCR-based
diagnostic assay for these markers. Results of my efforts on this project are
provided in Chapters 11 and lii.
in the process of working on developing of species-specific RDA-derived
markers, 1 discovered that these markers can be used not only in diagnostic
assays but also in studies to describe pattern of genome evolution. Frequently,
interspecific RDA yields families of repetitive sequences that are present in one
of the compared genomes and absent in the other. in some cases, however a
repetitive DNA family is isolated not due to its total absence from the driver
genome but because the latter is either differently flanked by restriction sites
(PARFs, see Chapter IV), or contains a significantly lower number of copies
and/or sequence homology is not 100%. Such families that are present in both
the tester and driver genomes but differ in copy number per genome or overall
sequence similarity provide an exciting opportunity to document and study
cases of concerted evolution and possibly to understand changes that
accompany speciation events. For example, these repetitive elements can be
used to describe interactions between subgenomes in allopolyploid organisms
such as highland cotton Gossypium hirsutum. Additionally, diagnostic markers
developed between diploid species can be used to establish the origin of
allopolyploids because these markers can be used to identify specifically which
original diploid genomes were united in a given allopolyploid taxon. The second
objective of my dissertation was to develop subgenome-specific markers for A
and D genomes in G. hirsutum and to study changes that occur to them in the
allotetrapliod plant relative to that observed in diploid progenitors.
Chapter 111 of this dissertation describes the development of markers
discriminating two cryptic species of the genus Microtus (M. arvalis and M.
rossiaemeridionalis). In Chapter lli, 1 outline the ideology for developing of
diagnostic markers for more than two species isolated by unequal genetic
distances and discuss some of the major features of the interspecific-RDA.
Chapter iV demonstrates how RDA-derived markers can be used for analysis of
allopolyploids and inferences about concerted evolution of selected repeat
families using diploid and polyploid cotton species as an example. Chapter V
summarizes my finding and highlights major contribution of this dissertation. in
this last part of my dissertation, 1 outline results and explain their significance as
well as discuss future direction.
Literature Cited
Baker RJ, Hamilton MJ, Van Den Bussche RA et al. (1996). Small mammals from the most radioactive sites near the Chernobyl nuclear power plant. Journal of Mammalogy 77:155-170.
Baker RJ, Longmire JL, Maltbie M, Hamilton MJ, and Van Den Bussche RA (1997). DNA synapomorphies for a variety of taxonomic levels from a cosmid library from the new worid bat Macrotus waterhousii. Systematic B/o/ogy 46:579-589.
Baldocci RA, and Flaherty L (1997). Isolation of genomic fragments from polymorphic regions by representational difference analysis. Methods 13:337-346.
Brown JR, and Doolittle WF (1995). Root of the universal tree of life based on ancient aminoacyl-tRNA synthetase gene duplications. Proceedings of the National Academy of Sciences ofthe USA 92:2441-2445.
Donnison IS, Siroky J, Vyskot B, Saedler H, and Grant SR (1996). isolation of Y chromosome-specific sequences from Silene latifolia and mapping of male sex-determining genes using representational difference analysis. GeA?ef/csf44:1839-1901.
Hubank M, and Schatz DG (1994). Identifying differences in mRNA expression by representational difference analysis of cDNA. Nucleic Acids Research 22:5640-5648.
Kunkel LM, Monaco AP, Middlesworth W, et al. (1985). Specific cloning of DNA fragments absent from the DNA of a male patient with an X chromosome deletion. Proceedings of the National Academy of Sciences of the USA 82:4778-4782.
Lake JA (1988). Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences. Nature 33t: 184-186.
Lamar EE, and Palmer E (1984). Y-encoded, species-specific DNA in mice: evidence that the Y chromosome exists in two polymorphic forms in inbred strains. Ce//37:171-177.
Lisitsyn NA (1995). Representational difference analysis: finding the differences between genomes. Trends in Genetics 11:303-307.
8
Lisitsyn NA, Lisitsyn N, and Wigler M (1993). Cloning the differences between two complex genomes. Science 259:946-951.
Lisitsyn NA, Lisitsyna NM, Dalbagni G, et al. (1995). Comparative genomic analysis of tumors: detection of DNA losses and amplifications. Proceedings of the National Academy of Sciences of the USA 92:151 -155.
Lisitsyn NA, and Wigler M (1995). Representational difference analysis in detection of genetic lesions in cancer. Methods in Enzymology 254:291-304.
Nekrutenko A, Hillis DM, Patton JC, etal. (1998). Cytosolic isocitrate dehydrogenase in humans, mice and voles and phylogenetic analysis of the enzyme family. Molecular Biology and Evolution 15: 1674-1684.
Nekrutenko A, Makova KD. Chesser RK, and Baker RJ (1999). Representational difference analysis to distinguish cryptic species. Molecular Ecology in press.
Nussbaum RL, Lesko JG, Lewis RA, et al. (1987). Isolation of anonymous DNA sequences from within a submicroscopic X chromosomai deletion in a patient with choroideremia, deafness, and mental retardation. Proceedings of the National Academy of Sciences of the USA 84:6521 -6525.
Rivera MC, and Lake JA (1992). Evidence that eukaryotes and eocyte prokaryotes are immedlate relatives. Science 257:74-76.
Wieland 1, Bolger G, Asouline G, and Wigler M (1990). A method for difference cloning: gene amplification foilowing subtractive hybridization. Proceedings of the National Academy of Sciences of the USA 87:2720-2724.
Woese CR, and Fox GE (1977). Phyiogenetic structure of the prokaryotic domain: The primary kingdoms. Proceedings ofthe National Academy of Sciences ofthe USA 74:5088-5090.
CHAPTER II
REPRESENTATIONAL DIFFERENCE ANALYSIS
TO DISTINGUISH CRYPTiC SPECIES
Abstract
in the study of biodiversity, it is important to have a reliable system for
identification of various genetically distinct units (species, subspecies, etc).
One of the most efficient tools available today is the polymerase chain reaction
(PCR) with diagnostic primers, that yield a detectable product for one taxon but
not for other taxa. Critical to this method is the identification of diagnostic DNA
fragments from which primers can be designed. Representational difference
analysis (RDA) can reliably isolate DNA fragments that are unique to a specific
taxon. In this report, we demonstrate the utility of the technique by development
of binary markers that distinguish between two cryptic species of voles (genus
Microtus).
Introduction
Although it is possible to distinguish cryptic species using common
methods such as sequencing or karyotyping, a fast yes/no-type of assay Is
highly desirable. Such an assay is especially valuable when large-scale studies
are conducted in which significant numbers of individuals need to be
unambiguously identified. The real challenge however is isolating markers that
10
can be used in this assay because dosely related species might have highly
similar genomes (possibly up to 99.9% in overall sequence identity).
Historically, subtractive hybridization was used to isolate differences between
highly similar pools of DNA. For example, it was used to identify small deletions
in the bacteriophage T4 genome (Bautz and Reilly 1966). Numerous
modifications of this procedure have been developed over the years, however
all of them suffer from insufficient enrichment of desired sequences when
applied to highly complex genomes such as the mammalian. Lisitsyn et al.
(1993) described a technique, representational difference analysis (RDA), which
is specifically directed toward the isolation of differences between two complex
DNA samples. RDA employs a subtractive hybridization approach, but greatly
facilitates the purification of unique fragments by kinetic enrichment. This
procedure compares two genomes and subtracts sequences that are similar
between them while, on the other hand, it amplifies fragments that are unique to
one of the genomes (íesfer genome, Fig 2.1). A reciprocal study also can be
performed, in which the other genome is used as the tester (Fig 2.1). As a
result, PCR primers can be designed that yield a diagnostic amplification
product from one genomic DNA but do not produce any amplification from the
other genomic DNA. This approach does not require any additional analyses
such as pattern recognition (RFLP and RAPD), repeat number scoring (mini-
and microsatellites), or sequence analysis and alignment (sequence data).
11
In RDA, the initial sequence complexity of genomic DNA is reduced by
PCR that can amplify only relatively short fragments, often called amplicons
(average size «0.6 kb), representing only a subsample (2-15%) of the original
genome (see Fig. 2.1). Prior to the amplification, genomic DNA is digested with
a restriction enzyme (generating fragments with average size 2-5 kb) and
oligonudeotide adapters are ligated to the ends of the generated restriction
fragments. These same adapters are used as the amplification primer in the
PCR. During a single RDA procedure two genomes can be compared: one
designated tester, the other designated driver. Sequences of the tester ihaX are
unique to the tester and are not present in the driver (these sequences are
commonly referred as targets) represent the differences between the two
compared genomes. As a result of the entire RDA procedure, involving three to
four rounds of enrichment, these targets are purified more than 10^-fold and can
be easily cloned and analyzed. (Modified from Baldocci and Flaherty 1997).
Additional description of the RDA procedure can be found in reviews by Lisitsyn
(1995) and Baldocci and Flaherty (1997).
RDA has been successfully utilized in a variety of experiments, such as
isolation of probes that detect DNA loss and amplification in tumors (Lisitsyn et
al. 1995), isolation of genes responsible for pathogenicity expression in baderia
(genus Neisseria; Tinsley and Nassif 1996), identification of Y chromosome-
specific sequences (Donnison et al. 1996), as well as in others (see Baldocci
and Flaherty 1997). However, its utility in biodiversity studies has never been
12
explored. To empirically test the utility of RDA for taxonomy we employed it for
the development of genetic markers capable of distinguishing two closely
related species of voles (Microtus arvalis and M. rossiaemeridionalis). These
two taxa are indistinguishable under field conditions and are truly sympatric over
expansive portions of their ranges (Zagorodnyuk 1991), but can be reliably
identified based on karyotypes (M. arvalis 2n = 46, FN = 58-90; M.
rossiaemeridionalis 2n = 54, FN = 54). Knowing the karyotypes of animals, it is
easy to test the efficacy of developed markers. As a result of our experiments
we obtained two primer pairs: one specific for M. arvalis, another for M.
rossiaemeridionalis (Table 2.1 and Fig 2.2).
Materials and Methods
Specimen Collection, identification and isolation of Genomic DNA. All
specimens, induding ones used in the original RDA experiments and in marker
testing, were collected in northern Ukraine from several localities around
Chernobyl and karyotyped as described in Baker et al. 1996. High molecular
weight genomic DNA was isolated from frozen liver samples.
RDA and Analvsis of Difference Products. RDA was carried out as
described by Lisitsyn and Wigler (1995) with slight modifications concerning
difference products purification between rounds of enrichment. We used Bgl\\
restriction enzyme and oligonudeotide adapters RBgl12 and RBgl24 (Lisitsyn et
al. 1993) to prepare amplicons from M. arvalis and M. rossiaemeridionalis
13
genomic DNAs. Two reciprocal experiments were then performed using (1) M.
an/alis amplicons as the tester (M. rossiaemeridionalis as the driver) and (2) M.
rossiaemeridionalis as the tester (M. arvalis as the driver). Difference products
obtained after three rounds of RDA were analyzed on agarose gel and the most
prominent bands were excised and cloned into pGEM-T vector (Promega).
Fifteen clones per plate per cloned band were amplified using vector-situated
primers. Amplification products were then digested with the restriction
endonudease Sau3fK\ (Promega) and run on an agarose gel. Products
displaying identical restriction fragment pattems were considered a "family" of
sequences representing the same RDA product. To identify diagnostic RDA
products, a member of each "family" was then labeled with [a^^PJdCTP (NEN
laboratories) using Random Primed DNA labeling kit (Boehringer Mammheim).
Radioactively labeled RDA products were hybridized to blots prepared using
genomic DNAs of M. an/alis and M. rossiaemeridionalis digested with 8gf/li
endonuciease (Promega). Products that were identified as diagnostic based on
hybridization (either presence of hybridization in case of tester DNA and
absence in case of driver or different pattems of hybridization between tester
and driver; Fig 2.2) were sequenced using the dRhodamine Terminator Ready
kit (Perkin Elmer) and an ABi 310 autosequencer. Primers to individual
difference products were designed using Oligo 4.05 program and tested on
genomic DNA of previously karyotyped animals.
14
Results and Discussion
Our goal was to employ RDA for isolation of DNA fragments unique to
each of two cryptic species of voles (Microtus arvalis and M.
rossiaemeridionalis). it would allow us to design PCR primers specific to each
species that generate a diagnostic amplification product unique to each of them.
To do so we performed two reciprocal RDA experiments using DNA from M.
arvalis as the tester and DNA from M. rossiaemeridionalis as the driver in
experiment A (Fig 2.2A) and reciprocal experiment B (Fig 2.2B).
Experiment A. The RDA using DNA from M. arvalis as the tester yielded
a single most abundant product. it was cloned (clone Mar^A) and used as a
probe in a southern hybridization experiment with digested genomic DNA from
M. arvalis and M. rossiaemeridionalis. The hybridization pattern (Fig 2.2A) was
different for tester and driver DNA suggesting that Mar^A is, indeed, a
diagnostic difference product. Nudeotide sequence of Mar^A was determined
(accession number AF093582) and compared to the ENTREZ database
(http://www.ncbi.nlm.nih.gov/Entrez). No matching entries were found. Primers
complementary to the sequence of Mar^A were then designed and tested on
genomic DNA isolated from individuals of M. arvalis (N = 10) and M.
rossiaemeridionalis (N = 10) that were identified by karyotyping. As shown in
the Fig 2.2A, these primers (/War14F and /War14R; Table 2.1) produced
amplification products only on genomic DNA from M. arvalis; therefore, within
this experimental design they are diagnostic for this spedes.
15
Experiment B. Three difference products were obtained in this
experiment. All three displayed different patterns of hybridization to the tester
and driver DNAs. One of these products, designated /Wro16, was analyzed in
detail. Southern hybridization of labeled /Wro16 to the tester and driver DNA is
shown on Fig 2.2B. Based on the hybridization pattern it is easy to condude
that /Wro16 represents a repetitive element unique to M. rossiaemeridionalis.
/Wro16 was sequenced and its sequence (accession number AF093583) was
compared to the ENTREZ database. This analysis revealed a highly similar
sequence representing a M. rossiaemeridionalis Bl-like element (accession
number U36930; Mayorov et al. 1996). Primers complementary to the sequence
of /Wro16 (/Wro16F and /Wro16R; Table 2.1) were designed and tested as
described above. Subsequent amplification generated products only from the
genomic DNA of M. rossiaemeridionalis and within our sample primers /Wro16F
and /Wro16R are diagnostic for /W. rossiaemeridionalis.
Our results demonstrate that RDA is a powerful and reliable method for
isolation of genetic markers suitable for biodiversity studies. By analyzing a
single RDA product in experiment A and three products in experiment B (Fig
2.2), we were able to design primers diagnostic for /W. arvalis and M.
rossiaemeridionalis. Notably. a difference product unique to /W.
rossiaemeridionalis (Fig 2.2B) is a repetitive Bl-like element (belonging to the
class of short interspersed elements, SiNEs). RDA preferentially isolates
repetitive sequences, if they are different between two compared DNAs (Navin
16
et al. 1996), as was the case in library screening (Baker et al. 1997). This
feature of the method is important for this particular study because
presence/absence of families of repetitive elements is a robust phylogenetic
character that is free of homoplasy (Verneau et al. 1998). Another advantage of
repetitive elements as diagnostic characters is that when they are used in a
PCR-based assay there is only a minute probability of having a null allele
because multiple loci are amplified simultaneously. In condusion, recent
technical inventions allow the analysis of complete genomes of organisms for
evolutionary and biodiversity studies rather than selected loci that represent only
a negligible part of total genomic DNA. Here we demonstrated that RDA, which
compares entire genomes, may be the preferred method for isolation of genetic
markers between closely related species.
17
Literature Cited
Baker RJ, Hamilton MJ, Van Den Bussche RA, et al. (1996). Small mammals from the most radioactive sites near the Chemobyl nuclear power plant. Journal of Mammalogy 77:155-170.
Baker RJ, Longmire JL, Maltbie M, Hamilton MJ, and Van Den Bussche RA (1997). DNA synapomorphies for a variety of taxonomic levels from a cosmid library from the new worid bat Macrotus waterhousii. Systematic B/o/ogy 46:579-589.
Baldocci RA, and Flaherty L (1997). isolation of genomic fragments from polymorphic regions by representational difference analysis. Methods 13:337-346.
Bautz EKF, and Reilly E (1966). Gene-specific messenger RNA: isolation by the deletion method. Science 151:328-330.
Donnison IS, Siroky J, Vyskot B, Saedler H, and Grant SR (1996). Isolation of Y chromosome-specific sequences from Silene latifolia and mapping of male sex-determining genes using representational difference analysis. Genef/cs 744:1839-1901.
Lisitsyn NA, Lisitsyn N, and Wigler M (1993). Cloning the differences between two complex genomes. Science 259:946-951.
Lisitsyn NA (1995). Representational difference analysis: finding the differences between genomes. Trends in Genetics f f :303-307.
Lisitsyn NA, Lisitsyna NM, Dalbagni G, et al. (1995). Comparative genomic analysis of tumors: detection of DNA losses and amplifications. Proceedings of the National Academy of Sciences of the USA 92:151 -155.
Lisitsyn NA, and Wigler M (1995). Representational difference analysis in detection of genetic lesions in cancer. Methods in Enzymology 254:291-304.
Mayorov Vi, Adkinson LR, Vorobyeva NV, et al. (1996). Organization and chromosomal localization of a Bl-like containing repeat of Microtus suban/alis. Mammalian Genome 7:593-597.
18
Navin A, Prekeris R, Lisitsyn NA, et al. (1996). Mouse Y-specific repeats isolated by whole chromosome representational difference analysis. Genomics 36:349-353.
Straus D, and Ausubel FM (1990). Genomic subtraction for cloning DNA corresponding to deletion mutations. Proceedings ofthe National Academy of Sciences of the USA 87:1889-1893
Tinsley CR, and Nassif X (1996). Analysis of the genetic differences between Neisseria meningitidis and Neisseria gonorrhoeae: two closely related bacteria expressing two different pathogenicities. Proceedings ofthe National Academy of Sciences of the USA 93:11109-11114.
Verneau O. Catzeflis F, and Furano AV (1998). Determining and dating recent rodent speciation events using L1 (LiNE-1) retrotransposons. Proceedings of the National Academy of Sciences of the USA 95:11284-11289.
Zagorodnyuk IV (1991) Systematic position of Microtus brevirostris (Rodentiformes): materials toward the taxonomy and diagnostics of the "arvalis" group. Vestnik Zoologii 3:26-35.
19
u> Csl <o d O)
mer
"t_ Q.
SZ o (0 (D
*#— o
^ o. i n
c o
' —'
pos
E o o c g
^ i w
Rea
c
(0 . • - '
o 13 T3 O ^ Q.
< Û Q:
from
"O (D C
0 TJ to <D E
' i _ Q. O
sti
o c (0 b
c\i (D
JD . o j :
1 -
0 ^ - í
*•—
o c g
_5 ^
o o T ~ ^ ' l l l ^
— M—
o " >,-
^
"(D o Q.
' •
< "Z. Q o
• — £ o c (D O)
H—
o D) c i n
1 • ^
<D" co (0 1—
<D
E _>,
o Q. C3-
K t i _
O i2 "c 3
i n <N ^
(0 CL 1— ^ 2 :2 T5 O ^ o o _ <p <0 S.E — O) S ' C c o
_Q) i ^ O u . Q. Q: o Q .
"rô E Q. O
D)
CO
E "Q. O
o O
Û_ ^
C 3 O E CD
O) c <D
<D O C <D 13 D-<D
0
E
(0 o (D
O o <J)
(0 o co o o o co
(0 l o ' ^
o o co h-
U) 0 o >* o lO co
CsJ
CO
co
O h-< O
O
<
o < o <
o
LL
CtJ
c o
O O < O I-o < o < o < o < I -o o o o o I -o o
CD
O O I-
<
o t o o < o h-I-< o < o o o t o CD
5
20
digest lígate new adapters
l ^ A reciprocai ^ j \r experíment can be done ^í f
genomic 1 DNA ( tester
1
V
digest ligate adapters PCR amplify
tester amplicons
1
N
hybridize tester with excess of
driver
genomic DNA driver
1
J/ V
c o ^ o c s 'S s t 3 O- 5-•n fl> ? ^ ^ ^ • ^ ^ M
0) (0 o
driver ciigest amjDlicons
1
^
V
f PCR amplify diaest/ligate adaoters
"^ <« 1« ^ O f^ â> r- ^
E S S
rich
iff
er
-4r<
C "0 fo 0)
taraets I cloning and
analysis
Figure 2.1 Schematic representation of the RDA procedure
21
o o CD
m
oeissQj i^ i
h- (D •<1- 00 T- O) -"t O) in co (T) <N
sijeAje IV :?
aefssQj y\i : -:t'-'-ÛL
> __
T3 g OQ ^ T3 O) (0 OJ C
O (0
.í2 - >. g 0) «0
" 1 _ ^ (0 (D
S ^ o O C T3
<D ^ •
Q- C co -
I (0
i2
O)oû
o to ^
• D
(0 3
o^r o o
Q .
O O CO
0 • o • o
Q.
O O
> .
V- - C (/) <D 3 TO E O ^ •c CO ^ Q.M_ (O
.Q 3 ^ O <0 I
-2 ^ CÛ T3 "c ^ -
0) CD
€ 0 Q.
Q.
I (0 <
Ui <
o>-ã
Q . ^ Û Ô < Í T
o á g
^ f E 3 g o co t >=
Q: .5 < CNJ CO Q
cvi S -o
0:9
<0 " ^ JCO
2 •= T3 CDi2 >
^ §.? :á 8 C CD c <0
S 2S <b T3 . 0 B Oig ro T3 J)
co ^ cb p .o*.S5
^ © co ^ T3 p T3 Í2 CD c
: ^ l O ro o-jw
»*- Q T3
8 "
II CN
<D T3 T3
• ^ C -Q Q
it CD O
. |s S^ © <D
• S CO
<o B-c: <D
CD <
< G o z î^ ^ Û Q. ^
o E o
T3 O Q
i2 <o E W i i o ® 3 c
=520
o II s <b § co (0 <0 O)
I
iO <D
18 <D T3 "^ C 0) — .C T3 "^ C CO CD CD .Q
22
CHAPTER III
ISOLATION OF BINARY SPECIES-SPECIFIC
PCR-BASED MARKERS AND THEIR VALUE FOR
DIAGNOSTIC APPLICATIONS
Abstract
Representational difference analysis (RDA), a technique directed toward
isolation of differences between highly similar complex genomes, was employed
for isolation of species-specific markers. These markers can be easily adapted
for a high throughput, PCR-based assay in which multiple specimens can be
simultaneously identified based on the presence/absence of amplification
products. One of the important features of RDA performed on genomes of
different species, interspecific RDA, is its ability to preferentially isolate families
of repetitive sequences that are unique to one of the compared genomes and
not present in the other. Such families of repetitive DNA are homoplasy-free
characters that can be used for cost efficient, mass identification of specimens
in a variety of situations ranging from mark-recapture studies to screenings of
egg or larval stages.
Introduction
At least from a theoretical standpoint specific differences at the DNA
level can be easily used to identify taxa. In reality however, it is difficult and
23
expensive to find taxon-specific DNA fragments in genomes composed of
billions of base pairs, especially when taxa under investigation are closely
related. Methods currently employed to identify taxon-specific markers, such as
sequencing, restriction fragment length polymorphisms, randomly amplified
polymorphic DNA, mini and microsatellites are rather costly and labor-intensive,
especially in cases where multiple samples need to be rapidly and reliably
(yes/no) identified. These methods are hit or miss because they are not
designed to enhance the probability of isolation of a desired marker. Two
altemative methods have been described to increase the probability of
identification of taxon-specific markers. First, library screening (Baker et al.
1997) has been employed to identify markers specific to a variety of taxonomic
levels. However, these authors did not develop primers to test for efficacy of the
markers in a PCR-based assay to yield a presence/absence test. Library
screening requires significant amounts of DNA and is a labor intensive
procedure. Second, subtractive hybridization has also been utilized to identify
unique fragments between closely related genomes (Straus and Ausubel 1990).
This method, on the other hand, does not provide adequate subtractive
efficiency when complex genomes (for example, mammalian genomes) are
compared (Lisitsyn 1995).
Lisitsyn and co-workers (1993) developed an approach designed
specifically for isolation of differences between complex genomes ~
representational difference analysis (RDA). Although ideology of RDA might
24
resemble the subtractive hybridization approach, it includes a step that allows
RDA to be efficient on complex genomes, i.e., representation. During the
representation the initial sequence complexity of genomes subjected to RDA is
reduced to 2%-15% (Lisitsyn 1995) by digesting genomic DNA with a restriction
enzyme, ligation of a oligonudeotide adapter to the ends of restnction fragments
followed by PCR optimized to effectively amplify fragment with an average
length of only 0.6 kb (products of this amplification are called amplicons and
prepared separately for both genomes to be compared). To sample the entire
"sequence space" of a genome several representations can be done with
different restriction enzymes. Representation is followed by a subtraction step
when a denatured representation of one genome (tester) is hybridized with the
excess of a similar denatured representation of the other genome (driver). Prior
to this hybridization another set oligonucleotide adapter is ligated to the tester
DNA only. During the hybridization, reassociation of DNA strands can follow
one of three possible ways: formation of driver homoduplicis, tester
homoduplicis, or driver/tester heteroduplicis. Because great excess of driver is
used sequences similar between tester and driver most likely reassociate
forming driver/tester heteroduplicis than forming tester homoduplicis. Therefore,
only sequences unique to tester form tester homoduplicis and because these
have oligonucleotide adapters at both ends tester homoduplicis can be
exponentially amplified (enriched) after hybridization is completed. Enriched
sequences undergo additional hybridization to ensure purity from common
25
sequences (typically three to four cycles are performed) with increase of driver-
to-tester ratio at each cycle and the final result of the entire procedure is
isolation of tester-specific sequences (Lisitsyn et al. 1995).
Use of RDA-derived markers for diagnostic purposes is a new application
as the procedure was originally designed to detect and clone genetic lesions in
cancer (Lisitsyn and Wigler 1995). Only recently, Ushijima et al. (1998) have
employed RDA to develop a series of Bl-repetitive element (short interspersed
element, SINE) based markers that allow high throughput genotyping of inbred
rat strains. This approach however is not PCR-based and suitable only for
intraspecific identification (identification of different strain within the same
species). Additionally, it cannot be applied for species when there is very little
or no information available on genome organization.
Our goal was to develop a reliable approach for identification of four
closely related species of voles (Microtus arvalis, M. rossiaemeridionalis, M.
oeconomus, M. agrestis) that inhabit overlapping areas in the northern Ukraine
(Baker et al. 1996) and are utilized in the study of environmental consequences
of the Chernobyl power plant meltdown. Previously our group (Nekrutenko et al.
1999) applied RDA to isolate genetic markers capable of distinguishing two of
these taxa representing one of the most striking examples of cryptic species
(Microtus arvalis and M. rossiaemeridionalis). As a result we have designed two
primer pairs (M. an/alis-spec\f\c, and M. rossiaemeridionalis-spec\f\c) that permit
rapid identification of these otherwise indistinguishable taxa. This success
26
prompted us to develop genetic markers that allow identification of individuals of
two other vole species that occur in our collecting area: M. agrestis and M.
oeconomus, separated by greater genetic distance. Adult animals of these two
species can be distinguished based on morphological characters whereas
identification of juvenile individuals is difficult if not impossible. Therefore, it is
highly desirable to be able to simultaneously identify large numbers of
specimens in a fast and reliable manner without sacrificing of individuals. In this
report, we summarize features of interspecific RDA and discuss some of its
potential scientific and economic applications.
Materials and Methods
Animals were collected in northern Ukraine from several localities around
Chernobyl and identified by karyotyping as described in Baker et al. (1996).
High molecular weight DNA was isolated from frozen liver samples following the
method of Longmire et al. (1997). For RDA experiments, we have chosen two
male individuals per species. RDA was performed as described in Lisitsyn and
Wigler (1995). We used Bgl\\ restriction enzyme for representation. Two
reciprocal experiments were performed using (I) M. agrestis amplicons as tester
(M. oeconomus as the driver) and (II) M. oeconomus amplicons as tester (M.
agrestis as the driver). After three rounds of RDA most prominent bands were
excised from agarose gel, purified using Qiagen gel purification kit (Qiagen Inc,
Valencia, CA) and cloned into pGEM®-T vector (Promega Corp., Madison, Wl).
27
After transformation into JM109 E. coli cells fifteen colonies per excised band
were screened by PCR using J-adapter (JBam24) as a primer (Lisitsyn and
Wigler 1995). Products of this amplification were then digested with Sau3A
restriction enzyme (Promega Corp.) to detect most common sequences. One
clone from a series displaying common pattern was labeled with [a-^^P]-dCTP
(NEN™ Life Sciences Products, Inc, Boston, MA) and used as a probe in
southern blot experiment with BamH\ digested genomic DNA of M. agrestis and
M. oeconomus. Probes that produced hybridization to the genomic DNA of one
species but did not give any detectable hybridization with the genomic DNA of
another were considered representing "true" difference products.
Prehybridization and hybridization were performed at 42°C using the following
solution: 5xSSC, 1% SDS, 0.005% non-fat dry milk, 5xDenhardt solution, 50%
formamide. "True" products were further sequenced using dRhodamine dye
terminator kit (Applied Biosystems, a division of Perkin Elmer Corp. Foster City,
CA) and analyzed on a ABi™310 autosequencer (Applied Biosystems).
Oligonucleotide primers were designed to each sequenced difference product
using Oligo® 4.05 computer program and tested on genomic DNA isolated from
individuals of M. oeconomus and M. agrestis.
Results and Discussion
We performed two reciprocal RDA experiments that we expected to yield
markers distinguishing M. agrestis from M. oeconomus. In the first experiment
28
M. agrestis genomic DNA was used as the tester and compared against M.
oeconomus DNA used as driver. In the second experiment tester and driver
were switched. We analyzed only a small subset of difference products
generated in each of two experiments. Hybhdization patterns shown in Fig 3.1
indicate that both isolated markers are repetitive elements (multiple bands are
present). Both difference products (/Woel ~ M. oeconomus-spec\f\c] and Mag3-
- M. agrestis-spec\f\c were sequenced and deposited in GenBank under the
following accession numbers: XXXXXX, XXXXXX). Using these sequences we
were able to design species-specific PCR-primers that produce amplification
product from the genomic DNA of one species and do not yield any visible
amplification for the other species (Fig 3.1). Therefore it is convenient tool for
simultaneous identification (both primer pairs have identical PCR profiles, Table
3.1) of multiple samples where the only limitation is the capacity of the thermal
cycle used. As it was stated above, our goal was not only to find markers
discriminating M. agrestis from M. oeconomus but also these two from M.
arvalis/M. rossiaemeridionalis group as juvenile individuals of these four species
look virtually identical and cannot be disfmguished morphologically. We tested
each primer pair of genomic DNA of each of these four species (Fig 3.1). Note,
that primers originally designed to discriminate species within M. arvalislM.
rossiaemeridionalis and M. agrestis/M. oeconomus pairs do not generate
amplification when tested on genomic DNA of species belonging to the other
29
pair. Primer sequences and PCR conditions are given in Table 3.1. Similar
ideology can be used to develop RDA-derived markers for other applications.
Results described above permit us to summarize and discuss features of
RDA performed on genomes of different species, or interspecific RDA. In most
cases difference products isolated by interspecific RDA are repetitive
sequences of the tester genome that are not present in the driver. Because of
the properties of RDA (kinetic enrichment step) it preferentially isolated
repetitive sequences when they constitute differences between tester and driver
(Navin et al. 1996). As we showed previously (Nekrutenko et al. 1999)
repetitive sequences that are present in one species and absent in the other can
be successfully isolated by RDA even when species under investigation are
cryptic and have highly similar complex genomes. It is apparent that families of
repetitive sequences are homoplasy-free characters as the probability of
evolving two identical sequence families in separately evolving is virtually a zero
(Vemeau et al. 1998). Additionally, differences found between two genomes
can be easily converted into PCR-based markers that would discriminate
between taxon based on presence/absence of amplification products. Because
repetitive sequences are represented in the genome by more than one locus
and in a PCR-based application multiple loci are amplified simultaneously the
likelihood of not having an intensive amplified diagnostic product is improbable.
In this study we have created a diagnostic key to identify individuals of
the genus Microtus two species. The need for this key results from the fact that
30
a wide variety of researchers will be capturing and releasing individuals in mark-
recapture studies of survival and dose monitoring. Two species, M. an/alis and
M. rossiaemeridionalis cannot be identified to species without either voucher
specimens or molecular analysis. Additionally, not all of the researchers that
work here are adequately trained to identify immature Microtus under field
conditions and by saving a drop of blood, ear clip or toe it will be possible to
isolate sufficient DNA to use this PCR-based method to positively identify all
individuals of this genus. In addition to the above problem other possible uses
might include identification of egg or larval stages where these forms are
particularly problematic. Finally, we think that RDA has a great potential to
develop diagnostic markers for forensic, law enforsment and conservation
issues. For example, in the caviar market there is considerable concern that
caviar can be collected from endangered species. Using RDA it should be
possible to develop a series of species-specific markers that could be used to
document that the sample came from legal species as well as to document
which endangered species was used in examples of illegal trade. Additionally,
only a single ovum would be required to perform the necessary test from each
sample and with the use of PCR cost per sample should be significantly lowed
than in the case of sequencing of southern blot hybridization.
31
Literature Cited
Baker RJ, Hamilton MJ, Van Den Bussche RA et al. (1996). Small mammals from the most radioactive sites near the Chornobyl nuclear power plant. Journal of Mammalogy 77:155-170.
Baker RJ, Longmire JL, Maltbie M et al. (1997). DNA synapomorphies for a variety of taxonomic levels from a cosmid library from the new world bat Macrotus waterhousii. Systematic B/o/ogy 46:579-589.
Lisitsyn NA, Lisitsyn N, Wigler M (1993). Cloning the differences between two complex genomes. Science 259:946-951.
Lisitsyn NA (1995). Representational difference analysis: finding the differences between genomes. Trends in Genetics '/f :303-307.
Lisitsyn NA, Wigler M (1995). Representational difference analysis in detection of genetic lesions in cancer. Methods in Enzy/no/ogy 254:291-304.
Longmire JL, Maltbie M, Baker RJ. (1997). Use of "lysis buffer" in DNA isolation and its implication for museum collections. Occasional Papers ofthe Museum of Texas Tech University f 63:1-3.
Navin A, Prekeris R, Lisitsyn NA et al. (1996). Mouse Y-specific repeats isolated by whole chromosome representational difference analysis. Genomics 36:349-353.
Nekrutenko A, Makova KD, Chesser RK, Baker RJ (1999). Representational difference analysis to disfmguish cryptic species. Molecular Ecology (in press).
Straus D, Ausubel FM (1990) Genomic subtraction for cloning DNA corresponding to deletion mutations. Proceedings of the National Academy of Sciences of the USA 87:1889-1893
Verneau O, Catzeflis F, Furano AV (1998). Determining and dating recent rodent speciation events using L1 (LINE-1) retrotransposons. Proceedings of the National Academy of Sciences of the USA 95:11284-11289.
32
Ushijima T, Nomoto T, Sugimura T et al. (1998) Isolation of 48 genetic markers appropriate for high throughput genotyping of inbred rat strains by B1 repetitive sequence-representational difference analysis. Mammalian Ge/7ome 9:1008-1012.
33
irim
er
1.25
rig
ina
Q - ^ o 2 Q. <D = L ^ : £
CN O ^ JT m _
eac
meg
; ut
ior
|Ll
of
(Pro
i O
di
tn .^ 2 CNÍ 8 ; = S ^ ^
"O _£î' ^ O = O D) ' ^ •E S >^ • i ^ S ^ E -5 . C W^ >N O CNI ^
*H^ *•— ^
o o < CD — Z
^^"? co ^ E
oduc
t C
orp
geno
»- i - M-Q . <D O < 1 O) û ^ iS
E l JL P <D _
4r CL --^
gned
T
Ps(
C
orp
íO 2 CD <D T3 O)
T3 ^ <D (0 -c E ir o O <D CD t -
E 3.^ Q . ^ <J>
.9 E ro oo in (D
agno
o
f2.
olym
• — — o
í^ CD ^ CM l ^
" °. B ^ 5 o i2 o CD O c O
ofile
i _
Q. cr: o Û-
tima
Q.
o
t5 3
T> P
CL
nce
0 3 <D
CO
0)
E Û-
h- </) 3 (/)
O) c CD
o <D (0
O CD O O "^ C3)
O <D (0
O CO O o o co
o <D <n
lO ' t O o co h-
<N CN CO
O <
o < o < < o o < o o
o
o < o o o o o < h-h-o <
o \-o o < o o I -o o
01
o
<0
o >^ o
in co
CJ) l O CN
o o
\-
o < o
GC
A
o <
o <
o o <
o H <
o o
u. co cn <D
O
O o
AC
Ti
AC
C
o o <
o o o o h-H <
CC co C») <D
34
siisaiBe I/\Í SnUJOUODQO i/\i
O)
I t3 T3 2 o. (D O C
£
Q
•«*•
«n o ro
co ro O CM
to CO <o l O
x : T3 o 0) o c c
0)
i ^ <D -^ 00 >- O) ' T O) iD c m rsi
C o s <D
<n
v,_ »_ <*î
E I S CO CL
P
^
0} o
^ q) o ^ t5
prod
i
<u o c s? 0 î t
b
JC -D <1)
1 w
it gn
e en
c
Hca
tion
rs d
esi
1 se
qu
•— (X> v» Q. C O
E c 5 ro û.
p t
^ ^
O) <o ^ <D O
5
1 e
sfisajBe i^
snuiouooao >v
• >
i î t
• *
»o o rO
to 00 O CM
<o ro to T —
1-
<0
Í2 O T3 C CD I
Oi </)
$0
o g "
CD E o
<D Q . <0 i_
o !_
<D <
<D û
S-o T3 E <D O
T3 C
í? 5Í ^ <D
c 0) <D
Ê 2
• c 0)
<D O
CD
— CO
<D <0 ^ C Q. O CD '^
.C CD
o •> • ^ <D
"O JD
l l <D ^ > C <D ^
CD C .o .•§ c: <b
<b
co <0
2 8
T3
O • >
Q . ^
<0 0) o <D Q. <0
T3 <D
^-» <0 <D •*^ O
sã <o .fl)
0 JZ * .» •*-
o <
<0
8 <D _ O O C
^ 2 <D Î E
T3 O M-
O c o
o <0
3 <0
co 8 c <D
cr
o <D
y j * - r>
^ sî 8 0 0)
Q^ <0
<D
O . t
co"S <D C - O)
o <D Q . <0
"cB c: . o :t3 c: <D
q> •5 <o co
O) <0 iá ' <D
LL T3 i :
8 CD
<0 : 3
o c 8 <b o
O ^
W <D
S o O ) ^
^ o
1.1
II co
•5 o CD —
^ 2 CD-£ <o o p <D ^ co ^ O)
<D c
II
I
35
CHAPTER IV
REPRESENTATIONAL DIFFERENCE ANALYSIS
IN THE STUDY OF ALLOPOLYPLOIDS:
SUBGENOME-SPECIFIC MARKERS AND
FURTHER EVIDENCE OF BIASED CONCERTED EVOLUTION
IN ALLOTETRAPLOID COTTON Gossypium hirsutum.
Abstract
Here I report on the utility of representational difference analysis (RDA)
in the study of the composition of allopolyploids as well as on ability of RDA-
derived markers to correctly depict patterns of molecular evolution within
allopolyploid nuclei. We developed a series of genetic markers specific to the A
and D genomes of diploid cottons. Polymerase chain reaction (PCR) primers
designed to these markers yield an amplification product only when
corresponding genomic DNA is used as a template. Moreover these same A
and D genome specific primers produce amplification products with genomic
DNA from allotetraploid cotton Gossypium hirsutum. Therefore this approach
can be used to study allopolyploids of unknown composition by developing a
series of binary markers specific to suspected diploid progenitors. One of the
RDA-derived markers reported here represents polymorphic amplified
restriction fragment (PARF) - a sequence found in both A and D genomes but
differentially flanked by restriction sites. Sequence analysis of the PARF
36
sequences from the diploid cottons and from the allotetraploid G. hirsutum
confirms the previously reported observation of interlocus concerted evolution
(Wendel et al. 1995ab) among homeologous sequences within the polyploid
nucleus: sequences corresponding to A and D genomes are concerted to a D
genome type eliminating the A genome type. This conclusion is drawn from a
presumably neutral markers indicating that the interlocus concerted evolution is
a common event for allopolyploid genomes and that repeats undergoing this
process should not necessarily be arranged tandemly.
Introduction
Polyploidy has played and perhaps is playing an important role in the
evolution of angiosperms, where an estimated 70% of all extant species have
gone through a polyploidization event (Masterson 1994). Once the original
diploid genomes are united into the allopolyploid nucleus an entirely new
genetic environment is created in which evolutionary changes might take new
directions relative to the changes occurring in corresponding diploid taxa
(Reinisch et al. 1994; Cronn et al. 1996; Wendel et al. 1995ab; Jiang et al.
1998). An ideal model system for studying such instances would be an
allopolyploid taxon for which its diploid progenitors are known. For example,
cottons (genus Gossypium) include both diploid and tetraploid species. All
diploid species are separated into seven genome classes (A, B, C, D. E, F, and
G) based on observations of chromosomal pairing (Endrizzi et al. 1984). This
37
genus also includes five allotetraploid species (2n=4x=52) that have originated
as a result of hybhdization between an A genome and a D genome species. It is
anticipated that the hybridization event that gave nse to allotetraploid cottons
occurred between A genome diploid taxon closely resembling present day
species of G. arboreum or G. herbaceum, whereas D-genome donors were
likely similar to today's G. raimondii or G. gossypioides. Among allotetraploid
species two are important fiber and oilseed crops: G. hirsutum ("upland cotton")
and G. barbadense ("pima" cotton).
How does sequence composition of the original diploid genomes change
after allopolyploidy is established? Wendel et al. (1995ab) analyzed highly
repeated arrays of the internal transcribed spacer regions and 5.8S rRNA gene
sequences from selected species of diploid cotton and five allotetraploid
species. In all diploid and tetraploid species of cotton examined these
sequence arrays were homogenized by concerted evolution. Interestingly, in all
five tetraploid species that should theoretically contain arrays of both types (A
and D) examined sequences were concerted to either D-type (four species
including G. hirsutum and G. barbadense) or A-type (G. mustelinum) type. This
example is a compelling illustration of intergenomic concerted evolution in
allopolyploid taxa. The A genome is physically larger that the D genome yet
both have identical recombinational lengths (Reinisch et al. 1994) implying that
the D genome is more recombinationally active. While some diploid A genome
species were domesticated, D genome diploid species do not produce
38
spinnable fibers and have never been cultivated (Jiang et al. 1998). However,
in the commercially important allotetraploids G. hirsutum and G. barbadense
quantitative trait loci that affect fiber quality and yield were mapped to the D
subgenome suggesting new evolutionary pathways created by tetraploidization
(Jiang et al. 1998). Similar patterns of concerted evolution were suggested in
less comprehensive studies conducted on other allopolyploid plants such as
tobacco (Volkov et al. 1999), wheat (Nagaki et al. 1998), and synthetic
polyploids of Brassica (Song et al. 1995), as well as on some animal species
(Hillisefa/. 1991).
Given these data evolutionary processes occurring within the
allopolyploid nuclei is an interesting and largely unexplored area with our
present understanding based on data from a limited number of molecular
markers. Being able to understand and predict changes occurring in polyploid
genomes is important theoretically and practically because it can permit
prediction of the fate of synthetic polyploids and new cultivars where quite
frequently obtained results deviate from desired. In this report I propose a new
tool for isolation of genome-specific markers in allopolyploid organisms -
representational difference analysis. This method allowed us to (1) develop
markers specific to each of A and D genomes of cotton demonstrating how
similar ideology can be applied to allopolyploids of unknown composition and
(2) infer patterns of evolution of homeologous sequences using data from a
39
polymorphic amplifiable restriction fragment - a type of markers frequently
isolated by RDA.
Materials and Methods
Genomic DNA was isolated from young leaves of G. arboreum, G.
herbaceum, G. hirsutum, G. thurberi, and G. raimondii using QUAGEN DNA
plant maxi kit (Quagen). RDA was carried out as described in Lisitsyn and
Wigler (1995). We used BamHI restriction endonuclease to cut genomic DNA
and R-Bam adapters (Lisitsyn et al. 1993) to prepare tester and driver
amplicons. After performing three rounds of RDA most prominent difference
products were excised from an agarose gel, purified using Quagen gel
purification kit (Quagen) and ligated into pGEM-T vector (Promega). After
transformation, we screened 15 clones per excised product using adapter J-
Bam24 (Lisitsyn and Wigler 1995) as an amplification primer. In most cases all
15 clones were identified as having correct insert based on amplification
product size and these same products were cut with SauZA restriction enzyme
(Promega) and run on a 2% agarose gel to detect most abundant difference
products based on restriction fragment patterns. Most abundant products were
reamplified and used as probe in a southern blot experiment performed to
confirm that these products are true difference products (present in genomic
DNA of tester, but absent in driver). We used [a-^^P]-dCTP (NEN laboratories)
and a random primed labeling kit (Boehringer Mannheim) to label the probed.
40
Probes were then hybridized to a positively charged nylon filters (Boehringer
Mannheim) carrying BamHI-digested genomic DNA of tester and driver at 42°C
in a hybridization solution containing 5xSSC, 1% SDS, 0.005% non-fat dry milk,
5xDenhardt solution, 50% formamide. Products that were identified as true
difference products were further sequences using dRhodamite Dye Temriinator
kit (Peri^in Elmer) and analyzed on a ABI 310 automated sequencer (Applied
Biosystems). Obtained nucleotide sequences were used to design difference
product-specific primers with the help of Oligo 4.05 software. Primer
sequences and amplification conditions are given in Table 4.1. Nucleotide
divergence analysis was performed using program DnaSP 3.0 (Rozas and
Rozas 1999).
Results
Isolation of DNA Fraqments Specific to A and D Genomes. Two
reciprocal RDA experiments were performed in order to isolate A and D
genome specific sequences. In the first experiment we used DNA from G.
arboreum (A genome) as the tester and DNA from G. raimondii (D genome) as
the driver. In the second experiment tester and driver were reversed. Each
RDA experiment yields a large number of difference products enriched to a
different extend. We analyzed only a small subset of such difference products
in each of our experiment. First experiment (A genome as the tester) yielded
two difference products (A20, A36) that were confirmed as absence/presence
41
differences based on southern blot hybridization (Fig 4.1A). Labeled probe A36
produced strong hybridization signal implying that this is a highly repetitive
element whereas probe A20 appears to be a low or perhaps single copy
element based on the hybridization intensity. Second experiment (D genome
as the tester) yielded two absence/presence products (D10 and D13) and one
so called polymorphic amplifiable restriction fragment (D1, Fig. 4.1A). Note that
hybridization intensity for D-specific probes is significantly lower when
compared to A-specific probes. Polymorphic amplifiable restriction fragments
(PARFs, see Lisitsyn et al. 1993) are sequences that are present in both tester
and driver genomes but are differentially flanked by restriction fragments. As
shown in Fig. 4.1A labeled D1 probe hybridizes to different fragments of A and
D genomic DNAs. Thus, as the initial result we have two DNA fragments
specific to each G. arboreum (A genome) and G. raimondii (D genome) and one
fragment characteristic for both species but differentially flanked by restriction
sites.
To test whether these fragments can be used not only to distinguish the
two species of diploid cottons but rather to sort A genome from D genome we
designed a series of amplification primers to each of four difference products
(A20, A36, D10, and D13) and the PARF (D1) described above. Using these
primers (sequences are given in Table 4.1) we amplified genomic DNA of two
original species (G. ari^oreum and G. raimondii), genomic DNA of two other
diploid cottons (G. heriDaceum [A-genome] and G. thuri)eri [D-genome]) as well
42
as the genomic DNA of the allotetrapolid cotton G. hirsutum. We selected G.
thurberi ^rom a number of D genome cottons because it is relatively distant from
G. raimondii based on phylogeny published by Wendel et al. (1995b). Results of
this amplification are given in Fig. 4.1B. Based on this amplification we
conclude that our markers are diagnostic for A and D genomes within this
sample and also generate products of the same size from the tetraploid
species. Dl-specific primers produced amplification products on all five plant
samples.
Analysis of the Difference Products. First we sequenced amplification
products shown in Fig 4.2 directly (without cloning). Based on hybridization
some of our probes are repetitive elements. Primers designed to these probes
are amplifying multiple loci simultaneously which is a powerful feature for
diagnostic applications (probability of encountering a null allele is low). In
theory however repeated elements can be polymorphic especially when they
represent non-codlng regions which poses a problem for the direct sequencing
approach. We succeeded in obtaining clean chromatograms from PCR
products generated with A20, D10 and D1 amplification primers. Products of
primer pairs designed to A36 and D13 required cloning prior to sequencing.
Obtained sequences were deposited to Genbank under accession numbers
xxxxxxx-xxxxxxx.
Sequencing Analvsis of the Individual Difference Products. PCR
products generated by A20-specific primers were obtained from diploids G.
43
ari)oreum, G. heri)aceum and an allotetraploid G. hirsutum. Interestingiy there
is no variation among these sequences: all nucleotide sites (547) are fixed
among these three species. When A20 sequence is compared to the non-
redundant Entrez database using the BLAST program
(http://www.ncbi.nlm.nih.qov/BLAST: we used blastx algorithm that compares
all six possible reading frames of a nucleotide sequence against the protein
database) it matches a short region of Lilium A7enry/del1-46 retrotransposon
(accession XI3886) reported by Smyth et al. (1989). Even though BLAST
score is low (37) matching regions are 94% similar (Table 4.2). Zhao et al.
(1998) reported several G. barbadense repetitive DNA clones that had higher
matches with the Lilium /7enry/del1-46 retrotransposon; however, the A20
sequence is different from ones reported by these authors.
Amplification products obtained with A36-specific primers on the DNA of
the two A genome diploids and the allotetraploid were cloned and two clones
per species were sequenced. When these sequence were used in BLAST
searches they appeared to have similarity with the repetitive DNA clone
pXP077 from G. barbadense (accession AF060598) reported by Zhao et al.
(1998). All sequenced clones were polymorphic (no identical clones were
found) wlth the nucleotide diversity estimator (TI, Nei and Li 1979) ranging from
0.075 for G. arboreum clones to 0.129 for G. hirsutum clones.
Sequences of PCR products obtained with DlO-specific primers did not
produce any significant matches with the Enrez entries. Products of D13-
44
specific primers were cloned and two clones per product were sequenced for
each of the three cottons G. thurberi, G. raimondii ar\6 G. hirsutum. Again, no
identical clones were found. Calculated n values are ranging from 0.200 in G.
thurberi clones to 0.239 in G. hirsutum clones.
Sequences corresponding to PARF D1 were obtained for all five
analyzed species of cotton. BLAST search with these sequences against the
Entrez database did not identify any significant matches, and we were unable to
detect presence of any continuous open reading frames (ORFs); therefore, it
may be that this region is uncoding and possibly selectively neutral. Having
these homeologous sequences we posses a unique opportunity to test whether
in allotetraploid G. hirsutum both types of sequences (A genome-derived type
and D genome derived type) still co-exist or have been homogenized by
concerted evolution. We sequenced directly PCR products from all five species
(two A cottons, two D cottons and the allopolyploid) and performed a
phylogenetic analysis. Fig. 4.2 shows the resulting Neighbor-Joining tree. As
can be seen G. hirsutum sequence clusters together with the G. raimondii
fragment supporting the observation previously made by Wendel et al. (1995)
that in G. hirsutum, repetitive DNA regions (rDNA) are homogenized to the D
type sequences. As direct sequencing conceals variation and indicates only
most abundant sequence type we cloned PCR products obtained for G.
hirsutum and sequenced five of them. Although all clones had different
sequences when phylogenetic analysis was performed all clones were again
45
clustered together with the G. raimondii sequence confirming our initial
observation. Notably the nucleotide variation among these five G. hirsutum
clones was low (7c=0.002).
Discussion
RDA allows the study of allopolyploid genomes in an effective and
precise fashion without the need to construct libraries and screen for non-
crosshybridizaing clones. This approach can be used in two types of
applications. First, to determine the composition of a given allopolyploid a set
of diagnostic markers can be developed specific to each of its diploid
progenitors. When these markers are applied to the allopolyploid it would be
possible to determine which of the candidate diploids contributed their genomes
to the allopolyploid nucleus. This approach is illustrated on Fig. 4.1B where all
A and D specific primer pairs generate amplification products when the
allotetraploid's DNA is used as a template. Second, sequence analysis of
difference products may provide valuable data for highlighting changes
occurring to homeologous sequences in allotetrapolid nuclei. When RDA is
performed on closely related species it is possible to isolate differences
representing sequences that in reality are present in both tester and driver
DNAs but either vary in copy number of differentially flanked by restriction sites
(PARFs). In our example PARF D1 sequences are present in both A and D
genome diploids. Analysis of sequence data for this genomic fragment enabled
46
us to draw the conclusion that in the allotetreaploid species A- and D-type
sequences were homogenized to the D-type by the process of interlocus
concerted evolution reported previously by Wendel et al. (1995ab).
Additionally, sequence data from A36 and D13 clones indicate that on average
nucleotide variability of D-genome sequence is higher than that for those of A-
genome (Table 4.2). This observation is compatible to data reported by
Reinisch et al. (1994) and Wendel et al. (1996). These authors reported neariy
identical recombinational lengths of A and D genomes based on RFLP maps
implying that the physically smaller D genome is more recombinationally active.
The latter group have sequenced multiple clones representing 5S rRNA gene
and intergenic spacer region. For these genome fragments nucleotide
variability (n) was higher for D genome plants than for A genome species (0.058
vs. 0.039). However to statistically prove our observations a greater number of
difference products and corresponding clones needs to be analyzed.
The Lillium henryi retroposon del-like sequence (A20) reported here is an
interesting finding. Recently, Zhao et al. (1998) analyzed a large sample of
cotton (G. barbadense) repetitive DNA clones and identified three A genome-
specific clones (pXP030, pXP067 and pXP1-58) con-esponding to reverse
transcriptase and integrase regions of the retroposon. Our sequence does not
match any reported by these authors, possible because it corresponds to a
different region of the reverse transcriptase gene (Table 4.2; see also Smyth et
al. 1989). It is surprising that this retroposon-like sequence is restricted to A
47
genome only because similar sequences are found in a variety of organisms
ranging from yeast to fruit flies (Table 4.2).
Based on southern hybridization patterns (Fig. 4.1A), we do not have a
reason to believe that any of sequences reported here represent tandemly
repeated elements. Arrangement of repeats is one of the important factors
affecting concerted evolution: tandemly repeated sequences typically have
higher probability of being homogenized by unequal crossingover or gene
conversion events (Li 1997). Nevertheless homogenization occurred in
sequences of PARF that is strikingly similar with the pattern described by
Wendel et al. (1995) from rDNA arrays in cotton. Thus there are two reports on
biased (D genome biased in this example) homogenization of homeologous
repeats in an allopolyploid that may reflect a general pattern. In this sense, it
would be interesting to investigate changes to homeologous sequences in
another allotetraploid cotton G. mustelinum in which rDNA repeats have
concerted to an A type (Wendel et al. 1995a). Although precise mechanisms of
concerted evolution are yet to be determined, allotetraploids offer a model
system for studying this important process.
As noted above a single RDA experiment yields hundreds of difference
products. In the present report, we only analyzed two products from one
experiment and three from the other. Potentially, a library of difference
products can be constructed and screened for presence of tester-specific
clones and PARFs. Markers identified this way can be used in many
48
applications ranging from construction of genetic maps (see, for example,
Toyota et al. 1996) to the large-scale analysis of polymorphisms.
49
Literature Cited
Cronn RC, Zhao X-P, Paterson AH, and Wendel JF (1996). Polymorphism and concerted evolution in a tandemly repeated gene family: 5S ribosomal DNA in diploid and allopolyploid cottons. Journal of Molecular Evolution 42:685-705.
Endrizzi JE, Turcotte EL, and Kohel RJ (1984). Qualitative genetics, cytology, and cytogenetics. In Cotton (ed. Kohel RJ and Lewis CF) ASA/CSSA/SSSA Publishers, Madison, Wisconsin.
Hillis DM, Moritz C, Porter CA, and Baker RJ (1991). Evidence for biased gene conversion in concerted evolution of ribosomal DNA. Science 251:308-310.
Jiang C-X, Wright RJ, El-Zik KM, and Paterson AH (1998). Polyploid formation created unique avenues for response to selection in Gossypium (cotton). Proceedings ofthe National Academy of Sciences ofthe USA, 95:4419-4424.
Li W-H. (1997). Molecular Evolution. Sinauer Assiciates. Sunderiand, Massachusetts.
Lisitsyn NA, Lisitsyn N, and Wigler M (1993). Cloning the differences between two complex genomes. Science 259:946-951.
Lisitsyn NA, and Wigler M (1995). Representational difference analysis in detection of genetic lesions in cancer. Methods in Enzymology 254:291-304.
Masterson J (1994). Stomatal sized in fossil plants: Evidence for polyploidy in majority of angiosperms. Science 264:421-424.
Nagaki K, Tsujimoto H, and Sasakuma T (1998). Dynamics of tandem repetitive Afa-family sequences in Triticeae, wheat-related species. Journal of Molecular Evolution 47:183-189.
Nekrutenko A, Makova KD, Chesser RK, and Baker RJ (1999). Representational difference analysis to distinguish cryptic species. Molecular Ecology, in press.
Nei M, and Li W-H (1979). Mathematlcal model for studying genetic variation in terms of restriction endomucleases. Proceedings ofthe National Academy of Sciences of the USA, 76:5269-5273.
50
Reinisch AJ, Dong J-M, Brubaker CL, et al. (1994). A detalied RFLP map of cotton, Gossypium hirsutum x Gossypium barbadense: Chromosome organization and evolution in a disomic polyploid genome. Genetics 138: 829-847.
Rozas J, and Rozas R (1999). DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175.
Smyth DR, Kalitsis P, Joseph JL, and Sentry JW (1989). Plant retrotransposon from Lilium henriyi is realted to Ty3 of yeast and the gypsy group of Drosophila. Proceedings of the National Academy of Sciences of the USA, 86:5015-5019.
Song K, Lu P, Tang K, and Osborn (1995). Rapid genome change in synthetic polyploids of Brassica and its implication for polyploid evolution. Proceedings of the National Academy of Sciences of the USA, 92:7719-7723.
Toyota M, Canzian F, Ushijima T, et al. (1996). A rat genetic map constructed by representational difference analysis markers with suitability for large-scale typing. Proceedings of the National Academy of Sciences of the USA, 93:3914-3919.
Volkov RA, Borisjuk NV, Panchuk II, et al. (1999). Elimination and rearrangement of parentakl rDNA in the allotetraploid Nicotiana tabacum. Molecular Biology and Evolution Y6:311 -320.
Wendel JF, Schnabel A,. and Seelanan T (1995a). An unusual ribosomal DNA sequence from Gossypium gossypioides reveals ancient, cryptic, intergenomic introgression. Molecular Phylogenetics and Evolution 4:298-313.
Wendel JF, Schnabel A,. and Seelanan T (1995b). Bidirectional interlocus concerted evolution following allopolyploid speciation in cotton (Gossypium). Proceedings of the National Academy of Sciences of the USA, 92:280-284.
Zhao X-P, Si Y, Hansion RE, et al. (1998) Dispersed repetitive DNA has spread to new genomes since polyploid formation in cotton. Genome Research 8:479-492.
51
Table 4.1. Amplification primers for A and D genome-specific difference products. Amplification profile was 1 min at 94°C (2 min for the first cycle), 1 min at 52°C and 1 min at 73°C for 35 cycles. Reaction composition was following (per 25^1): 2.5)LII IOX buffer (Promega), 3.75)LII of each 2 ^M primer, 2jnl of 2.5mM dNTPs(Perkin Elmer), 1.5^1 of 25 mM MgCI^ (Promega), 2.5 units of T"a<7 polymerase (Promega buffer A),1-5ng of the template DNA (typically 2.5\i\ of 1/100 dilution of the original extraction).
Primer Sequence
A20F ATA GCC CAG ATG GAG ATA GAA TGT GG
A20R ACT CTA AGG CTG AAG ACT GAA TAG AAA GG
A36F CCG GAG TCG AAC ACA AGG TGC AT
A36R CCG ACT TTG GAA ATT CAT TGT AAA TTA ACC
D1_PARF_F TTA CTG GGA TTT GCC ATG AAA CC
D1_PARF_R CCC CTT ATC TTC CAG TTG TGA CG
D10F TGA GGC GAC ATG GAA TCT GTA GG
D1OR TTC TGA ACT GAG GCG AAC TAT TTG G
D13F GGG CAA CCA GTT GTG TCC AGG
D13R TGT GGG AAA GAT TGG TGG TGT AGG
52
U) <D >
<D .4-» C <D <0 <D Q . <D i_
E o
" ^ — > o E
T3 <D
^ <D (O C o o p <D <0 CD
• * - •
Q . ' i — O (O c CD
0 <0 1— <D >
<D
(0 <D 0 C <D 3 O" <D <0
•D
'o CD 0 C
E <
csi -^' <D
.Q CD h-
<ô CD E 'c CD
• D C CD
'O) c 3
H -..
<0 c CD Q
V|—
0
c g <o <o CD o o
<
<D >
"CD
CD <0 c o o
(r
c 'CD E o •o
<D O k-13 O
co
>
I—I
Q t—)
> > . - I
<D E o c <D O )
<
I <0 o
CD
c <D
E UJ
co 00 00
co
l-l > Q Q > fi-i > >
<D - C
: 3
00 a> co 10 0 0 0 <
r CÛ 00 00 CN ^
00 <o co
00 N
0 "^ 00 ^
s
C3) 0 h-0
i
>
.<o <0
&
-Q
f-l
Q Q 1—1 [L| (—1 HH >
Q Q Q D > Z >
<D .<D
0 CN <
(D ^
1
. ^ <D û
^
Z
CN 1
00 >%
1-
> Q Q >
>
U
> Q Q Q >
> U U
h-l
1-1 Q Q >
a Q Q
/s/
> p V-. <D
yces
c
haro
m
acc
00
-9> CO <D O) 0
elam
^
ophi
la
rosi
0
=3 <D
: c ô
disc
8
oste
liu
. 0
Q
<o ss
ue
V-»
mal
ian
E CD
^
psy
>%
0
>
CD 0
1
0 S
53
<0
o =3 T.-Í o i _
Q. O «= O <D Q. <0 0 F
geno
co t —
n
D10
n Û < <
Q
AA
D
Q LL 01 < a.
<o t5
o < Q. O
i ^ 'O 0 Q. <0 <D E o c 0 O)
3
0 c c g'^
•n S Í S
ÍL^ ^" • '%
t
t
54
^ - ^ lO <0 T-
C
X h-
C <
LJJ
Q Ll_ ÍT < CL
O
'Ô "D O
E c o "3 —> "oo j Q 3 <0 i _
"Ø
ram
CD
S ^ 0
o <D
"8 0
0
E o c 0 O)
Q^ ..^
ei "6 II
UJ X . —
0
E O
c 0 O)
<
O Q . .Q 1 s: CM
ura
E
rKi
0 • Q C 13
^*^^ 0 0 i _
0 C 0 D ) C3) C 'c 'o —3
1
o .Q
O) • ø
z T3 0
Unr
oot
Csi
0
Fig
ur
<D
cb 11
CÛ
< .
<n c O 1 ^ ^
o o -o 'o CL <D u.
S "O
<D T3 O Q .
TJ
E o i _
• D 0 C
"<D
o <0 0 o c 0 o-0 <0
c: o .g ÍD i ^
CD II
Û a:
'ãr E o c 0 D )
Q < ,
¥ -5 3 .í2 -c
ei E 2
M—
<0 0 c o o II
o
n X (D ~~ ^ LO
X
^ '
X
^"'^^ 0
E 0 O)
<
,—^ 0
E o c 0 O )
Q c: 0
•e :3 S ei II
X h-
55
CHAPTER V
SUMMARY
From a theoretical prospective, aspects of genome change and
organization are thought to be critical to evolutionary success. Therefore
identification of genome changes is basic to the development and testing of
hypothesis are theories concerning evolutionary processes. From an applied
point, identification of unique aspects of closely related genomes has great
financial value with applications ranging from legal matters of law enforcement
to patent protection. An efficient method to locate and study genome changes
will serve society in a variety of ways ranging from theoretical growth to
economical development. My dissertation explores the representational
difference analysis as a means to resolve these needs.
Theoretical applications of RDA include identification of genetic
differences between closely related species and development of arrays of
markers that can be used for interpretation of evolutionary events on molecular
level. Species-specific markers isolated with RDA allow identification of
multiple samples in a fast and effective way. These markers can be used for
discrimination of closely related taxa, varieties, cultivars or even individuals.
Because RDA-derived markers are PCR-based (a set of diagnostic primers is
developed for each marker) a minute amount of DNA is sacrificed during the
identification procedure. For example, the DNA isolated form an ear clip, toe, or
56
a single ovum would be sufficient. These features make RDA ideally suited for
high throughput applied procedures. For example, in law enforcement,
conservation, and commercial use it would be possible to precisely identify the
source of a certain specimen or food product.
Features of Interspecific RDA
In most cases, even when RDA is performed on closely related taxa
(cryptic species, for example; see Chapter II), obtained markers represent
families of repetitive elements that were introduced or have dramatically
increased in copy number after the split of the two taxa. Diagnostic primers
designed to repetitive elements amplify multiple loci (corresponding to individual
copies) simultaneously, virtually eliminating the possibility of encountering a null
allele, that would lead to a incorrect (no) identification. In cases when it is
desired to develop markers for more that two species it is not necessary to
perform reciprocal RDAs between each possible pair (see Chapter III). Each
RDA experiment generates a large number of difference products and it is
highly unlikely (although possible) that sequences that differ between two taxa
in a group would also be characteristic for other species. For example, in
Chapter III our goal was to obtain markers unique to each of four species of
voles. To accomplish this we performed reciprocal RDA only between two pairs
of species that were most closely related based on cytochrome b phylogeny
57
(Nekrutenko 1999, unpublished data). This was sufficient to yield desired
markers (Fig. 3.1).
As mentioned above RDA experiment generates a large array of
difference products. It is possible to visualize this array as a set of randomly
sampled sequences that either absent from one genome and present in the
other or different between the two. As these two genomes belong to different
species the differences between them contain a wealth of information on what
aspects of genome change after the two become different taxa. Being able to
analyze large number of differences would be very helpful in studies of
molecular evolution but would also require application of more powerful
technical approaches such as use of high density arrays. Additionally, most of
the differences between highly similar genomes account for non-coding
(presumably neutral, if not subjected for genetic hitchhiking) regions. Analysis
of nudeotide polymorphisms in such regions may provide a baseline
information about the degree of variability for a particular organism. For
example analysis of random clones obtained for A and D genome specific
markers in cotton (Chapter IV) suggests that the D genome has higher level of
nucleotide variability compared to the A genome.
Absence/presence differences are not the only type of marker RDA
isolates. In some cases isolated sequences present in both compared
genomes but differentially flanked by restriction sites (polymorphic amplifiable
restriction fragments-PARFs). This type of RDA-derived marker cannot be
58
used in diagnostic application because corresponding primers would amplify
both genomes. On the other hand these markers represent sequences
homologous between the two genomes. In Chapter IV we used one of such
markers (PARF D1) to show that within the allotetraploid cotton G. hirsutum
sequences originally donated by A and D genome diploid progenitors during the
hybridization are homogenized to D type by concerted evolution. Similarly to
absence/presence markers a RDA experiment yields multiple PARFs that can
be isolated and analyzed in large numbers providing, for example, a reliable
estimate of the pattern of concerted evolution within allopolyploids.
Maior Contributions from the Dissertation
Similarly to early papers on application of protein electrophoresis,
restriction enzymes or sequencing to the analysis of polymorphisms, this work
introduces a new and powerful approach for isolation of diagnostic genetic
markers. Below I listed major contributions from the dissertation:
1. RDA has been applied to isolation of species-specific markers. As
shown in Chapters II and III such markers have been successfully isolated and
tested (Figs. 2.2 and 3.1). Identification procedure is PCR based implying that
it requires minimal amounts of DNA and can be done within a short period of
time. Similar ideology can be used for development of binary markers capable
of distinguishing other closely related species with suitability to large-scale
genotyping.
59
2. By development of markers specific to A and D genomes in cotton I
demonstrated how RDA-derived markers can be used to identify composition of
allopolyploid nuclei (Fig. 4.1).
3. I demonstrated that polymorphic amplifiable restriction fragments
(PARS) provide unique opportunity for random sampling of homologous
sequences between genomes and uncovering interactions among them in
allopolyploid systems (Fig. 4.2)
60