development of species- and genome-specific genetic...

DEVELOPMENT OF SPECIES- AND GENOME-SPECIFIC GENETIC

MARKERS BY REPRESENTATIONAL DIFFERENCE ANALYSIS:

APPLICATION IN SYSTEMATIC AND EVOLUTIONARY RESEARCH

by

ANTON NEKRUTENKO, M.S.

A DISSERTATION

IN

BIOLOGY

Submitted to the Graduate Faculty of Texas Tech University in

Partial FulfiUment of the Requirements for

the Degree of

DOCTOR OF PHILOSOPHY

Approved

Accepted

August, 199é

ACKNOWLEDGMENTS

This dissertation v\/as conceived and completed because I was given a

once in a life time opportunity. I was at the right place at the hght time, and it

certainly was one of the greatest things that have ever happened to me. I was

immersed in an environment that allowed me to grow and become confident in

who I will be. The person who is entirely responsible for granting these endless

opportunities to me is Dr. Robert Baker who brought me to Texas Tech. His

scientific guidance has had a great impact on how I view science and scientific

conduct now. His everyday support changed, in many ways, my understanding

of how to function in the society and how to interact with other people. He is

and will be my teacher.

Members of my doctoral committee—Dr. Randy Allen, Dr. Robert

Bradley, Dr. Ronald Chesser and Dr. Marilyn Houck—have always been

supportive and put a significant amount of time and effort in making this

dissertation happen both linguistically and scientifically. 1 am grateful to Dr.

John Patton for his constant flow of ideas and support throughout all this time.

Dr. David Hillis helped me with analyses of data and writing of my first

manuscript. i thank Drs. Jim Bull and Holly Wichman for reviewing parts of this

dissertation and for theirs suggestions. For favors great and small I want to

thank members of Dr. Baker's laboratory and the Natural Sciences Research

Laboratory. graduate students of the Department of Biology, my friends.

I thank my wife, Kateryna Makova, who supported and survived my work

with all its victories and failures. She also happened to be my co-worker who

designed all primers presented in this dissertation. She is a great and

incredibly strong person who invested so much in my success. I am grateful to

my parents Yuri Nekrutenko and Ihna Deriugina for their strong belief in my

abilities and constant support. My father, also a biologist, set for me an

example of a scientist whose only reason to be a scientist is a pure curiosity

and nothing else. My mother, on the other hand, has been always there for me

to remind that the real world does exist and requires my attention. I also want

to thank my grandparents and especially my grandmother Alexandra Deriugina

who's endless optimism and believe in the good sides of life always recharges

me in bad times. My older sister, Olga Razgonova, helped and supported me in

all aspects of every day life.

Finally, but not at last, I want to thank the person who taught me practical

molecular biology—^AIex Palamarchuk. He is an organic chemist who told me

that it is impossible to understand complex processes without knowing the

basics.

This study was supported by contract DE-FC09-96SR18546 between the

U.S. Department of Energy and the University of Georgia and by funds from

Texas Tech University.

III

TABLE OF CONTENTS

ACKNOWLEDGMENTS

LIST OF TABLES vi

LIST OF FIGURES vii

LIST OF ABBREVIATiONS AND ACRONYMS viii

CHAPTER

1.

II.

III.

INTRODUCTION Literature cited

REPRESENTATIONAL DIFFDERENCE ANALYSIS TO DISTINGUISH CRYPTIC SPECIES Abstract Introduction Materials and Methods

Specimen Collection, Identification and Isolation of Genomic DNA RDA and Analvsis of Difference Products

Results and Discussion Expehment A Experiment B

Literature Cited

ISOLATION OF BINARY SPECIES-SPECIFIC PCR-BASED MARKERS AND THEIR VALUE FOR DIAGNOSTIC APPLiCATIONS Abstract Introduction Materials and Methods Results and Discussion Literature Cited

1 8

10 10 10 13

13 14 15 15 16 18

23 23 23 27 28 32

iV REPRESENTATiVE DiFFERENCE ANALYSiS IN THE STUDY OF ALLOPOLYPLOIDS: SUBGENOME-SPECiFIC MARKERS AND FURTHER EVIDENCE OF BIASED CONCERTED EVOLUTiON IN ALLOTETRAPLOID COTTON Gossypium hirsutum 36 Abstract 36

IV

Introduction 37 Materials and Methods 40 Results 41

isolation of DNA Fragments Specific to A and Genomes 41 Analysis of the Difference Products 43 Sequencing Analysis of Individual Difference Products _ 4 3

Discussion 46 Literature Cited 50

V. SUMMARY 56 Features of Interspecific RDA 57 Major Conthbutions from the Dissertation 59

LiST OF TABLES

2.1 Diagnostic phmers designed from RDA products 20

3.1 Diagnostic phmers designed from RDA products 34

4.1 Amplification phmers for A and D genome-specific difference products 52

4.2 Amino acid sequences of the reverse transchptase conserved motif from representatives of plant, yeast and animal retrotransposons 53

VI

LiST OF FiGURES

2.1 Schematic representation of the RDA procedure 21

2.2 Results of RDA expehments and amplification with resulting pri mers 22

3.1 Results of two reciprocal RDA experiments and test of

pnmers on four species of voles 35

4.1 Cotton genome-specific RDA difference products 54

4.2 PARF D1 sequences are homogenized to a D-type in the allotetraploid cotton G. hirsutum 55

VII

LiST OF ABBREVIATIONS AND ACRONYMS

bp base pairs

DNA deoxyribonudeic acid

kb kilobases/kilobase pairs

dNTP deoxynudeotide triphosphate

PARF polymorphic amplifiable

restriction fragment

PCR polymerase chain reaction

RDA representational difference analysis

VIII

CHAPTER1

INTRODUCTiON

Genomic DNA is an enormous databank containing information about

function, development, reproduction, and evolution of every living organism.

This overwhelming "information space" can be used in evolutionary and

systematic research to learn which properties of the genome define various

taxonomic groups and how the genome itself changes during the process of

evolution. One of the problems associated with the exploration of the genome

is its size. For example, the human genome and genomes of other mammalian

species are highly complex and composed of billions of base pairs. They

contain sequences that are highly similar between all major branches of life

such as ribosomal genes (Woese and Fox 1977; Lake 1988), protein synthesis

machinery genes (Rivera and Lake 1992; Brown and Doollttle 1995) or certain

housekeeping genes (Nekrutenko et al. 1998). On the other hand other

sequences can be unique to a particular taxonomic group (Baker et al. 1997),

one of two closely related species (Nekrutenko et al. 1999), or even vary

between indivlduals of the same population.

in order to develop reliable diagnostic markers for individuals,

populations, subspecies and closely related species and to record evolutionary

processes in situ, it is necessary to gain access to rapidly evolving regions of

the genome. The rapidly evolving regions will be able to provide a desired

resolution. it is especially valuable in the instances of closely related species

recently separated by a speciation event. While such taxa have highly similar

genomes (possibly up to 99.9% in overall sequence similarity) they

nevertheless contain genetic fingerprints that are the benchmark to establish

the two as separate species. Given these observations, the questions this

dissertation primarily addresses are how to effectively isolate and analyze

differences between closely related genomes and what types of sequences

these differences represent?

Hlstorically, the most effective way for the isolation of differences

between two similar pools of DNA sequences was subtractive hybridization. In

a subtractive hybridization experiment two compared pools of DNA are cut into

fragments, denatured either by heat or sodium hydroxide solution and mixed

together to allow reannealing. Typically a small amount of one of these two

DNA samples (driver) is mixed with an excess of the other sample (tester) so

that the tester fragment predominantly reanneal to the driver fragments. if the

tester fragments are labeled with a hapten then the tester/tester homoduplicis

and tester/driver heteroduplicis can be isolated using affinity chromatography.

This approach was successfully used for isolation of gross differences such as

Y chromosome-specific sequences (Lamar and Palmer 1984), or large

deletions on the X chromosome related to Duchenne muscular dystrophy and

choroidermia disorders (Kunkel et al. 1985; Nussbaum et al. 1987). However in

all of these examples, DNA differences were large enough to be detected when

degree of enrichment is only 10^-10^ times - typical for subtractive

hybridization. Sequence complexity of mammalian genomes is so high that it

does not allow finer differences, such as small deletions/additions and

nudeotide polymorphisms, to be detected with this method due to inadequate

enrichment.

Lisitsyn et al. (1993) proposed an approach, representational difference

analysis (RDA), for isolation of differences between closely related mammalian

genomes that combines methodology of subtractive hybridization with kinetic

enrichment. Kinetic enrichment is another tool for DNA difference enrichment

that is based on the second-order kinetics of self-reassociation (Wieland et al.

1990). Importantly, to overcome the problem created by the genome

complexity the first stage of RDA, representation, reduces initial complexity of

genomic DNA to 2%-15%. This is achieved by digesting tester and driver

genomes with a restriction enzyme (typically with 6 bp recognition site), ligation

of an oligonudeotide adapter followed by PCR amplification with the same

adapter used as a primer. Amplification conditions are adjusted in a way that

only fragments with an average size of 0.6 kb can be effectively amplified. This

new, synthetic pool of amplification products or amplicons, is prepared

separately for tester and driver and represent, as was mentioned above, only

2%-15% of the initial genomic DNA. Subsequently, oligonudeotide adapters

are cut off from tester and driver amplicons and a new set (with a different

sequence) ligated to the tester amplicons only. Small amount of tester

amplicons (having new oligonudeotides at the ends) is then ligated to a large

excess of tester amplicons, denatured and allowed to reassociate. Because the

driver amplicon fragments constitute the majority in this mix, tester sequences

that are aiso found in the driver amplicons preferentially form heteroduplicis in

which the oligonudeotide is attached only from the tester side. On the other

hand, sequences that are unique to the tester do not have twins in the driver

pool to reassociate with and form homoduplices with the oligonudeotide

attached to both strands. After reassociation is completed, the mix is used as a

template in a PCR amplification with the oligonudeotide as a primer. Only

tester homoduplicis that have adapters attached to both stransd can be

exponentially amplified and therefore greatly enriched relative to other

sequences. This hybridization/amplification step, difference enrichment, is

repeated 3 to 4 times resulting in final enrichment over a 10^-fold. This degree

of enrichment makes it possible to isolate polymorphisms even between

individuals of the same population (reviewed in Lisitsyn 1995).

RDA allows isolation of differences that can be divided into two classes:

binary (absence/presence) differences and restriction site polymorphisms.

Absence/difference type relates to sequences that are present in one genome

(tester) and absent in the other (driver). Restriction site polymorphisms, also

called polymorphic amplifiable restriction endonudease fragments (PARF's;

Lisitsyn et al. 1993) represent differences in the position of restriction sites in

the tester and driver genomes. In tester genome, restriction sites for a

particular PARF's are close enough together so the sequence between them

can be amplified during the representation step, whereas in the driver these

sites are too far apart prevenfmg the sequence between them from being

amplified.

Originally, RDA was developed for isolation of genetic lesions in cancer

where losses and amplifications of genome regions can be detected by

comparing genomic DNA from cancerous and normal cells of the same

individual (Lisitsyn et al. 1995). Since that time, the technique has been

successfully used in various applications induding: the isolation of Y

chromosome-specific sequences (Donnison et al. 1996), identification of

differentially expressed transcripts (Hubank and Schatz 1994), isolation of

probes detecting DNA loss and amplification in tumors (Lisitsyn et al. 1995) and

others (reviewed in Baldocchi and Flaherty 1997).

The Chernobyl Research Team directed by Drs. Ronald Chesser and

Robert Baker—a result of collaboration between the Texas Tech University and

the Savannah River Ecology Laboratory—has been conducting several studies

of the effects of the Chernobyl power plant meltdown accident on genetics of

animals (reviewed in Baker et al. 1996). One of the problems the team has

faced is the reliable identification of some rodent species that inhabit

radioactively contaminated areas that could serve as a model system for the

study of the ecological consequences of the accident. For example, four

species of voles (genus Microtus) that were collected at the accident site are

very difficult to distinguish at juvenile stages. Moreover, two of these species

(M. an/alis and M. rossiaemeridionalis) represent an impressive example of

cryptic species that are truly sympatric over extensive portions of their ranges

and cannot be identified based on morphological characters. Although these

taxa can be distinguished by karyotyping or DNA sequencing this approach is

too labor and cost intensive to be used for identifying large numbers of

individuals. Thus, an initial objective of my dissertation was do employ RDA for

the first tlme to isolate genetic markers capable of unambiguous identification of

closely related species (interspedfic RDA) and to develop PCR-based

diagnostic assay for these markers. Results of my efforts on this project are

provided in Chapters 11 and lii.

in the process of working on developing of species-specific RDA-derived

markers, 1 discovered that these markers can be used not only in diagnostic

assays but also in studies to describe pattern of genome evolution. Frequently,

interspecific RDA yields families of repetitive sequences that are present in one

of the compared genomes and absent in the other. in some cases, however a

repetitive DNA family is isolated not due to its total absence from the driver

genome but because the latter is either differently flanked by restriction sites

(PARFs, see Chapter IV), or contains a significantly lower number of copies

and/or sequence homology is not 100%. Such families that are present in both

the tester and driver genomes but differ in copy number per genome or overall

sequence similarity provide an exciting opportunity to document and study

cases of concerted evolution and possibly to understand changes that

accompany speciation events. For example, these repetitive elements can be

used to describe interactions between subgenomes in allopolyploid organisms

such as highland cotton Gossypium hirsutum. Additionally, diagnostic markers

developed between diploid species can be used to establish the origin of

allopolyploids because these markers can be used to identify specifically which

original diploid genomes were united in a given allopolyploid taxon. The second

objective of my dissertation was to develop subgenome-specific markers for A

and D genomes in G. hirsutum and to study changes that occur to them in the

allotetrapliod plant relative to that observed in diploid progenitors.

Chapter 111 of this dissertation describes the development of markers

discriminating two cryptic species of the genus Microtus (M. arvalis and M.

rossiaemeridionalis). In Chapter lli, 1 outline the ideology for developing of

diagnostic markers for more than two species isolated by unequal genetic

distances and discuss some of the major features of the interspecific-RDA.

Chapter iV demonstrates how RDA-derived markers can be used for analysis of

allopolyploids and inferences about concerted evolution of selected repeat

families using diploid and polyploid cotton species as an example. Chapter V

summarizes my finding and highlights major contribution of this dissertation. in

this last part of my dissertation, 1 outline results and explain their significance as

well as discuss future direction.

Literature Cited

Baker RJ, Hamilton MJ, Van Den Bussche RA et al. (1996). Small mammals from the most radioactive sites near the Chernobyl nuclear power plant. Journal of Mammalogy 77:155-170.

Baker RJ, Longmire JL, Maltbie M, Hamilton MJ, and Van Den Bussche RA (1997). DNA synapomorphies for a variety of taxonomic levels from a cosmid library from the new worid bat Macrotus waterhousii. Systematic B/o/ogy 46:579-589.

Baldocci RA, and Flaherty L (1997). Isolation of genomic fragments from polymorphic regions by representational difference analysis. Methods 13:337-346.

Brown JR, and Doolittle WF (1995). Root of the universal tree of life based on ancient aminoacyl-tRNA synthetase gene duplications. Proceedings of the National Academy of Sciences ofthe USA 92:2441-2445.

Donnison IS, Siroky J, Vyskot B, Saedler H, and Grant SR (1996). isolation of Y chromosome-specific sequences from Silene latifolia and mapping of male sex-determining genes using representational difference analysis. GeA?ef/csf44:1839-1901.

Hubank M, and Schatz DG (1994). Identifying differences in mRNA expression by representational difference analysis of cDNA. Nucleic Acids Research 22:5640-5648.

Kunkel LM, Monaco AP, Middlesworth W, et al. (1985). Specific cloning of DNA fragments absent from the DNA of a male patient with an X chromosome deletion. Proceedings of the National Academy of Sciences of the USA 82:4778-4782.

Lake JA (1988). Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences. Nature 33t: 184-186.

Lamar EE, and Palmer E (1984). Y-encoded, species-specific DNA in mice: evidence that the Y chromosome exists in two polymorphic forms in inbred strains. Ce//37:171-177.

Lisitsyn NA (1995). Representational difference analysis: finding the differences between genomes. Trends in Genetics 11:303-307.

8

Lisitsyn NA, Lisitsyn N, and Wigler M (1993). Cloning the differences between two complex genomes. Science 259:946-951.

Lisitsyn NA, Lisitsyna NM, Dalbagni G, et al. (1995). Comparative genomic analysis of tumors: detection of DNA losses and amplifications. Proceedings of the National Academy of Sciences of the USA 92:151 -155.

Lisitsyn NA, and Wigler M (1995). Representational difference analysis in detection of genetic lesions in cancer. Methods in Enzymology 254:291-304.

Nekrutenko A, Hillis DM, Patton JC, etal. (1998). Cytosolic isocitrate dehydrogenase in humans, mice and voles and phylogenetic analysis of the enzyme family. Molecular Biology and Evolution 15: 1674-1684.

Nekrutenko A, Makova KD. Chesser RK, and Baker RJ (1999). Representational difference analysis to distinguish cryptic species. Molecular Ecology in press.

Nussbaum RL, Lesko JG, Lewis RA, et al. (1987). Isolation of anonymous DNA sequences from within a submicroscopic X chromosomai deletion in a patient with choroideremia, deafness, and mental retardation. Proceedings of the National Academy of Sciences of the USA 84:6521 -6525.

Rivera MC, and Lake JA (1992). Evidence that eukaryotes and eocyte prokaryotes are immedlate relatives. Science 257:74-76.

Wieland 1, Bolger G, Asouline G, and Wigler M (1990). A method for difference cloning: gene amplification foilowing subtractive hybridization. Proceedings of the National Academy of Sciences of the USA 87:2720-2724.

Woese CR, and Fox GE (1977). Phyiogenetic structure of the prokaryotic domain: The primary kingdoms. Proceedings ofthe National Academy of Sciences ofthe USA 74:5088-5090.

CHAPTER II

REPRESENTATIONAL DIFFERENCE ANALYSIS

TO DISTINGUISH CRYPTiC SPECIES

Abstract

in the study of biodiversity, it is important to have a reliable system for

identification of various genetically distinct units (species, subspecies, etc).

One of the most efficient tools available today is the polymerase chain reaction

(PCR) with diagnostic primers, that yield a detectable product for one taxon but

not for other taxa. Critical to this method is the identification of diagnostic DNA

fragments from which primers can be designed. Representational difference

analysis (RDA) can reliably isolate DNA fragments that are unique to a specific

taxon. In this report, we demonstrate the utility of the technique by development

of binary markers that distinguish between two cryptic species of voles (genus

Microtus).

Introduction

Although it is possible to distinguish cryptic species using common

methods such as sequencing or karyotyping, a fast yes/no-type of assay Is

highly desirable. Such an assay is especially valuable when large-scale studies

are conducted in which significant numbers of individuals need to be

unambiguously identified. The real challenge however is isolating markers that

10

can be used in this assay because dosely related species might have highly

similar genomes (possibly up to 99.9% in overall sequence identity).

Historically, subtractive hybridization was used to isolate differences between

highly similar pools of DNA. For example, it was used to identify small deletions

in the bacteriophage T4 genome (Bautz and Reilly 1966). Numerous

modifications of this procedure have been developed over the years, however

all of them suffer from insufficient enrichment of desired sequences when

applied to highly complex genomes such as the mammalian. Lisitsyn et al.

(1993) described a technique, representational difference analysis (RDA), which

is specifically directed toward the isolation of differences between two complex

DNA samples. RDA employs a subtractive hybridization approach, but greatly

facilitates the purification of unique fragments by kinetic enrichment. This

procedure compares two genomes and subtracts sequences that are similar

between them while, on the other hand, it amplifies fragments that are unique to

one of the genomes (íesfer genome, Fig 2.1). A reciprocal study also can be

performed, in which the other genome is used as the tester (Fig 2.1). As a

result, PCR primers can be designed that yield a diagnostic amplification

product from one genomic DNA but do not produce any amplification from the

other genomic DNA. This approach does not require any additional analyses

such as pattern recognition (RFLP and RAPD), repeat number scoring (mini-

and microsatellites), or sequence analysis and alignment (sequence data).

11

In RDA, the initial sequence complexity of genomic DNA is reduced by

PCR that can amplify only relatively short fragments, often called amplicons

(average size «0.6 kb), representing only a subsample (2-15%) of the original

genome (see Fig. 2.1). Prior to the amplification, genomic DNA is digested with

a restriction enzyme (generating fragments with average size 2-5 kb) and

oligonudeotide adapters are ligated to the ends of the generated restriction

fragments. These same adapters are used as the amplification primer in the

PCR. During a single RDA procedure two genomes can be compared: one

designated tester, the other designated driver. Sequences of the tester ihaX are

unique to the tester and are not present in the driver (these sequences are

commonly referred as targets) represent the differences between the two

compared genomes. As a result of the entire RDA procedure, involving three to

four rounds of enrichment, these targets are purified more than 10^-fold and can

be easily cloned and analyzed. (Modified from Baldocci and Flaherty 1997).

Additional description of the RDA procedure can be found in reviews by Lisitsyn

(1995) and Baldocci and Flaherty (1997).

RDA has been successfully utilized in a variety of experiments, such as

isolation of probes that detect DNA loss and amplification in tumors (Lisitsyn et

al. 1995), isolation of genes responsible for pathogenicity expression in baderia

(genus Neisseria; Tinsley and Nassif 1996), identification of Y chromosome-

specific sequences (Donnison et al. 1996), as well as in others (see Baldocci

and Flaherty 1997). However, its utility in biodiversity studies has never been

12

explored. To empirically test the utility of RDA for taxonomy we employed it for

the development of genetic markers capable of distinguishing two closely

related species of voles (Microtus arvalis and M. rossiaemeridionalis). These

two taxa are indistinguishable under field conditions and are truly sympatric over

expansive portions of their ranges (Zagorodnyuk 1991), but can be reliably

identified based on karyotypes (M. arvalis 2n = 46, FN = 58-90; M.

rossiaemeridionalis 2n = 54, FN = 54). Knowing the karyotypes of animals, it is

easy to test the efficacy of developed markers. As a result of our experiments

we obtained two primer pairs: one specific for M. arvalis, another for M.

rossiaemeridionalis (Table 2.1 and Fig 2.2).

Materials and Methods

Specimen Collection, identification and isolation of Genomic DNA. All

specimens, induding ones used in the original RDA experiments and in marker

testing, were collected in northern Ukraine from several localities around

Chernobyl and karyotyped as described in Baker et al. 1996. High molecular

weight genomic DNA was isolated from frozen liver samples.

RDA and Analvsis of Difference Products. RDA was carried out as

described by Lisitsyn and Wigler (1995) with slight modifications concerning

difference products purification between rounds of enrichment. We used Bgl\\

restriction enzyme and oligonudeotide adapters RBgl12 and RBgl24 (Lisitsyn et

al. 1993) to prepare amplicons from M. arvalis and M. rossiaemeridionalis

13

genomic DNAs. Two reciprocal experiments were then performed using (1) M.

an/alis amplicons as the tester (M. rossiaemeridionalis as the driver) and (2) M.

rossiaemeridionalis as the tester (M. arvalis as the driver). Difference products

obtained after three rounds of RDA were analyzed on agarose gel and the most

prominent bands were excised and cloned into pGEM-T vector (Promega).

Fifteen clones per plate per cloned band were amplified using vector-situated

primers. Amplification products were then digested with the restriction

endonudease Sau3fK\ (Promega) and run on an agarose gel. Products

displaying identical restriction fragment pattems were considered a "family" of

sequences representing the same RDA product. To identify diagnostic RDA

products, a member of each "family" was then labeled with [a^^PJdCTP (NEN

laboratories) using Random Primed DNA labeling kit (Boehringer Mammheim).

Radioactively labeled RDA products were hybridized to blots prepared using

genomic DNAs of M. an/alis and M. rossiaemeridionalis digested with 8gf/li

endonuciease (Promega). Products that were identified as diagnostic based on

hybridization (either presence of hybridization in case of tester DNA and

absence in case of driver or different pattems of hybridization between tester

and driver; Fig 2.2) were sequenced using the dRhodamine Terminator Ready

kit (Perkin Elmer) and an ABi 310 autosequencer. Primers to individual

difference products were designed using Oligo 4.05 program and tested on

genomic DNA of previously karyotyped animals.

14

Results and Discussion

Our goal was to employ RDA for isolation of DNA fragments unique to

each of two cryptic species of voles (Microtus arvalis and M.

rossiaemeridionalis). it would allow us to design PCR primers specific to each

species that generate a diagnostic amplification product unique to each of them.

To do so we performed two reciprocal RDA experiments using DNA from M.

arvalis as the tester and DNA from M. rossiaemeridionalis as the driver in

experiment A (Fig 2.2A) and reciprocal experiment B (Fig 2.2B).

Experiment A. The RDA using DNA from M. arvalis as the tester yielded

a single most abundant product. it was cloned (clone MarÂ) and used as a

probe in a southern hybridization experiment with digested genomic DNA from

M. arvalis and M. rossiaemeridionalis. The hybridization pattern (Fig 2.2A) was

different for tester and driver DNA suggesting that MarÂ is, indeed, a

diagnostic difference product. Nudeotide sequence of MarÂ was determined

(accession number AF093582) and compared to the ENTREZ database

(http://www.ncbi.nlm.nih.gov/Entrez). No matching entries were found. Primers

complementary to the sequence of MarÂ were then designed and tested on

genomic DNA isolated from individuals of M. arvalis (N = 10) and M.

rossiaemeridionalis (N = 10) that were identified by karyotyping. As shown in

the Fig 2.2A, these primers (/War14F and /War14R; Table 2.1) produced

amplification products only on genomic DNA from M. arvalis; therefore, within

this experimental design they are diagnostic for this spedes.

15

http://www.ncbi.nlm.nih.gov/Entrez

Experiment B. Three difference products were obtained in this

experiment. All three displayed different patterns of hybridization to the tester

and driver DNAs. One of these products, designated /Wro16, was analyzed in

detail. Southern hybridization of labeled /Wro16 to the tester and driver DNA is

shown on Fig 2.2B. Based on the hybridization pattern it is easy to condude

that /Wro16 represents a repetitive element unique to M. rossiaemeridionalis.

/Wro16 was sequenced and its sequence (accession number AF093583) was

compared to the ENTREZ database. This analysis revealed a highly similar

sequence representing a M. rossiaemeridionalis Bl-like element (accession

number U36930; Mayorov et al. 1996). Primers complementary to the sequence

of /Wro16 (/Wro16F and /Wro16R; Table 2.1) were designed and tested as

described above. Subsequent amplification generated products only from the

genomic DNA of M. rossiaemeridionalis and within our sample primers /Wro16F

and /Wro16R are diagnostic for /W. rossiaemeridionalis.

Our results demonstrate that RDA is a powerful and reliable method for

isolation of genetic markers suitable for biodiversity studies. By analyzing a

single RDA product in experiment A and three products in experiment B (Fig

2.2), we were able to design primers diagnostic for /W. arvalis and M.

rossiaemeridionalis. Notably. a difference product unique to /W.

rossiaemeridionalis (Fig 2.2B) is a repetitive Bl-like element (belonging to the

class of short interspersed elements, SiNEs). RDA preferentially isolates

repetitive sequences, if they are different between two compared DNAs (Navin

16

et al. 1996), as was the case in library screening (Baker et al. 1997). This

feature of the method is important for this particular study because

presence/absence of families of repetitive elements is a robust phylogenetic

character that is free of homoplasy (Verneau et al. 1998). Another advantage of

repetitive elements as diagnostic characters is that when they are used in a

PCR-based assay there is only a minute probability of having a null allele

because multiple loci are amplified simultaneously. In condusion, recent

technical inventions allow the analysis of complete genomes of organisms for

evolutionary and biodiversity studies rather than selected loci that represent only

a negligible part of total genomic DNA. Here we demonstrated that RDA, which

compares entire genomes, may be the preferred method for isolation of genetic

markers between closely related species.

17

Literature Cited

Baker RJ, Hamilton MJ, Van Den Bussche RA, et al. (1996). Small mammals from the most radioactive sites near the Chemobyl nuclear power plant. Journal of Mammalogy 77:155-170.

Baker RJ, Longmire JL, Maltbie M, Hamilton MJ, and Van Den Bussche RA (1997). DNA synapomorphies for a variety of taxonomic levels from a cosmid library from the new worid bat Macrotus waterhousii. Systematic B/o/ogy 46:579-589.

Baldocci RA, and Flaherty L (1997). isolation of genomic fragments from polymorphic regions by representational difference analysis. Methods 13:337-346.

Bautz EKF, and Reilly E (1966). Gene-specific messenger RNA: isolation by the deletion method. Science 151:328-330.

Donnison IS, Siroky J, Vyskot B, Saedler H, and Grant SR (1996). Isolation of Y chromosome-specific sequences from Silene latifolia and mapping of male sex-determining genes using representational difference analysis. Genef/cs 744:1839-1901.


Lisitsyn NA (1995). Representational difference analysis: finding the differences between genomes. Trends in Genetics f f :303-307.

Lisitsyn NA, Lisitsyna NM, Dalbagni G, et al. (1995). Comparative genomic analysis of tumors: detection of DNA losses and amplifications. Proceedings of the National Academy of Sciences of the USA 92:151 -155.


Mayorov Vi, Adkinson LR, Vorobyeva NV, et al. (1996). Organization and chromosomal localization of a Bl-like containing repeat of Microtus suban/alis. Mammalian Genome 7:593-597.

18

Navin A, Prekeris R, Lisitsyn NA, et al. (1996). Mouse Y-specific repeats isolated by whole chromosome representational difference analysis. Genomics 36:349-353.

Straus D, and Ausubel FM (1990). Genomic subtraction for cloning DNA corresponding to deletion mutations. Proceedings ofthe National Academy of Sciences of the USA 87:1889-1893

Tinsley CR, and Nassif X (1996). Analysis of the genetic differences between Neisseria meningitidis and Neisseria gonorrhoeae: two closely related bacteria expressing two different pathogenicities. Proceedings ofthe National Academy of Sciences of the USA 93:11109-11114.

Verneau O. Catzeflis F, and Furano AV (1998). Determining and dating recent rodent speciation events using L1 (LiNE-1) retrotransposons. Proceedings of the National Academy of Sciences of the USA 95:11284-11289.

Zagorodnyuk IV (1991) Systematic position of Microtus brevirostris (Rodentiformes): materials toward the taxonomy and diagnostics of the "arvalis" group. Vestnik Zoologii 3:26-35.

19

u> Csl <o d O)

mer

"t_ Q.

SZ o (0 (D

*#— o

^ o. i n

c o

' —'

pos

E o o c g

^ i w

Rea

c

(0 . • - '

o 13 T3 O ^ Q.

< Û Q:

from

"O (D C

0 TJ to <D E

' i _ Q. O

sti

o c (0 b

c\i (D

JD . o j :

1 -

0 ^ - í

*•—

o c g

_5 ^

o o T ~ ^ ' l l l ^

— M—

o " >,-

^

"(D o Q.

' •

< "Z. Q o

• — £ o c (D O)

H—

o D) c i n

1 • ^

<D" co (0 1—

<D

E _>,

o Q. C3-

K t i _

O i2 "c 3

i n <N ^

(0 CL 1— ^ 2 :2 T5 O ^ o o _ <p <0 S.E — O) S ' C c o

_Q) i ^ O u . Q. Q: o Q .

"rô E Q. O

D)

CO

E "Q. O

o O

Û_ ^

C 3 O E CD

O) c <D

<D O C <D 13 D-<D

0

E

(0 o (D

O o <J)

(0 o co o o o co

(0 l o ' ^

o o co h-

U) 0 o >* o lO co

CsJ

CO

co

O h-< O

O

<

o < o <

o

LL

CtJ

c o

O O < O I-o < o < o < o < I -o o o o o I -o o

CD

O O I-

<

o t o o < o h-I-< o < o o o t o CD

5

20

digest lígate new adapters

l ^ A reciprocai ^ j \r experíment can be done ^í f

genomic 1 DNA ( tester

1

V

digest ligate adapters PCR amplify

tester amplicons

1

N

hybridize tester with excess of

driver

genomic DNA driver

1

J/ V

c o ^ o c s 'S s t 3 O- 5-•n fl> ? ^ ^ ^ • ^ ^ M

0) (0 o

driver ciigest amjDlicons

1

^

V

f PCR amplify diaest/ligate adaoters

"^ <« 1« ^ O f^ â> r- ^

E S S

rich

iff

er

-4r<

C "0 fo 0)

taraets I cloning and

analysis

Figure 2.1 Schematic representation of the RDA procedure

21

o o CD

m

oeissQj i^ i

h- (D •<1- 00 T- O) -"t O) in co (T) <N

sijeAje IV :?

aefssQj y\i : -:t'-'-ÛL

> __

T3 g OQ ^ T3 O) (0 OJ C

O (0

.í2 - >. g 0) «0

" 1 _ ^ (0 (D

S ^ o O C T3

<D ^ •

Q- C co -

I (0

i2

O)oû

o to ^

• D

(0 3

o^r o o

Q .

O O CO

0 • o • o

Q.

O O

> .

V- - C (/) <D 3 TO E O ^ •c CO ^ Q.M_ (O

.Q 3 ^ O <0 I

-2 ^ CÛ T3 "c ^ -

0) CD

€ 0 Q.

Q.

I (0 <

Ui <

o>-ã

Q . ^ Û Ô < Í T

o á g

^ f E 3 g o co t >=

Q: .5 < CNJ CO Q

cvi S -o

0:9

<0 " ^ JCO

2 •= T3 CDi2 >

^ §.? :á 8 C CD c <0

S 2S <b T3 . 0 B Oig ro T3 J)

co ^ cb p .o*.S5

^ © co ^ T3 p T3 Í2 CD c

: ^ l O ro o-jw

»*- Q T3

8 "

II CN

<D T3 T3

• ^ C -Q Q

it CD O

. |s S^ © <D

• S CO

<o B-c: <D

CD <

< G o z î^ ^ Û Q. ^

o E o

T3 O Q

i2 <o E W i i o ® 3 c

=520

o II s <b § co (0 <0 O)

I

iO <D

18 <D T3 "^ C 0) — .C T3 "^ C CO CD CD .Q

22

CHAPTER III

ISOLATION OF BINARY SPECIES-SPECIFIC

PCR-BASED MARKERS AND THEIR VALUE FOR

DIAGNOSTIC APPLICATIONS

Abstract

Representational difference analysis (RDA), a technique directed toward

isolation of differences between highly similar complex genomes, was employed

for isolation of species-specific markers. These markers can be easily adapted

for a high throughput, PCR-based assay in which multiple specimens can be

simultaneously identified based on the presence/absence of amplification

products. One of the important features of RDA performed on genomes of

different species, interspecific RDA, is its ability to preferentially isolate families

of repetitive sequences that are unique to one of the compared genomes and

not present in the other. Such families of repetitive DNA are homoplasy-free

characters that can be used for cost efficient, mass identification of specimens

in a variety of situations ranging from mark-recapture studies to screenings of

egg or larval stages.

Introduction

At least from a theoretical standpoint specific differences at the DNA

level can be easily used to identify taxa. In reality however, it is difficult and

23

expensive to find taxon-specific DNA fragments in genomes composed of

billions of base pairs, especially when taxa under investigation are closely

related. Methods currently employed to identify taxon-specific markers, such as

sequencing, restriction fragment length polymorphisms, randomly amplified

polymorphic DNA, mini and microsatellites are rather costly and labor-intensive,

especially in cases where multiple samples need to be rapidly and reliably

(yes/no) identified. These methods are hit or miss because they are not

designed to enhance the probability of isolation of a desired marker. Two

altemative methods have been described to increase the probability of

identification of taxon-specific markers. First, library screening (Baker et al.

1997) has been employed to identify markers specific to a variety of taxonomic

levels. However, these authors did not develop primers to test for efficacy of the

markers in a PCR-based assay to yield a presence/absence test. Library

screening requires significant amounts of DNA and is a labor intensive

procedure. Second, subtractive hybridization has also been utilized to identify

unique fragments between closely related genomes (Straus and Ausubel 1990).

This method, on the other hand, does not provide adequate subtractive

efficiency when complex genomes (for example, mammalian genomes) are

compared (Lisitsyn 1995).

Lisitsyn and co-workers (1993) developed an approach designed

specifically for isolation of differences between complex genomes ~

representational difference analysis (RDA). Although ideology of RDA might

24

resemble the subtractive hybridization approach, it includes a step that allows

RDA to be efficient on complex genomes, i.e., representation. During the

representation the initial sequence complexity of genomes subjected to RDA is

reduced to 2%-15% (Lisitsyn 1995) by digesting genomic DNA with a restriction

enzyme, ligation of a oligonudeotide adapter to the ends of restnction fragments

followed by PCR optimized to effectively amplify fragment with an average

length of only 0.6 kb (products of this amplification are called amplicons and

prepared separately for both genomes to be compared). To sample the entire

"sequence space" of a genome several representations can be done with

different restriction enzymes. Representation is followed by a subtraction step

when a denatured representation of one genome (tester) is hybridized with the

excess of a similar denatured representation of the other genome (driver). Prior

to this hybridization another set oligonucleotide adapter is ligated to the tester

DNA only. During the hybridization, reassociation of DNA strands can follow

one of three possible ways: formation of driver homoduplicis, tester

homoduplicis, or driver/tester heteroduplicis. Because great excess of driver is

used sequences similar between tester and driver most likely reassociate

forming driver/tester heteroduplicis than forming tester homoduplicis. Therefore,

only sequences unique to tester form tester homoduplicis and because these

have oligonucleotide adapters at both ends tester homoduplicis can be

exponentially amplified (enriched) after hybridization is completed. Enriched

sequences undergo additional hybridization to ensure purity from common

25

sequences (typically three to four cycles are performed) with increase of driver-

to-tester ratio at each cycle and the final result of the entire procedure is

isolation of tester-specific sequences (Lisitsyn et al. 1995).

Use of RDA-derived markers for diagnostic purposes is a new application

as the procedure was originally designed to detect and clone genetic lesions in

cancer (Lisitsyn and Wigler 1995). Only recently, Ushijima et al. (1998) have

employed RDA to develop a series of Bl-repetitive element (short interspersed

element, SINE) based markers that allow high throughput genotyping of inbred

rat strains. This approach however is not PCR-based and suitable only for

intraspecific identification (identification of different strain within the same

species). Additionally, it cannot be applied for species when there is very little

or no information available on genome organization.

Our goal was to develop a reliable approach for identification of four

closely related species of voles (Microtus arvalis, M. rossiaemeridionalis, M.

oeconomus, M. agrestis) that inhabit overlapping areas in the northern Ukraine

(Baker et al. 1996) and are utilized in the study of environmental consequences

of the Chernobyl power plant meltdown. Previously our group (Nekrutenko et al.

1999) applied RDA to isolate genetic markers capable of distinguishing two of

these taxa representing one of the most striking examples of cryptic species

(Microtus arvalis and M. rossiaemeridionalis). As a result we have designed two

primer pairs (M. an/alis-spec\f\c, and M. rossiaemeridionalis-spec\f\c) that permit

rapid identification of these otherwise indistinguishable taxa. This success

26

prompted us to develop genetic markers that allow identification of individuals of

two other vole species that occur in our collecting area: M. agrestis and M.

oeconomus, separated by greater genetic distance. Adult animals of these two

species can be distinguished based on morphological characters whereas

identification of juvenile individuals is difficult if not impossible. Therefore, it is

highly desirable to be able to simultaneously identify large numbers of

specimens in a fast and reliable manner without sacrificing of individuals. In this

report, we summarize features of interspecific RDA and discuss some of its

potential scientific and economic applications.


Animals were collected in northern Ukraine from several localities around

Chernobyl and identified by karyotyping as described in Baker et al. (1996).

High molecular weight DNA was isolated from frozen liver samples following the

method of Longmire et al. (1997). For RDA experiments, we have chosen two

male individuals per species. RDA was performed as described in Lisitsyn and

Wigler (1995). We used Bgl\\ restriction enzyme for representation. Two

reciprocal experiments were performed using (I) M. agrestis amplicons as tester

(M. oeconomus as the driver) and (II) M. oeconomus amplicons as tester (M.

agrestis as the driver). After three rounds of RDA most prominent bands were

excised from agarose gel, purified using Qiagen gel purification kit (Qiagen Inc,

Valencia, CA) and cloned into pGEM®-T vector (Promega Corp., Madison, Wl).

27

After transformation into JM109 E. coli cells fifteen colonies per excised band

were screened by PCR using J-adapter (JBam24) as a primer (Lisitsyn and

Wigler 1995). Products of this amplification were then digested with Sau3A

restriction enzyme (Promega Corp.) to detect most common sequences. One

clone from a series displaying common pattern was labeled with [a-^^P]-dCTP

(NEN™ Life Sciences Products, Inc, Boston, MA) and used as a probe in

southern blot experiment with BamH\ digested genomic DNA of M. agrestis and

M. oeconomus. Probes that produced hybridization to the genomic DNA of one

species but did not give any detectable hybridization with the genomic DNA of

another were considered representing "true" difference products.

Prehybridization and hybridization were performed at 42°C using the following

solution: 5xSSC, 1% SDS, 0.005% non-fat dry milk, 5xDenhardt solution, 50%

formamide. "True" products were further sequenced using dRhodamine dye

terminator kit (Applied Biosystems, a division of Perkin Elmer Corp. Foster City,

CA) and analyzed on a ABi™310 autosequencer (Applied Biosystems).

Oligonucleotide primers were designed to each sequenced difference product

using Oligo® 4.05 computer program and tested on genomic DNA isolated from

individuals of M. oeconomus and M. agrestis.

Results and Discussion

We performed two reciprocal RDA experiments that we expected to yield

markers distinguishing M. agrestis from M. oeconomus. In the first experiment

28

M. agrestis genomic DNA was used as the tester and compared against M.

oeconomus DNA used as driver. In the second experiment tester and driver

were switched. We analyzed only a small subset of difference products

generated in each of two experiments. Hybhdization patterns shown in Fig 3.1

indicate that both isolated markers are repetitive elements (multiple bands are

present). Both difference products (/Woel ~ M. oeconomus-spec\f\c] and Mag3-

- M. agrestis-spec\f\c were sequenced and deposited in GenBank under the

following accession numbers: XXXXXX, XXXXXX). Using these sequences we

were able to design species-specific PCR-primers that produce amplification

product from the genomic DNA of one species and do not yield any visible

amplification for the other species (Fig 3.1). Therefore it is convenient tool for

simultaneous identification (both primer pairs have identical PCR profiles, Table

3.1) of multiple samples where the only limitation is the capacity of the thermal

cycle used. As it was stated above, our goal was not only to find markers

discriminating M. agrestis from M. oeconomus but also these two from M.

arvalis/M. rossiaemeridionalis group as juvenile individuals of these four species

look virtually identical and cannot be disfmguished morphologically. We tested

each primer pair of genomic DNA of each of these four species (Fig 3.1). Note,

that primers originally designed to discriminate species within M. arvalislM.

rossiaemeridionalis and M. agrestis/M. oeconomus pairs do not generate

amplification when tested on genomic DNA of species belonging to the other

29

pair. Primer sequences and PCR conditions are given in Table 3.1. Similar

ideology can be used to develop RDA-derived markers for other applications.

Results described above permit us to summarize and discuss features of

RDA performed on genomes of different species, or interspecific RDA. In most

cases difference products isolated by interspecific RDA are repetitive

sequences of the tester genome that are not present in the driver. Because of

the properties of RDA (kinetic enrichment step) it preferentially isolated

repetitive sequences when they constitute differences between tester and driver

(Navin et al. 1996). As we showed previously (Nekrutenko et al. 1999)

repetitive sequences that are present in one species and absent in the other can

be successfully isolated by RDA even when species under investigation are

cryptic and have highly similar complex genomes. It is apparent that families of

repetitive sequences are homoplasy-free characters as the probability of

evolving two identical sequence families in separately evolving is virtually a zero

(Vemeau et al. 1998). Additionally, differences found between two genomes

can be easily converted into PCR-based markers that would discriminate

between taxon based on presence/absence of amplification products. Because

repetitive sequences are represented in the genome by more than one locus

and in a PCR-based application multiple loci are amplified simultaneously the

likelihood of not having an intensive amplified diagnostic product is improbable.

In this study we have created a diagnostic key to identify individuals of

the genus Microtus two species. The need for this key results from the fact that

30

a wide variety of researchers will be capturing and releasing individuals in mark-

recapture studies of survival and dose monitoring. Two species, M. an/alis and

M. rossiaemeridionalis cannot be identified to species without either voucher

specimens or molecular analysis. Additionally, not all of the researchers that

work here are adequately trained to identify immature Microtus under field

conditions and by saving a drop of blood, ear clip or toe it will be possible to

isolate sufficient DNA to use this PCR-based method to positively identify all

individuals of this genus. In addition to the above problem other possible uses

might include identification of egg or larval stages where these forms are

particularly problematic. Finally, we think that RDA has a great potential to

develop diagnostic markers for forensic, law enforsment and conservation

issues. For example, in the caviar market there is considerable concern that

caviar can be collected from endangered species. Using RDA it should be

possible to develop a series of species-specific markers that could be used to

document that the sample came from legal species as well as to document

which endangered species was used in examples of illegal trade. Additionally,

only a single ovum would be required to perform the necessary test from each

sample and with the use of PCR cost per sample should be significantly lowed

than in the case of sequencing of southern blot hybridization.

31

Literature Cited

Baker RJ, Hamilton MJ, Van Den Bussche RA et al. (1996). Small mammals from the most radioactive sites near the Chornobyl nuclear power plant. Journal of Mammalogy 77:155-170.

Baker RJ, Longmire JL, Maltbie M et al. (1997). DNA synapomorphies for a variety of taxonomic levels from a cosmid library from the new world bat Macrotus waterhousii. Systematic B/o/ogy 46:579-589.

Lisitsyn NA, Lisitsyn N, Wigler M (1993). Cloning the differences between two complex genomes. Science 259:946-951.

Lisitsyn NA (1995). Representational difference analysis: finding the differences between genomes. Trends in Genetics '/f :303-307.

Lisitsyn NA, Wigler M (1995). Representational difference analysis in detection of genetic lesions in cancer. Methods in Enzy/no/ogy 254:291-304.

Longmire JL, Maltbie M, Baker RJ. (1997). Use of "lysis buffer" in DNA isolation and its implication for museum collections. Occasional Papers ofthe Museum of Texas Tech University f 63:1-3.

Navin A, Prekeris R, Lisitsyn NA et al. (1996). Mouse Y-specific repeats isolated by whole chromosome representational difference analysis. Genomics 36:349-353.

Nekrutenko A, Makova KD, Chesser RK, Baker RJ (1999). Representational difference analysis to disfmguish cryptic species. Molecular Ecology (in press).

Straus D, Ausubel FM (1990) Genomic subtraction for cloning DNA corresponding to deletion mutations. Proceedings of the National Academy of Sciences of the USA 87:1889-1893

Verneau O, Catzeflis F, Furano AV (1998). Determining and dating recent rodent speciation events using L1 (LINE-1) retrotransposons. Proceedings of the National Academy of Sciences of the USA 95:11284-11289.

32

Ushijima T, Nomoto T, Sugimura T et al. (1998) Isolation of 48 genetic markers appropriate for high throughput genotyping of inbred rat strains by B1 repetitive sequence-representational difference analysis. Mammalian Ge/7ome 9:1008-1012.

33

irim

er

1.25

rig

ina

Q - ^ o 2 Q. <D = L ^ : £

CN O ^ JT m _

eac

meg

; ut

ior

|Ll

of

(Pro

i O

di

tn .^ 2 CNÍ 8 ; = S ^ ^

"O _£î' ^ O = O D) ' ^ •E S >^ • i ^ S ^ E -5 . C W^ >N O CNI ^

*H^ *•— ^

o o < CD — Z

^^"? co ^ E

oduc

t C

orp

geno

»- i - M-Q . <D O < 1 O) û ^ iS

E l JL P <D _

4r CL --^

gned

T

Ps(

C

orp

íO 2 CD <D T3 O)

T3 ^ <D (0 -c E ir o O <D CD t -

E 3.^ Q . ^ <J>

.9 E ro oo in (D

agno

o

f2.

olym

• — — o

í^ CD ^ CM l ^

" °. B ^ 5 o i2 o CD O c O

ofile

i _

Q. cr: o Û-

tima

Q.

o

t5 3

T> P

CL

nce

0 3 <D

CO

0)

E Û-

h- </) 3 (/)

O) c CD

o <D (0

O CD O O "^ C3)

O <D (0

O CO O o o co

o <D <n

lO ' t O o co h-

<N CN CO

O <

o < o < < o o < o o

o

o < o o o o o < h-h-o <

o \-o o < o o I -o o

01

o

<0

o >^ o

in co

CJ) l O CN

o o

\-

o < o

GC

A

o <

o <

o o <

o H <

o o

u. co cn <D

O

O o

AC

Ti

AC

C

o o <

o o o o h-H <

CC co C») <D

34

siisaiBe I/\Í SnUJOUODQO i/\i

O)

I t3 T3 2 o. (D O C

£

Q

•«*•

«n o ro

co ro O CM

to CO <o l O

x : T3 o 0) o c c

0)

i ^ <D -^ 00 >- O) ' T O) iD c m rsi

C o s <D

<n

v,_ »_ <*î

E I S CO CL

P

^

0} o

^ q) o ^ t5

prod

i

<u o c s? 0 î t

b

JC -D <1)

1 w

it gn

e en

c

Hca

tion

rs d

esi

1 se

qu

•— (X> v» Q. C O

E c 5 ro û.

p t

^ ^

O) <o ^ <D O

5

1 e

sfisajBe i^

snuiouooao >v

• >

i î t

• *

»o o rO

to 00 O CM

<o ro to T —

1-

<0

Í2 O T3 C CD I

Oi </)

$0

o g "

CD E o

<D Q . <0 i_

o !_

<D <

<D û

S-o T3 E <D O

T3 C

í? 5Í ^ <D

c 0) <D

Ê 2

• c 0)

<D O

CD

— CO

<D <0 ^ C Q. O CD '^

.C CD

o •> • ^ <D

"O JD

l l <D ^ > C <D ^

CD C .o .•§ c: <b

<b

co <0

2 8

T3

O • >

Q . ^

<0 0) o <D Q. <0

T3 <D

^-» <0 <D •*^ O

sã <o .fl)

0 JZ * .» •*-

o <

<0

8 <D _ O O C

^ 2 <D Î E

T3 O M-

O c o

o <0

3 <0

co 8 c <D

cr

o <D

y j * - r>

^ sî 8 0 0)

Q^ <0

<D

O . t

co"S <D C - O)

o <D Q . <0

"cB c: . o :t3 c: <D

q> •5 <o co

O) <0 iá ' <D

LL T3 i :

8 CD

<0 : 3

o c 8 <b o

O ^

W <D

S o O ) ^

^ o

1.1

II co

•5 o CD —

^ 2 CD-£ <o o p <D ^ co ^ O)

<D c

II

I

35

CHAPTER IV

REPRESENTATIONAL DIFFERENCE ANALYSIS

IN THE STUDY OF ALLOPOLYPLOIDS:

SUBGENOME-SPECIFIC MARKERS AND

FURTHER EVIDENCE OF BIASED CONCERTED EVOLUTION

IN ALLOTETRAPLOID COTTON Gossypium hirsutum.

Abstract

Here I report on the utility of representational difference analysis (RDA)

in the study of the composition of allopolyploids as well as on ability of RDA-

derived markers to correctly depict patterns of molecular evolution within

allopolyploid nuclei. We developed a series of genetic markers specific to the A

and D genomes of diploid cottons. Polymerase chain reaction (PCR) primers

designed to these markers yield an amplification product only when

corresponding genomic DNA is used as a template. Moreover these same A

and D genome specific primers produce amplification products with genomic

DNA from allotetraploid cotton Gossypium hirsutum. Therefore this approach

can be used to study allopolyploids of unknown composition by developing a

series of binary markers specific to suspected diploid progenitors. One of the

RDA-derived markers reported here represents polymorphic amplified

restriction fragment (PARF) - a sequence found in both A and D genomes but

differentially flanked by restriction sites. Sequence analysis of the PARF

36

sequences from the diploid cottons and from the allotetraploid G. hirsutum

confirms the previously reported observation of interlocus concerted evolution

(Wendel et al. 1995ab) among homeologous sequences within the polyploid

nucleus: sequences corresponding to A and D genomes are concerted to a D

genome type eliminating the A genome type. This conclusion is drawn from a

presumably neutral markers indicating that the interlocus concerted evolution is

a common event for allopolyploid genomes and that repeats undergoing this

process should not necessarily be arranged tandemly.

Introduction

Polyploidy has played and perhaps is playing an important role in the

evolution of angiosperms, where an estimated 70% of all extant species have

gone through a polyploidization event (Masterson 1994). Once the original

diploid genomes are united into the allopolyploid nucleus an entirely new

genetic environment is created in which evolutionary changes might take new

directions relative to the changes occurring in corresponding diploid taxa

(Reinisch et al. 1994; Cronn et al. 1996; Wendel et al. 1995ab; Jiang et al.

1998). An ideal model system for studying such instances would be an

allopolyploid taxon for which its diploid progenitors are known. For example,

cottons (genus Gossypium) include both diploid and tetraploid species. All

diploid species are separated into seven genome classes (A, B, C, D. E, F, and

G) based on observations of chromosomal pairing (Endrizzi et al. 1984). This

37

genus also includes five allotetraploid species (2n=4x=52) that have originated

as a result of hybhdization between an A genome and a D genome species. It is

anticipated that the hybridization event that gave nse to allotetraploid cottons

occurred between A genome diploid taxon closely resembling present day

species of G. arboreum or G. herbaceum, whereas D-genome donors were

likely similar to today's G. raimondii or G. gossypioides. Among allotetraploid

species two are important fiber and oilseed crops: G. hirsutum ("upland cotton")

and G. barbadense ("pima" cotton).

How does sequence composition of the original diploid genomes change

after allopolyploidy is established? Wendel et al. (1995ab) analyzed highly

repeated arrays of the internal transcribed spacer regions and 5.8S rRNA gene

sequences from selected species of diploid cotton and five allotetraploid

species. In all diploid and tetraploid species of cotton examined these

sequence arrays were homogenized by concerted evolution. Interestingly, in all

five tetraploid species that should theoretically contain arrays of both types (A

and D) examined sequences were concerted to either D-type (four species

including G. hirsutum and G. barbadense) or A-type (G. mustelinum) type. This

example is a compelling illustration of intergenomic concerted evolution in

allopolyploid taxa. The A genome is physically larger that the D genome yet

both have identical recombinational lengths (Reinisch et al. 1994) implying that

the D genome is more recombinationally active. While some diploid A genome

species were domesticated, D genome diploid species do not produce

38

spinnable fibers and have never been cultivated (Jiang et al. 1998). However,

in the commercially important allotetraploids G. hirsutum and G. barbadense

quantitative trait loci that affect fiber quality and yield were mapped to the D

subgenome suggesting new evolutionary pathways created by tetraploidization

(Jiang et al. 1998). Similar patterns of concerted evolution were suggested in

less comprehensive studies conducted on other allopolyploid plants such as

tobacco (Volkov et al. 1999), wheat (Nagaki et al. 1998), and synthetic

polyploids of Brassica (Song et al. 1995), as well as on some animal species

(Hillisefa/. 1991).

Given these data evolutionary processes occurring within the

allopolyploid nuclei is an interesting and largely unexplored area with our

present understanding based on data from a limited number of molecular

markers. Being able to understand and predict changes occurring in polyploid

genomes is important theoretically and practically because it can permit

prediction of the fate of synthetic polyploids and new cultivars where quite

frequently obtained results deviate from desired. In this report I propose a new

tool for isolation of genome-specific markers in allopolyploid organisms -

representational difference analysis. This method allowed us to (1) develop

markers specific to each of A and D genomes of cotton demonstrating how

similar ideology can be applied to allopolyploids of unknown composition and

(2) infer patterns of evolution of homeologous sequences using data from a

39

polymorphic amplifiable restriction fragment - a type of markers frequently

isolated by RDA.


Genomic DNA was isolated from young leaves of G. arboreum, G.

herbaceum, G. hirsutum, G. thurberi, and G. raimondii using QUAGEN DNA

plant maxi kit (Quagen). RDA was carried out as described in Lisitsyn and

Wigler (1995). We used BamHI restriction endonuclease to cut genomic DNA

and R-Bam adapters (Lisitsyn et al. 1993) to prepare tester and driver

amplicons. After performing three rounds of RDA most prominent difference

products were excised from an agarose gel, purified using Quagen gel

purification kit (Quagen) and ligated into pGEM-T vector (Promega). After

transformation, we screened 15 clones per excised product using adapter J-

Bam24 (Lisitsyn and Wigler 1995) as an amplification primer. In most cases all

15 clones were identified as having correct insert based on amplification

product size and these same products were cut with SauZA restriction enzyme

(Promega) and run on a 2% agarose gel to detect most abundant difference

products based on restriction fragment patterns. Most abundant products were

reamplified and used as probe in a southern blot experiment performed to

confirm that these products are true difference products (present in genomic

DNA of tester, but absent in driver). We used [a-^^P]-dCTP (NEN laboratories)

and a random primed labeling kit (Boehringer Mannheim) to label the probed.

40

Probes were then hybridized to a positively charged nylon filters (Boehringer

Mannheim) carrying BamHI-digested genomic DNA of tester and driver at 42°C

in a hybridization solution containing 5xSSC, 1% SDS, 0.005% non-fat dry milk,

5xDenhardt solution, 50% formamide. Products that were identified as true

difference products were further sequences using dRhodamite Dye Temriinator

kit (Peri^in Elmer) and analyzed on a ABI 310 automated sequencer (Applied

Biosystems). Obtained nucleotide sequences were used to design difference

product-specific primers with the help of Oligo 4.05 software. Primer

sequences and amplification conditions are given in Table 4.1. Nucleotide

divergence analysis was performed using program DnaSP 3.0 (Rozas and

Rozas 1999).

Results

Isolation of DNA Fraqments Specific to A and D Genomes. Two

reciprocal RDA experiments were performed in order to isolate A and D

genome specific sequences. In the first experiment we used DNA from G.

arboreum (A genome) as the tester and DNA from G. raimondii (D genome) as

the driver. In the second experiment tester and driver were reversed. Each

RDA experiment yields a large number of difference products enriched to a

different extend. We analyzed only a small subset of such difference products

in each of our experiment. First experiment (A genome as the tester) yielded

two difference products (A20, A36) that were confirmed as absence/presence

41

differences based on southern blot hybridization (Fig 4.1A). Labeled probe A36

produced strong hybridization signal implying that this is a highly repetitive

element whereas probe A20 appears to be a low or perhaps single copy

element based on the hybridization intensity. Second experiment (D genome

as the tester) yielded two absence/presence products (D10 and D13) and one

so called polymorphic amplifiable restriction fragment (D1, Fig. 4.1A). Note that

hybridization intensity for D-specific probes is significantly lower when

compared to A-specific probes. Polymorphic amplifiable restriction fragments

(PARFs, see Lisitsyn et al. 1993) are sequences that are present in both tester

and driver genomes but are differentially flanked by restriction fragments. As

shown in Fig. 4.1A labeled D1 probe hybridizes to different fragments of A and

D genomic DNAs. Thus, as the initial result we have two DNA fragments

specific to each G. arboreum (A genome) and G. raimondii (D genome) and one

fragment characteristic for both species but differentially flanked by restriction

sites.

To test whether these fragments can be used not only to distinguish the

two species of diploid cottons but rather to sort A genome from D genome we

designed a series of amplification primers to each of four difference products

(A20, A36, D10, and D13) and the PARF (D1) described above. Using these

primers (sequences are given in Table 4.1) we amplified genomic DNA of two

original species (G. ari^oreum and G. raimondii), genomic DNA of two other

diploid cottons (G. heriDaceum [A-genome] and G. thuri)eri [D-genome]) as well

42

as the genomic DNA of the allotetrapolid cotton G. hirsutum. We selected G.

thurberi ^rom a number of D genome cottons because it is relatively distant from

G. raimondii based on phylogeny published by Wendel et al. (1995b). Results of

this amplification are given in Fig. 4.1B. Based on this amplification we

conclude that our markers are diagnostic for A and D genomes within this

sample and also generate products of the same size from the tetraploid

species. Dl-specific primers produced amplification products on all five plant

samples.

Analysis of the Difference Products. First we sequenced amplification

products shown in Fig 4.2 directly (without cloning). Based on hybridization

some of our probes are repetitive elements. Primers designed to these probes

are amplifying multiple loci simultaneously which is a powerful feature for

diagnostic applications (probability of encountering a null allele is low). In

theory however repeated elements can be polymorphic especially when they

represent non-codlng regions which poses a problem for the direct sequencing

approach. We succeeded in obtaining clean chromatograms from PCR

products generated with A20, D10 and D1 amplification primers. Products of

primer pairs designed to A36 and D13 required cloning prior to sequencing.

Obtained sequences were deposited to Genbank under accession numbers

xxxxxxx-xxxxxxx.

Sequencing Analvsis of the Individual Difference Products. PCR

products generated by A20-specific primers were obtained from diploids G.

43

ari)oreum, G. heri)aceum and an allotetraploid G. hirsutum. Interestingiy there

is no variation among these sequences: all nucleotide sites (547) are fixed

among these three species. When A20 sequence is compared to the non-

redundant Entrez database using the BLAST program

(http://www.ncbi.nlm.nih.qov/BLAST: we used blastx algorithm that compares

all six possible reading frames of a nucleotide sequence against the protein

database) it matches a short region of Lilium A7enry/del1-46 retrotransposon

(accession XI3886) reported by Smyth et al. (1989). Even though BLAST

score is low (37) matching regions are 94% similar (Table 4.2). Zhao et al.

(1998) reported several G. barbadense repetitive DNA clones that had higher

matches with the Lilium /7enry/del1-46 retrotransposon; however, the A20

sequence is different from ones reported by these authors.

Amplification products obtained with A36-specific primers on the DNA of

the two A genome diploids and the allotetraploid were cloned and two clones

per species were sequenced. When these sequence were used in BLAST

searches they appeared to have similarity with the repetitive DNA clone

pXP077 from G. barbadense (accession AF060598) reported by Zhao et al.

(1998). All sequenced clones were polymorphic (no identical clones were

found) wlth the nucleotide diversity estimator (TI, Nei and Li 1979) ranging from

0.075 for G. arboreum clones to 0.129 for G. hirsutum clones.

Sequences of PCR products obtained with DlO-specific primers did not

produce any significant matches with the Enrez entries. Products of D13-

44

http://www.ncbi.nlm.nih.qov/BLAST

specific primers were cloned and two clones per product were sequenced for

each of the three cottons G. thurberi, G. raimondii ar\6 G. hirsutum. Again, no

identical clones were found. Calculated n values are ranging from 0.200 in G.

thurberi clones to 0.239 in G. hirsutum clones.

Sequences corresponding to PARF D1 were obtained for all five

analyzed species of cotton. BLAST search with these sequences against the

Entrez database did not identify any significant matches, and we were unable to

detect presence of any continuous open reading frames (ORFs); therefore, it

may be that this region is uncoding and possibly selectively neutral. Having

these homeologous sequences we posses a unique opportunity to test whether

in allotetraploid G. hirsutum both types of sequences (A genome-derived type

and D genome derived type) still co-exist or have been homogenized by

concerted evolution. We sequenced directly PCR products from all five species

(two A cottons, two D cottons and the allopolyploid) and performed a

phylogenetic analysis. Fig. 4.2 shows the resulting Neighbor-Joining tree. As

can be seen G. hirsutum sequence clusters together with the G. raimondii

fragment supporting the observation previously made by Wendel et al. (1995)

that in G. hirsutum, repetitive DNA regions (rDNA) are homogenized to the D

type sequences. As direct sequencing conceals variation and indicates only

most abundant sequence type we cloned PCR products obtained for G.

hirsutum and sequenced five of them. Although all clones had different

sequences when phylogenetic analysis was performed all clones were again

45

clustered together with the G. raimondii sequence confirming our initial

observation. Notably the nucleotide variation among these five G. hirsutum

clones was low (7c=0.002).

Discussion

RDA allows the study of allopolyploid genomes in an effective and

precise fashion without the need to construct libraries and screen for non-

crosshybridizaing clones. This approach can be used in two types of

applications. First, to determine the composition of a given allopolyploid a set

of diagnostic markers can be developed specific to each of its diploid

progenitors. When these markers are applied to the allopolyploid it would be

possible to determine which of the candidate diploids contributed their genomes

to the allopolyploid nucleus. This approach is illustrated on Fig. 4.1B where all

A and D specific primer pairs generate amplification products when the

allotetraploid's DNA is used as a template. Second, sequence analysis of

difference products may provide valuable data for highlighting changes

occurring to homeologous sequences in allotetrapolid nuclei. When RDA is

performed on closely related species it is possible to isolate differences

representing sequences that in reality are present in both tester and driver

DNAs but either vary in copy number of differentially flanked by restriction sites

(PARFs). In our example PARF D1 sequences are present in both A and D

genome diploids. Analysis of sequence data for this genomic fragment enabled

46

us to draw the conclusion that in the allotetreaploid species A- and D-type

sequences were homogenized to the D-type by the process of interlocus

concerted evolution reported previously by Wendel et al. (1995ab).

Additionally, sequence data from A36 and D13 clones indicate that on average

nucleotide variability of D-genome sequence is higher than that for those of A-

genome (Table 4.2). This observation is compatible to data reported by

Reinisch et al. (1994) and Wendel et al. (1996). These authors reported neariy

identical recombinational lengths of A and D genomes based on RFLP maps

implying that the physically smaller D genome is more recombinationally active.

The latter group have sequenced multiple clones representing 5S rRNA gene

and intergenic spacer region. For these genome fragments nucleotide

variability (n) was higher for D genome plants than for A genome species (0.058

vs. 0.039). However to statistically prove our observations a greater number of

difference products and corresponding clones needs to be analyzed.

The Lillium henryi retroposon del-like sequence (A20) reported here is an

interesting finding. Recently, Zhao et al. (1998) analyzed a large sample of

cotton (G. barbadense) repetitive DNA clones and identified three A genome-

specific clones (pXP030, pXP067 and pXP1-58) con-esponding to reverse

transcriptase and integrase regions of the retroposon. Our sequence does not

match any reported by these authors, possible because it corresponds to a

different region of the reverse transcriptase gene (Table 4.2; see also Smyth et

al. 1989). It is surprising that this retroposon-like sequence is restricted to A

47

genome only because similar sequences are found in a variety of organisms

ranging from yeast to fruit flies (Table 4.2).

Based on southern hybridization patterns (Fig. 4.1A), we do not have a

reason to believe that any of sequences reported here represent tandemly

repeated elements. Arrangement of repeats is one of the important factors

affecting concerted evolution: tandemly repeated sequences typically have

higher probability of being homogenized by unequal crossingover or gene

conversion events (Li 1997). Nevertheless homogenization occurred in

sequences of PARF that is strikingly similar with the pattern described by

Wendel et al. (1995) from rDNA arrays in cotton. Thus there are two reports on

biased (D genome biased in this example) homogenization of homeologous

repeats in an allopolyploid that may reflect a general pattern. In this sense, it

would be interesting to investigate changes to homeologous sequences in

another allotetraploid cotton G. mustelinum in which rDNA repeats have

concerted to an A type (Wendel et al. 1995a). Although precise mechanisms of

concerted evolution are yet to be determined, allotetraploids offer a model

system for studying this important process.

As noted above a single RDA experiment yields hundreds of difference

products. In the present report, we only analyzed two products from one

experiment and three from the other. Potentially, a library of difference

products can be constructed and screened for presence of tester-specific

clones and PARFs. Markers identified this way can be used in many

48

applications ranging from construction of genetic maps (see, for example,

Toyota et al. 1996) to the large-scale analysis of polymorphisms.

49

Literature Cited

Cronn RC, Zhao X-P, Paterson AH, and Wendel JF (1996). Polymorphism and concerted evolution in a tandemly repeated gene family: 5S ribosomal DNA in diploid and allopolyploid cottons. Journal of Molecular Evolution 42:685-705.

Endrizzi JE, Turcotte EL, and Kohel RJ (1984). Qualitative genetics, cytology, and cytogenetics. In Cotton (ed. Kohel RJ and Lewis CF) ASA/CSSA/SSSA Publishers, Madison, Wisconsin.

Hillis DM, Moritz C, Porter CA, and Baker RJ (1991). Evidence for biased gene conversion in concerted evolution of ribosomal DNA. Science 251:308-310.

Jiang C-X, Wright RJ, El-Zik KM, and Paterson AH (1998). Polyploid formation created unique avenues for response to selection in Gossypium (cotton). Proceedings ofthe National Academy of Sciences ofthe USA, 95:4419-4424.

Li W-H. (1997). Molecular Evolution. Sinauer Assiciates. Sunderiand, Massachusetts.



Masterson J (1994). Stomatal sized in fossil plants: Evidence for polyploidy in majority of angiosperms. Science 264:421-424.

Nagaki K, Tsujimoto H, and Sasakuma T (1998). Dynamics of tandem repetitive Afa-family sequences in Triticeae, wheat-related species. Journal of Molecular Evolution 47:183-189.

Nekrutenko A, Makova KD, Chesser RK, and Baker RJ (1999). Representational difference analysis to distinguish cryptic species. Molecular Ecology, in press.

Nei M, and Li W-H (1979). Mathematlcal model for studying genetic variation in terms of restriction endomucleases. Proceedings ofthe National Academy of Sciences of the USA, 76:5269-5273.

50

Reinisch AJ, Dong J-M, Brubaker CL, et al. (1994). A detalied RFLP map of cotton, Gossypium hirsutum x Gossypium barbadense: Chromosome organization and evolution in a disomic polyploid genome. Genetics 138: 829-847.

Rozas J, and Rozas R (1999). DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175.

Smyth DR, Kalitsis P, Joseph JL, and Sentry JW (1989). Plant retrotransposon from Lilium henriyi is realted to Ty3 of yeast and the gypsy group of Drosophila. Proceedings of the National Academy of Sciences of the USA, 86:5015-5019.

Song K, Lu P, Tang K, and Osborn (1995). Rapid genome change in synthetic polyploids of Brassica and its implication for polyploid evolution. Proceedings of the National Academy of Sciences of the USA, 92:7719-7723.

Toyota M, Canzian F, Ushijima T, et al. (1996). A rat genetic map constructed by representational difference analysis markers with suitability for large-scale typing. Proceedings of the National Academy of Sciences of the USA, 93:3914-3919.

Volkov RA, Borisjuk NV, Panchuk II, et al. (1999). Elimination and rearrangement of parentakl rDNA in the allotetraploid Nicotiana tabacum. Molecular Biology and Evolution Y6:311 -320.

Wendel JF, Schnabel A,. and Seelanan T (1995a). An unusual ribosomal DNA sequence from Gossypium gossypioides reveals ancient, cryptic, intergenomic introgression. Molecular Phylogenetics and Evolution 4:298-313.

Wendel JF, Schnabel A,. and Seelanan T (1995b). Bidirectional interlocus concerted evolution following allopolyploid speciation in cotton (Gossypium). Proceedings of the National Academy of Sciences of the USA, 92:280-284.

Zhao X-P, Si Y, Hansion RE, et al. (1998) Dispersed repetitive DNA has spread to new genomes since polyploid formation in cotton. Genome Research 8:479-492.

51

Table 4.1. Amplification primers for A and D genome-specific difference products. Amplification profile was 1 min at 94°C (2 min for the first cycle), 1 min at 52°C and 1 min at 73°C for 35 cycles. Reaction composition was following (per 25^1): 2.5)LII IOX buffer (Promega), 3.75)LII of each 2 ^M primer, 2jnl of 2.5mM dNTPs(Perkin Elmer), 1.5^1 of 25 mM MgCI^ (Promega), 2.5 units of T"a<7 polymerase (Promega buffer A),1-5ng of the template DNA (typically 2.5\i\ of 1/100 dilution of the original extraction).

Primer Sequence

A20F ATA GCC CAG ATG GAG ATA GAA TGT GG

A20R ACT CTA AGG CTG AAG ACT GAA TAG AAA GG

A36F CCG GAG TCG AAC ACA AGG TGC AT

A36R CCG ACT TTG GAA ATT CAT TGT AAA TTA ACC

D1_PARF_F TTA CTG GGA TTT GCC ATG AAA CC

D1_PARF_R CCC CTT ATC TTC CAG TTG TGA CG

D10F TGA GGC GAC ATG GAA TCT GTA GG

D1OR TTC TGA ACT GAG GCG AAC TAT TTG G

D13F GGG CAA CCA GTT GTG TCC AGG

D13R TGT GGG AAA GAT TGG TGG TGT AGG

52

U) <D >

<D .4-» C <D <0 <D Q . <D i_

E o

" ^ — > o E

T3 <D

^ <D (O C o o p <D <0 CD

• * - •

Q . ' i — O (O c CD

0 <0 1— <D >

<D

(0 <D 0 C <D 3 O" <D <0

•D

'o CD 0 C

E <

csi -^' <D

.Q CD h-

<ô CD E 'c CD

• D C CD

'O) c 3

H -..

<0 c CD Q

V|—

0

c g <o <o CD o o

<

<D >

"CD

CD <0 c o o

(r

c 'CD E o •o

<D O k-13 O

co

>

I—I

Q t—)

> > . - I

<D E o c <D O )

<

I <0 o

CD

c <D

E UJ

co 00 00

co

l-l > Q Q > fi-i > >

<D - C

: 3

00 a> co 10 0 0 0 <

r CÛ 00 00 CN ^

00 <o co

00 N

0 "^ 00 ^

s

C3) 0 h-0

i

>

.<o <0

&

-Q

f-l

Q Q 1—1 [L| (—1 HH >

Q Q Q D > Z >

<D .<D

0 CN <

(D ^

1

. ^ <D û

^

Z

CN 1

00 >%

1-

> Q Q >

>

U

> Q Q Q >

> U U

h-l

1-1 Q Q >

a Q Q

/s/

> p V-. <D

yces

c

haro

m

acc

00

-9> CO <D O) 0

elam

^

ophi

la

rosi

0

=3 <D

: c ô

disc

8

oste

liu

. 0

Q

<o ss

ue

V-»

mal

ian

E CD

^

psy

>%

0

>

CD 0

1

0 S

53

<0

o =3 T.-Í o i _

Q. O «= O <D Q. <0 0 F

geno

co t —

n

D10

n Û < <

Q

AA

D

Q LL 01 < a.

<o t5

o < Q. O

i ^ 'O 0 Q. <0 <D E o c 0 O)

3

0 c c g'^

•n S Í S

ÍL^ ^" • '%

t

t

54

^ - ^ lO <0 T-

C

X h-

C <

LJJ

Q Ll_ ÍT < CL

O

'Ô "D O

E c o "3 —> "oo j Q 3 <0 i _

"Ø

ram

CD

S ^ 0

o <D

"8 0

0

E o c 0 O)

Q^ ..^

ei "6 II

UJ X . —

0

E O

c 0 O)

<

O Q . .Q 1 s: CM

ura

E

rKi

0 • Q C 13

^*^^ 0 0 i _

0 C 0 D ) C3) C 'c 'o —3

1

o .Q

O) • ø

z T3 0

Unr

oot

Csi

0

Fig

ur

<D

cb 11

CÛ

< .

<n c O 1 ^ ^

o o -o 'o CL <D u.

S "O

<D T3 O Q .

TJ

E o i _

• D 0 C

"<D

o <0 0 o c 0 o-0 <0

c: o .g ÍD i ^

CD II

Û a:

'ãr E o c 0 D )

Q < ,

¥ -5 3 .í2 -c

ei E 2

M—

<0 0 c o o II

o

n X (D ~~ ^ LO

X

^ '

X

^"'^^ 0

E 0 O)

<

,—^ 0

E o c 0 O )

Q c: 0

•e :3 S ei II

X h-

55

CHAPTER V

SUMMARY

From a theoretical prospective, aspects of genome change and

organization are thought to be critical to evolutionary success. Therefore

identification of genome changes is basic to the development and testing of

hypothesis are theories concerning evolutionary processes. From an applied

point, identification of unique aspects of closely related genomes has great

financial value with applications ranging from legal matters of law enforcement

to patent protection. An efficient method to locate and study genome changes

will serve society in a variety of ways ranging from theoretical growth to

economical development. My dissertation explores the representational

difference analysis as a means to resolve these needs.

Theoretical applications of RDA include identification of genetic

differences between closely related species and development of arrays of

markers that can be used for interpretation of evolutionary events on molecular

level. Species-specific markers isolated with RDA allow identification of

multiple samples in a fast and effective way. These markers can be used for

discrimination of closely related taxa, varieties, cultivars or even individuals.

Because RDA-derived markers are PCR-based (a set of diagnostic primers is

developed for each marker) a minute amount of DNA is sacrificed during the

identification procedure. For example, the DNA isolated form an ear clip, toe, or

56

a single ovum would be sufficient. These features make RDA ideally suited for

high throughput applied procedures. For example, in law enforcement,

conservation, and commercial use it would be possible to precisely identify the

source of a certain specimen or food product.

Features of Interspecific RDA

In most cases, even when RDA is performed on closely related taxa

(cryptic species, for example; see Chapter II), obtained markers represent

families of repetitive elements that were introduced or have dramatically

increased in copy number after the split of the two taxa. Diagnostic primers

designed to repetitive elements amplify multiple loci (corresponding to individual

copies) simultaneously, virtually eliminating the possibility of encountering a null

allele, that would lead to a incorrect (no) identification. In cases when it is

desired to develop markers for more that two species it is not necessary to

perform reciprocal RDAs between each possible pair (see Chapter III). Each

RDA experiment generates a large number of difference products and it is

highly unlikely (although possible) that sequences that differ between two taxa

in a group would also be characteristic for other species. For example, in

Chapter III our goal was to obtain markers unique to each of four species of

voles. To accomplish this we performed reciprocal RDA only between two pairs

of species that were most closely related based on cytochrome b phylogeny

57

(Nekrutenko 1999, unpublished data). This was sufficient to yield desired

markers (Fig. 3.1).

As mentioned above RDA experiment generates a large array of

difference products. It is possible to visualize this array as a set of randomly

sampled sequences that either absent from one genome and present in the

other or different between the two. As these two genomes belong to different

species the differences between them contain a wealth of information on what

aspects of genome change after the two become different taxa. Being able to

analyze large number of differences would be very helpful in studies of

molecular evolution but would also require application of more powerful

technical approaches such as use of high density arrays. Additionally, most of

the differences between highly similar genomes account for non-coding

(presumably neutral, if not subjected for genetic hitchhiking) regions. Analysis

of nudeotide polymorphisms in such regions may provide a baseline

information about the degree of variability for a particular organism. For

example analysis of random clones obtained for A and D genome specific

markers in cotton (Chapter IV) suggests that the D genome has higher level of

nucleotide variability compared to the A genome.

Absence/presence differences are not the only type of marker RDA

isolates. In some cases isolated sequences present in both compared

genomes but differentially flanked by restriction sites (polymorphic amplifiable

restriction fragments-PARFs). This type of RDA-derived marker cannot be

58

used in diagnostic application because corresponding primers would amplify

both genomes. On the other hand these markers represent sequences

homologous between the two genomes. In Chapter IV we used one of such

markers (PARF D1) to show that within the allotetraploid cotton G. hirsutum

sequences originally donated by A and D genome diploid progenitors during the

hybridization are homogenized to D type by concerted evolution. Similarly to

absence/presence markers a RDA experiment yields multiple PARFs that can

be isolated and analyzed in large numbers providing, for example, a reliable

estimate of the pattern of concerted evolution within allopolyploids.

Maior Contributions from the Dissertation

Similarly to early papers on application of protein electrophoresis,

restriction enzymes or sequencing to the analysis of polymorphisms, this work

introduces a new and powerful approach for isolation of diagnostic genetic

markers. Below I listed major contributions from the dissertation:

1. RDA has been applied to isolation of species-specific markers. As

shown in Chapters II and III such markers have been successfully isolated and

tested (Figs. 2.2 and 3.1). Identification procedure is PCR based implying that

it requires minimal amounts of DNA and can be done within a short period of

time. Similar ideology can be used for development of binary markers capable

of distinguishing other closely related species with suitability to large-scale

genotyping.

59

2. By development of markers specific to A and D genomes in cotton I

demonstrated how RDA-derived markers can be used to identify composition of

allopolyploid nuclei (Fig. 4.1).

3. I demonstrated that polymorphic amplifiable restriction fragments

(PARS) provide unique opportunity for random sampling of homologous

sequences between genomes and uncovering interactions among them in

allopolyploid systems (Fig. 4.2)

60

development of species- and genome-specific genetic...

Documents