construction of coding region enriched genomic library by s1 … · 2017. 8. 25. · j. korean soc....

8

Upload: others

Post on 26-Apr-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Construction of coding region enriched genomic library by S1 … · 2017. 8. 25. · J. Korean Soc. Appl. Biol. Chem. 52(3), 213-220 (2009) Article Construction of Coding Region Enriched

J. Korean Soc. Appl. Biol. Chem. 52(3), 213-220 (2009) Article

Construction of Coding Region Enriched Genomic Library by S1 Nuclease Treatment of Partially Denatured Rice Genomic DNA

Yeon-Ki Kim1, Jeong-Sook Kim2, Pil-Joong Cheong2, Min-Jeong Kim2, Tae-Ho Lee2, Joungsu Joo2,

Ju-Kon Kim2, Baek-Hie Nahm1,2, and Sang Ik Song2*

1Green Gene Bio Tech Inc., Myongji University, Youngin 449-728, Republic of Korea2Department of Molecular Genetics and Bioinformatics, Myongji University, Youngin, 449-728, Republic of Korea

Received March 17, 2009; Accepted April, 30, 2009

Designing high density DNA arrays representing all the genes of an organism could be limited

when the entire genome sequence of the organism is not available or if only a limited number of

ESTs are available. In an effort to prepare coding sequences as DNA sources for a microarray, we

found that coding region enriched genomic DNA libraries can be produced by S1 nuclease

treatment of partially denatured genomic DNA. Sequence analysis of about 1,000 clones using

BLASTN and BLASTX searches showed that 46% of the clones in the library have regions which

share significant similarity with sequences deposited in the rice EST and nr data bases. These data

suggested that clones produced in this library could be directly used as PCR templates, where the

resulting PCR products could be spotted on slides for microarray analysis. This technique might

be applicable in designing a high density DNA array for an organism whose entire genome

sequence is not available.

Key words: coding sequence, microarray, S1 nuclease

Gene expression profiling on a genomic scale using

high-density arrays of nucleic acids on a special matrix

has emerged as a leading technology in the analysis of

various biological phenomena [DeRisi et al., 1997;

Zirlinger et al., 2001]. With DNA microarray analysis,

differential and temporal changes in gene expression can

be monitored under specified conditions by hybridizing

labeled reverse transcripts to prefabricated DNA spots on

chips or glass. At present, commercially manufactured

oligonucleotide chips in which tens of thousands of

oligonucleotides representing all the genes of a genome

and spotted in a small area are available for several

organisms including Saccharomyces and Drosophila

[Jelinsky and Samson, 1999; McDonald and Rosbash,

2001]. When complete genome sequence data is not

available, a cDNA microarray prepared with PCR products

amplified from various template sources, including an

EST library, can be used with somewhat limiting results

[Kawasaki et al., 2001; Negishi et al., 2002].

Although it might be ideal to use chips or slides

covering all the genes of a genome for the array analysis,

several difficulties are encountered in designing these.

For example, it is essential that one has an annotated

genome sequence in order to construct the oligonucleotide

chips. However, genome sequencing, especially for higher

eukaryotic organisms such as plants and mammals, might

be hampered due to a large genome size and the presence

of excessive intergenic and intron regions [Pennisi,

2001]. A customized microarray based on PCR products

is restricted not only by ORF identification essential for

primer design, but also by the source of the template for

the PCR. Thus, although EST clones are the popular

template source for PCR, the use of cDNA microarrays is

restricted due to the limited representation of whole genes

(less than 50%) and the high redundancy present in the

cDNA libraries. Thus less than 50% of putative whole

genes in plants such as arabidopsis and rice were

represented in a recent cDNA based microarray analysis

[Harmer et al., 2000; Kawasaki et al., 2001].

The genome sequence of the flowering plant,

Arabidopsis, has been completed and used as a model

*Corresponding authorPhone: +31-330-6276; Fax: +31-321-6355E-mail: [email protected]

Abbreviations: BLAST, basic local alignment search tool; EST,expressed sequence tags; LTR, long terminal repeat; SNP, singlenucleotide polymorphism

doi:10.3839/jksabc.2009.039

Page 2: Construction of coding region enriched genomic library by S1 … · 2017. 8. 25. · J. Korean Soc. Appl. Biol. Chem. 52(3), 213-220 (2009) Article Construction of Coding Region Enriched

214 Yeon-Ki Kim et al.

system for identifying genes and determining their

functions. Furthermore, due to its economical importance

and the relatively small genome size, rice (Oryza sativa)

has become a model organism among the Gramineae

family including the major agricultural crop species such

as maize, wheat and barley [Yuan et al., 2001]. Draft

sequences of indica and japonica type rice spanning a 420

to 466 megabase genome were reported by Yu et al.

[2002] and Goff et al. [2002] respectively.

The analysis of genome sequences of rice revealed that

the coding regions represented at most 20% of the genome,

while intergenic and intron regions are predominant

within the genome just as in other eukaryotic genome

sequences. We investigated whether the protein coding

regions of rice can be enriched by making use of the

differential GC content within the genomes. Here, we

report the construction of a rice plasmid library in which

the coding regions were highly enriched using a S1

pretreatment of partially denatured rice genomic DNA. In

our experiments, most of the cloned sequences obtained

were nonredundant. Additionally, a homology search of

these clones using BLASTN and BLASTX with genes

and ESTs deposited in GenBank showed that 48% of the

clones from the rice plasmid libraries have the same

coding regions or share significant homology to other

known sequences. These data suggested that the coding

regions were significantly enriched in this library. These

data suggested that the coding regions were significantly

enriched in this library when considered at least 35 % and

12 % of genome is transposon and exon, respectively

(International Rice Genome Sequencing Project, 2005).

Materials and Methods

Materials. Rice (Oryza sativa var Nipponbare) was

obtained from the STAFF Institute, Ibaraki, Japan. S1

nuclease, T4 DNA polymerase and ligase were purchased

from TaKaRa (Otsu, Japan). Chemicals and reagents

were purchased from Sigma.

Preparation of rice genomic DNA. Genomic DNA

from rice young seedling was prepared as previously

described in Shure et al. [1983] with minor modifications.

Construction of coding region enriched plasmid

library. The coding sequence enriched plasmid library

was constructed as shown in Fig. 1. Briefly, two

micrograms of genomic DNA was partially denatured

and digested with S1 nuclease by incubation at 68oC for

10 min. (150 units, TaKaRa). DNA was electrophoresed

through a Low Melting Point (LMP) agarose gel. Both

ends of the DNA fragments were filled with T4 DNA

polymerase (TaKaRa) in the presence of dNTP and

subsequently ligated to pBluescript vector prepared by

digestion with SmaI and dephosphorylation with calf

intestine phosphatase (TaKaRa). The ligation mixture

was purified by extraction with phenol: chloroform:

isoamylalcohol (25:24:1) and the DNA was precipitated

by the addition of ethanol in the presence of 2 M

ammonium acetate. Electrocompetent cells (DH10B,

GibcoBRL, Gaithersburg, MD) were transformed with

DNA (3 ng) and spread on ampicillin containing plates.

Plasmid DNA was purified using an automated plasmid

prepmachine equipped with 96 wells (MWG Biotech AG,

Ebersberg, Germany) and then sequenced using an ABI

3700 DNA sequencer (Perkin Elmer, Boston, MA).

Strategy to identify exon containing clones in Rice.

Sequences were automatically processed to trim (quality

value=15) and screen vector sequence and to select

portions with high quality using Phred; sequences of

more than 200 bp with QV>20 were analyzed using a

local BLAST program using BLASTN. Sequences were

processed as described above. As annotated genome

information is not available for rice, we compared the

sequences to GenBank nonredundant (nr) data and EST

libraries from GenBank and Chinese Rice GD databases.

BLASTX was used to locate regions that shared significant

sequence identity (score >100 and expectation value

<1020) with protein sequences deposited in the nr and rice

EST databases [Altschul et al., 1997].

Fig. 1. Schematic representation of the construction ofthe coding region enriched plasmid library from ricegenomic DNA.

Page 3: Construction of coding region enriched genomic library by S1 … · 2017. 8. 25. · J. Korean Soc. Appl. Biol. Chem. 52(3), 213-220 (2009) Article Construction of Coding Region Enriched

S1 Nuclease Treated Library 215

Results

The analysis of the G+C content in the coding region of

rice genome sequences is 54.2 % and it is much higher

than those of the intergenic or intron regions, 42.9 and

38.3%, respectively (International Rice Genome Sequencing

Project, 2005). We reasoned that partial denaturation of

rice genomic DNA would occur preferentially in the

intergenic regions rather than the coding regions and that

S1 nuclease could easily access these partially denatured

regions, thus facilitating the construction of a coding

region enriched plasmid library. To test this, rice genomic

DNA was partially denatured by incubation at 50-70oC

for 10 min in the presence of S1 nuclease (Fig. 2). When

the resulting reaction mixture was fractionated through a

1.5% LMP gel, we found that the genomic fragments

ranging from 0.7 to 1.5 kb had accumulated in the

samples treated at approximately 68oC. This fraction of

genomic DNA was cloned into pBluescript vector and

subsequently sequenced. In total, 1005 sequences were

analyzed using the GenBank local BLAST algorithm.

As shown in Fig. 3A, sequences with a read-length

between 400 and 500 bp were predominant in the S1

nuclease treated rice genomic library. The variation in GC

content of the sequences is illustrated in Fig. 3B. The

average GC content of the clones was 45.8% and

approximately 65% of the clones in the library had more

than 40% GC content. To test whether ligation bias was

involved in the cloning process, the sequence redundancy

of each library was checked using a blastclust algorithm

[Wheeler et al., 2000]. As shown in Fig. 3C, 1,005 paired

sequences were clustered into 991 cliques giving less than

1.1 redundance.

Repetitive elements have been found in most eukaryotic

genomes that have been analyzed. These elements are

found in multiple copies, in some cases thousands of

copies. The two major classes of repetitive elements are

interspersed elements and tandem arrays. The rice

genome is also populated by representatives from all

known transposon superfamilies (International Rice

Genome Sequencing Project, 2005). The transposon

content of the O. sativa ssp. japonica genome is at least

35%. Repeats were analyzed with RepeatMasker (http://

repeatmasker.genome.washington.edu/cgibin/RepeatMasker).

Repeats including retro and DNA elements, simple and

low complex were found in the library. Among these, the

long terminal repeat (LTR) frequently found in monocot

crops was predominant and constituted 14.2% of the total

base pairs sequenced in this analysis (Table 1). DNA

elements found in monocots were around 0.1%. The

proportion of simple and low complexity repeats was less

than 1%. The total number of base pairs consisting of

these repeats was around 15% of the total bases read.

We then asked how many clones contained regions that

were highly homologous to sequences deposited in public

Fig. 2. Partial digestion of rice genomic DNA by S1nuclease treatment at various temperatures. Twomicrograms of each genomic DNA was used and 20% ofthe digested sample was electrophoresed through a 1.5%agarose gel.

Fig. 3. Several features of the S1 nuclease treated ricegenomic library. Distribution of read lengths (A), GCcontent (B), and Redundancy (C) of the clones. Redundancywas tested with a blastcluster algorithm.

Page 4: Construction of coding region enriched genomic library by S1 … · 2017. 8. 25. · J. Korean Soc. Appl. Biol. Chem. 52(3), 213-220 (2009) Article Construction of Coding Region Enriched

216 Yeon-Ki Kim et al.

dabases such as the rice EST and GenBank nr. BlastN and

BlastX analyses were performed are the results are shown

in Fig. 4A. There were 970 chromosomal, 33 chloroplast

and 2 mitochondrial sequences among the 1005 clones

examined (data not shown). The threshold score used to

find a clone that matched one deposited in the nr or EST

databases was 100, which is equivalent to an E value of

around 1020. Among the sequences analyzed in this way,

334 sequences matched EST subject sequences (Fig. 4B).

Among those, top 40 sequences are depicted in Table 2.

To determine how many clones matched sequences in

the nr databases, we performed a BLASTX search using

the criteria described above. Under this condition, 180

sequences had regions which shared significant homology

to protein sequences in the databases (Fig. 4B). Among

those, top 40 sequences are depicted in Table 3. Seventy

sequences were found in both BLASTN and BLASTX

searches (Fig. 4B). In total, 444 sequences (44%) contain

regions that are significantly homologous to sequences

deposited in the databases examined.

Discussion

In cDNA or oligonucleotide array analysis, the global

pattern of gene expression can be monitored by hybridizing

labeled probes to spotted cDNAs or nucleotide oligomers

representing genes of a given genome. The number of

spots in these arrays is limited due to the template source

used for reverse transcription (for cDNA) or due to the

difficulties of gene annotation caused by bulky intergenic

and interrupting (intron) regions of a genome. In an effort

to improve monitoring power and efficiency and to test

global gene expression using DNA arrays, we investigated

whether genomic DNA can be directly spotted on the

slides. For this purpose, rice genomic DNA was partially

denatured and digested with S1 nuclease. Sequences

represented by 1005 clones from the rice library were

used in the analysis. Sequence redundancy using a

Blastclust algorithm was almost 1 for the library,

suggesting that no cloning bias was involved. BLASTN

and BLASTX searches against rice EST and nr databases

showed that 444 (44%) among the 1005 sequences

contained regions that showed significant homology

under the criteria used in the analysis. Although the exact

proportion of coding regions on the rice genome is

unknown, the rice genome draft suggested that it might be

less than 20%. An arithmetical assessment indicated that

the exon enrichment for the rice library was higher than 2

Table 1. Analysis of repeat containing sequences of theS1 nuclease treated rice genomic library

Total sequences: 1005

Repeat elementsNumber of elements

Percentage of sequence

LTR elements 126 14.2

LTR/Rice 101 10.0

LTR/Maize 10 0.01

LTR/Barley 9 0.01

Gypsytype 4 0.00

Copiatype 2 0.00

DNA elements 52 0.10

DNA/Rice 42 0.04

DNA/Sorghum 9 0.01

DNA/Maize 1 0.00

Satellites 7 0.10

Simple repeats 71 0.07

Low complexity 65 0.06

Repeats were analyzed with RepeatMasker (http://

repeatmasker.genome.washington.edu/cgibin/RepeatMasker).

Fig. 4. Strategy to identify clones (A) and theirnumbers (B) containing regions highly homologous tosequences deposited in the rice EST or nr databases.Sequences longer than 200 bp were processed as describedin Methods. Chromosomal, chloroplast, and mitochondrialsequences were assigned by BLASTN analysis against arice genome database. BLASTX and BLASTN were usedto find regions that shared significant sequence identity(score >100 and expectation value <1020) with proteinsequences deposited in the nr and rice EST databases,respectively.

Page 5: Construction of coding region enriched genomic library by S1 … · 2017. 8. 25. · J. Korean Soc. Appl. Biol. Chem. 52(3), 213-220 (2009) Article Construction of Coding Region Enriched

S1 Nuclease Treated Library 217

fold (44/20=2.2). These data suggest that S1 nuclease

treatment following partial denaturation of plant genomic

DNA can remove a significant porportion of intergenic

and intron regions.

Generally, the denaturation of DNA fragments is

determined by the melting temperature (Tm) which is a

function proportional to the GC content. Genome

sequencing projects have shown that the GC content of

the coding regions in rice and Arabidopsis are higher than

those of the intergenic and intron regions (The

Arabidopsis genome initiative, 2000; TiGR rice genome

index). In comparison to the GC content of rice introns

(37%) and exons (56%), we found that the average GC

content of our clones was significantly high (46%). It

could be that intergenic and intron regions are denatured

faster than coding regions during the denaturation

process, and that S1 nuclease can act by digesting the

single stranded DNA in these regions.

In addition, DNA denaturation could be affected by the

frequency of GpC islands and stretches of unmethylated

DNA with a higher frequency of CpG dinucleotides than

found on the genome. It has been reported that the

frequency of CpG islands in the first exon of genes is

much higer than that found in other regions, including

other exons [Ashikawa, 2001; Venter et al., 2001]. In an

oligonucleotide model, consecutive GC base pairs exert a

stabilizing effect while an isolated GC base pair destabilizes

the parallelstranded DNA duplex [Shchyolkina et al.,

2000]. The stabilizing effect of consecutive GC base pairs

in CpG islands might be gretaer than that of randomly

positioned GC base pairs having the same GC content.

The rice genome is populated by representatives from

all known transposon superfamilies (International Rice

Genome Sequencing Project, 2005). Transposable elements

such as retro and DNA elements were found in the

library. Among these transposable elements, an LTR

frequently found in monocot crops was predominant in

the library. Although copia, gypsy, long interspersal

nuclear elements and short interspersed nuclear elements

are relatively rare, it is noteworthy that these repeats

including simple repeats constituted around 15% of the

total base pairs analyzed in this study. This constitutes a

very small portion when compared to the entire rice

genome sequence in which 60% is composed of complex

and simple repeats [Yu et al., 2002]. It is not clear

whether these repeat elements are removed by S1

nuclease treatment or during the cloning process. It has

been reported that some repeated DNAs are very unstable

during E.coli transformation [Hashem et al., 2002]. The

instability of (CAG), (CTG) repeats and a 106 bp perfect

inverted repeat were dramatically elevated upon E.coli

transformation. Since the GC content of many simple and

low complex repeats is quite high, the removal of those

repeats in this library does not appear to be controlled by

Table 2. List of sequences showing identity to ESTs in the rice EST databases (in partial)

SequenceSource sequence

in EST DBE value Sequence

Source sequencein EST DB

E value

JS07RO11 rsicem_0890 0 JS07RH16 sice_464 0

WS0711G06 BI305595 0 JS07RA23 BE230570 0

JS01RN18 rsicee_4333 0 YNU01J22 rsiceb_9958 0

YNU01B14 rsicef_7685 0 JS07RF01 AU076106 0

JS07RO13 BI799271 0 YNU01D22 BI804855 0

YNU01M20 rsiced_12003 0 JS07RN14 siceh_0584 0

YNU01I16 rsicen_3284 0 YNU01K23 C73082 0

JS06RK12 rsicef_1587 0 JS07RI23 C97991 5.04E180

JS06RJ10 D22350 0 WS0711G07 C72648 1.87E176

JS07RK20 BI808953 0 YNU01I18 rsiceg_5726 2.91E175

WS0711J09 rsiceh_20018 0 JS01RN14 siceh_0523 3.54E174

JS07RK14 sicek_0122 0 YNU01G13 C27024 1.77E173

JS01RI18 AU197460 0 JS07RK12 AU161766 9.93E172

WS0711N09 rsicem_14191 0 WS0711J05 rsiceh_7018 1.09E170

YNU01D15 BF430690 0 JS07RM02 BI799060 1.07E168

JS01RG14 rsicem_2617 0 JS06RC19 C28952 1.02E167

JS01RK17 rsicem_25747 0 JS06RK21 AU088708 7.83E166

JS01RC18 AU197502 0 YNU01J15 rsiceb_9958 1.00E165

JS07RD17 AU165526 0 YNU01K13 AU184733 4.38E164

JS06RD19 rsicek_13321 0 JS07RO22 C28607 5.66E163

Page 6: Construction of coding region enriched genomic library by S1 … · 2017. 8. 25. · J. Korean Soc. Appl. Biol. Chem. 52(3), 213-220 (2009) Article Construction of Coding Region Enriched

218 Yeon-Ki Kim et al.

the accessibility of S1 nuclease during the melting

process that is reliant on the GC content. In an effort to

address this issue, it might be necessary to examine at the

molecular level the effect of plant repeat elements on

E.coli transformation.

DNA fractionation based upon relative iteration on the

genome or physical size restricted by endonuclease or

mung bean nuclease (EC 3.1.30.1) has been used as a

means of overcoming difficulties due to the low sequence

complexity of large genomes [Britten and Kohne, 1968;

Reddy et al., 1993; Altshuler et al., 2000; Peterson et al.,

2002]. Peterson et al. used hydroxyapatite chromatography

to fractionate genomic DNA, based on fragment kinetic

behavior, into highly repetitive, moderately repetitive and

Table 3. List of clones showing identity with sequences in the nr databases (in partial)

Sequence Accession number Putative identification E value

JS01RA13 NP_039391 rbcL; RuBisCO large subunit [Oryza sativa] 1.90E23

JS01RB16 AAK92604 Putative retroelement [Oryza sativa] 2.20E49

JS01RB18 AAD27554 Unknown [Oryza sativa subsp. Indica] 2.61E34

JS01RC18 NP_445524 Hypothetical protein [Chlamydophila pneumoniae AR39] 1.87E21

JS01RD18 AAK53831 Unknown protein [Oryza sativa] 2.15E70

JS01RH16 AAK92611 Putative transposable element [Oryza sativa] 2.28E79

JS01RH18 AAK00419 Putative Tam1 transposon protein TNP2 [Oryza sativa] 6.21E78

JS01RJ16 BAA96641 Membrane associated saltinducible protein [Oryza sativa] 4.22E79

JS01RJ18 AAK13115 Hypothetical protein [Oryza sativa] 1.43E32

JS01RK13 NP_437521ABC transporter periplasmic solutebinding protein precursor [Sinorhizobium meliloti]

1.59E44

JS01RK15 NP_200487 Serine acetyltransferase [Arabidopsis thaliana] 1.42E67

JS01RK16 BAA78744 Splicing factor Prp8 [Oryza sativa] 3.75E25

JS01RL16 BAA88542 Polyprotein [Oryza sativa] 2.32E88

JS01RL17 AAL31096 Hypothetical protein [Oryza sativa] 2.49E34

JS01RM15 AAL34970 Putative polyprotein [Oryza sativa] 1.84E72

JS01RN14 AAK55777 Putative polyprotein [Oryza sativa] 6.62E46

JS01RO13 AAK13121 Similar to Sorghum bicolor 22 kD akafirincluster [Oryza sativa] 1.75E65

JS01RO15 AAK55480 Putative transposase related protein [Oryza sativa] 2.45E76

JS01RO17 BAB44089 Putative retrotransposable elements TNP2 [Oryza sativa] 1.34E91

JS01RP17 AAK14415 Putative glutamine synthetase [Oryza sativa] 3.20E24

JS06RB15 BAA90506 Rice gypsytype retrotransposon RIRE2 [Oryza sativa] 5.07E42

JS06RB23 BAA90349 Similar to maize transposon MuDR mudrA protein. [Oryza sativa] 1.04E46

JS06RC16 BAA96622 Hypothetical protein [Oryza sativa] 6.68E70

JS06RC18 AAK13118 Polyprotein [Oryza sativa] 2.70E45

JS06RC19 AAG13514 Mutatorlike transposase [Oryza sativa] 1.98E54

JS06RD11 AAD22153 Polyprotein [Sorghum bicolor] 7.87E28

JS06RD19 NP_039463 Ribosomal protein L2 [Oryza sativa] 2.78E34

JS06RD20 NP_039385 Ribosomal protein S4 [Oryza sativa] 1.12E73

JS06RE09 AAK58693 Sinapyl alcohol dehydrogenase [Populus tremuloides] 5.98E47

JS06RE15 AAK51574 Putative retroelement [Oryza sativa] 2.70E76

JS06RF08 AAK43497 Gagpol precursor [Oryza sativa] 1.93E41

JS06RF10 BAA89558 Hypothetical protein [Oryza sativa] 7.89E23

JS06RF18 AAK71544 Putative polyprotein [Oryza sativa] 3.74E35

JS06RF20 AAK52121 Putative retroelement [Oryza sativa] 1.34E67

JS06RF24 AAK72287 Putative retrotransposon protein [Oryza sativa] 1.70E58

JS06RH02 BAB44128 Hypothetical protein [Oryza sativa] 1.13E26

JS06RH03 NP_039374 RNA polymerase beta&apos;1 [Oryza sativa] 5.68E58

JS06RH22 NP_039431 Hypothetical 29K protein rice chloroplast 9.48E54

JS06RI09 AAL34929 Putative mutatorlike transposase [Oryza sativa] 3.60E43

JS06RI14 AAK51583 Putative retroelement [Oryza sativa] 3.90E36

Page 7: Construction of coding region enriched genomic library by S1 … · 2017. 8. 25. · J. Korean Soc. Appl. Biol. Chem. 52(3), 213-220 (2009) Article Construction of Coding Region Enriched

S1 Nuclease Treated Library 219

single/lowcopy sequence components that were

subsequently cloned to produce genomic libraries. It was

found that the SL library was enriched in gene sequences

and “nonrepetitive ESTs”. This raised the possibility that

clones from each library could be sequenced in

proportion to the kinetic complexity of the component

from which they were derived. Thus, the authors argued

that when this DNA fractionation technique is combined

with a shot gun sequencing strategy, high-throughput

sequencing of a large genome with low complexity can

be achievable more efficiently. In another example,

Altshuler et al. [2000] utilized genomic DNA fractionation

to produce a single nucleotide polymorphism (SNP) map

by constructing and sequencing libraries from specific

subsets of the genome, called reduced representation

shotgun. DNA from several individuals was digested

with enzymes and size fractionated following gel

electrophoresis. This method could facilitate the rapid,

inexpensive construction of SNP maps in biomedically

and agriculturally important species. Mung bean nuclease

has been used to clone intact genes or gene fragments in

numerous protozoans including Plasmodium spp

[McCutchan et al., 1984; Reddy et al., 1993]. When the

enzyme was applied to genomic DNA of Plasmodium

spp., a significant proportion of the clones displayed

sequence similarity to sequences in the nr databases.

These and our experimental data suggest that DNA

fractionation could prove a more effective and flexible

approach (to find SNP or genes) over the whole genome

shot gun method when the genome size is especially large

but its complexity is low, as is the case in higher plants

and animals.

When the library clone sequences were analyzed using

rice EST and nr databases, score and E values of 100 and

1020 were used, respectively. Manual inspection using

BLASTN suggested that 88 nt from a 100 nt average

stretch resulted in matches between query and subject

sequences using the current rice EST databases. The

sequencing of rice and human genomes has revealed that

the average A. thaliana, rice and human exons are 254,

250 bps in average [The Arabidopsis Genome Initiative,

2000; Venter et al., 2001; Yu et al., 2002]. Although the

relationship between the number of perfect and

missmatched nucleotides for hybridization remains to be

tested, the clones prepared in this study might be used for

detecting transcripts derived from cognate or homologous

genes. Thousands of genomic fragments containing exon

regions might be amplified and potentially used to

manufacture slides for microarray analysis. This

approach could be useful when examining an organism

whose entire genome sequence is not available, where the

use of chips or slides might be somewhat limiting.

Acknowledgments. We thank SongHwa Chae for

technical assistance in the sequencing of the libraries, and

Yun-Cheol Shin for data analysis. This work was supported

in part by the Ministry of Science and Technology

through the Crop Functional Genomics Center, by the

Biogreen21 Program, and by the Ministry of Education’s

Brain Korea 21 Project.

References

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z,

Miller W, and Lipman DJ (1997) Gapped BLAST and

PSIBLAST: a new generation of protein database search

programs. Nucleic Acids Res 25, 3389-3402

Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Bald-

win J, Linton L, and Lander ES (2000) An SNP map of

the human genome generated by reduced representation

shotgun sequencing. Nature 407, 513-516.

Ashikawa I (2001) Geneassociated CpG islands in plants as

revealed by analyses of senomic sequences. Plant J 26,

617-625.

Britten RJ and Kohne DE (1968) Repeated sequences in

DNA. Science 161, 529-540.

DeRisi JL, Iyer VR, and Brown PO (1997) Exploring the

metabolic and genetic control of gene expression on a

genomic scale. Science 278, 680-686.

Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M,

Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D,

Hutchison D, Martin C, Katagiri F, Lange BM,

Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T,

Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L,

Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R,

Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S,

Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tav-

tigian S, Mitchell J, Eldredge G, Scholl T, Miller RM,

Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson

R, Feldhaus J, Macalma T, Oliphant A, Briggs S. (2002)

A draft sequence of the rice genome (Oryza sativa L.

ssp. japonica). Science 296, 92-100.

Harmer SL, Hogenesch JB, Straume M, Chang HS, Han B,

Zhu T, Wang X, Kreps JA, and Kay SA (2000) Orches-

trated transcription of key pathways in Aradidopsis by

the circadian clock. Science 290, 2110-2113.

Hashem VI, Klysik EA, Rosche WA, and Sinden RR (2002)

Instability of repeated DNAs during transformation in

Escherichia coli. Mutat Res 502, 3946.

International Rice Genome Sequencing Project (2005) The

map-based sequence of the rice genome. Nature 436,

793-800.

Jelinsky SA and Samson LD (1999) Global response of

Saccharomyces cerevisiae to an alkylating agent. Proc

Natl Acad Sci USA 96, 1486-1491.

Kawasaki S, Borchert C, Deyholos M, Wang H, Brazille S,

Kawai K, Galbraith D, and Bohnert HJ (2001) Gene

expression profiles during the initial phase of salt stress

in rice. Plant Cell 13, 889-905.

Page 8: Construction of coding region enriched genomic library by S1 … · 2017. 8. 25. · J. Korean Soc. Appl. Biol. Chem. 52(3), 213-220 (2009) Article Construction of Coding Region Enriched

220 Yeon-Ki Kim et al.

McCutchan TF, Hansen JL, Dame JB, and Mullins JA

(1984) Mung bean nuclease cleaves Plasmodium

genomic DNA at sites before and after genes. Science

225, 625-628.

McDonald MJ and Rosbash M (2001) Microarray analysis

and organization of circadian gene expression in Droso-

phila. Cell 107, 567-578

Negishi T, Nakanishi H, Yazaki J, Kishimoto N, Fujii F,

Shimbo K, Yamamoto K, Sakata K, Sasaki T, Kikuchi

S, Mori S, and Nishizawa NK (2002) cDNA microarray

analysis of gene expression during Fe-deficiency stress

in barley suggests that polar transport of vesicles is

implicated in phytosiderophore secretion in Fe-deficient

barley roots. Plant J 30, 83-94

Pennisi E (2001) The human genome. Science 291, 1177-

1180.

Peterson DG, Schulze SR, Sciara EB, Lee SA, Bowers JE,

Nagel A, Jiang N, Tibbitts DC, Wessler SR, and Pater-

son AH (2002) Integration of Cot analysis, DAN clon-

ing, and highthroughput sequencing facilitates genome

characterization and gene discovery. Genome Res 12,

795-807.

Reddy GR, Chakrabarti D, Schuster SM, Ferl RJ, Almira

EC, and Dame JB (1993) Gene sequence tags from

Plasmodium falciparum genomic DNA fragments pre-

pared by the “genease” activity of mung bean nuclease.

Proc Natl Acad Sci USA 90, 9867-9871.

Shchyolkina AK, Borisova OF, Livshits MA, Pozmogova

GE, Chernov BK, Klement R, and Jovin TM (2000)

Parallelstranded DNA with mixed AT/GC composition:

role of trans G.C base pairs in sequence dependent heli-

cal stability. Biochem 39, 10034-10044.

Shure M, Wessler S, and Fedoroff N (1983) Molecular iden-

tification and isolation of the waxy locus in maize. Cell

35, 225-233

The Arabidopsis Genome Initiative (2000) Analysis of the

genome sequence of the flowering plant Arabidopsis

thaliana. Nature 408, 796-815.

Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sut-

ton GG, Smith HO, Yandell M, Evans CA, Holt RA, et

al. (2001) The sequence of the human genome. Science

291, 1304-1350.

Wheeler DL, Chappey C, Lash AE, Leipe DD, Madden TL,

Schuler GD, Tatusova TA, and Rapp BA (2000) Data-

base resources of the national center for biotechnology

information. Nucleic Acids Res 28, 1014.

Yu J, Hu S, Wang J, Wong GKS, Li S, Liu B, Deng Y, Dai

L, Zhou Y, Zhang X, et al. (2002) A draft sequence of

the rice genome (Oryza sativa L. ssp. indica). Science

296, 79-92

Yuan Q, Quackenbush J, Sultana R, Pertea M, Salzberg SL,

and Buell CR (2001) Rice bioinformatics. Analysis of

rice sequence data and leveraging the data to other plant

species. Plant Physiol 125, 1166-1174.

Zirlinger M, Kreiman G, and Anderson DJ (2001)

Amygdalaenriched genes identified by microarray tech-

nology are restricted to specific amygdaloid subnuclei.

Proc Natl Acad Sci USA 98, 5270-5275.