construction of coding region enriched genomic library by s1 … · 2017. 8. 25. · j. korean soc....
TRANSCRIPT
J. Korean Soc. Appl. Biol. Chem. 52(3), 213-220 (2009) Article
Construction of Coding Region Enriched Genomic Library by S1 Nuclease Treatment of Partially Denatured Rice Genomic DNA
Yeon-Ki Kim1, Jeong-Sook Kim2, Pil-Joong Cheong2, Min-Jeong Kim2, Tae-Ho Lee2, Joungsu Joo2,
Ju-Kon Kim2, Baek-Hie Nahm1,2, and Sang Ik Song2*
1Green Gene Bio Tech Inc., Myongji University, Youngin 449-728, Republic of Korea2Department of Molecular Genetics and Bioinformatics, Myongji University, Youngin, 449-728, Republic of Korea
Received March 17, 2009; Accepted April, 30, 2009
Designing high density DNA arrays representing all the genes of an organism could be limited
when the entire genome sequence of the organism is not available or if only a limited number of
ESTs are available. In an effort to prepare coding sequences as DNA sources for a microarray, we
found that coding region enriched genomic DNA libraries can be produced by S1 nuclease
treatment of partially denatured genomic DNA. Sequence analysis of about 1,000 clones using
BLASTN and BLASTX searches showed that 46% of the clones in the library have regions which
share significant similarity with sequences deposited in the rice EST and nr data bases. These data
suggested that clones produced in this library could be directly used as PCR templates, where the
resulting PCR products could be spotted on slides for microarray analysis. This technique might
be applicable in designing a high density DNA array for an organism whose entire genome
sequence is not available.
Key words: coding sequence, microarray, S1 nuclease
Gene expression profiling on a genomic scale using
high-density arrays of nucleic acids on a special matrix
has emerged as a leading technology in the analysis of
various biological phenomena [DeRisi et al., 1997;
Zirlinger et al., 2001]. With DNA microarray analysis,
differential and temporal changes in gene expression can
be monitored under specified conditions by hybridizing
labeled reverse transcripts to prefabricated DNA spots on
chips or glass. At present, commercially manufactured
oligonucleotide chips in which tens of thousands of
oligonucleotides representing all the genes of a genome
and spotted in a small area are available for several
organisms including Saccharomyces and Drosophila
[Jelinsky and Samson, 1999; McDonald and Rosbash,
2001]. When complete genome sequence data is not
available, a cDNA microarray prepared with PCR products
amplified from various template sources, including an
EST library, can be used with somewhat limiting results
[Kawasaki et al., 2001; Negishi et al., 2002].
Although it might be ideal to use chips or slides
covering all the genes of a genome for the array analysis,
several difficulties are encountered in designing these.
For example, it is essential that one has an annotated
genome sequence in order to construct the oligonucleotide
chips. However, genome sequencing, especially for higher
eukaryotic organisms such as plants and mammals, might
be hampered due to a large genome size and the presence
of excessive intergenic and intron regions [Pennisi,
2001]. A customized microarray based on PCR products
is restricted not only by ORF identification essential for
primer design, but also by the source of the template for
the PCR. Thus, although EST clones are the popular
template source for PCR, the use of cDNA microarrays is
restricted due to the limited representation of whole genes
(less than 50%) and the high redundancy present in the
cDNA libraries. Thus less than 50% of putative whole
genes in plants such as arabidopsis and rice were
represented in a recent cDNA based microarray analysis
[Harmer et al., 2000; Kawasaki et al., 2001].
The genome sequence of the flowering plant,
Arabidopsis, has been completed and used as a model
*Corresponding authorPhone: +31-330-6276; Fax: +31-321-6355E-mail: [email protected]
Abbreviations: BLAST, basic local alignment search tool; EST,expressed sequence tags; LTR, long terminal repeat; SNP, singlenucleotide polymorphism
doi:10.3839/jksabc.2009.039
214 Yeon-Ki Kim et al.
system for identifying genes and determining their
functions. Furthermore, due to its economical importance
and the relatively small genome size, rice (Oryza sativa)
has become a model organism among the Gramineae
family including the major agricultural crop species such
as maize, wheat and barley [Yuan et al., 2001]. Draft
sequences of indica and japonica type rice spanning a 420
to 466 megabase genome were reported by Yu et al.
[2002] and Goff et al. [2002] respectively.
The analysis of genome sequences of rice revealed that
the coding regions represented at most 20% of the genome,
while intergenic and intron regions are predominant
within the genome just as in other eukaryotic genome
sequences. We investigated whether the protein coding
regions of rice can be enriched by making use of the
differential GC content within the genomes. Here, we
report the construction of a rice plasmid library in which
the coding regions were highly enriched using a S1
pretreatment of partially denatured rice genomic DNA. In
our experiments, most of the cloned sequences obtained
were nonredundant. Additionally, a homology search of
these clones using BLASTN and BLASTX with genes
and ESTs deposited in GenBank showed that 48% of the
clones from the rice plasmid libraries have the same
coding regions or share significant homology to other
known sequences. These data suggested that the coding
regions were significantly enriched in this library. These
data suggested that the coding regions were significantly
enriched in this library when considered at least 35 % and
12 % of genome is transposon and exon, respectively
(International Rice Genome Sequencing Project, 2005).
Materials and Methods
Materials. Rice (Oryza sativa var Nipponbare) was
obtained from the STAFF Institute, Ibaraki, Japan. S1
nuclease, T4 DNA polymerase and ligase were purchased
from TaKaRa (Otsu, Japan). Chemicals and reagents
were purchased from Sigma.
Preparation of rice genomic DNA. Genomic DNA
from rice young seedling was prepared as previously
described in Shure et al. [1983] with minor modifications.
Construction of coding region enriched plasmid
library. The coding sequence enriched plasmid library
was constructed as shown in Fig. 1. Briefly, two
micrograms of genomic DNA was partially denatured
and digested with S1 nuclease by incubation at 68oC for
10 min. (150 units, TaKaRa). DNA was electrophoresed
through a Low Melting Point (LMP) agarose gel. Both
ends of the DNA fragments were filled with T4 DNA
polymerase (TaKaRa) in the presence of dNTP and
subsequently ligated to pBluescript vector prepared by
digestion with SmaI and dephosphorylation with calf
intestine phosphatase (TaKaRa). The ligation mixture
was purified by extraction with phenol: chloroform:
isoamylalcohol (25:24:1) and the DNA was precipitated
by the addition of ethanol in the presence of 2 M
ammonium acetate. Electrocompetent cells (DH10B,
GibcoBRL, Gaithersburg, MD) were transformed with
DNA (3 ng) and spread on ampicillin containing plates.
Plasmid DNA was purified using an automated plasmid
prepmachine equipped with 96 wells (MWG Biotech AG,
Ebersberg, Germany) and then sequenced using an ABI
3700 DNA sequencer (Perkin Elmer, Boston, MA).
Strategy to identify exon containing clones in Rice.
Sequences were automatically processed to trim (quality
value=15) and screen vector sequence and to select
portions with high quality using Phred; sequences of
more than 200 bp with QV>20 were analyzed using a
local BLAST program using BLASTN. Sequences were
processed as described above. As annotated genome
information is not available for rice, we compared the
sequences to GenBank nonredundant (nr) data and EST
libraries from GenBank and Chinese Rice GD databases.
BLASTX was used to locate regions that shared significant
sequence identity (score >100 and expectation value
<1020) with protein sequences deposited in the nr and rice
EST databases [Altschul et al., 1997].
Fig. 1. Schematic representation of the construction ofthe coding region enriched plasmid library from ricegenomic DNA.
S1 Nuclease Treated Library 215
Results
The analysis of the G+C content in the coding region of
rice genome sequences is 54.2 % and it is much higher
than those of the intergenic or intron regions, 42.9 and
38.3%, respectively (International Rice Genome Sequencing
Project, 2005). We reasoned that partial denaturation of
rice genomic DNA would occur preferentially in the
intergenic regions rather than the coding regions and that
S1 nuclease could easily access these partially denatured
regions, thus facilitating the construction of a coding
region enriched plasmid library. To test this, rice genomic
DNA was partially denatured by incubation at 50-70oC
for 10 min in the presence of S1 nuclease (Fig. 2). When
the resulting reaction mixture was fractionated through a
1.5% LMP gel, we found that the genomic fragments
ranging from 0.7 to 1.5 kb had accumulated in the
samples treated at approximately 68oC. This fraction of
genomic DNA was cloned into pBluescript vector and
subsequently sequenced. In total, 1005 sequences were
analyzed using the GenBank local BLAST algorithm.
As shown in Fig. 3A, sequences with a read-length
between 400 and 500 bp were predominant in the S1
nuclease treated rice genomic library. The variation in GC
content of the sequences is illustrated in Fig. 3B. The
average GC content of the clones was 45.8% and
approximately 65% of the clones in the library had more
than 40% GC content. To test whether ligation bias was
involved in the cloning process, the sequence redundancy
of each library was checked using a blastclust algorithm
[Wheeler et al., 2000]. As shown in Fig. 3C, 1,005 paired
sequences were clustered into 991 cliques giving less than
1.1 redundance.
Repetitive elements have been found in most eukaryotic
genomes that have been analyzed. These elements are
found in multiple copies, in some cases thousands of
copies. The two major classes of repetitive elements are
interspersed elements and tandem arrays. The rice
genome is also populated by representatives from all
known transposon superfamilies (International Rice
Genome Sequencing Project, 2005). The transposon
content of the O. sativa ssp. japonica genome is at least
35%. Repeats were analyzed with RepeatMasker (http://
repeatmasker.genome.washington.edu/cgibin/RepeatMasker).
Repeats including retro and DNA elements, simple and
low complex were found in the library. Among these, the
long terminal repeat (LTR) frequently found in monocot
crops was predominant and constituted 14.2% of the total
base pairs sequenced in this analysis (Table 1). DNA
elements found in monocots were around 0.1%. The
proportion of simple and low complexity repeats was less
than 1%. The total number of base pairs consisting of
these repeats was around 15% of the total bases read.
We then asked how many clones contained regions that
were highly homologous to sequences deposited in public
Fig. 2. Partial digestion of rice genomic DNA by S1nuclease treatment at various temperatures. Twomicrograms of each genomic DNA was used and 20% ofthe digested sample was electrophoresed through a 1.5%agarose gel.
Fig. 3. Several features of the S1 nuclease treated ricegenomic library. Distribution of read lengths (A), GCcontent (B), and Redundancy (C) of the clones. Redundancywas tested with a blastcluster algorithm.
216 Yeon-Ki Kim et al.
dabases such as the rice EST and GenBank nr. BlastN and
BlastX analyses were performed are the results are shown
in Fig. 4A. There were 970 chromosomal, 33 chloroplast
and 2 mitochondrial sequences among the 1005 clones
examined (data not shown). The threshold score used to
find a clone that matched one deposited in the nr or EST
databases was 100, which is equivalent to an E value of
around 1020. Among the sequences analyzed in this way,
334 sequences matched EST subject sequences (Fig. 4B).
Among those, top 40 sequences are depicted in Table 2.
To determine how many clones matched sequences in
the nr databases, we performed a BLASTX search using
the criteria described above. Under this condition, 180
sequences had regions which shared significant homology
to protein sequences in the databases (Fig. 4B). Among
those, top 40 sequences are depicted in Table 3. Seventy
sequences were found in both BLASTN and BLASTX
searches (Fig. 4B). In total, 444 sequences (44%) contain
regions that are significantly homologous to sequences
deposited in the databases examined.
Discussion
In cDNA or oligonucleotide array analysis, the global
pattern of gene expression can be monitored by hybridizing
labeled probes to spotted cDNAs or nucleotide oligomers
representing genes of a given genome. The number of
spots in these arrays is limited due to the template source
used for reverse transcription (for cDNA) or due to the
difficulties of gene annotation caused by bulky intergenic
and interrupting (intron) regions of a genome. In an effort
to improve monitoring power and efficiency and to test
global gene expression using DNA arrays, we investigated
whether genomic DNA can be directly spotted on the
slides. For this purpose, rice genomic DNA was partially
denatured and digested with S1 nuclease. Sequences
represented by 1005 clones from the rice library were
used in the analysis. Sequence redundancy using a
Blastclust algorithm was almost 1 for the library,
suggesting that no cloning bias was involved. BLASTN
and BLASTX searches against rice EST and nr databases
showed that 444 (44%) among the 1005 sequences
contained regions that showed significant homology
under the criteria used in the analysis. Although the exact
proportion of coding regions on the rice genome is
unknown, the rice genome draft suggested that it might be
less than 20%. An arithmetical assessment indicated that
the exon enrichment for the rice library was higher than 2
Table 1. Analysis of repeat containing sequences of theS1 nuclease treated rice genomic library
Total sequences: 1005
Repeat elementsNumber of elements
Percentage of sequence
LTR elements 126 14.2
LTR/Rice 101 10.0
LTR/Maize 10 0.01
LTR/Barley 9 0.01
Gypsytype 4 0.00
Copiatype 2 0.00
DNA elements 52 0.10
DNA/Rice 42 0.04
DNA/Sorghum 9 0.01
DNA/Maize 1 0.00
Satellites 7 0.10
Simple repeats 71 0.07
Low complexity 65 0.06
Repeats were analyzed with RepeatMasker (http://
repeatmasker.genome.washington.edu/cgibin/RepeatMasker).
Fig. 4. Strategy to identify clones (A) and theirnumbers (B) containing regions highly homologous tosequences deposited in the rice EST or nr databases.Sequences longer than 200 bp were processed as describedin Methods. Chromosomal, chloroplast, and mitochondrialsequences were assigned by BLASTN analysis against arice genome database. BLASTX and BLASTN were usedto find regions that shared significant sequence identity(score >100 and expectation value <1020) with proteinsequences deposited in the nr and rice EST databases,respectively.
S1 Nuclease Treated Library 217
fold (44/20=2.2). These data suggest that S1 nuclease
treatment following partial denaturation of plant genomic
DNA can remove a significant porportion of intergenic
and intron regions.
Generally, the denaturation of DNA fragments is
determined by the melting temperature (Tm) which is a
function proportional to the GC content. Genome
sequencing projects have shown that the GC content of
the coding regions in rice and Arabidopsis are higher than
those of the intergenic and intron regions (The
Arabidopsis genome initiative, 2000; TiGR rice genome
index). In comparison to the GC content of rice introns
(37%) and exons (56%), we found that the average GC
content of our clones was significantly high (46%). It
could be that intergenic and intron regions are denatured
faster than coding regions during the denaturation
process, and that S1 nuclease can act by digesting the
single stranded DNA in these regions.
In addition, DNA denaturation could be affected by the
frequency of GpC islands and stretches of unmethylated
DNA with a higher frequency of CpG dinucleotides than
found on the genome. It has been reported that the
frequency of CpG islands in the first exon of genes is
much higer than that found in other regions, including
other exons [Ashikawa, 2001; Venter et al., 2001]. In an
oligonucleotide model, consecutive GC base pairs exert a
stabilizing effect while an isolated GC base pair destabilizes
the parallelstranded DNA duplex [Shchyolkina et al.,
2000]. The stabilizing effect of consecutive GC base pairs
in CpG islands might be gretaer than that of randomly
positioned GC base pairs having the same GC content.
The rice genome is populated by representatives from
all known transposon superfamilies (International Rice
Genome Sequencing Project, 2005). Transposable elements
such as retro and DNA elements were found in the
library. Among these transposable elements, an LTR
frequently found in monocot crops was predominant in
the library. Although copia, gypsy, long interspersal
nuclear elements and short interspersed nuclear elements
are relatively rare, it is noteworthy that these repeats
including simple repeats constituted around 15% of the
total base pairs analyzed in this study. This constitutes a
very small portion when compared to the entire rice
genome sequence in which 60% is composed of complex
and simple repeats [Yu et al., 2002]. It is not clear
whether these repeat elements are removed by S1
nuclease treatment or during the cloning process. It has
been reported that some repeated DNAs are very unstable
during E.coli transformation [Hashem et al., 2002]. The
instability of (CAG), (CTG) repeats and a 106 bp perfect
inverted repeat were dramatically elevated upon E.coli
transformation. Since the GC content of many simple and
low complex repeats is quite high, the removal of those
repeats in this library does not appear to be controlled by
Table 2. List of sequences showing identity to ESTs in the rice EST databases (in partial)
SequenceSource sequence
in EST DBE value Sequence
Source sequencein EST DB
E value
JS07RO11 rsicem_0890 0 JS07RH16 sice_464 0
WS0711G06 BI305595 0 JS07RA23 BE230570 0
JS01RN18 rsicee_4333 0 YNU01J22 rsiceb_9958 0
YNU01B14 rsicef_7685 0 JS07RF01 AU076106 0
JS07RO13 BI799271 0 YNU01D22 BI804855 0
YNU01M20 rsiced_12003 0 JS07RN14 siceh_0584 0
YNU01I16 rsicen_3284 0 YNU01K23 C73082 0
JS06RK12 rsicef_1587 0 JS07RI23 C97991 5.04E180
JS06RJ10 D22350 0 WS0711G07 C72648 1.87E176
JS07RK20 BI808953 0 YNU01I18 rsiceg_5726 2.91E175
WS0711J09 rsiceh_20018 0 JS01RN14 siceh_0523 3.54E174
JS07RK14 sicek_0122 0 YNU01G13 C27024 1.77E173
JS01RI18 AU197460 0 JS07RK12 AU161766 9.93E172
WS0711N09 rsicem_14191 0 WS0711J05 rsiceh_7018 1.09E170
YNU01D15 BF430690 0 JS07RM02 BI799060 1.07E168
JS01RG14 rsicem_2617 0 JS06RC19 C28952 1.02E167
JS01RK17 rsicem_25747 0 JS06RK21 AU088708 7.83E166
JS01RC18 AU197502 0 YNU01J15 rsiceb_9958 1.00E165
JS07RD17 AU165526 0 YNU01K13 AU184733 4.38E164
JS06RD19 rsicek_13321 0 JS07RO22 C28607 5.66E163
218 Yeon-Ki Kim et al.
the accessibility of S1 nuclease during the melting
process that is reliant on the GC content. In an effort to
address this issue, it might be necessary to examine at the
molecular level the effect of plant repeat elements on
E.coli transformation.
DNA fractionation based upon relative iteration on the
genome or physical size restricted by endonuclease or
mung bean nuclease (EC 3.1.30.1) has been used as a
means of overcoming difficulties due to the low sequence
complexity of large genomes [Britten and Kohne, 1968;
Reddy et al., 1993; Altshuler et al., 2000; Peterson et al.,
2002]. Peterson et al. used hydroxyapatite chromatography
to fractionate genomic DNA, based on fragment kinetic
behavior, into highly repetitive, moderately repetitive and
Table 3. List of clones showing identity with sequences in the nr databases (in partial)
Sequence Accession number Putative identification E value
JS01RA13 NP_039391 rbcL; RuBisCO large subunit [Oryza sativa] 1.90E23
JS01RB16 AAK92604 Putative retroelement [Oryza sativa] 2.20E49
JS01RB18 AAD27554 Unknown [Oryza sativa subsp. Indica] 2.61E34
JS01RC18 NP_445524 Hypothetical protein [Chlamydophila pneumoniae AR39] 1.87E21
JS01RD18 AAK53831 Unknown protein [Oryza sativa] 2.15E70
JS01RH16 AAK92611 Putative transposable element [Oryza sativa] 2.28E79
JS01RH18 AAK00419 Putative Tam1 transposon protein TNP2 [Oryza sativa] 6.21E78
JS01RJ16 BAA96641 Membrane associated saltinducible protein [Oryza sativa] 4.22E79
JS01RJ18 AAK13115 Hypothetical protein [Oryza sativa] 1.43E32
JS01RK13 NP_437521ABC transporter periplasmic solutebinding protein precursor [Sinorhizobium meliloti]
1.59E44
JS01RK15 NP_200487 Serine acetyltransferase [Arabidopsis thaliana] 1.42E67
JS01RK16 BAA78744 Splicing factor Prp8 [Oryza sativa] 3.75E25
JS01RL16 BAA88542 Polyprotein [Oryza sativa] 2.32E88
JS01RL17 AAL31096 Hypothetical protein [Oryza sativa] 2.49E34
JS01RM15 AAL34970 Putative polyprotein [Oryza sativa] 1.84E72
JS01RN14 AAK55777 Putative polyprotein [Oryza sativa] 6.62E46
JS01RO13 AAK13121 Similar to Sorghum bicolor 22 kD akafirincluster [Oryza sativa] 1.75E65
JS01RO15 AAK55480 Putative transposase related protein [Oryza sativa] 2.45E76
JS01RO17 BAB44089 Putative retrotransposable elements TNP2 [Oryza sativa] 1.34E91
JS01RP17 AAK14415 Putative glutamine synthetase [Oryza sativa] 3.20E24
JS06RB15 BAA90506 Rice gypsytype retrotransposon RIRE2 [Oryza sativa] 5.07E42
JS06RB23 BAA90349 Similar to maize transposon MuDR mudrA protein. [Oryza sativa] 1.04E46
JS06RC16 BAA96622 Hypothetical protein [Oryza sativa] 6.68E70
JS06RC18 AAK13118 Polyprotein [Oryza sativa] 2.70E45
JS06RC19 AAG13514 Mutatorlike transposase [Oryza sativa] 1.98E54
JS06RD11 AAD22153 Polyprotein [Sorghum bicolor] 7.87E28
JS06RD19 NP_039463 Ribosomal protein L2 [Oryza sativa] 2.78E34
JS06RD20 NP_039385 Ribosomal protein S4 [Oryza sativa] 1.12E73
JS06RE09 AAK58693 Sinapyl alcohol dehydrogenase [Populus tremuloides] 5.98E47
JS06RE15 AAK51574 Putative retroelement [Oryza sativa] 2.70E76
JS06RF08 AAK43497 Gagpol precursor [Oryza sativa] 1.93E41
JS06RF10 BAA89558 Hypothetical protein [Oryza sativa] 7.89E23
JS06RF18 AAK71544 Putative polyprotein [Oryza sativa] 3.74E35
JS06RF20 AAK52121 Putative retroelement [Oryza sativa] 1.34E67
JS06RF24 AAK72287 Putative retrotransposon protein [Oryza sativa] 1.70E58
JS06RH02 BAB44128 Hypothetical protein [Oryza sativa] 1.13E26
JS06RH03 NP_039374 RNA polymerase beta'1 [Oryza sativa] 5.68E58
JS06RH22 NP_039431 Hypothetical 29K protein rice chloroplast 9.48E54
JS06RI09 AAL34929 Putative mutatorlike transposase [Oryza sativa] 3.60E43
JS06RI14 AAK51583 Putative retroelement [Oryza sativa] 3.90E36
S1 Nuclease Treated Library 219
single/lowcopy sequence components that were
subsequently cloned to produce genomic libraries. It was
found that the SL library was enriched in gene sequences
and “nonrepetitive ESTs”. This raised the possibility that
clones from each library could be sequenced in
proportion to the kinetic complexity of the component
from which they were derived. Thus, the authors argued
that when this DNA fractionation technique is combined
with a shot gun sequencing strategy, high-throughput
sequencing of a large genome with low complexity can
be achievable more efficiently. In another example,
Altshuler et al. [2000] utilized genomic DNA fractionation
to produce a single nucleotide polymorphism (SNP) map
by constructing and sequencing libraries from specific
subsets of the genome, called reduced representation
shotgun. DNA from several individuals was digested
with enzymes and size fractionated following gel
electrophoresis. This method could facilitate the rapid,
inexpensive construction of SNP maps in biomedically
and agriculturally important species. Mung bean nuclease
has been used to clone intact genes or gene fragments in
numerous protozoans including Plasmodium spp
[McCutchan et al., 1984; Reddy et al., 1993]. When the
enzyme was applied to genomic DNA of Plasmodium
spp., a significant proportion of the clones displayed
sequence similarity to sequences in the nr databases.
These and our experimental data suggest that DNA
fractionation could prove a more effective and flexible
approach (to find SNP or genes) over the whole genome
shot gun method when the genome size is especially large
but its complexity is low, as is the case in higher plants
and animals.
When the library clone sequences were analyzed using
rice EST and nr databases, score and E values of 100 and
1020 were used, respectively. Manual inspection using
BLASTN suggested that 88 nt from a 100 nt average
stretch resulted in matches between query and subject
sequences using the current rice EST databases. The
sequencing of rice and human genomes has revealed that
the average A. thaliana, rice and human exons are 254,
250 bps in average [The Arabidopsis Genome Initiative,
2000; Venter et al., 2001; Yu et al., 2002]. Although the
relationship between the number of perfect and
missmatched nucleotides for hybridization remains to be
tested, the clones prepared in this study might be used for
detecting transcripts derived from cognate or homologous
genes. Thousands of genomic fragments containing exon
regions might be amplified and potentially used to
manufacture slides for microarray analysis. This
approach could be useful when examining an organism
whose entire genome sequence is not available, where the
use of chips or slides might be somewhat limiting.
Acknowledgments. We thank SongHwa Chae for
technical assistance in the sequencing of the libraries, and
Yun-Cheol Shin for data analysis. This work was supported
in part by the Ministry of Science and Technology
through the Crop Functional Genomics Center, by the
Biogreen21 Program, and by the Ministry of Education’s
Brain Korea 21 Project.
References
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z,
Miller W, and Lipman DJ (1997) Gapped BLAST and
PSIBLAST: a new generation of protein database search
programs. Nucleic Acids Res 25, 3389-3402
Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Bald-
win J, Linton L, and Lander ES (2000) An SNP map of
the human genome generated by reduced representation
shotgun sequencing. Nature 407, 513-516.
Ashikawa I (2001) Geneassociated CpG islands in plants as
revealed by analyses of senomic sequences. Plant J 26,
617-625.
Britten RJ and Kohne DE (1968) Repeated sequences in
DNA. Science 161, 529-540.
DeRisi JL, Iyer VR, and Brown PO (1997) Exploring the
metabolic and genetic control of gene expression on a
genomic scale. Science 278, 680-686.
Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M,
Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D,
Hutchison D, Martin C, Katagiri F, Lange BM,
Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T,
Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L,
Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R,
Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S,
Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tav-
tigian S, Mitchell J, Eldredge G, Scholl T, Miller RM,
Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson
R, Feldhaus J, Macalma T, Oliphant A, Briggs S. (2002)
A draft sequence of the rice genome (Oryza sativa L.
ssp. japonica). Science 296, 92-100.
Harmer SL, Hogenesch JB, Straume M, Chang HS, Han B,
Zhu T, Wang X, Kreps JA, and Kay SA (2000) Orches-
trated transcription of key pathways in Aradidopsis by
the circadian clock. Science 290, 2110-2113.
Hashem VI, Klysik EA, Rosche WA, and Sinden RR (2002)
Instability of repeated DNAs during transformation in
Escherichia coli. Mutat Res 502, 3946.
International Rice Genome Sequencing Project (2005) The
map-based sequence of the rice genome. Nature 436,
793-800.
Jelinsky SA and Samson LD (1999) Global response of
Saccharomyces cerevisiae to an alkylating agent. Proc
Natl Acad Sci USA 96, 1486-1491.
Kawasaki S, Borchert C, Deyholos M, Wang H, Brazille S,
Kawai K, Galbraith D, and Bohnert HJ (2001) Gene
expression profiles during the initial phase of salt stress
in rice. Plant Cell 13, 889-905.
220 Yeon-Ki Kim et al.
McCutchan TF, Hansen JL, Dame JB, and Mullins JA
(1984) Mung bean nuclease cleaves Plasmodium
genomic DNA at sites before and after genes. Science
225, 625-628.
McDonald MJ and Rosbash M (2001) Microarray analysis
and organization of circadian gene expression in Droso-
phila. Cell 107, 567-578
Negishi T, Nakanishi H, Yazaki J, Kishimoto N, Fujii F,
Shimbo K, Yamamoto K, Sakata K, Sasaki T, Kikuchi
S, Mori S, and Nishizawa NK (2002) cDNA microarray
analysis of gene expression during Fe-deficiency stress
in barley suggests that polar transport of vesicles is
implicated in phytosiderophore secretion in Fe-deficient
barley roots. Plant J 30, 83-94
Pennisi E (2001) The human genome. Science 291, 1177-
1180.
Peterson DG, Schulze SR, Sciara EB, Lee SA, Bowers JE,
Nagel A, Jiang N, Tibbitts DC, Wessler SR, and Pater-
son AH (2002) Integration of Cot analysis, DAN clon-
ing, and highthroughput sequencing facilitates genome
characterization and gene discovery. Genome Res 12,
795-807.
Reddy GR, Chakrabarti D, Schuster SM, Ferl RJ, Almira
EC, and Dame JB (1993) Gene sequence tags from
Plasmodium falciparum genomic DNA fragments pre-
pared by the “genease” activity of mung bean nuclease.
Proc Natl Acad Sci USA 90, 9867-9871.
Shchyolkina AK, Borisova OF, Livshits MA, Pozmogova
GE, Chernov BK, Klement R, and Jovin TM (2000)
Parallelstranded DNA with mixed AT/GC composition:
role of trans G.C base pairs in sequence dependent heli-
cal stability. Biochem 39, 10034-10044.
Shure M, Wessler S, and Fedoroff N (1983) Molecular iden-
tification and isolation of the waxy locus in maize. Cell
35, 225-233
The Arabidopsis Genome Initiative (2000) Analysis of the
genome sequence of the flowering plant Arabidopsis
thaliana. Nature 408, 796-815.
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sut-
ton GG, Smith HO, Yandell M, Evans CA, Holt RA, et
al. (2001) The sequence of the human genome. Science
291, 1304-1350.
Wheeler DL, Chappey C, Lash AE, Leipe DD, Madden TL,
Schuler GD, Tatusova TA, and Rapp BA (2000) Data-
base resources of the national center for biotechnology
information. Nucleic Acids Res 28, 1014.
Yu J, Hu S, Wang J, Wong GKS, Li S, Liu B, Deng Y, Dai
L, Zhou Y, Zhang X, et al. (2002) A draft sequence of
the rice genome (Oryza sativa L. ssp. indica). Science
296, 79-92
Yuan Q, Quackenbush J, Sultana R, Pertea M, Salzberg SL,
and Buell CR (2001) Rice bioinformatics. Analysis of
rice sequence data and leveraging the data to other plant
species. Plant Physiol 125, 1166-1174.
Zirlinger M, Kreiman G, and Anderson DJ (2001)
Amygdalaenriched genes identified by microarray tech-
nology are restricted to specific amygdaloid subnuclei.
Proc Natl Acad Sci USA 98, 5270-5275.