characterization of the global transcriptome and ...the genetics society of korea and...
TRANSCRIPT
RESEARCH ARTICLE
Characterization of the global transcriptome and microsatellitemarker information for spotted halibut Verasper variegatus
Jianlong Ge1 • Siqing Chen1 • Changlin Liu1 • Li Bian1 • Huiling Sun1 •
Jie Tan1
Received: 7 September 2016 / Accepted: 12 November 2016
� The Genetics Society of Korea and Springer-Science and Media 2016
Abstract The spotted halibut Verasper variegatus is an
economically important flatfish species distributed in
Japan, Korea and China. However, the genomic resources
regarding this species were scarcity, which hindered our
understanding of the genetics and biological mechanisms
in spotted halibut. In this study, we examined the global
transcriptome from six major tissues of spotted halibut.
Approximately 40 million of high quality reads were
generated using Illumina paired-end sequencing technol-
ogy. More than 9 Gbp data were generated, and de novo
assembled into 59,235 unigenes, with an N50 of 938 bp.
Based on sequence similarity search with known protein
database, 34,084 (57.5%) showed significant similarity to
known proteins in Nr database, and 28,875 (48.7%) had
BLAST hits in Swiss-Prot database. 19,562 and 23,037
unigenes were assigned into gene ontology categories and
clusters of orthologous group, respectively. 9138 unigenes
were mapped to 211 KEGG pathways. For functional
marker development, 13,322 candidate simple sequence
repeats were identified in the transcriptome and 7235 pri-
mer pairs were successfully designed. Among 72 primer
pairs selected for validation, 67 (93.1%) were successful in
PCR amplification and 14 (19.4%) exhibited obvious
repeat length polymorphisms in a culture spotted halibut
population. The transcriptomic data and microsatellite
markers will provide valuable resources for future func-
tional gene analyses, genetic map construction, and quan-
titative trait loci mapping in V. variegatus.
Keywords Verasper variegatus � Illumina sequencing �Transcriptome analysis � SSR markers
Introduction
Spotted halibut Verasper variegatus belongs to the family
Pleuronectidae, distributed around northeastern Asian
coastal waters (Sekino et al. 2010). This flatfish has been
recognized as a promising marine fish species for resource
enhancement and aquaculture in Japan, Korea and China,
because of its high market price, high growth performance
and limited availability of natural resources (Wada et al.
2011). In recent years, the V. variegatus industry has
developed quickly with the breakthrough of its large-scale
artificial breeding technology in China. However, because
of the loss of genetic diversity and intensive culture, the V.
variegatus industry are facing challenges from germplasm
degeneration and poor diseases resistance. Therefore, it
will be necessary to apply molecular genetics tools to
protect germplasm resources and promote genetic selection
for improvement of growth rate and disease resistance.
Transcriptome sequences and functional markers are
highly valuable resources for us to understand molecular
genetic mechanisms and to perform molecular aided
selection of spotted halibut. With the advent and rapid
development of the next generation sequencing (NGS)
technologies, it provided a reliable and cost-effective
approach for the transcriptome sequencing of non-model
species and also aquaculture organisms (Long et al. 2013).
Electronic supplementary material The online version of thisarticle (doi:10.1007/s13258-016-0496-1) contains supplementarymaterial, which is available to authorized users.
& Siqing Chen
1 Key Laboratory for Sustainable Utilization of Marine
Fisheries Resources, Ministry of Agriculture, Yellow Sea
Fisheries Research Institute, Chinese Academy of Fishery
Sciences, Qingdao 266071, Shandong, China
123
Genes Genom Online ISSN 2092-9293
DOI 10.1007/s13258-016-0496-1 Print ISSN 1976-9571
Nowadays, RNA-seq has been employed in various aqua-
culture studies, such as transcriptome survey (Li et al.
2015b; Lim et al. 2015; Zhong et al. 2016), molecular
marker development (Tran et al. 2015; Zheng et al. 2014)
and differential gene expression analysis (Lv et al. 2014;
Zhu et al. 2016). The transcriptome sequences can provide
a great database for simple sequence repeats (SSR)
development, which plays an important role in genetic
researches such as linkage map construction, population
genetic studies, molecular marker assisted breeding and so
on (Kucuktas et al. 2009; Yi et al. 2015; Zheng et al. 2014).
Compared to genomic SSRs, transcriptome derived SSRs
(EST-SSRs) are more conserved (Ellis and Burke 2007)
and can help to map candidate functional genes and
increase the efficiency of marker assisted selection (Ku-
cuktas et al. 2009).
As one of the most economic importantly marine fish
species, researches have mainly focused on dietary nutri-
tion (Lv et al. 2015a, b), reproduction biology (Chen et al.
2006; Xu et al. 2011) and breeding technology (Shimizu
et al. 2012), while the molecular genetic study for V.
variegatus is seriously lacking. In total, 533 expressed
sequence tag (EST) sequences in spotted halibut have been
deposited in the NCBI GenBank databse (http://www.ncbi.
nlm.nih.gov/; as at Aug. 2016). A few functional genes
were described by homology cloning (Li et al. 2011, 2012)
and dozens of genomic SSR marker (Ma and Chen 2009;
Ma et al. 2009; Sekino et al. 2007) were developed up to
now.
In the present study, we sampled the pooled transcrip-
tomes of brain, liver, spleen, kindey, muscle and skin of V.
variegatus using the Illumina paired-end sequencing tech-
nology to generate a large-scale EST database and develop
tens of thousands of SSRs. To the best of our knowledge,
this is the first comprehensive transcriptome analysis for
this species. The results of this study will provide a valu-
able resource for further genetic and molecular studies of
V. variegatus.
Materials and methods
Sample collection
Six 8-month-old juvenile V. variegatus with average body
weight of 58.5 ± 2.8 g were obtained from a culture
population of Rushan, Shandong Province, China. All the
fish were anesthetized and sacrificed by decapitation. Tis-
sue samples of brain, liver, spleen, kindey, skin and
skeletal muscle were collected immediately and stored in
RNA-protector (TaKaRa). Meanwhile, thirty 8-month-old
individuals from the same culture population were
randomly collected for transcriptome derived SSR marker
validation. Fin tissue was cut and stored in 70% alcohol at
-30 �C until DNA extraction.
RNA extraction, cDNA library construction
and sequencing
Total RNA was extracted from six tissues of each indi-
viduals using Trizol Reagent (Invitrogen, USA) according
to the manufacturer’s instructions. RNA concentration and
integrity were measured using an UV/visible spectropho-
tometer and gel electrophoresis. Equal amounts of high-
quality RNA from each sample were pooled for RNA-seq
library construction.
A cDNA library was constructed following the manu-
facturer’s instructions (Illumina, San Diego, USA). Briefly,
mRNA was enriched from total RNA using Oligo-(dT)
beads. Then the enriched mRNA was fragmented into short
fragments using fragmentation buffer and reverse tran-
scripted into cDNA with random primers. The second-
strand cDNA synthesis using buffer, dNTPs, Rnase H and
DNA polymerase I. The cDNA fragments were then puri-
fied with QiaQuick PCR extraction kit (Qiagen, Germany),
end repaired, poly(A) added, and ligated to Illumina
sequencing adapters. The ligation products were size
selected (200 ± 25 bp) by agarose gel electrophoresis,
PCR amplified, and sequenced on an Illumina HiSeqTM
2000 platform in 125 bp pair-ended mode.
Quality control and de novo transcriptome assembly
The raw reads were filtered to obtain high-quality clean
reads prior to assembly. This was performed by removing
low-quality reads that with more than 10% of bases with
Q-value\20, ambiguous reads containing more than 10%
unknown bases, and reads containing adaptor sequences.
The clean reads were assembled using the short read
assembling program Trinity (Grabherr et al. 2011). Trinity
is a modular method and software package which combines
three components: inchworm, chrysalis and butterfly.
Firstly, inchworm assembles reads by a greedy k-mer based
approach, resulting in a collection of linear contigs. Next,
chrysalis clusters related contigs that correspond to por-
tions of alternatively spliced transcripts or otherwise
unique portions of paralogous genes, and then builds a de
Bruijn graphs for each cluster of related contigs. Finally,
butterfly analyzes the paths taken by reads and read pair-
ings in the context of the corresponding de Bruijn graph,
and outputs one linear sequence for each alternatively
spliced isoform and transcripts derived from paralogous
genes. All the assembled sequences were defined as
unigenes.
Genes Genom
123
Functional annotation of unigenes
All the unigenes were searched against the NCBI non-re-
dundant (Nr) protein database (http://www.ncbi.nlm.nih.
gov/) and the Swiss-Prot protein database (http://www.
expasy.ch/sprot) using BLASTx with E-value less than
1.0 9 10-5. Protein sequences from the databases which
had the highest similarity scores were used as the func-
tional annotation for the related unigene. Based on NCBI
Nr annotation, BLAST2GO program (http://www.BLAS
T2go.org/) was used to get GO annotation of unigenes
(Conesa et al. 2005) and GO functional classification for all
unigenes was then performed using WEGO software
(http://wego.genomics.org.cn/cgi-bin/wego/index.pl) (Ye
et al. 2006). Function annotation of unigenes were carried
out by BlastX searching against the clusters of orthologous
groups (COG) database (http://www.ncbi.nlm.nih.gov/
COG/) (Tatusov et al. 2000). Meanwhile, the unigenes
were also aligned to the kyoto encyclopedia of genes and
genomes (KEGG) pathway database (http://www.genome.
jp/kegg) to annotate the possible metabolic pathways
(Kanehisa et al. 2008).
Identification and validation of transcriptome
derived SSRs
To explore the distribution of potential SSR core motifs
in the assembled transcriptome, all the unigenes were
scanned for the presence of SSRs with the simple
sequence repeat identification software MIcroSAtellite
identification tool Version 1.0 (MISA, http://pgrc.ipk-
gatersleben.de/misa/) (Thiel et al. 2003). The minima of
contiguous repeat units were set as dimer-6, trimer-5,
tetramer-4, pentamer-4, and hexamer-4. Primer pairs for
microsatellite loci were designed based on the unique
flanking regions of each microsatellite locus using primer
3 (Rozen and Skaletsky 2000). Polymorphism was tested
in the 30 randomly collected individuals. Genomic DNA
was extracted from the fin sample using the extraction kit
for marine animals (Tiangen Biotech, Beijing, China).
PCR was performed on a Thermal Cycler (Takara) in a
total volume of 10 lL containing 0.4 lM of each primer,
109 PCR buffer, and 50 ng DNA. Cycling conditions
consisted of initial denaturation at 94 �C for 5 min; 35
cycles of 30 s at 94 �C, 30 s at the annealing temperature,
30 s at 72 �C; a final extension of 5 min at 72 �C and
stored at 4 �C. The amplified products were separated on
8% polyacrylamide gels and visualized by silver staining.
The expected and observed heterozygosity were calcu-
lated using Popgene32 software. Polymorphism informa-
tion content (PIC) was calculated using the PIC-CALC
0.6 software.
Results
Transcriptome sequencing and de novo assembly
In order to obtain a broad transcriptome data, a pooled
cDNA library of multiple tissues including brain, liver,
spleen, kindey, skin and skeletal muscle was constructed
and sequenced using the Illumina HiSeqTM 2000 platform.
A total of 41,099,174 raw reads with a length of 125 bp
were produced. The raw read files have been deposited in
the NCBI Sequences Read Archive (accession number:
SRA455839). After removing adaptor sequences and dis-
carding low-quality reads, 39,760,916 clean reads were
received (Table 1). The remaining high-quality reads were
finally assembled into 59,235 unigenes with average length
of 938 bp and N50 of 1735 bp. The length distribution of
the unigenes showed that 30,548 (51.57%) unigenes were
200–499 bp long, 12,347 (20.84%) were 500–999 bp long,
8934 (15.08%) were 1000–1999 bp long, 3975 (6.71%)
were 2000–2999 bp long, and 3431 (5.79%) were longer
than 3000 bp (Fig. 1). The final assembled sequences and
detailed gene annotations were presented in file S1 and file
S2, respectively.
Functional annotation
Functional annotation of the non-redundant unigenes was
carried out by searching the transcripts against public Nr
and Swiss-Prot databases. As a result, 34,084 (57.5%) and
28,875 (48.7%) unigenes showed significant similarities (E
value\1.0 9 10-5) to the know sequence databases,
respectively (Table 1). Based on the BLASTx similarity
analysis, the unigenes matched sequences from a range of
fish species (Fig. 2). Among which, the highest number of
hits were to Larimichthys crocea (10,352, 30.31%), fol-
lowed by Stegastes partitus (8310, 24.33%), Notothenia
coriiceps (2206, 6.46%) and Oreochromis niloticus (2015,
5.90%). In total, about 6.6% of the annotated unigenes
shared similar sequences with the flatfish species, among
which, Cynoglossus semilaevis was the largest (1820,
5.32%) and 17 unigenes were annotated to V. variegatus.
GO, COG and KEGG classification
The potential functions of the unigenes were determined
using gene ontology (GO) databases and 19,562 unigenes
were categorized by GO analysis. Second-level GO terms
were used to classify the involvement terms of unigenes in
three main categories (cellular component, molecular
function and biological process) and each unigene was
assigned to one or more GO term. In this study, 7833
unigenes are involved in cellular component category,
Genes Genom
123
among which, ‘cell’ (6562, 20.24%) and ‘cell part’ (6562,
20.24%) were most abundant, followed by ‘membrane’
(4908, 15.14%) and ‘organelle’ (4832, 14.90%) (Fig. 3).
Further, 11,528 unigenes are involved in molecular
function category, and they showed a significant proportion
of clusters assigned to ‘binding’ (10,383 unigenes) and
‘catalytic activity’ (6909 unigenes). Additionally, 11,343
unigenes are involved in various biological process
Fig. 1 Length distribution of
all unigenes of V. variegatus
Fig. 2 Top 20 hit species
distribution based on BLASTp
Table 1 Summary of Illumina
transcriptome, assembly and
annotation for V. variegatus
Raw results (after trimming) Assembly results Annotation results
Clean bases (G) 9.26 Unigenes 59,235 Nr annotations 34,084
Read pairs 39,760,916 Average length (bp) 938.71 Swissprot 28,875
Read length (bp) 125 Min–max length (bp) 201–16,457 COG 23,037
N50 (bp) 1735 KEGG 9138
Genes Genom
123
categories, the dominant subcategories were ‘cellular pro-
cess’ (9398 unigenes), ‘single-organism process’ (8422
unigenes), ‘metabolic process’ (6949 unigenes) and ‘bio-
logical regulation’ (5941 unigenes).
For functional prediction and classifications, all unige-
nes were aligned to the COG database. Together, 23,037
unigenes were grouped into 25 COG classifications
(Fig. 4). More than half of the unigenes are distributed in
the three main groups, (R) General function prediction only
(21.81%), (T) signal transduction mechanisms (18.14%),
and (O) posttranslational modification, protein turnover,
chaperones (10.52%). In contrast, there are less than 10%
of the unigenes unevenly distributed in each of the
remaining subcategories, such as (Y) nuclear structure and
(N) cell motility, accounting for 0.45 and 0.31%,
respectively.
The pathway analysis is able to help us better understand
the biological functions of genes. In this study, 9138 uni-
genes were assigned into five major categories: metabo-
lism, cellular processes, genetic information processing,
environmental information processing and organismal
systems. The detailed subcategories and distribution in
each major category are shown in Fig. 5. In total, 211
pathways were obtained and the number of unigenes in
these pathways ranged from 1 to 610. The largest enrich-
ment pathway was ‘‘Endocytosis’’, which contained 610
unigenes (6.68% of total unigenes annotated). Other major
pathways were ‘‘Focal adhesion’’ (533, 5.83%), ‘‘MAPK
signaling pathway’’ (531, 5.81%), ‘‘Calcium signaling
pathway’’ (531, 5.81%) and ‘‘Neuroactive ligand–receptor
interaction’’ (467, 5.11%).
Frequency and distribution of SSRs
To further assess the assembly quality and to develop
molecular markers, all unigenes were used to mine
potential SSRs. A total of 13,322 potential SSR loci were
identified in 9658 unigenes and 2487 unigenes contained
more than one SSR. The SSR frequency was 22.5% and the
average distribution distance was 4.1 kb. Detailed analysis
showed that di-nucleotide was the most common repeat
unit (57.09%), followed by tri- (35.3%), tetra- (5.5%),
hexa- (1.2%) and penta-nucleotide (0.9%) repeats
(Table 2). The number of repeat units of the di-nucleotide
motifs was distributed mainly from 6 to 10, and the tri-,
tetra-, penta- and hexa-nucleotide mainly contained 5–7,
4–5, 4 and 4 repeat units, respectively (Table 2). The
dominant repeat motif in SSRs was AC/GT (40.3%), fol-
lowed by AG/CT (11.8%), AGG/CCT (10.6%), AGC/CTG
(9.8%), AT/AT (4.9%) and AAG/CTT (4.5%) (Fig. 6).
SSR marker validation
A total of 7235 primer pairs were successfully designed
from 6085 SSR-containing sequences (Table S3) and 72
primer pairs were randomly selected for validation. Among
the 72 primer pairs, 67 (93.1%) were successful in PCR
amplification with the genomic DNA of spotted halibut.
Among the 67 successful primer pairs, 59 (88.1%) PCR
products were as sizes as they expected, while the other 8
(11.9%) PCR products were either shorter or longer than
that expected. After the products were separated on poly-
acrylamide gels, 14 (19.4%) of the microsatellite loci
Fig. 3 Gene ontology (GO) analysis and functional classification of the V. variegatus transcriptome
Genes Genom
123
showed polymorphisms. The number of effective alleles in
the SSR loci varied from 2 to 4 and the PIC values ranged
from 0.141 to 0.692 (Table 3).
Discussion
In the present study, we conducted a comprehensive study
on the de novo assembly and characterization of the tran-
scriptome of spotted halibut and developed a large number
of SSR markers based on the transcriptome information
obtained. To the best of our knowledge, this is the first
exploration of the transcriptome of this species through the
analysis of large-scale transcript sequences. In total, 59,235
unigenes were assembled with an average length of
938 bp,which was comparable with recent Illumina
sequencing reports in marine medaka (984 bp) (Kim et al.
2015) and blunt snout bream (998 bp) (Tran et al. 2015).
The length distribution pattern that approximately 30% of
the unigenes exceeded 1000 bp was similar to Trachinotus
ovatus (Xie et al. 2014) and common carp Cyprinus carpio
(Li et al. 2015b). These assembly results suggested that the
sequence data from the spotted halibut was effectively
assembled.
To predict the functions of the transcriptome sequences,
all the unigenes were annotated by searches against public
databases. As a result, 34,084 unigenes, which took up an
approximate proportion of 57.5%, were assigned at the Nr
database. In the present study, the most homology hits in the
Nr search were to L. crocea (8310, 24.33%) and among
flatfisheswere toC. semilaevis. The draft genomes of the two
species were published already in 2014 (Chen et al. 2014;
Wu et al. 2014). As expected unigenes of spotted halibut
transcriptome matched well to proteins of other fish species,
especially species with reported genome. Still, a consider-
able proportion of unigenes failed to find hits in any of the
databases. Previous studies on transcriptome analyses indi-
cated that unannotated sequences mainly represent tran-
scripts of spanning only untranslated mRNA regions,
chimeric sequences derived from assembly errors (Wang
et al. 2004) and containing non-conserved protein regions
(Mittapalli et al. 2010). Some may also be components of
novel genes specific to this species, which are likely to be
matched to certain genome sequences in the near future.
Fig. 4 Clusters of orthologous group (COG) functional classification of the V. variegatus transcriptome
Genes Genom
123
In this study, a large number of unigenes were assigned
to a wide range of gene ontology categories and COG
classification, which indicated that our transcriptome data
represented a broad diversity of transcripts in spotted hal-
ibut. Further, the distribution and composition of the
assigned GO terms were very similar to those reported in
other fish species, such as T. ovatus (Xie et al. 2014),
Scophthalmus maximus (Ma et al. 2016), and Paramis-
gurnus dabryanus (Li et al. 2015a), indicating the func-
tional distribution of conserved genes. In addition, a large
percentage of the unigenes were mapped into KEGG
pathways associated with amino acid metabolism, lipid
metabolism, immune system and endocrine system. These
annotations provide a valuable resource for investigating
specific progress, functions and pathways in future V.
variegatus research.
Although SSR markers play an important role in the
genetic researches, very limited marker information was
available for spotted halibut. In the present study, 13,322
potential SSR loci were identified from 56,235 unigenes,
Fig. 5 Identified KEGG pathways of assembled unigenes
Table 2 Distribution of
identified SSRs in V. variegatus
transcriptome
Type Repeat number Total %
4 5 6 7 8 9 10 11–15 [15
Di- – – 2236 1311 984 806 696 1110 463 7606 57.1
Tri- – 2215 1150 652 167 160 215 118 24 4701 35.3
Tetra- 464 138 42 49 10 5 5 18 1 732 5.5
Penta- 86 9 14 2 2 1 1 3 0 118 0.9
Hexa- 102 42 6 6 3 6 0 0 0 165 1.2
Total 652 2404 3448 2020 1166 978 917 1249 488 13,322 100
% 4.89 18.05 25.88 15.16 8.75 7.34 6.88 9.38 3.66 100
Genes Genom
123
Fig. 6 Frequency of classified
repeat types of SSRs in V.
variegatus transcriptome
Table 3 Characteristics of 14
polymorphic microsatellite lociLocus Core motif Primer (50–30) Tm (�C) Na PIC Ho He
ShE01 (CTC)6 GCTGGATTCATCTCTCAGCC
CCTCTGCTTCTTCTGCTGCT
61 2 0.282 0.033 0.345
ShE05 (TG)10 GAACCCGCTTCAACTACGAC
CTTGGAAACCAAAGAGCGAG
60 3 0.460 0.533 0.524
ShE21 (TG)10 TCTGACTGGATGGTGTTGGA
GACTTTAGGCCGAGAAGGCT
60 4 0.632 0.567 0.693
ShE25 (GAG)5 GAACTCACGACTACGACGCA
CGGAACACACAAAGAGCAGA
60 2 0.141 0.167 0.155
ShE27 (TGC)5…(TGT)5 GTTGGGCCATCTGAGACAAG
CTCTACCGTCGGCAGCTCT
60 2 0.222 0.233 0.259
ShE33 (TG)8 AACCCAGCAGTTGTCATTGAT
CTCCTCGATGCTTTTCATCC
60 4 0.692 0.600 0.753
ShE38 (TG)12 CAGAAGTGGTCTCGCGTGTA
TTTTCATGCAACAAAGGCAA
60 2 0.370 0.857 0.499
ShE50 (CAG)6…(CAG)8 AACCAGGACCTCAGTCATCG
ATCATTCGTGGGTGGTTCAT
62 3 0.523 0.733 0.610
ShE54 (GGTTCA)4 TTTCATGATGGTCTGCTGGA
CTTGAATCCGAAGAGAACCG
62 2 0.185 0.233 0.210
ShE58 (GCA)14 GATGATGAACGCTGCTCAAA
CACCTCCTTGCATAGCCATT
62 4 0.662 0.500 0.725
ShE62 (AAC)6 GGTAAAACTCCTTCGCTCCC
CAGGTGTTGTTTGTGGATGC
62 2 0.372 0.296 0.503
ShE68 (CAG)7…(CAG)7 GTTGCAACAGCAGCAACAGT
TGTTGAGATTTCCCATCGGT
62 3 0.505 0.600 0.599
ShE71 (AC)11 ACCATGAAAGTGTCTGCGTG
CACAGGCAATAAAGCGATGA
62 3 0.461 0.367 0.569
ShE72 (GAT)6 GATCTTGCAGTCCTCCTTGC
TTGTCAAGCTCATCGTCGTC
62 4 0.329 0.400 0.353
Tm, specific annealing temperature; Na, number of alleles; Ho, observed heterozygosity; He, expected
heterozygosity
PIC polymorphism information content
Genes Genom
123
which suggested that every EST sequence possesses an
average number of 0.24 SSR, similar to P. dabryanus 0.21
(Li et al. 2015a) and Haliotis midae 0.34 (Franchini et al.
2011). The distribution density was one SSR per 4.1 kb,
higher than reported for other fish species including Me-
galobrama amblycephala 9.53 kb (Gao et al. 2012) and P.
dabryanus 6.99 kb (Li et al. 2015a), while lower than C.
carpio 3.9 kb (Li et al. 2015b) and H. midae 0.756 kb
(Franchini et al. 2011). The distribution density of SSR is
influenced by several factors, including genome structure
or composition (Toth et al. 2000), SSR detection criteria,
dataset size, database-mining tools and the parameters for
exploration of microsatellites (Wei et al. 2008). Among
five repeat types, di-nucleotide repeats were the most
common type (57.09%) and the AC/GT motif accounted
for the majority of SSRs (40.3%), which was consistent
with the results in several fish species (Ma et al. 2016; Tran
et al. 2015; Xie et al. 2014) and Toth’s survey in vertebrate
animal (Toth et al. 2000).
The transcriptome sequencing provided numerous EST
sequences for developing EST-SSR markers. In the present
study, 7235 SSR primer pairs were successfully designed
from the transcriptome. The validation of 72 primer pairs
showed that 93.1% of the microsatellite loci were suc-
cessful in PCR amplification and 19.4% showed poly-
morphisms. These results further suggested that the
assembled transcripts were of high quality and SSRs
identified in our dataset are expected to be useful in future
genetic studies. The rate of polymorphic microsatellites
isolated in this study was lower than that in M. ambly-
cephala (44.4%) (Gao et al. 2012), P. dabryanus (79.9%)
(Li et al. 2015a) and Paphia textile (53.75%) (Chen et al.
2016), perhaps because the tested individuals came from
the same fish farm. More polymorphic microsatellites may
be developed if more geographically distant populations
were examined.
Conclusion
Here we report the first transcriptome study in spotted
halibut V. variegatus, an economical important flatfish in
Northeast Asia. The large amount of generated sequences
(59,235 putative unigenes) will enrich genomic resources
in spotted halibut and therefore to improve available
sequence database for gene discovery. In addition, many
SSR loci in the transcriptome were discovered and the
candidate markers identified in this study will be useful for
the construction of linkage maps, population genetic
studies and so on.
Acknowledgements We thank Guangzhou Gene denovo Biotech-
nology Co., Ltd for help with the Illumina sequencing of the cDNA
library and bio-informatic analysis. This study was grant from Special
Scientific Research Funds for Central Non-profit Institutes, Yellow
Sea Fisheries Research Institute, Chinese Academy of Fishery Sci-
ences (20603022016005).
Compliance with ethical standards
Conflict of interest Jianlong Ge, Siqing Chen, Changlin Liu, Li
Bian, Huiling Sun and Jie Tan declares that there is no conflict of
interest.
Human and animal rights The animals used in the present study
were artificially cultivated, and all experimental treatments are
implemented according to the recommendations in the Guide for the
Care and Use of Laboratory Animals of the National Institutes of
Health. The study protocol was approved by the Experimental Animal
Ethics Committee, Yellow Sea Fisheries Research Institute, Chinese
Academy of Fishery Sciences, China.
References
Chen S, Gao T, Wang C, Zhang Y, Zhang X, Chen Y (2006) Study on
developmental characters in early stage of spotted halibut
Verasper variegatus. Period Ocean Univ China 36:281–286 (in
Chinese)
Chen S, Zhang G, Shao C et al (2014) Whole-genome sequence of a
flatfish provides insights into ZW sex chromosome evolution and
adaptation to a benthic lifestyle. Nat Genet 46:253–260
Chen X, Li J, Xiao S, Liu X (2016) De novo assembly and
characterization of foot transcriptome and microsatellite marker
development for Paphia textile. Gene 576:537–543
Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M
(2005) Blast2GO: a universal tool for annotation, visualization
and analysis in functional genomics research. Bioinformatics
21:3674–3676
Ellis JR, Burke JM (2007) EST-SSRs as a resource for population
genetic analyses. Heredity 99:125–132
Franchini P, Van der Merwe M, Roodt-Wilding R (2011) Transcrip-
tome characterization of the South African abalone Haliotis
midae using sequencing-by-synthesis. BMC Res Notes 4:1–11
Gao Z, Luo W, Liu H, Zeng C, Liu X, Yi S, Wang W (2012)
Transcriptome analysis and SSR/SNP markers information of the
blunt snout bream (Megalobrama amblycephala). PLoS ONE
7:e42637
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length
transcriptome assembly from RNA-Seq data without a reference
genome. Nat Biotechnol 29:644–652
Kanehisa M, Araki M, Goto S et al (2008) KEGG for linking genomes
to life and the environment. Nucleic Acids Res 36:D480–D484
Kim BM, Choi BS, Kim HS, Rhee JS, Au DW, Wu RS, Choi IY, Lee
JS (2015) Transcriptome profiling of larvae of the marine
medaka Oryzias melastigma by Illumina RNA-seq. Mar Geno-
mics 24:255–258
Kucuktas H, Wang S, Li P et al (2009) Construction of genetic
linkage maps and comparative genome analysis of catfish using
gene-associated markers. Genetics 181:1649–1660
Li H, Jiang L, Han J, Su H, Yang Q, He C (2011) Major
histocompatibility complex class IIA and IIB genes of the
spotted halibut Verasper variegatus: genomic structure, molec-
ular polymorphism, and expression analysis. Fish Physiol
Biochem 37:767–780
Li H, Fan J, Liu S, Yang Q, Mu G, He C (2012) Characterization of a
myostatin gene (MSTN1) from spotted halibut (Verasper
Genes Genom
123
variegatus) and association between its promoter polymorphism
and individual growth performance. Comp Biochem Physiol B
161:315–322
Li C, Ling Q, Ge C, Ye Z, Han X (2015a) Transcriptome
characterization and SSR discovery in large-scale loach
Paramisgurnus dabryanus (Cobitidae, Cypriniformes). Gene
557:201–208
Li G, Zhao Y, Liu Z, Gao C, Yan F, Liu B, Feng J (2015b) De novo
assembly and characterization of the spleen transcriptome of
common carp (Cyprinus carpio) using Illumina paired-end
sequencing. Fish Shellfish Immunol 44:420–429
Lim H-J, Lim J-S, Lee J-S, Choi B-S, Kim D-I, Kim H-W, Rhee J-S,
Choi I-Y (2015) Transcriptome profiling of the Pacific oyster
Crassostrea gigas by Illumina RNA-seq. Genes Genom
38:359–365
Long Y, Li Q, Zhou B, Song G, Li T, Cui Z (2013) De novo assembly
of mud loach (Misgurnus anguillicaudatus) skin transcriptome to
identify putative genes involved in immunity and epidermal
mucus secretion. PLoS ONE 8:e56998
Lv J, Liu P, Gao B, Wang Y, Wang Z, Chen P, Li J (2014)
Transcriptome analysis of the Portunus trituberculatus: de novo
assembly, growth-related gene identification and marker discov-
ery. PLoS ONE 9:e94055
Lv Y, Chang Q, Chen S, Yu C, Qin B, Wang Z (2015a) Effect of
dietary protein and lipid levels on growth and body composition
of spotted halibut, Verasper variegatus. J World Aquac Soc
46:311–318
Lv Y, Chen S, Yu C, Chang Q, Qin B, Wang Z (2015b) The effects of
ratio of dietary protein to lipid on the growth, digestive enzyme
activities and blood biochemical parameters in spotted halibut,
Verasper variegatus. Prog Fish Sci 36:118–124 (in Chinese)
Ma H, Chen S (2009) Isolation and characterization of 31 polymor-
phic microsatellite markers in barfin flounder (Verasper moseri)
and the cross-species amplification in spotted halibut (Verasper
variegatus). Conserv Genet 10:1591–1595
Ma HY, Bi JZ, Shao CW, Chen Y, Miao GD, Chen SL (2009)
Development of 40 microsatellite markers in spotted halibut
(Verasper variegatus) and the cross-species amplification in
barfin flounder (Verasper moseri). Anim Genet 40:576–578
Ma D, Ma A, Huang Z, Wang G, Wang T, Xia D, Ma B (2016)
Transcriptome analysis for identification of genes related to
gonad differentiation, growth, immune response and marker
discovery in the turbot (Scophthalmus maximus). PLoS ONE
11:e0149414
Mittapalli O, Bai X, Mamidala P, Rajarapu SP, Bonello P, Herms DA
(2010) Tissue-specific transcriptomics of the exotic invasive
insect pest emerald ash borer (Agrilus planipennis). PLoS ONE
5:e13708
Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users
and for biologist programmers. Bioinform Methods Protoc
132:365–386
Sekino M, Saitoh K, Aritaki M (2007) Microsatellite markers for a
rare species of right-eye flounder Verasper variegatus (Pleu-
ronectiformes, Pleuronectidae). Conserv Genet 9:761–765
Sekino M, Saitoh K, Shimizu D, Wada T, Kamiyama K, Gambe S,
Chen S, Aritaki M (2010) Genetic structure in species with
shallow evolutionary lineages: a case study of the rare flatfish
Verasper variegatus. Conserv Genet 12:139–159
Shimizu D, Fujinami Y, Sawaguchi S, Matsubara T (2012) Egg
collection from hatchery-reared broodstock of spotted halibut
Verasper variegatus treated with LHRH analog. Fisher Sci
78:1245–1252
Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG
database: a tool for genome-scale analysis of protein functions
and evolution. Nucleic Acids Res 28:33–36
Thiel T, Michalek W, Varshney R, Graner A (2003) Exploiting EST
databases for the development and characterization of gene-
derived SSR-markers in barley (Hordeum vulgare L.). Theor
Appl Genet 106:411–422
Toth G, Gaspari Z, Jurka J (2000) Microsatellites in different
eukaryotic genomes: survey and analysis. Genome Res
10:967–981
Tran NT, Gao ZX, Zhao HH, Yi SK, Chen BX, Zhao YH, Lin L, Liu
XQ, Wang WM (2015) Transcriptome analysis and microsatel-
lite discovery in the blunt snout bream (Megalobrama ambly-
cephala) after challenge with Aeromonas hydrophila. Fish
Shellfish Immunol 45:72–82
Wada T, Kamiyama K, Shimamura S, Matsumoto I, Mizuno T,
Nemoto Y (2011) Habitat utilization, feeding, and growth of
wild spotted halibut Verasper variegatus in a shallow brackish
lagoon: Matsukawa-ura, northeastern Japan. Fish Sci
77:785–793
Wang JP, Lindsay BG, Leebens-Mack J, Cui L, Wall K, Miller WC,
de Pamphilis CW (2004) EST clustering error evaluation and
correction. Bioinformatics 20:2973–2984
Wei L, Zhang H, Zheng Y, Wangzhen G, Zhang T (2008) Developing
EST-derived microsatellites in sesame (Sesamum indicum L.).
Acta Agron Sin 34:2077–2084
Wu C, Zhang D, Kan M et al (2014) The draft genome of the large
yellow croaker reveals well-developed innate immunity. Nat
Commun 5:5227
Xie Z, Xiao L, Wang D, Fang C, Liu Q, Li Z, Liu X, Yong Z,
Shuisheng L, Haoran L (2014) Transcriptome analysis of the
Trachinotus ovatus: identification of reproduction, growth and
immune-related genes and microsatellite markers. PLoS ONE
9:e109419
Xu Y, Liu X, Wang Q, Zhao M, Qu J (2011) Annual gonadal
maturation cycle of captive spotted halibut, Verasper variegatus:
correlation with serum sex steroids and photothermal regulation.
J Fish Sci China 18:836–846 (in Chinese)
Ye J, Fang L, Zheng H et al (2006) WEGO: a web tool for plotting
GO annotations. Nucleic Acids Res 34:W293–W297
Yi TL, Guo WJ, Liang XF, Yang M, Lv LY, Tian CX, Song Y, Zhao
C, Sun J (2015) Microsatellite analysis of genetic diversity and
genetic structure in five consecutive breeding generations of
mandarin fish Siniperca chuatsi (Basilewsky). Genet Mol Res
14:2600–2607
Zheng X, Kuang Y, Lu W, Cao D, Sun X (2014) Transcriptome-
derived EST–SSR markers and their correlations with growth
traits in crucian carp Carassius auratus. Fish Sci 80:977–984
Zhong H, Li J, Zhou Y, Li H, Tang Y, Yu J, Yu F (2016) A
transcriptome resource for common carp after growth hormone
stimulation. Mar Genom 25:25–27
Zhu W, Wang L, Dong Z, Chen X, Song F, Liu N, Yang H, Fu J
(2016) Comparative transcriptome analysisi dentifies candidate
genes related to skin color differentiation in red tilapia. Sci Rep
6:31347
Genes Genom
123