characterization of the global transcriptome and ...the genetics society of korea and...

RESEARCH ARTICLE

Characterization of the global transcriptome and microsatellitemarker information for spotted halibut Verasper variegatus

Jianlong Ge1 • Siqing Chen1 • Changlin Liu1 • Li Bian1 • Huiling Sun1 •

Jie Tan1

Received: 7 September 2016 / Accepted: 12 November 2016

� The Genetics Society of Korea and Springer-Science and Media 2016

Abstract The spotted halibut Verasper variegatus is an

economically important flatfish species distributed in

Japan, Korea and China. However, the genomic resources

regarding this species were scarcity, which hindered our

understanding of the genetics and biological mechanisms

in spotted halibut. In this study, we examined the global

transcriptome from six major tissues of spotted halibut.

Approximately 40 million of high quality reads were

generated using Illumina paired-end sequencing technol-

ogy. More than 9 Gbp data were generated, and de novo

assembled into 59,235 unigenes, with an N50 of 938 bp.

Based on sequence similarity search with known protein

database, 34,084 (57.5%) showed significant similarity to

known proteins in Nr database, and 28,875 (48.7%) had

BLAST hits in Swiss-Prot database. 19,562 and 23,037

unigenes were assigned into gene ontology categories and

clusters of orthologous group, respectively. 9138 unigenes

were mapped to 211 KEGG pathways. For functional

marker development, 13,322 candidate simple sequence

repeats were identified in the transcriptome and 7235 pri-

mer pairs were successfully designed. Among 72 primer

pairs selected for validation, 67 (93.1%) were successful in

PCR amplification and 14 (19.4%) exhibited obvious

repeat length polymorphisms in a culture spotted halibut

population. The transcriptomic data and microsatellite

markers will provide valuable resources for future func-

tional gene analyses, genetic map construction, and quan-

titative trait loci mapping in V. variegatus.

Keywords Verasper variegatus � Illumina sequencing �Transcriptome analysis � SSR markers

Introduction

Spotted halibut Verasper variegatus belongs to the family

Pleuronectidae, distributed around northeastern Asian

coastal waters (Sekino et al. 2010). This flatfish has been

recognized as a promising marine fish species for resource

enhancement and aquaculture in Japan, Korea and China,

because of its high market price, high growth performance

and limited availability of natural resources (Wada et al.

2011). In recent years, the V. variegatus industry has

developed quickly with the breakthrough of its large-scale

artificial breeding technology in China. However, because

of the loss of genetic diversity and intensive culture, the V.

variegatus industry are facing challenges from germplasm

degeneration and poor diseases resistance. Therefore, it

will be necessary to apply molecular genetics tools to

protect germplasm resources and promote genetic selection

for improvement of growth rate and disease resistance.

Transcriptome sequences and functional markers are

highly valuable resources for us to understand molecular

genetic mechanisms and to perform molecular aided

selection of spotted halibut. With the advent and rapid

development of the next generation sequencing (NGS)

technologies, it provided a reliable and cost-effective

approach for the transcriptome sequencing of non-model

species and also aquaculture organisms (Long et al. 2013).

Electronic supplementary material The online version of thisarticle (doi:10.1007/s13258-016-0496-1) contains supplementarymaterial, which is available to authorized users.

& Siqing Chen

[email protected]

1 Key Laboratory for Sustainable Utilization of Marine

Fisheries Resources, Ministry of Agriculture, Yellow Sea

Fisheries Research Institute, Chinese Academy of Fishery

Sciences, Qingdao 266071, Shandong, China

123

Genes Genom Online ISSN 2092-9293

DOI 10.1007/s13258-016-0496-1 Print ISSN 1976-9571

http://dx.doi.org/10.1007/s13258-016-0496-1

http://crossmark.crossref.org/dialog/?doi=10.1007/s13258-016-0496-1&domain=pdf

http://crossmark.crossref.org/dialog/?doi=10.1007/s13258-016-0496-1&domain=pdf

Nowadays, RNA-seq has been employed in various aqua-

culture studies, such as transcriptome survey (Li et al.

2015b; Lim et al. 2015; Zhong et al. 2016), molecular

marker development (Tran et al. 2015; Zheng et al. 2014)

and differential gene expression analysis (Lv et al. 2014;

Zhu et al. 2016). The transcriptome sequences can provide

a great database for simple sequence repeats (SSR)

development, which plays an important role in genetic

researches such as linkage map construction, population

genetic studies, molecular marker assisted breeding and so

on (Kucuktas et al. 2009; Yi et al. 2015; Zheng et al. 2014).

Compared to genomic SSRs, transcriptome derived SSRs

(EST-SSRs) are more conserved (Ellis and Burke 2007)

and can help to map candidate functional genes and

increase the efficiency of marker assisted selection (Ku-

cuktas et al. 2009).

As one of the most economic importantly marine fish

species, researches have mainly focused on dietary nutri-

tion (Lv et al. 2015a, b), reproduction biology (Chen et al.

2006; Xu et al. 2011) and breeding technology (Shimizu

et al. 2012), while the molecular genetic study for V.

variegatus is seriously lacking. In total, 533 expressed

sequence tag (EST) sequences in spotted halibut have been

deposited in the NCBI GenBank databse (http://www.ncbi.

nlm.nih.gov/; as at Aug. 2016). A few functional genes

were described by homology cloning (Li et al. 2011, 2012)

and dozens of genomic SSR marker (Ma and Chen 2009;

Ma et al. 2009; Sekino et al. 2007) were developed up to

now.

In the present study, we sampled the pooled transcrip-

tomes of brain, liver, spleen, kindey, muscle and skin of V.

variegatus using the Illumina paired-end sequencing tech-

nology to generate a large-scale EST database and develop

tens of thousands of SSRs. To the best of our knowledge,

this is the first comprehensive transcriptome analysis for

this species. The results of this study will provide a valu-

able resource for further genetic and molecular studies of

V. variegatus.

Materials and methods

Sample collection

Six 8-month-old juvenile V. variegatus with average body

weight of 58.5 ± 2.8 g were obtained from a culture

population of Rushan, Shandong Province, China. All the

fish were anesthetized and sacrificed by decapitation. Tis-

sue samples of brain, liver, spleen, kindey, skin and

skeletal muscle were collected immediately and stored in

RNA-protector (TaKaRa). Meanwhile, thirty 8-month-old

individuals from the same culture population were

randomly collected for transcriptome derived SSR marker

validation. Fin tissue was cut and stored in 70% alcohol at

-30 �C until DNA extraction.

RNA extraction, cDNA library construction

and sequencing

Total RNA was extracted from six tissues of each indi-

viduals using Trizol Reagent (Invitrogen, USA) according

to the manufacturer’s instructions. RNA concentration and

integrity were measured using an UV/visible spectropho-

tometer and gel electrophoresis. Equal amounts of high-

quality RNA from each sample were pooled for RNA-seq

library construction.

A cDNA library was constructed following the manu-

facturer’s instructions (Illumina, San Diego, USA). Briefly,

mRNA was enriched from total RNA using Oligo-(dT)

beads. Then the enriched mRNA was fragmented into short

fragments using fragmentation buffer and reverse tran-

scripted into cDNA with random primers. The second-

strand cDNA synthesis using buffer, dNTPs, Rnase H and

DNA polymerase I. The cDNA fragments were then puri-

fied with QiaQuick PCR extraction kit (Qiagen, Germany),

end repaired, poly(A) added, and ligated to Illumina

sequencing adapters. The ligation products were size

selected (200 ± 25 bp) by agarose gel electrophoresis,

PCR amplified, and sequenced on an Illumina HiSeqTM

2000 platform in 125 bp pair-ended mode.

Quality control and de novo transcriptome assembly

The raw reads were filtered to obtain high-quality clean

reads prior to assembly. This was performed by removing

low-quality reads that with more than 10% of bases with

Q-value\20, ambiguous reads containing more than 10%

unknown bases, and reads containing adaptor sequences.

The clean reads were assembled using the short read

assembling program Trinity (Grabherr et al. 2011). Trinity

is a modular method and software package which combines

three components: inchworm, chrysalis and butterfly.

Firstly, inchworm assembles reads by a greedy k-mer based

approach, resulting in a collection of linear contigs. Next,

chrysalis clusters related contigs that correspond to por-

tions of alternatively spliced transcripts or otherwise

unique portions of paralogous genes, and then builds a de

Bruijn graphs for each cluster of related contigs. Finally,

butterfly analyzes the paths taken by reads and read pair-

ings in the context of the corresponding de Bruijn graph,

and outputs one linear sequence for each alternatively

spliced isoform and transcripts derived from paralogous

genes. All the assembled sequences were defined as

unigenes.

Genes Genom

123

http://www.ncbi.nlm.nih.gov/


Functional annotation of unigenes

All the unigenes were searched against the NCBI non-re-

dundant (Nr) protein database (http://www.ncbi.nlm.nih.

gov/) and the Swiss-Prot protein database (http://www.

expasy.ch/sprot) using BLASTx with E-value less than

1.0 9 10-5. Protein sequences from the databases which

had the highest similarity scores were used as the func-

tional annotation for the related unigene. Based on NCBI

Nr annotation, BLAST2GO program (http://www.BLAS

T2go.org/) was used to get GO annotation of unigenes

(Conesa et al. 2005) and GO functional classification for all

unigenes was then performed using WEGO software

(http://wego.genomics.org.cn/cgi-bin/wego/index.pl) (Ye

et al. 2006). Function annotation of unigenes were carried

out by BlastX searching against the clusters of orthologous

groups (COG) database (http://www.ncbi.nlm.nih.gov/

COG/) (Tatusov et al. 2000). Meanwhile, the unigenes

were also aligned to the kyoto encyclopedia of genes and

genomes (KEGG) pathway database (http://www.genome.

jp/kegg) to annotate the possible metabolic pathways

(Kanehisa et al. 2008).

Identification and validation of transcriptome

derived SSRs

To explore the distribution of potential SSR core motifs

in the assembled transcriptome, all the unigenes were

scanned for the presence of SSRs with the simple

sequence repeat identification software MIcroSAtellite

identification tool Version 1.0 (MISA, http://pgrc.ipk-

gatersleben.de/misa/) (Thiel et al. 2003). The minima of

contiguous repeat units were set as dimer-6, trimer-5,

tetramer-4, pentamer-4, and hexamer-4. Primer pairs for

microsatellite loci were designed based on the unique

flanking regions of each microsatellite locus using primer

3 (Rozen and Skaletsky 2000). Polymorphism was tested

in the 30 randomly collected individuals. Genomic DNA

was extracted from the fin sample using the extraction kit

for marine animals (Tiangen Biotech, Beijing, China).

PCR was performed on a Thermal Cycler (Takara) in a

total volume of 10 lL containing 0.4 lM of each primer,

109 PCR buffer, and 50 ng DNA. Cycling conditions

consisted of initial denaturation at 94 �C for 5 min; 35

cycles of 30 s at 94 �C, 30 s at the annealing temperature,

30 s at 72 �C; a final extension of 5 min at 72 �C and

stored at 4 �C. The amplified products were separated on

8% polyacrylamide gels and visualized by silver staining.

The expected and observed heterozygosity were calcu-

lated using Popgene32 software. Polymorphism informa-

tion content (PIC) was calculated using the PIC-CALC

0.6 software.

Results

Transcriptome sequencing and de novo assembly

In order to obtain a broad transcriptome data, a pooled

cDNA library of multiple tissues including brain, liver,

spleen, kindey, skin and skeletal muscle was constructed

and sequenced using the Illumina HiSeqTM 2000 platform.

A total of 41,099,174 raw reads with a length of 125 bp

were produced. The raw read files have been deposited in

the NCBI Sequences Read Archive (accession number:

SRA455839). After removing adaptor sequences and dis-

carding low-quality reads, 39,760,916 clean reads were

received (Table 1). The remaining high-quality reads were

finally assembled into 59,235 unigenes with average length

of 938 bp and N50 of 1735 bp. The length distribution of

the unigenes showed that 30,548 (51.57%) unigenes were

200–499 bp long, 12,347 (20.84%) were 500–999 bp long,

8934 (15.08%) were 1000–1999 bp long, 3975 (6.71%)

were 2000–2999 bp long, and 3431 (5.79%) were longer

than 3000 bp (Fig. 1). The final assembled sequences and

detailed gene annotations were presented in file S1 and file

S2, respectively.

Functional annotation

Functional annotation of the non-redundant unigenes was

carried out by searching the transcripts against public Nr

and Swiss-Prot databases. As a result, 34,084 (57.5%) and

28,875 (48.7%) unigenes showed significant similarities (E

value\1.0 9 10-5) to the know sequence databases,

respectively (Table 1). Based on the BLASTx similarity

analysis, the unigenes matched sequences from a range of

fish species (Fig. 2). Among which, the highest number of

hits were to Larimichthys crocea (10,352, 30.31%), fol-

lowed by Stegastes partitus (8310, 24.33%), Notothenia

coriiceps (2206, 6.46%) and Oreochromis niloticus (2015,

5.90%). In total, about 6.6% of the annotated unigenes

shared similar sequences with the flatfish species, among

which, Cynoglossus semilaevis was the largest (1820,

5.32%) and 17 unigenes were annotated to V. variegatus.

GO, COG and KEGG classification

The potential functions of the unigenes were determined

using gene ontology (GO) databases and 19,562 unigenes

were categorized by GO analysis. Second-level GO terms

were used to classify the involvement terms of unigenes in

three main categories (cellular component, molecular

function and biological process) and each unigene was

assigned to one or more GO term. In this study, 7833

unigenes are involved in cellular component category,

Genes Genom

123



http://www.expasy.ch/sprot

http://www.expasy.ch/sprot

http://www.BLAST2go.org/

http://www.BLAST2go.org/

http://wego.genomics.org.cn/cgi-bin/wego/index.pl

http://www.ncbi.nlm.nih.gov/COG/

http://www.ncbi.nlm.nih.gov/COG/

http://www.genome.jp/kegg

http://www.genome.jp/kegg

http://pgrc.ipk-gatersleben.de/misa/

http://pgrc.ipk-gatersleben.de/misa/

among which, ‘cell’ (6562, 20.24%) and ‘cell part’ (6562,

20.24%) were most abundant, followed by ‘membrane’

(4908, 15.14%) and ‘organelle’ (4832, 14.90%) (Fig. 3).

Further, 11,528 unigenes are involved in molecular

function category, and they showed a significant proportion

of clusters assigned to ‘binding’ (10,383 unigenes) and

‘catalytic activity’ (6909 unigenes). Additionally, 11,343

unigenes are involved in various biological process

Fig. 1 Length distribution of

all unigenes of V. variegatus

Fig. 2 Top 20 hit species

distribution based on BLASTp

Table 1 Summary of Illumina

transcriptome, assembly and

annotation for V. variegatus

Raw results (after trimming) Assembly results Annotation results

Clean bases (G) 9.26 Unigenes 59,235 Nr annotations 34,084

Read pairs 39,760,916 Average length (bp) 938.71 Swissprot 28,875

Read length (bp) 125 Min–max length (bp) 201–16,457 COG 23,037

N50 (bp) 1735 KEGG 9138

Genes Genom

123

categories, the dominant subcategories were ‘cellular pro-

cess’ (9398 unigenes), ‘single-organism process’ (8422

unigenes), ‘metabolic process’ (6949 unigenes) and ‘bio-

logical regulation’ (5941 unigenes).

For functional prediction and classifications, all unige-

nes were aligned to the COG database. Together, 23,037

unigenes were grouped into 25 COG classifications

(Fig. 4). More than half of the unigenes are distributed in

the three main groups, (R) General function prediction only

(21.81%), (T) signal transduction mechanisms (18.14%),

and (O) posttranslational modification, protein turnover,

chaperones (10.52%). In contrast, there are less than 10%

of the unigenes unevenly distributed in each of the

remaining subcategories, such as (Y) nuclear structure and

(N) cell motility, accounting for 0.45 and 0.31%,

respectively.

The pathway analysis is able to help us better understand

the biological functions of genes. In this study, 9138 uni-

genes were assigned into five major categories: metabo-

lism, cellular processes, genetic information processing,

environmental information processing and organismal

systems. The detailed subcategories and distribution in

each major category are shown in Fig. 5. In total, 211

pathways were obtained and the number of unigenes in

these pathways ranged from 1 to 610. The largest enrich-

ment pathway was ‘‘Endocytosis’’, which contained 610

unigenes (6.68% of total unigenes annotated). Other major

pathways were ‘‘Focal adhesion’’ (533, 5.83%), ‘‘MAPK

signaling pathway’’ (531, 5.81%), ‘‘Calcium signaling

pathway’’ (531, 5.81%) and ‘‘Neuroactive ligand–receptor

interaction’’ (467, 5.11%).

Frequency and distribution of SSRs

To further assess the assembly quality and to develop

molecular markers, all unigenes were used to mine

potential SSRs. A total of 13,322 potential SSR loci were

identified in 9658 unigenes and 2487 unigenes contained

more than one SSR. The SSR frequency was 22.5% and the

average distribution distance was 4.1 kb. Detailed analysis

showed that di-nucleotide was the most common repeat

unit (57.09%), followed by tri- (35.3%), tetra- (5.5%),

hexa- (1.2%) and penta-nucleotide (0.9%) repeats

(Table 2). The number of repeat units of the di-nucleotide

motifs was distributed mainly from 6 to 10, and the tri-,

tetra-, penta- and hexa-nucleotide mainly contained 5–7,

4–5, 4 and 4 repeat units, respectively (Table 2). The

dominant repeat motif in SSRs was AC/GT (40.3%), fol-

lowed by AG/CT (11.8%), AGG/CCT (10.6%), AGC/CTG

(9.8%), AT/AT (4.9%) and AAG/CTT (4.5%) (Fig. 6).

SSR marker validation

A total of 7235 primer pairs were successfully designed

from 6085 SSR-containing sequences (Table S3) and 72

primer pairs were randomly selected for validation. Among

the 72 primer pairs, 67 (93.1%) were successful in PCR

amplification with the genomic DNA of spotted halibut.

Among the 67 successful primer pairs, 59 (88.1%) PCR

products were as sizes as they expected, while the other 8

(11.9%) PCR products were either shorter or longer than

that expected. After the products were separated on poly-

acrylamide gels, 14 (19.4%) of the microsatellite loci

Fig. 3 Gene ontology (GO) analysis and functional classification of the V. variegatus transcriptome

Genes Genom

123

showed polymorphisms. The number of effective alleles in

the SSR loci varied from 2 to 4 and the PIC values ranged

from 0.141 to 0.692 (Table 3).

Discussion

In the present study, we conducted a comprehensive study

on the de novo assembly and characterization of the tran-

scriptome of spotted halibut and developed a large number

of SSR markers based on the transcriptome information

obtained. To the best of our knowledge, this is the first

exploration of the transcriptome of this species through the

analysis of large-scale transcript sequences. In total, 59,235

unigenes were assembled with an average length of

938 bp,which was comparable with recent Illumina

sequencing reports in marine medaka (984 bp) (Kim et al.

2015) and blunt snout bream (998 bp) (Tran et al. 2015).

The length distribution pattern that approximately 30% of

the unigenes exceeded 1000 bp was similar to Trachinotus

ovatus (Xie et al. 2014) and common carp Cyprinus carpio

(Li et al. 2015b). These assembly results suggested that the

sequence data from the spotted halibut was effectively

assembled.

To predict the functions of the transcriptome sequences,

all the unigenes were annotated by searches against public

databases. As a result, 34,084 unigenes, which took up an

approximate proportion of 57.5%, were assigned at the Nr

database. In the present study, the most homology hits in the

Nr search were to L. crocea (8310, 24.33%) and among

flatfisheswere toC. semilaevis. The draft genomes of the two

species were published already in 2014 (Chen et al. 2014;

Wu et al. 2014). As expected unigenes of spotted halibut

transcriptome matched well to proteins of other fish species,

especially species with reported genome. Still, a consider-

able proportion of unigenes failed to find hits in any of the

databases. Previous studies on transcriptome analyses indi-

cated that unannotated sequences mainly represent tran-

scripts of spanning only untranslated mRNA regions,

chimeric sequences derived from assembly errors (Wang

et al. 2004) and containing non-conserved protein regions

(Mittapalli et al. 2010). Some may also be components of

novel genes specific to this species, which are likely to be

matched to certain genome sequences in the near future.

Fig. 4 Clusters of orthologous group (COG) functional classification of the V. variegatus transcriptome

Genes Genom

123

In this study, a large number of unigenes were assigned

to a wide range of gene ontology categories and COG

classification, which indicated that our transcriptome data

represented a broad diversity of transcripts in spotted hal-

ibut. Further, the distribution and composition of the

assigned GO terms were very similar to those reported in

other fish species, such as T. ovatus (Xie et al. 2014),

Scophthalmus maximus (Ma et al. 2016), and Paramis-

gurnus dabryanus (Li et al. 2015a), indicating the func-

tional distribution of conserved genes. In addition, a large

percentage of the unigenes were mapped into KEGG

pathways associated with amino acid metabolism, lipid

metabolism, immune system and endocrine system. These

annotations provide a valuable resource for investigating

specific progress, functions and pathways in future V.

variegatus research.

Although SSR markers play an important role in the

genetic researches, very limited marker information was

available for spotted halibut. In the present study, 13,322

potential SSR loci were identified from 56,235 unigenes,

Fig. 5 Identified KEGG pathways of assembled unigenes

Table 2 Distribution of

identified SSRs in V. variegatus

transcriptome

Type Repeat number Total %

4 5 6 7 8 9 10 11–15 [15

Di- – – 2236 1311 984 806 696 1110 463 7606 57.1

Tri- – 2215 1150 652 167 160 215 118 24 4701 35.3

Tetra- 464 138 42 49 10 5 5 18 1 732 5.5

Penta- 86 9 14 2 2 1 1 3 0 118 0.9

Hexa- 102 42 6 6 3 6 0 0 0 165 1.2

Total 652 2404 3448 2020 1166 978 917 1249 488 13,322 100

% 4.89 18.05 25.88 15.16 8.75 7.34 6.88 9.38 3.66 100

Genes Genom

123

Fig. 6 Frequency of classified

repeat types of SSRs in V.

variegatus transcriptome

Table 3 Characteristics of 14

polymorphic microsatellite lociLocus Core motif Primer (50–30) Tm (�C) Na PIC Ho He

ShE01 (CTC)6 GCTGGATTCATCTCTCAGCC

CCTCTGCTTCTTCTGCTGCT

61 2 0.282 0.033 0.345

ShE05 (TG)10 GAACCCGCTTCAACTACGAC

CTTGGAAACCAAAGAGCGAG

60 3 0.460 0.533 0.524

ShE21 (TG)10 TCTGACTGGATGGTGTTGGA

GACTTTAGGCCGAGAAGGCT

60 4 0.632 0.567 0.693

ShE25 (GAG)5 GAACTCACGACTACGACGCA

CGGAACACACAAAGAGCAGA

60 2 0.141 0.167 0.155

ShE27 (TGC)5…(TGT)5 GTTGGGCCATCTGAGACAAG

CTCTACCGTCGGCAGCTCT

60 2 0.222 0.233 0.259

ShE33 (TG)8 AACCCAGCAGTTGTCATTGAT

CTCCTCGATGCTTTTCATCC

60 4 0.692 0.600 0.753

ShE38 (TG)12 CAGAAGTGGTCTCGCGTGTA

TTTTCATGCAACAAAGGCAA

60 2 0.370 0.857 0.499

ShE50 (CAG)6…(CAG)8 AACCAGGACCTCAGTCATCG

ATCATTCGTGGGTGGTTCAT

62 3 0.523 0.733 0.610

ShE54 (GGTTCA)4 TTTCATGATGGTCTGCTGGA

CTTGAATCCGAAGAGAACCG

62 2 0.185 0.233 0.210

ShE58 (GCA)14 GATGATGAACGCTGCTCAAA

CACCTCCTTGCATAGCCATT

62 4 0.662 0.500 0.725

ShE62 (AAC)6 GGTAAAACTCCTTCGCTCCC

CAGGTGTTGTTTGTGGATGC

62 2 0.372 0.296 0.503

ShE68 (CAG)7…(CAG)7 GTTGCAACAGCAGCAACAGT

TGTTGAGATTTCCCATCGGT

62 3 0.505 0.600 0.599

ShE71 (AC)11 ACCATGAAAGTGTCTGCGTG

CACAGGCAATAAAGCGATGA

62 3 0.461 0.367 0.569

ShE72 (GAT)6 GATCTTGCAGTCCTCCTTGC

TTGTCAAGCTCATCGTCGTC

62 4 0.329 0.400 0.353

Tm, specific annealing temperature; Na, number of alleles; Ho, observed heterozygosity; He, expected

heterozygosity

PIC polymorphism information content

Genes Genom

123

which suggested that every EST sequence possesses an

average number of 0.24 SSR, similar to P. dabryanus 0.21

(Li et al. 2015a) and Haliotis midae 0.34 (Franchini et al.

2011). The distribution density was one SSR per 4.1 kb,

higher than reported for other fish species including Me-

galobrama amblycephala 9.53 kb (Gao et al. 2012) and P.

dabryanus 6.99 kb (Li et al. 2015a), while lower than C.

carpio 3.9 kb (Li et al. 2015b) and H. midae 0.756 kb

(Franchini et al. 2011). The distribution density of SSR is

influenced by several factors, including genome structure

or composition (Toth et al. 2000), SSR detection criteria,

dataset size, database-mining tools and the parameters for

exploration of microsatellites (Wei et al. 2008). Among

five repeat types, di-nucleotide repeats were the most

common type (57.09%) and the AC/GT motif accounted

for the majority of SSRs (40.3%), which was consistent

with the results in several fish species (Ma et al. 2016; Tran

et al. 2015; Xie et al. 2014) and Toth’s survey in vertebrate

animal (Toth et al. 2000).

The transcriptome sequencing provided numerous EST

sequences for developing EST-SSR markers. In the present

study, 7235 SSR primer pairs were successfully designed

from the transcriptome. The validation of 72 primer pairs

showed that 93.1% of the microsatellite loci were suc-

cessful in PCR amplification and 19.4% showed poly-

morphisms. These results further suggested that the

assembled transcripts were of high quality and SSRs

identified in our dataset are expected to be useful in future

genetic studies. The rate of polymorphic microsatellites

isolated in this study was lower than that in M. ambly-

cephala (44.4%) (Gao et al. 2012), P. dabryanus (79.9%)

(Li et al. 2015a) and Paphia textile (53.75%) (Chen et al.

2016), perhaps because the tested individuals came from

the same fish farm. More polymorphic microsatellites may

be developed if more geographically distant populations

were examined.

Conclusion

Here we report the first transcriptome study in spotted

halibut V. variegatus, an economical important flatfish in

Northeast Asia. The large amount of generated sequences

(59,235 putative unigenes) will enrich genomic resources

in spotted halibut and therefore to improve available

sequence database for gene discovery. In addition, many

SSR loci in the transcriptome were discovered and the

candidate markers identified in this study will be useful for

the construction of linkage maps, population genetic

studies and so on.

Acknowledgements We thank Guangzhou Gene denovo Biotech-

nology Co., Ltd for help with the Illumina sequencing of the cDNA

library and bio-informatic analysis. This study was grant from Special

Scientific Research Funds for Central Non-profit Institutes, Yellow

Sea Fisheries Research Institute, Chinese Academy of Fishery Sci-

ences (20603022016005).

Compliance with ethical standards

Conflict of interest Jianlong Ge, Siqing Chen, Changlin Liu, Li

Bian, Huiling Sun and Jie Tan declares that there is no conflict of

interest.

Human and animal rights The animals used in the present study

were artificially cultivated, and all experimental treatments are

implemented according to the recommendations in the Guide for the

Care and Use of Laboratory Animals of the National Institutes of

Health. The study protocol was approved by the Experimental Animal

Ethics Committee, Yellow Sea Fisheries Research Institute, Chinese

Academy of Fishery Sciences, China.

References

Chen S, Gao T, Wang C, Zhang Y, Zhang X, Chen Y (2006) Study on

developmental characters in early stage of spotted halibut

Verasper variegatus. Period Ocean Univ China 36:281–286 (in

Chinese)

Chen S, Zhang G, Shao C et al (2014) Whole-genome sequence of a

flatfish provides insights into ZW sex chromosome evolution and

adaptation to a benthic lifestyle. Nat Genet 46:253–260

Chen X, Li J, Xiao S, Liu X (2016) De novo assembly and

characterization of foot transcriptome and microsatellite marker

development for Paphia textile. Gene 576:537–543

Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M

(2005) Blast2GO: a universal tool for annotation, visualization

and analysis in functional genomics research. Bioinformatics

21:3674–3676

Ellis JR, Burke JM (2007) EST-SSRs as a resource for population

genetic analyses. Heredity 99:125–132

Franchini P, Van der Merwe M, Roodt-Wilding R (2011) Transcrip-

tome characterization of the South African abalone Haliotis

midae using sequencing-by-synthesis. BMC Res Notes 4:1–11

Gao Z, Luo W, Liu H, Zeng C, Liu X, Yi S, Wang W (2012)

Transcriptome analysis and SSR/SNP markers information of the

blunt snout bream (Megalobrama amblycephala). PLoS ONE

7:e42637

Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length

transcriptome assembly from RNA-Seq data without a reference

genome. Nat Biotechnol 29:644–652

Kanehisa M, Araki M, Goto S et al (2008) KEGG for linking genomes

to life and the environment. Nucleic Acids Res 36:D480–D484

Kim BM, Choi BS, Kim HS, Rhee JS, Au DW, Wu RS, Choi IY, Lee

JS (2015) Transcriptome profiling of larvae of the marine

medaka Oryzias melastigma by Illumina RNA-seq. Mar Geno-

mics 24:255–258

Kucuktas H, Wang S, Li P et al (2009) Construction of genetic

linkage maps and comparative genome analysis of catfish using

gene-associated markers. Genetics 181:1649–1660

Li H, Jiang L, Han J, Su H, Yang Q, He C (2011) Major

histocompatibility complex class IIA and IIB genes of the

spotted halibut Verasper variegatus: genomic structure, molec-

ular polymorphism, and expression analysis. Fish Physiol

Biochem 37:767–780

Li H, Fan J, Liu S, Yang Q, Mu G, He C (2012) Characterization of a

myostatin gene (MSTN1) from spotted halibut (Verasper

Genes Genom

123

variegatus) and association between its promoter polymorphism

and individual growth performance. Comp Biochem Physiol B

161:315–322

Li C, Ling Q, Ge C, Ye Z, Han X (2015a) Transcriptome

characterization and SSR discovery in large-scale loach

Paramisgurnus dabryanus (Cobitidae, Cypriniformes). Gene

557:201–208

Li G, Zhao Y, Liu Z, Gao C, Yan F, Liu B, Feng J (2015b) De novo

assembly and characterization of the spleen transcriptome of

common carp (Cyprinus carpio) using Illumina paired-end

sequencing. Fish Shellfish Immunol 44:420–429

Lim H-J, Lim J-S, Lee J-S, Choi B-S, Kim D-I, Kim H-W, Rhee J-S,

Choi I-Y (2015) Transcriptome profiling of the Pacific oyster

Crassostrea gigas by Illumina RNA-seq. Genes Genom

38:359–365

Long Y, Li Q, Zhou B, Song G, Li T, Cui Z (2013) De novo assembly

of mud loach (Misgurnus anguillicaudatus) skin transcriptome to

identify putative genes involved in immunity and epidermal

mucus secretion. PLoS ONE 8:e56998

Lv J, Liu P, Gao B, Wang Y, Wang Z, Chen P, Li J (2014)

Transcriptome analysis of the Portunus trituberculatus: de novo

assembly, growth-related gene identification and marker discov-

ery. PLoS ONE 9:e94055

Lv Y, Chang Q, Chen S, Yu C, Qin B, Wang Z (2015a) Effect of

dietary protein and lipid levels on growth and body composition

of spotted halibut, Verasper variegatus. J World Aquac Soc

46:311–318

Lv Y, Chen S, Yu C, Chang Q, Qin B, Wang Z (2015b) The effects of

ratio of dietary protein to lipid on the growth, digestive enzyme

activities and blood biochemical parameters in spotted halibut,

Verasper variegatus. Prog Fish Sci 36:118–124 (in Chinese)

Ma H, Chen S (2009) Isolation and characterization of 31 polymor-

phic microsatellite markers in barfin flounder (Verasper moseri)

and the cross-species amplification in spotted halibut (Verasper

variegatus). Conserv Genet 10:1591–1595

Ma HY, Bi JZ, Shao CW, Chen Y, Miao GD, Chen SL (2009)

Development of 40 microsatellite markers in spotted halibut

(Verasper variegatus) and the cross-species amplification in

barfin flounder (Verasper moseri). Anim Genet 40:576–578

Ma D, Ma A, Huang Z, Wang G, Wang T, Xia D, Ma B (2016)

Transcriptome analysis for identification of genes related to

gonad differentiation, growth, immune response and marker

discovery in the turbot (Scophthalmus maximus). PLoS ONE

11:e0149414

Mittapalli O, Bai X, Mamidala P, Rajarapu SP, Bonello P, Herms DA

(2010) Tissue-specific transcriptomics of the exotic invasive

insect pest emerald ash borer (Agrilus planipennis). PLoS ONE

5:e13708

Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users

and for biologist programmers. Bioinform Methods Protoc

132:365–386

Sekino M, Saitoh K, Aritaki M (2007) Microsatellite markers for a

rare species of right-eye flounder Verasper variegatus (Pleu-

ronectiformes, Pleuronectidae). Conserv Genet 9:761–765

Sekino M, Saitoh K, Shimizu D, Wada T, Kamiyama K, Gambe S,

Chen S, Aritaki M (2010) Genetic structure in species with

shallow evolutionary lineages: a case study of the rare flatfish

Verasper variegatus. Conserv Genet 12:139–159

Shimizu D, Fujinami Y, Sawaguchi S, Matsubara T (2012) Egg

collection from hatchery-reared broodstock of spotted halibut

Verasper variegatus treated with LHRH analog. Fisher Sci

78:1245–1252

Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG

database: a tool for genome-scale analysis of protein functions

and evolution. Nucleic Acids Res 28:33–36

Thiel T, Michalek W, Varshney R, Graner A (2003) Exploiting EST

databases for the development and characterization of gene-

derived SSR-markers in barley (Hordeum vulgare L.). Theor

Appl Genet 106:411–422

Toth G, Gaspari Z, Jurka J (2000) Microsatellites in different

eukaryotic genomes: survey and analysis. Genome Res

10:967–981

Tran NT, Gao ZX, Zhao HH, Yi SK, Chen BX, Zhao YH, Lin L, Liu

XQ, Wang WM (2015) Transcriptome analysis and microsatel-

lite discovery in the blunt snout bream (Megalobrama ambly-

cephala) after challenge with Aeromonas hydrophila. Fish

Shellfish Immunol 45:72–82

Wada T, Kamiyama K, Shimamura S, Matsumoto I, Mizuno T,

Nemoto Y (2011) Habitat utilization, feeding, and growth of

wild spotted halibut Verasper variegatus in a shallow brackish

lagoon: Matsukawa-ura, northeastern Japan. Fish Sci

77:785–793

Wang JP, Lindsay BG, Leebens-Mack J, Cui L, Wall K, Miller WC,

de Pamphilis CW (2004) EST clustering error evaluation and

correction. Bioinformatics 20:2973–2984

Wei L, Zhang H, Zheng Y, Wangzhen G, Zhang T (2008) Developing

EST-derived microsatellites in sesame (Sesamum indicum L.).

Acta Agron Sin 34:2077–2084

Wu C, Zhang D, Kan M et al (2014) The draft genome of the large

yellow croaker reveals well-developed innate immunity. Nat

Commun 5:5227

Xie Z, Xiao L, Wang D, Fang C, Liu Q, Li Z, Liu X, Yong Z,

Shuisheng L, Haoran L (2014) Transcriptome analysis of the

Trachinotus ovatus: identification of reproduction, growth and

immune-related genes and microsatellite markers. PLoS ONE

9:e109419

Xu Y, Liu X, Wang Q, Zhao M, Qu J (2011) Annual gonadal

maturation cycle of captive spotted halibut, Verasper variegatus:

correlation with serum sex steroids and photothermal regulation.

J Fish Sci China 18:836–846 (in Chinese)

Ye J, Fang L, Zheng H et al (2006) WEGO: a web tool for plotting

GO annotations. Nucleic Acids Res 34:W293–W297

Yi TL, Guo WJ, Liang XF, Yang M, Lv LY, Tian CX, Song Y, Zhao

C, Sun J (2015) Microsatellite analysis of genetic diversity and

genetic structure in five consecutive breeding generations of

mandarin fish Siniperca chuatsi (Basilewsky). Genet Mol Res

14:2600–2607

Zheng X, Kuang Y, Lu W, Cao D, Sun X (2014) Transcriptome-

derived EST–SSR markers and their correlations with growth

traits in crucian carp Carassius auratus. Fish Sci 80:977–984

Zhong H, Li J, Zhou Y, Li H, Tang Y, Yu J, Yu F (2016) A

transcriptome resource for common carp after growth hormone

stimulation. Mar Genom 25:25–27

Zhu W, Wang L, Dong Z, Chen X, Song F, Liu N, Yang H, Fu J

(2016) Comparative transcriptome analysisi dentifies candidate

genes related to skin color differentiation in red tilapia. Sci Rep

6:31347

Genes Genom

123

characterization of the global transcriptome and ...the genetics society of korea and...

Documents