de novo in the wing polymorphic salt marsh beetle pogonus ... · in the wing polymorphic salt marsh...

13
OPEN 3 ACCESS Freely available online PLos one De novo Transcriptome Assembly and SNP Discovery in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*, Dick Roelofs3, Jeroen Van Houdt4, Frederik Hendrickx1,2 1 Terrestrial Ecology Unit, Biology Department, Ghent University, Gent, Belgium, 2 Department Entomology, Royal Belgian Institute of Natural Sciences, Brussel, Belgium, 3 Department of Ecological Science, VU University Amsterdam, Amsterdam, The Netherlands, 4 Laboratory of Cytogenetics and Genome Research, Leuven, Belgium Abstract Background: The salt marsh beetle Pogonus chalceus represents a unique opportunity to understand and study the origin and evolution of dispersal polymorphisms as remarkable Inter-population divergence In dispersal related traits (e.g. wing development, body size and metabolism) has been shown to persist In face of strong homogenizing gene flow. Sequencing and assembling the transcriptome of P. chalceus Is a first step In developing large scale genetic Information that will allow us to further study the recurrent phenotypic evolution In dispersal traits In these natural populations. Methodology/Results: We used the Illumina HISeq2000 to sequence 37 Gbases of the transcriptome and performed de novo transcriptome assembly with the Trinity short read assembler. This resulted In 65,766 contlgs, clustering Into 39,393 unique transcripts (unlgenes). A subset of 12,987 show similarity (BLAST) to known proteins In the NCBI database and 7,589 are assigned Gene Ontology (GO). Using homology searches we Identified all reported genes Involved In wing development, juvenile- and ecdysterold hormone pathways In Tribolium castaneum. About half (56.7%) of the unique assembled genes are shared among three life stages (third-instar larva, pupa, and ¡mago). We Identified 38,141 single nucleotide polymorphisms (SNPs) In these unlgenes. Of these SNPs, 26,823 (70.3%) were found In a predicted open reading frame (ORF) and 6,998 (18.3%) were nonsynonymous. Conclusions: The assembled transcriptome and SNP data are essential genomic resources for further study of the developmental pathways, genetic mechanisms and metabolic consequences of adaptive divergence In dispersal power In natural populations. Citation: Van Belleghem SM, Roelofs D, Van Houdt J, Hendrickx F (2012) De novo Transcriptome Assembly and SNP Discovery in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae). PLoS ONE 7(8): e42605. doi:10.1371/journal.pone.0042605 Editor: Zhanjiang Liu, Auburn University, United States of America Received May 17, 2012; Accepted July 9, 2012; Published August 1, 2012 Copyright: © 2012 Van Belleghem et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: Funding was received from the FWO-Flanders (PhD grant to Steven Van Belleghem) and the Belgian Science Policy (MO/36/025) and partly conducted within the framework of the Interuniversity Attraction Poles programme IAP (SPEEDY) - Belgian Science Policy. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] Introduction A vast number of insect species are characterized by remarkable and often discontinuous morphological variation in traits related to dispersal capacity [1,2]. As variation in such traits determines the ability of populations and species to persist in both patchy and changing landscapes [3,4,5,6], research on the ultimate and proximate causes of dispersal is a central theme in both evolutionary ecology and conservation biology [7,8], Theoretical and empirical research on the ultimate cause of dispersal demonstrated that such dispersal polymorphisms are the result of disruptive selection in heterogeneous landscapes in response to habitat persistence [5,9,10] and fitness homogenization under spatiotemporal population fluctuations [1 l,12,13,14](Hendrickx et al. under rev.). Still, only little is known about the molecular basis of this profound phenotypic variation. For instance, it is unclear whether (i) divergence in dispersal traits is caused by a small set genes that exert large effects or by many genes with moderate to small effect, and in which order they are involved in adaptive differentiation [15,16,17], (ii) whether adaptations and the evolution of distinct dispersal phenotypes are mainly the result of mutations in coding regions of the genome or rather due to differences in gene expression (i.e. regulatory changes) [18,19,20,21], (iii) if the recurrent appearance of this trait is caused by independent mutations or rather by introgression of standing genetic variation [22,23] or the release of cryptic genetic variation by changes in epistatic interactions [24,25], and (iv) how disruptive selection in dispersal traits affects metabolic pathways resulting in genetically correlated changes in other life history traits [26], Such information is particularly crucial to link the proximate and ultimate mechanisms underlying the recurrent intra- and inter specific evolution of dispersal phenotypes. The endangered halobiontic ground beetle Pogonus chalceus (Marsham, 1802) is a most suitable system to study the molecular mechanisms behind adaptive divergence in dispersal traits. The species exhibits a clear wing polymorphism with both short-winged individuals (brachypterous), long-winged individuals (macropter- PLoS ONE I www.plosone.org 1 August 2012 | Volume 7 | Issue 8 | e42605

Upload: dinhnhi

Post on 16-Feb-2019

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

OPEN 3 ACCESS Freely available online PLos one

De novo Transcriptome Assembly and SNP Discovery in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae)Steven M. Van Belleghem1'2*, Dick Roelofs3, Jeroen Van Houdt4, Frederik Hendrickx1,21 Terrestrial Ecology Unit, Biology Department, Ghent University, Gent, Belgium, 2 Department Entomology, Royal Belgian Institute of Natural Sciences, Brussel, Belgium, 3 Department of Ecological Science, VU University Amsterdam, Amsterdam, The Netherlands, 4 Laboratory of Cytogenetics and Genome Research, Leuven, Belgium

Abstract

Background:The salt marsh beetle Pogonus chalceus represents a unique opportun ity to understand and study the origin and evolution o f dispersal polymorphisms as remarkable Inter-population divergence In dispersal related traits (e.g. wing development, body size and metabolism) has been shown to persist In face o f strong homogenizing gene flow. Sequencing and assembling the transcriptome o f P. chalceus Is a first step In developing large scale genetic Information that w ill allow us to further study the recurrent phenotypic evolution In dispersal traits In these natural populations.

Methodology/Results: We used the Illumina HISeq2000 to sequence 37 Gbases o f the transcriptome and performed de novo transcriptome assembly w ith the Trinity short read assembler. This resulted In 65,766 contlgs, clustering Into 39,393 unique transcripts (unlgenes). A subset o f 12,987 show similarity (BLAST) to known proteins In the NCBI database and 7,589 are assigned Gene Ontology (GO). Using hom ology searches we Identified all reported genes Involved In wing development, juvenile- and ecdysterold hormone pathways In Tribolium castaneum. About half (56.7%) o f the unique assembled genes are shared among three life stages (third-instar larva, pupa, and ¡mago). We Identified 38,141 single nucleotide polymorphisms (SNPs) In these unlgenes. Of these SNPs, 26,823 (70.3%) were found In a predicted open reading frame (ORF) and 6,998 (18.3%) were nonsynonymous.

Conclusions: The assembled transcriptome and SNP data are essential genomic resources for further study o f the developmental pathways, genetic mechanisms and metabolic consequences o f adaptive divergence In dispersal power In natural populations.

Citation: Van Belleghem SM, Roelofs D, Van Houdt J, Hendrickx F (2012) De novo Transcriptome Assembly and SNP Discovery in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae). PLoS ONE 7(8): e42605. doi:10.1371/journal.pone.0042605

Editor: Zhanjiang Liu, Auburn University, United States of America

Received May 17, 2012; Accepted July 9, 2012; Published August 1, 2012

Copyright: © 2012 Van Belleghem e t al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: Funding was received from the FWO-Flanders (PhD grant to Steven Van Belleghem) and the Belgian Science Policy (MO/36/025) and partly conducted within the framework of the Interuniversity Attraction Poles programm e IAP (SPEEDY) - Belgian Science Policy. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Com peting Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

Introduction

A vast num ber o f insect species are characterized by rem arkable and often discontinuous m orphological variation in traits related to dispersal capacity [1,2]. As variation in such traits determ ines the ability o f populations and species to persist in bo th patchy and changing landscapes [3,4,5,6], research on the ultim ate and proxim ate causes o f dispersal is a central them e in bo th evolutionary ecology an d conservation biology [7,8], T heoretical and em pirical research on the ultim ate cause o f dispersal dem onstra ted that such dispersal polym orphism s are the result o f disruptive selection in heterogeneous landscapes in response to hab ita t persistence [5,9,10] an d fitness hom ogenization under spatiotem poral population fluctuations [1 l,12,13,14](H endrickx et al. under rev.).

Still, only little is know n about the m olecular basis o f this profound phenotypic variation. For instance, it is unclear w hether (i) divergence in dispersal traits is caused by a small set genes that exert large effects o r by m any genes with m oderate to small effect,

and in w hich o rder they are involved in adaptive differentiation [15,16,17], (ii) w hether adaptations and the evolution o f distinct dispersal phenotypes are m ainly the result o f m utations in coding regions o f the genom e or ra ther due to differences in gene expression (i.e. regulatory changes) [18,19,20,21], (iii) if the recurren t appearance o f this trait is caused by independent m utations o r ra th e r by introgression of standing genetic variation [22,23] or the release o f cryptic genetic variation by changes in epistatic interactions [24,25], and (iv) how disruptive selection in dispersal traits affects m etabolic pathw ays resulting in genetically correlated changes in o ther life history traits [26], Such inform ation is particularly crucial to link the proxim ate and ultim ate m echanism s underlying the recurren t intra- and in ter­specific evolution o f dispersal phenotypes.

T h e endangered halobiontic g round beetle Pogonus chalceus (M arsham , 1802) is a m ost suitable system to study the m olecular m echanism s beh ind adaptive divergence in dispersal traits. T he species exhibits a clear wing polym orphism with bo th short-winged individuals (brachypterous), long-winged individuals (m acropter-

PLoS ONE I w w w .p lo so n e .o rg 1 A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605

Page 2: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

T ranscrip tom ics o f a W ing P o lym o rp h ic Beetle

ous), as well as interm ediate forms [27]. T hese differences in dispersal pow er have been shown to be related to differences in hab ita t stability and persistence, w ith long w inged individuals occurring prim arily in unstable and relatively recent salt m arsh areas. T h e determ ination o f wing size in this species is polygenic as crosses betw een brachy- and m acropterous populations result in the p roduction o f individuals with interm ediate wing sizes [28]. D ivergent selection on w ing size likely results in sim ultaneous selection in o ther life history traits, as suggested by a strong correlation am ong populations betw een average wing size and frequencies o f the m etabolic enzym e isoforms o f the isocitrate dehydrogenase 2 (IDH2) p rotein [6,29]. M oreover, w ithin a salt m arsh situated a t the A dantic coast in the G uérande region in France, individuals o f P. chalceus occur chiefly in two h ab ita t types interlaced a t a very small scale, i.e. ponds and canals [29]. Salt ex traction ponds are m ostly occupied by long w inged individuals with larger body size and the ID H 2-B allozyme. T h e borders o f tidal canals th a t lead sea w ater to these ponds are occupied by smaller short w inged individuals w ith the ID H 2-D allozyme. W hile signals o f strong divergent na tura l selection are observed betw een the ecotypes for the ID H 2 allozymes, dispersal pow er and body size, no differentiation could be detected for neu tra l m arkers, suggesting high levels o f gene flow am ong bo th ecotypes [6,29,30]. T hese findings and the incipient stage o f divergence m ake the salt m arsh beetle P. chalceus a ttractive for genetic studies o f selection, adaptation , and gene flow.

It has been shown th a t portions o f the wing developm ent gene netw ork are largely conserved am ong holom etabolous insect orders [31,32]. A num ber o f genes involved in the patterning, grow th and differentiation o f the w ing in Pfrosophila have been identified [33] and characterized in T. castaneum [34], F u rther­m ore, genes involved in the juvenile horm one (JH) and ecdysteroid (ECD) pathw ay have also been shown to be relevant for the study o f insect polym orphism s, including wing polym orphism s [35,36,37,38,39]. Flowever, little genom ic resources are available to study the genetic architecture o f dispersal polym orphism s in na tura l populations o f g round beetles, in w hich intraspecific dispersal polym orphism s can be found abundantly [40,41,42]. C onsidering ground beetles (Carabidae), N C B I reports 306 ESTs from a study com paring seven coleopteran species [43] an d a m itochondrial genom e of a Calosoma species [44]. O th er genomic resources com prise mostly single bar-coding gene sequences, such as cytochrom e oxidase an d ribosom al R N A , used for phylogenetic studies. T h e only coleopteran species for w hich the genom e has been sequenced is the red flour beetle Tribolium castaneum [34], belonging to the Polyphaga suborder. T h e evolutionary distance o f this suborder to the A dephaga suborder, com prising C arabidae species, is estim ated to be m ore th an 200 M a [45].

S hort read de now transcriptom e analysis has proven to be a valuable first step to study genetic characteristics and allowed researchers to obtain sequence inform ation an d expression levels o f genes involved in developm ental an d m etabolic pathways, insecticide resistance, candidate transcripts for diapauses p rep ara ­tion based on hom ology w ith related organism s and to discover single nucleotide polym orphism (SNP) in all kinds o f m odel and non-m odel organism s [46,47,48,49],

In this study, we used Illum ina short read sequencing for de novo transcriptom e assembly and analysis o f the salt m arsh beetle P. chalceus. W e constructed three libraries covering three life stages, one third-instar larva, one pu p a and one adu lt m ale beetle. W e m atched these sequences in a BLAST search to known proteins o f the N C B I database and aligned the sequences to the genom e o f T. castaneum. M atches include a num ber o f genes relevant to the study o f w ing developm ent an d dispersal polym orphism . Furtherm ore,

we screened the transcriptom e for bo th conservative SNPs and SNPs resulting in am ino acid changes, w hich will allow genom e wide screening o f variation betw een different ecotypes. T he resulting assem bled an d anno ta ted transcriptom e sequences constitute com prehensive genom ic resources, available for further studies and m ay provide a fast approach for identifying genes involved in developm ental pathw ays (i.e. w ing developm ent, JH , and ECD) relevant to adaptive divergence in this species.

Materials and Methods

Tissue material and nucleic acid IsolationT h e geographical distribution o f P. chalceus extends along the

A tlantic coasts from D enm ark up to an d including the m ajor pa rt o f the M ed iterranean coasts [50]. Beetles were cap tured in the G uérande region, France. No specific perm its were required for the described field study. Eggs w ere obtained from the canal ecotype (short-winged) and raised in a com m on environm ent. A larva (third-instar), p u p a and imago (male) resulting from the same m other were frozen in liquid nitrogen and subsequently used for sequencing. T h e sex determ ination is p robably o f the X Y typeP i ] -

T otal R N A was isolated from a com plete larva (third-instar), pu p a and newly em erged m ale imago. R N A was extracted using the SV T otal R N A isolation System (Prom ega, M adison, USA) according to m anufacturer’s instructions an d genom ic D N A was rem oved by on-colum n digest with D N ase I. R N A was quantified by m easuring the absorbance a t 260 nm using a N anoD rop spectrophotom eter (Therm o Fisher Scientific, Inc.). T h e purity o f the R N A samples was assessed a t an absorbance ratio o f O D 26o/ 280 an d O D 26o/230 an d the integrity was confirm ed on an Agilent 2100 B ioanalyzer (Agilent Technologies, Inc.).

Illumina paired-end cDNA library construction and sequencing

T h e cD N A libraries were constructed for the larva, p u p a and imago using the T ru S e q ™ R N A Sam ple P reparation K it (Illumina, Inc.) according to the m anufacturer’s instructions. Poly-A containing m R N A was purified from 2 pg o f total R N A using oligo(dT) m agnetic beads and fragm ented into 200-500 bp pieces using divalent cations a t 94°C for 5 m in. T h e cleaved R N A fragm ents were copied into first strand cD N A using S uperscrip t II reverse transcriptase (Life Technologies, Inc.) an d random prim ers. After second strand cD N A synthesis, fragm ents were end repaired , a-tailed and indexed adapters were ligated. T he products were purified and enriched w ith P C R to create the final cD N A library. T h e tagged cD N A libraries w ere pooled in equal ratios and used for 2 x 1 0 0 bp paired-end sequencing on a single lane o f the Illum ina H iSeq2000 (Genomics C ore, U Z Leuven, Belgium). After sequencing, the samples were dem ultiplexed and the indexed adap ter sequences were trim m ed using the CASAVA v l.8 .2 software (Illumina, Inc.).

De novo transcriptome assemblyT h e transcriptom e reads were de novo assem bled using T rin ity

(release 20111126) [52] on the ST E V IN Supercom puter In fra ­structure a t G hen t U niversity (48 cores, 350 G of memory). T he three samples (i.e. larva, pupa, and imago) were assem bled and analyzed as a pooled dataset. As the T rin ity assem bler discards low coverage Umers, no quality trim m ing o f the reads was perform ed prio r to the assembly. T rin ity was ru n on the pa ired-end sequences with the fixed default k-m er size o f 25, m inim um contig length of 200, pa ired fragm ent length o f 500, 12 CPU s, and a butterfly H eapSpace of 25G (i.e. allocated mem ory). P rior to submission of

PLoS ONE I w w w .p lo so n e .o rg 2 A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605

Page 3: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

T ranscrip tom ics o f a W ing P o lym o rp h ic Beetle

the da ta to the T ranscrip tom e Shotgun Assembly Sequence D atabase (TSA), assem bled transcripts were blasted to N C B I’s U niV ec database (h ttp ://w w w .n cb i.n lm .n ih .g o v /V ecS creen / U niVec.htm l) to identify segments w ith adap ter contam ination and trim m ed w hen significant hits were found. This adap ter contam ination m ay result from sequencing into the 3 ' ligated adap ter o f small fragm ents (< 100 bp). H u m an an d bacterial sequence contam ination was investigated using the web-based version of D econSeq [53], w ith a query coverage and sequence identity threshold o f 90% .

Functional annotationT h e assem bled transcripts were subjected to similarity search

against N C B I’s non-redundan t (nr) database using the BLASTx algorithm [54], w ith a cut-off E-value o f < IO-3 and a H S P (high- scoring segm ent pairs) length cut-off o f 33. T h e publicly available platform independent jav a im plem entation o f the B last2G O software [55] was used for blasting and to retrieve associated gene ontology (GO) term s describing biological processes, m olecular functions, and cellular com ponents [56]. T o p 20 blast hits with a cut-off E-value o f < 1 0 6 an d similarity cut-off o f 55% were considered for G O annotation . Next, to get an idea o f the am oun t o f genes o f the T. castaneum transcriptom e are covered by P. chalceus transcripts, assem bled transcripts w ere aligned to the T ribolium Official G ene Set [34,57] using the P R O m er pipeline o f the M U M m er 3.0 software [58] with default param eters. T he presence o f open reading fram es (ORFs) was investigated using the O R F -pred ic tor server w ith an O R F cut-off length of 200 bp [59].

Genes o f interestT o guide our search for w ing developm ent genes, we used a

previously generated list o f Tribolium castaneum (Table S 13b R ichards et al. 2008 [34]). T o find P. chalceus w ing developm ent orthologs, we used T. castaneum p ro tein sequences in a local BLAST search (tBLASTn) querying the assem bled P. chalceus transcriptom e sequences. H its with an E-value less th an l e - 15 were exam ined. T h e m ost significant h it was considered to be the putative P. chalceus orthologue o f the wing developm ent gene in T. castaneum. SubsequenÜy, the P. chalceus transcrip t sequence was used in a reciprocal b last to the N C B I n r database. I f the BLAST and reciprocal BLAST m atched, we assigned orthology to that sequence. For the apterous gene, we extracted sequences o f D. melanogaster, T. castaneum, A. mellifera and A.pisum from G enB ank and constructed a neighbor-joining tree o f the p rotein sequences with M E G A 5.0 [60], bootstrapped 1000 times. T h e m ethodology used is similar to th a t o f Brisson et al. 2010 [61].

Next, genes involved in the juvenile horm one (JH) [62] and ecdysteroid (ECD) [63] pathw ay in T. castaneum were extracted from the K E G G pathw ay database [64] and the same procedure for orthologue discovery for w ing developm ent genes was followed. T h e assem bled transcriptom e was also investigated for the presence o f the isocitrate dehydrogenase 2 (IDH2) gene, which has been shown to be strongly correlated with dispersal pow er in P. chalceus [6,29]. For this; the T. castaneum pro tein sequence of the gene hom ologues to ID H 2 (XP_970446) was blasted to the P. chalceus transcript.

Mapping reads to reference transcriptomeT o align the reads back to the assem bled reference transcrip­

tom e the Burrows— W heeler Aligner (BWA) program [65] and the Bowtie aligner [66] were used. BW A was used for varian t analysis. R eads were m apped for each sample (i.e. larva, pupa, an d imago) separately to the assem bled transcriptom e based on the pooled read data. T h e BW A default values for m apping w ere used, except

for num ber o f threads (-t) = 8 an d m axim um num ber o f alignm ents (sampe -n) = 40. U n d er these settings, read pairs m apping to m ultiple equally best positions are p laced random ly. Properly pa ired reads with a m apping quality o f a t least 20 (-q = 20) were extracted from the resulting BAM file using SAMtools [67] for further analyses. Properly pa ired is defined as bo th left and right reads m apped in opposite directions on the same transcrip t a t a distance com patible with the expected m ean size o f the fragm ents. T h e high m apping quality ensures reliable (unique) m apping o f the reads, w hich im portan t for varian t calling.

As reads can m ap to m ultiple genes or isoforms and we have no available reference genom e, we used the R S E M software [68] to assign reads to genes and isoforms an d to count transcript abundances. R S E M requires gap-free alignm ents and therefore the Bowtie aligner (older version, no t Bowtie 2) was used and properly paired reads were extracted. R S E M and Bowtie were used as im plem ented in the T rin ity software package [52]. Bowtie m apping param eters were set as follows: m axim um num ber o f m ism atches allowed (-v) = 2, num ber o f valid alignm ents pe r read pa ir (-k) = 40. Setting the - k p a ram ete r allows reads to align against up to 40 different locations. T h e old version o f Bowtie does no t report m apping quality and, hence, does no t enable filtering on this param eter. W e com pared the three developm ental stages for transcrip t com position. U niquely expressed genes for each life stage were counted and investigated for G ene O ntology (GO) composition.

Variant analysisO nly reliable properly paired BW A m apped reads were

considered for Single Nucleotide Polym orphism (SNP) calling. Indels were no t considered because alternative splicing impedes reliable indel discovery. SNPs were called using the SAMtools software package [67]. G enotype likelihoods were com puted using the SAM tools utilities and variable positions in the aligned reads com pared to the reference were called w ith the BCFtools utilities [69]. U sing the varFilter com m and, SNPs w ere called only for positions w ith a m inim al m apping quality (-Q) and coverage (-d) o f 25. T h e m axim um read dep th (-D) was set a t 200. T h e reference is based on all three samples com bined. Therefore, to com pare the variational com position o f the samples, we extracted only heterozygous SN P positions (i.e. M ax-likelihood estim ate o f the site allele frequency~0.5) from each sample for the unigenes. U nique an d shared SNPs were extracted with the V CFtools software [70]. SNPs located in an open reading fram e (ORF) < 2 0 0 bp were extracted. A custom peri script was used to test w hether these SNPs resulted in an am ino acid change in the predicted O R F .

Results and Discussion

Sequencing, transcriptome assembly and validationT h ree developm ental stages (one third-instar larva, pu p a and

m ale adult beetíe) were barcode tagged and sequenced on one lane o f an Illum ina H iSeq2000 sequencer. Sequencing o f cD N A libraries generated a total o f 184,749,261 raw paired end reads with a length of 101 bp, resulting in a total o f 37.32 giga bases. T h e raw sequence reads were o f good quality (< 20 Phred score). A sum m ary o f sequencing, assembly and annotation results for the three samples an d the pooled reads dataset is presented in T able 1. For the pu p a sample, rem arkably less reads were sequenced. R eads were assem bled using the R N A seq de novo assem bler T rin ity[52]. T h e com plete read dataset assem bled into 65,766 contigs, clustering into 39,393 isoform clusters (i.e. unigenes). W e selected the longest transcrip t as the representative for each cluster. T he

PLoS ONE I w w w .p lo so n e .o rg 3 A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605

Page 4: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

T ranscrip tom ics o f a W ing P o lym o rp h ic Beetle

T a b le 1. P. c h a lc e u s t r a n s c r i p t o m e s e q u e n c in g , a s s e m b ly a n d a n n o t a t i o n s u m m a r y .

Stage Larva Pupa Imago ALL

Sequencing Sequencing reads (101 bp paired end) 66,595,267 48,251,298 69,902,696 184,749,261

Bases (Gbp) 13.45 9.75 14.12 37.32

Assembly Trinity assembly (Transcripts) 65,766

Unigenes (Isoform clusters) 39,393

N50 length (bp) (Unigenes)* 1,904

Max length (bp) (Transcripts) 19,606

Max length (bp) (Unigenes) 19,606

Mean length (bp) (Transcripts) 1,044

Mean length (bp) (Unigenes) 868

Median length (bp) (Transcripts) 422

Median length (bp) (Unigenes) 365

Annotation Transcripts with BLAST results 29,358

Unigenes with BLAST results 12,987

Transcripts annotated with GO terms 17,756

Unigenes annotated with GO terms 7,589

Mapping Read mappings (properly paired) 83,539,754 53,814,547 85,597,567

(BWA)** Properly paired reads (%) 92.6 90.4 93.1

Mean coverage (properly paired) 93.7 55.2 111.6

Median coverage (properly paired) 0.93 0.91 2.27

Mapping Read mappings 143,056,584 97,896,830 156,747,118

(Bowtie)** Properly paired reads (%) 86.8 87.2 87.7

Mean coverage (properly paired) 132.98 78.54 150.71

Median coverage (properly paired) 1.95 2.21 4.67

*Contig length for which half o f all bases in the assembled sequences are in a sequence equal or longer than this contig length. **Reads of each sample were mapped to the assembled transcriptome of the pooled data (ALL). doi:10.1371 /journal.pone.0042605.t001

size o f the contigs ranged from 200 (m inim um contig length) up to 19,606 bp, w ith a m ean length o f 1,046 bp and totaling 68,799,644 bp for all contigs (Figure 1) and a m ean length of 869 bp totaling 34,249,556 bp for the unigenes. T h e top longest (> 16 ,000 bp) assem bled sequences were inspected for correctness. O verall these extrem ely long transcripts m atched long gene sequences present in N C B I’s n r database, indicating that these sequences a re not the result o f chim erical assembly errors due to repeat regions in the genes. T h e longest transcript (19,606 bp) also m atches the D. melanogaster dum py gene, a gigantic extracellular p rotein required to m aintain tension a t epiderm al cuticle a ttachm en t sites [71].

Bacterial an d h um an transcriptom e contam ination was negli­gible. Fifty and fifty-seven unigenes w ere identified by D econSeq[53] as bacterial an d h um an contam inant sequences, respectively. Flowever, these sequences were short in length (289 bp (SD = 148) and 251 bp (SD = 60) for bacterial and hu m an contam inants, respectively) an d m ost likely represent conserved protein regions.

All sequencing reads were deposited into the Short R ead Archive (SRA) o f the N ational C enter for Biotechnology Inform ation (NCBI), an d can be accessed under the accession num ber SRA 050429. T he assem bled transcriptom e was subm itted to the T ranscrip tom e Shotgun Assembly Sequence D atabase (TSA) and can be accessed th rough the G enB ank accession num bers JU 404687^JU 470452.

Functional annotationFrom the assem bled unigenes, 12,987 (33.0%) showed signifi­

cant similarity (E v a lu e < le ) to proteins in N C B I’s non- redundan t (nr) database, w ith an average best-hit am ino acid identity o f 70.5% (SD = 14.2). As expected, the m ajority o f the sequences h ad top hits to T. castaneum proteins (54.5%) (Figure 2), the only C oleoptera species for w hich a com plete genom e is available. O th er insects resem bling P. chalceus sequences are divided across different insect orders, the m ost relevant being H ym enoptera (,Nasonia vitripennis (2.85%), Camponotus floridanus (2.41%), Apis mellifera (2.15%), Harpagnathos saltator (1.86%)), Lepidoptera (Danaus plexippus (2.48%)), H em ip tera (Acyrthosiphon pisum (2.24%)), and D ip tera (Aedes aegipty (1.88%)). T h e only non- A rthropoda species with top blast hits w orth m entioning is Hydra magnipapillata (0.53%).In total 7,589 (19.3%) P. chalceus unigenes were assigned G ene O ntology (GO) term s based on BLAST m atches to sequences w ith know n function. T h e functional classification based on biological process, m olecular function and cellular com ponent is depicted in Figure 3. A m ong the biological process term s, a significant percentage of genes were assigned to cellular (22.1%) and m etabolic (18.0%) processes. M olecular functions were for a high percentage assigned to b inding (44.8%) and catalytic activity (36.4%), w hereas m any genes were assigned to cell p a rt (48.2%) and organelle (27.5%) for the functional class cellular com ponent. T hese observations are in accordance with observations o f m etabolic processes in o ther transcriptom ic studies on insects [47 ,48,72,73,74].R edundancy is expected in the

PLoS ONE I w w w .p lo so n e .o rg A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605

Page 5: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

Tran scrip to m ics o f a W in g P o lym o rp h ic Beetle

toŒ)Coo

toJDE

6000

4000

2000

3.0 3.5C ontig length (Iog10(bp))

Figure 1. Contig length distribution of Trinity assembly for P o g o n u s cha lceus. All a sse m b le d c o n tig s w e re inc luded . do i:10 .1371/jo u rn a l.pone.0042605 .g001

assem bled transcriptom e due to the stochastic process o f sequencing and the heuristic natu re o f the assembly process, w hich can result in the fragm ented assembly o f genes. T o assess how m any actual unique genes we have found in ou r data, we aligned the obtained unigenes to the 16,645 official genes reported for T. castaneum. O f these Tribolium genes, 6,883 were covered by P. chalceus transcripts based on the P R O m er alignm ents [58], w ith a m ean percent similarity o f 76.2% (SD = 10.4). Next, m ining the alignm ents shows that 7 64 o f these Tribolium gene hits have m ore th an one hit by unique P. chalceus transcripts (comprising 1,837 unigenes). For the transcripts with a P R O m er alignm ent to a Tribolium gene this corresponds to a m axim al redundancy of 15.6% ((l,837-764)/6,883). How ever, further investigating these m ultiple hits showed that m ost com prise genes that belong to the same gene family (i.e. paralogs). O nly 272 Tribolium genes are m atched by m ultiple non-overlapping P. chalceus contigs (compris­ing 649 unigenes) and align to different portions o f the same gene. This reduces the redundancy to 5.5% ((649-272)/6,882>). H ence, the contig sets that are different portions o f the same gene do inflate the gene counts for P. chalceus to only a m inor extent.

W e calculated the “ ortholog hit ra tio” as described in O ’Neil et al. 2010 [75] by dividing the length o f the putative coding region of a unigene by the length o f the ortholog found for that unigene. For this, each unigene an d its best B LASTx hit were considered orthologs an d the hit region in the unigene is considered to be a conservative estim ator o f the “putative coding region” . In this way, the ortholog hit ratio gives an estim ate on the am ount o f a transcript that is represented by each unigene. R atios greater than 1.0 can indicate insertions in unigenes. Figure 4(A) shows that the completeness o f the assem bled transcripts decreases for very long genes. How ever, for genes with a length < 1 2 ,0 0 0 bp this relationship disappears, w hich shows that the sequencing design and T rin ity assem bler succeed well in assem bling bo th short and long transcripts. T h e distribution of ortholog hit ratios is represented in Figure 4(B). Overall, unigenes w ith B LASTx results have high ratios, indicating high completeness o f these transcripts.

O f the 12,987 transcripts w ith B LA STx results, 4,567 genes have a ratio > 0 .9 and 8,300 have a ratio > 0 .5 .

A high percentage o f unigenes (31,804; 80.7%) could no t be assigned a G O term . E xam ining the length and coverage distribution o f these anno ta ted and unanno ta ted unique tran ­scripts shows th a t m ost reads (68.8%) are, however, m apped to anno ta ted transcripts. Furtherm ore, a m ajor portion o f the u n anno ta ted transcripts consist o f assem bled transcripts with very low coverage values and short length (Figure 5). For instance, 23,497 (59.6% of all unigenes) o f these u n anno ta ted transcripts have a length shorter than 500 bp and only 3.1% o f all reads m ap to these transcripts. These short low coverage transcripts m ay represent chim eric sequences resulting from assembly errors, fragm ented transcripts corresponding to lowly expressed genes, as well as un transla ted regions. T h e rem aining 8,427 unanno ta ted sequences are m ore likely to represent true gene sequences, which m ay represent novel genes o r less conserved genes for w hich no annotation is found. 15,765 (40.0%) o f the unigenes h ad an O R F (open reading frame) > 2 0 0 bp, w ith an average length o f 1,040 bp and a m edian length of 659 bp. 7,203 (45.7%) o f these unique sequences w ith O R Fs were assigned G O annotations. T he rem aining sequences w ith an O R F > 2 0 0 bp th a t lack annotation results m ight represent true gene sequences. From the daphnia genom e sequence it was discovered that significant genom ic regions w ithout assigned open reading fram es are actively transcribed [76], T h e functional significance o f these regions rem ains to be elucidated, bu t such transcripts m ay also be present in the Pogonus transcriptom e, w hich cannot be functionally analyzed. Furtherm ore, high num bers o f u n anno ta ted contigs are frequently found in o ther transcriptom e sequencing projects [72,73,74,77] and m ay give some indication o f the lim itation of inferring the relevant functions o f transcripts assem bled from sequence da ta from species with very lim ited genom ic resources or with long evolutionary distances to m odel species. O n the o ther hand, T rin ity succeeds in assem bling a reasonable set o f anno ta ted genes despite low coverage values (Figure 5).

PLoS ONE I w w w .p lo so n e .o rg A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605

Page 6: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

T ranscrip tom ics o f a W ing P o lym o rp h ic Beetle

15

0.36%

0 52%0 53%

0.84% ^11 33%

1 .86 %

2 15*?

2 2 4 %

2 30 %

■ Tribolium castaneum

m Nasonia vitripennis

mDanausplexippus

■ Camponotus floridanus

m Pediculus humanus

m Acyrthosiphon pisum

mApismellifera

m Aedes aegypti

m Harpegnathos saltator

m Solenopsis invicta

m Bombus impatiens

■ Bombus terrestiis

m Acromyrmex echinatior

m Anopheles gambiae

■ Culex quinquefasciatus

m Bombyx mon

■ Drosophila melanogaster

■ Hydra magnipapillata

Daphnia pulex

Anopheles dariingi

Otheis

Figure 2. Species distribution of top BLASTx results. T he p ie ch a rt sh o w s th e sp ec ie s d is tr ib u tio n o f u n ig e n e s to p BLASTx resu lts a g a in s t th e nr p ro te in d a ta b a s e w ith a c u to ff E v a lu e < 1 e ~ 3. do i:10 .1371/jo u rn a l.p o n e .0042605 .g002

60,00%

50,00%

40,00%

30,00%

20,00%

10 ,00%

0, 1 ■1 n „ . l n li . , 1 n - ■ 1 I - 1

\e-

Biological Process M olecular Function Cellular Com ponent

Figure 3. Gene Ontology (GO) categories of the unigenes. D istribution o f th e GO ca teg o r ie s a ss ig n ed to th e Pogonus chalceus tran sc rip to m e . U nique tran sc rip ts (un igenes) w e re a n n o ta te d in th re e c a teg o rie s : cellu lar c o m p o n e n ts , m o lecu lar fu n c tio n s , b io logical p rocess. do i:10 .1371/jo u rn a l.p o n e .0042605 .g003

8 . PLoS ONE I w w w .p lo so n e .o rg A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605

Page 7: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

T ranscrip tom ics o f a W ing P o lym o rp h ic Beetle

O

!_

cno° 0 .6 -_c=

O0 .4 -

0 .2 -

0.0

2.0 2.5 3.0 3 .5 4 .0

1200

1000

80 04—'C=3O

O 60 0

4 0 0

200

0 .0 0 .2 0 .4 0 .6 0 .8 1.0 1 .2 1 .4

Ortholog length (log10(Amino acids)) Ortholog hit ratio

Figure 4. Relationship between ortholog hit ratio and ortholog length (A) and distribution of ortholog hit ratios (B). O rth o lo g hit ra tios w ere c a lcu la ted fo r co n tig s w ith BLASTx resu lts. A ra tio o f 1.0 in d ica tes th e g e n e is likely fully a ssem b led . do i:10 .1371/jo u rn a l.pone .00 4 2 6 0 5 .g 0 0 4

Genes o f interestAs we are interested in the adaptive divergence o f wing length in

populations o f P. chalceus, we began o u r investigation by searching the assem bled transcriptom e for orthologous genes know n to be involved in wing developm ent in the fruit fly Drosophila melanogaster. In particular, we used a previously generated list o f the wing developm ent genes reported in the genom e of the red flour beetle Tribolium castaneum (Table S13b o f R ichards et al. 2008 [34]), which was based on Drosophila w ing developm ent studies. W e found

orthologous genes for every w ing developm ent gene that we looked for in the assem bled P. chalceus transcriptom e w ith high confidence (Table 2). Engrailed (en) an d inverted (inv) blasted to the same P. chalceus transcrip t and reciprocal blast o f this com ponent re tu rned engrailed. This is not surprising considering their similarity in sequences and function [78], R etrieving orthologous genes for the apterous (ap) gene was problem atic as this gene exhibits a duplication in T. castaneum and Acyrthosiphon pisum [61,79]. Therefore, we aligned the am ino acid sequences o f

Annotated Unannotated

2.5 3.0 3.5 4.0 2.5 3.0 3.5 4.0

Contig length (Iog10(bp))

Figure 5. Contour plot of length and coverage distribution of annotated (left) and unannotated (right) unigenes. T ranscrip ts w ere a n n o ta te d using Blast2GO. R eads w e re m a p p e d using BWA. For th e a n n o ta te d tran scrip ts , m ean le n g th a n d co v e rag e w as 2,139 an d 932, respec tive ly . For th e u n a n n o ta te d tran scrip ts , m e a n le n g th an d c o v e rag e w as 567 a n d 224, respec tive ly . T he co lo r b a r sh o w s th e Iog10 tran sfo rm ed c o u n t values.do i:10 .1371/jo u rn a l.p o n e .0042605 .g005

PLoS ONE I w w w .p lo so n e .o rg A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605

Page 8: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

T ranscrip tom ics o f a W ing P o lym o rp h ic Beetle

apterous genes from D. melanogaster (NP_724428), T. castaneum (apA: N P _ 0 0 1 139341, apB: ACN 43342), Apis mellifera (XP_392622) and A. pisum (apA: X P_001946004, apB: XP_001949543) with those retrieved from BLAST hits to the P. chalceus transcriptom e (Figure 6). T h e apterous gene is a hox transcription factor and contains two conserved dom ains; the hom eo dom ain and the LIM - contain ing region [80]. As we did no t retrieve the hom eo dom ain for apB o f P. chalceus, we only com pared the conserved L IM dom ain region of the apterous genes as reported in [61]. T o root the tree, we added the closely related L IM -contain ing gene tailup (tup) o f A. pisum (XP_001944557) and T. castaneum (XP_001815525). T h e phylogenetic inference indicates that P. chalceus exhibits bo th apterous paralogs that are present in T. castaneum an d A. pisum genom e, w hich were lost in the holom etabolous insects Drosophila

and Apis. T h e relationships are similar as the ones reported by

[61]-Subsequently, we perform ed sim ilar sim ilarity analyses for genes

involved in the Juvenile horm one an d ecdysteroid pathw ay. W e found orthologous candidates w ith h igh certain ty for each gene reported in the K E G G insect horm one biosynthesis pathw ay (Table 3). T h e length o f the O R F o f the P. chalceus m atch, com pared to the O R F length in T. castaneum is also reported.

Finally we identified the full coding sequence o f the isocitrate dehydrogenase 2 (IDH2) gene (Pc_com pl560_c0_seql) based on hom ology to the T. castaneum p ro te in sequence (EFA04299; E- value = 0, b it score = 760). T h e b last result also identified the isocitrate dehydrogenase 1 (IDH1) gene (Pc_com p296_c0_seql), b u t w ith less support (E-value = e-172, b it score = 602).

T a b le 2. L ist o f w in g d e v e l o p m e n t g e n e s f o u n d in P. c h a lc e u s o r t h o l o g o u s to T. c a s ta n e u m .

Am ino acid O rtho log hitFunction Gene Accession P. chalceus iden tity (% ) ratio

Invected (inv) Pc_comp5821 _cO_seq 1 56 1.31

Cubitus interruptus Pc_co m p4 719_c0_seq 1 60

Decapentaplegic (dpp) Pc_comp8429_c0_seq2 64 0.85

Brinker (brk) Pc_co m p8966_c0_seq 1 0.29

Spalt-like protein (sal) Pc_co m p7794_c0_seq 1 0.87

Apterous b (ap B) Pc_co m p 10531 _c0_seq 1 0.69

(Ser)Serrate Pc_co m p6451 _c0_seq 1 80 1.00

Distal-less (Dll) Pc_co m p7089_c0_seq 1 1.08

Rhomboid (rho) Pc_comp9713_c0_seq1 96 0.72

Knot transcription factor (knot) Pc_comp 14479_c0_seq 1 84 0.61

Abrupt (ab) Pc_comp3738_c0_seq3 1.00

Delta (Dl) Pc_co m p8811 _c0_seq 1 0.95

Achaete-scute (ASH) Pc_co m p5966_c0_seq 1 67 1.09

Bodywall/wing Teashirt (tsh) Pc_co m p7294_c0_seq 1 69

Nubbin (nub) Pc_co m p7766_c0_seq 1 0.36

Vestigial (vg) Pc_co m p7899_c0_seq 1 69 0.74

Prothoraxless (ptl) Pc_comp8727_c0_seq1 100 0.31

Daughters against (dad) Pc_comp5722_c0_seq1 1.08

Extra macrochaetae (emc) Pc_co m p778_c0_seq 1 86 1.04

Hox Sex combs reduced Ser (Cx) Pc_co m p5657_c0_seq 1 1.07

Optomotor-blind-like (omb) Pc_comp6103_c0_seq1 0.68

Dorsal/Ventral Apterous a (ap A) Pc_comp9155_c1_seq1 0.76

Notch Pc_co m p3149_c0_seq 1 1.02

Wingless (wg) Pc_co m p9580_c0_seq 1 96 0.74

Hedgehog (hh) Pc_co m p8905_c0_seq 1 0.96

Patched (ptc) Pc_comp7372_c1_seq1 0.62

Vein and sensory Serum response factor (srf) Pc_co m p3744_c0_seq2 96 0.36

Knirps (kni) Pc_comp8029_c0_seq2 0.83

Ventral vein lacking (w l) Pc_co m p4049_c0_seq 1 1.05

Homothorax (hth) Pc_comp2739_c0_seq1 87 1.04

Asense (ase) Pc_co m p 12489_c0_seq 1 54 1.07

Noradrenaline transporter (net) Pc_comp9252_c0_seq1 0.94

liroquois Oro) Pc_comp4855_c0_seq2 1.04

Anterior/Posterior Engrailed (en) Pc_comp5821 _c0_seq 1 1.27

Ultrabithorax (Ubx) Pc_co m p6090_c0_seq 1 84 0.97

doi:10.1371 /journal.pone.0042605.t002

PLoS ONE I w w w .p lo so n e .o rg August 2012 | Volume 7 | Issue 8 | e42605

Page 9: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

Tran scrip to m ics o f a W in g P o lym o rp h ic Beetle

1 10 20 30 40

I

Tc_apB CAG CGG RIQ DRY YL LAVDRQWHASCLKC CECKL PLDTELT Pc_a pB CAG CGG RIH DRYYL LAVDRQWHAPCLKC CECKVPLDTELT A p a p B CAGCGRSIDDEFYLSAVDMCWHTGCLQCAECKLPLDTET.V D m a p C SG CG RQ IO DRF YL s AVEKRWH ASC LQ C Y AC RQ RLERES S Ap_apA CCGCGLKILDRYYLFAVDKRWHASCLQCSQCTRTLASEIK Am_ap CAGCG T.RI SDRF YLQ AVDRRWHAACLQ CS HCRQG LDGEVT T c a p A CAGCGMRITDRFYLQAVDRRWHASCLQCCQCRNTLDGEIT Pc_apA CAGCGMRIADRFYLQAVDRRWHAACLQCCQCRNTLDGEIT

50 60 70 80 i ___ ___ ___Tc_apB CF ARDGNIYCKEDY YRMFA-VTRCGROQ AG I SANELVMRA Pc_apB CFARDGNIYCKSDYYRMFA-VTROGROQAG ISANELVMRA Ap_apB CYSRHGNIYCKQDYYRLFS-IKRCAROQGGIGACDLVMRA Dm_ap CYSRDGNIYCKNDYYSFFG-TRRCSRCLASISSNELVM RA Ap_apA CF YRDGNIYCKADYQRLYG-IRRCGRCIIAGISPSELVM RA Am_ap CF SRCG N I YCKKDY'YRMFG SMKRCARCH A A IL AS ELVMRA T c a p A c f s r c g n i y c k k d y y r l f g - m k r c a r o q a t i i s s e l v m r aPc_apA CFSRDGNIYCKKDYYRLFG-MKRCGROQAAILSSELVMRA

90 100 110 120Tc_apB RDSVYHLHCFSCTSCGMPLSKGDHFGMRDGLIYCRPHYEL Pc_apB RDLVYHLHCF TCASCNVPLSKG DHFGMREG LVYCRPHYEI Ap_apB KDLVYHVECFACYAOGAVLCKGDYYGVRDGAVFCRPDYER Dm_ap RNLVFHVNCFCCTVCHTPLTKGD2YGIIDALIYCRTHYSI Ap_apA r d t v f h v p c f s c t v c l a v l t k g d q f g m r d g a v f c q h h y q q Am ap r d l v f h v r c f s c a a c a v p l t k g d h f g m r d g a v l c r l h y e mT c a p A RDLVFHVllCFSCAVCNSPLTKGDUFGMRDGAVLCRLILFEMPc^apA r e l v f h v h c f s c a v c n c p l t k g d h f g m r d g a v l c r l h f e m

9 9

9 9

7 9

42

— Tc_apA

■Pc_apA

Am_ap

58

Dm _ap

- Ap_apA

100

-T c _ a p B

Pc_apB

Ap_apB

100 - Ap_tup

-Tc_tup

0.1

Figure 6. Phylogenetic analysis of the U M dom ain of the a p te ro u s gene. (A) A lignm en t o f p ro te in s e q u e n c e s o f th e UM d o m a in reg ion o f th e apterous (ap) o r th o lo g s a n d p a ra lo g s o f Tribo lium castaneum (Tc), A cirthosyphon p isum (Ap), D rosoph ila m e lanogaste r (Dm), Apis m e llife ra (Am) w ith th e p re su m ed p a ra lo g s fo u n d in th e Pogonus chalceus (Pc_apA an d Pc_apB) tra n sc r ip to m e . (B) N e igbour-jo in ing t re e o f ap p ro te in se q u e n c e s , ro o te d w ith ta ilu p (tup). B oo tstrap s u p p o r t va lu es a re g iven a t e ac h n o d e . do i:10 .1371/jo u rn a l.pone .00 4 2 6 0 5 .g 0 0 6

MappingR eads for each sample (i.e. larva, pupa, adult) were m apped

back to the assem bled reference transcriptom e based on the pooled da ta and properly pa ired reads were extracted (Table 1; Figure 7). Based on the BW A m appings [65], 92.6% , 90.4% and 93.1% o f the m apped reads were aligned properly pa ired when aligning the reads o f the larva, p u p a and adult sample, respectively, to the assem bled reference transcriptom e. T h e m ean coverage dep th (reads covering each base pair) for the larva, p upa and adult sam ple is respectively 93.7, 55.2 an d 111.6. T h e Bowtie aligner resulted in a h igher m ean coverage, owing to reads being

m apped to m ultiple positions. T h e p upa sam ple has less m ean coverage dep th resulting from less sequenced reads.

Som e transcripts were represented by m any reads. M oreover, 50% o f the reads m apped to only 146 transcrip t sequences and 90% m apped to 2,971 transcripts. M apping of the reads shows that read coverage is very high. How ever, the fact that only 149 transcripts consum e 50% o f all reads m ay indicate th a t norm al­ization can be useful for transcriptom e assembling. T h e top twenty o f these were investigated an d are shown in T able 4. Am ongst these transcripts, several are associated with energy m etabolism (cytochrome c oxidase subunit II and III, succinate and N A D H dehydrogenase and A D P /A T P translocase), locom otion (actin and

T a b le 3 . List o f Insect hormone biosynthesis genes.

Function GeneNCBI genelD T.castaneum Accession P. chalceus

Amino acid iden tity (% )

Ortholog hit ratio

Juvenile hormone juvenile-hormone esterase (JHE) 658208 Pc_comp7235_c0_seq 1 62 0.97

juvenile hormone acid methyltransferase

(JHAMT) 662961 Pc_comp8820_c0_seq 1 65 1.01

juvenile hormone epoxide hydrolase

(JHEH) 659305 Pc_comp841 _c0_seq 1 74 0.98

cytochrome P450, family 15 (CYP15A1) 658858 Pc_comp2578_c2_seq2 77 0.95

Molting hormone ecdysteroid 25-hydroxylase (PHM) 656884 Pc_comp6141_c0_seq 1 72 0.98

(ecdysone) ecdysteroid 22-hydroxylase (DIB) 663098 Pc_comp7215_c0_seq2 73 0.70

ecdysteroid 2-hydroxylase (SAD) 658665 Pc_comp5946_c0_seq 1 64 0.75

ecdysone 20-monooxygenase (SHD) 661451 Pc_comp8625_c0_seq2 73 0.69

cytochrome P450, family 307 (Spo/spok) 658081 Pc_comp9046_c0_seq 1 79 0.93

cytochrome P450, family 18 (CYP18A1) 656794 Pc_comp3811_c0_seq1 86 0.52

Note: Genes were extracted from T. castaneum through the KEGG pathway database. doi:10.1371 /journal.pone.0042605.t003

PLoS ONE I w w w .p lo so n e .o rg A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605

Page 10: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

T ranscrip tom ics o f a W ing P o lym o rp h ic Beetle

Larva

Is: 42,711 (64,8%) U: 25,712 (65,0%)

«\=\

&/b ' <&■

'A'

Is: 2,894 (4,4 %} U: 1,504(3,8% )

Is: 30,200 (45.8 %) U: 18,462(56,7% )

Pupa

Is: 41,292 (62 ,6 ' U: 24,653 (62,3 “

O '* .

Is; 6,171 (9,4% ) U: 3,867(9,8% )

Is: 5,795(8,8% ) U: 3.284 (8,3 %)

Is: 10,384(15,7% ) U: 7 ,086(17.9% )

Is: 52,550(79,7% ) U: 32,699(82,7% )

C om parison o f the sam plesR eads w ere m apped w ith Bowtie [66] an d assigned to genes an d

isoforms w ith the R S E M software [68]. Shared an d unique presence o f genes an d isoforms is shown in Figure 6. 30,200 (45.8%) an d 18,462 (56.7%) o f the isoforms a n d unigenes respectively w ere shared am ong life stages. 1,879 (4.8%), 1,403 (3.5%) an d 7,086 (17.9%) o f the unigenes are uniquely expressed in the larva, p u p a a n d adu lt stage, respectively. O f these uniquely expressed unigenes, only 170, 106, an d 243 respectively were assigned G O term s (Figure 8). Overall, the G O term com position o f these uniquely expressed transcripts in each life stage corresponds well to the G O term com position o f the com plete transcriptom e. N o statistical differences in G O term com position w ere found betw een these sets o f uniquely expressed genes (FD R <0.1). T h e h igher am o u n t o f uniquely expressed genes in the ad u lt stage m ost likely resulted from m ore short transcripts being assem bled.

Figure 7. Unique and shared transcript presence of the three developm ental stages. The v en n d ia g ram sh o w s th e u n iq u e a n d sh a red tra n sc r ip t p re s e n c e o f th e th re e d e v e lo p m e n ta l s ta g e s (larva, p u p a a n d ad u lt), b a se d o n RSEM co u n ts . R eads w e re a ss ig n e d to iso form s (Is) o r u n ig e n e s (U). W hen RSEM re p o r te d a c o u n t o f a t least o n e , th e tra n sc r ip t w as re p o r te d a s p re s en t. d o i:1 0 .1 3 7 1 /jo u rna l.pone.0042605 .g007

m yosin light chain), transcrip tion (DNA topoisom erase 1) an d translation (elongation factor 1 an d 2). Ferritin is a p ro tein th a t stores a n d buffers iron [81] an d its h igh abundance m ay resem ble an accom m odation to h igh reduced iron concentrations an d high oxidative stress in salt m arshes [82,83] o r a stress response.

V arian t ca llingF or SN P calling, BW A was used to m ap the reads o f each

sam ple to the reference transcriptom e. In total, SAM tools [67] detected 38,141 different heterozygous SN P position in unique transcrip t sequences using the stringent param eters (i.e. coverage an d m app ing quality o f 25) (Figure 9). T his is ab o u t one SN P per nine hun d red b p o f unique transcrip t sequence (1/898). O f these SNPs, 26,823 (70.3%) w ere found in a pred ic ted open reading fram e (ORF) > 2 0 0 bp an d 6,998 (18.3%) resulted in a am ino acid change (nonsynonym ous SN P (nsSNP)) a n d are found in 2,907 different unigenes. T his results in a percentage o f nonsynonym ous changes in the coding region o f 26.1% , w hich is lower com pared

T a b le 4 . T o p t w e n t y t r a n s c r ip t s w ith m o s t r e a d s a s s ig n e d .

Accession P. chalceus Nr. reads Length (bp) Annotation

Succinate dehydrogenase3Pc_comp5_c0_seq 1 4116337

UnknownPc_comp23_c1 _seq1 2836940 3,453

NADH dehydrogenase subunit 43Pc_comp32_c0_seq 1 1912972 3,409

Alpha-tubulinPc_comp30_c0_seq 1 1823110 1,598

Pc_comp 1 _c0_seq3 Actin31511917 1,714

Pc_com p 14_c0_seq 1 DNA topoisomerase 11505364 6,711

A DP/ATP trans locase3Pc_comp58_c0_seq 1 1419825 1,732

Cytochrome c Oxidase subunit III (coxlll)-Pc_com p 10_c4_seq 1 1217481 1,679

UnknownPc_comp2_c0_seq 1 1128159 634

Pc_comp39_c0_seq 1 1501260 2,011 Unknown

Pc_com p 19_c 1 _seq 1 1124751 821 Cytochrome c Oxidase subunit II (coxii)3

Pc_comp26_c0_seq 1 1178489 3,236 Elongation factor 23

Pc_com p 16_c0_seq 1 1501260 2,186 Muscular protein 20

Pc_com p41 _c0_seq 1 1788846 1,961 Elongation factor 1-alpha3

Pc_com p 13_c0_seq 1 1346169 759 Unknown

Pc_com p 7_c0_seq 1 2585095 1,672 Myosin light chain 23

Pc_com p 18_c0_seq 1 3016196 3,942 Melanization -related protein

Pc_com p4_c3_seq 1 1842608 651 Unknown

Pc_comp0_c1 _seq1 21905861 1,272 Unknown

* Associated with mitochondria, energy metabolism and electron transport chain. **Associated with muscles and movement.***Associated with translation or transcription. doi:10.13717journal.pone.0042605.t004

p . PLoS ONE I w w w .p lo so n e .o rg 10 A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605

Page 11: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

T ranscrip tom ics o f a W ing P o lym o rp h ic Beetle

120■ Larva

n P u p a

■ A dult

I I ,

E d Ey . ^ u l L i u J l J lu. , . .d I 1 . üU ul

Biological Process Molecular Function Cellular Component

Figure 8. Gene O ntology (GO) distribution assigned to unigenes that are found uniquely in each life stage. R eads w ere m a p p e d w ith Bow tie an d ass ig n ed to g e n e s an d iso form s w ith th e RSEM softw are . do i:10 .1371/jo u rn a l.pone .00 4 2 6 0 5 .g 0 0 8

to studies reporting up to 57.3% nsSNPs in coding regions in a single individual o f Japanese native cattle [84] an d 41 to 47% in hu m an individual resequencing studies [85,86], bu t com parable to ratios found in o ther studies [87,88],

Conclusion

In the present study, we sequenced an d characterized the transcriptom e in the w ing polym orphic beetle P. chalceus. T he assem bled sequence data com prising 39,393 unique transcripts provides valuable resources to study wing polym orphism an d the

Larva Pupa

SN P: 16 ,027 (4 2 .0 % ) SN P: 16,578 (43.5% )ORF: 11.511 (30.2% ) ORF: 1 2 .0 4 2 (3 1 .6 % )

nsS N P : 2 .810 (7.4% ) \ n sS N P : 2 .842 (7.5%)SN P: 2,985 (7.8% ) O R F: 2 .199 (5.6% )

nsS N P : 4 6 5 (1 .2 % )

SN P: 1,937 (5.1% ) O RF: 1 ,4 1 3 (3 .7 % )

nsS N P : 307 (0.8% )

SN P: 2 ,0 1 4 (5 .3 % ) ORF: 1,459 (3.8% )

nsS N P : 356 (0.9% )

SN P: 4,279 (11.2% ) ORF: 3,171 (8.3% )

n sS N P : 719 (1.9% )

SN P: 10 ,458 (27.4% ) O R F: 6 ,8 8 2 (1 8 .0 % )

n sS N P : 2,118 (5.6% )

16,027 (42.0 11.511 (30.2% ) 2 ,810

SN PORF:

7.4% ) nsS N P

16,578 12.042

2 .842

AdultSN P: 18 ,688 (49 .0 %) ORF: 12 .925 (33.9% )

nsS N P : 3 .500 (9.2% )

Figure 9. Shared and unique SNPs. Only H e tero zy g o u s SNPs a re c o n sid e red from u n ig e n e s . T he to ta l a m o u n t o f h e te ro z y g o u s SNPs called in th e th re e sam p les is 38,141. 70.3% (26,823) o f th e se SNPs w e re fo u n d in an o p e n read in g fram e (ORF) an d 18.3% (6,998) re su lte d in an a m in o acid c h an g e (nsSNP).do i:10 .1371/jo u rn a l.pone .00 4 2 6 0 5 .g 0 0 9

PLoS ONE I w w w .p lo so n e .o rg 11 A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605

Page 12: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

T ranscrip tom ics o f a W ing P o lym o rp h ic Beetle

adaptive divergence in the face o f strong gene flow found in P. chalceus. W e characterized a large set o f genes relevant to wing developm ent an d dispersal polym orphism w ith high significance, including paralogs, giving an indication o f the integrity and completeness o f the assem bled P. chalceus transcriptom e resulting from short read Illum ina sequencing.

W e found a high num ber o f putative SNPs (37,492). T he com bination of SN P calling with O R F prediction allowed us to infer th a t a large p a r t o f the SNPs located in a coding fragm ent (26,757) result in nonsynonym ous nucleotide substitutions (23.2%).

T h e results show th a t it is possible to com bine transcriptom e assembly and characterization with the discovery o f bo th synonym ous an d nonsynonym ous SNPs, providing a fram ework for further population genom ic studies to indentify the m olecular basis underlying phenotypic variation o f ecologically relevant traits in a non-m odel species.

References1. R o ff D A (1986) T h e E vo lu tio n o f W in g D im o rp h ism in Insects. E v o lu tio n 40:

1 0 0 9 -1 0 2 0 .2. R o ff D A , F a irb a irn D J (2007) T h e evo lu tion a n d genetics o f m ig ra tio n in insects.

B ioscience 57: 15 5 -1 6 4 .3. H e n d ric k x F , M aelfa it J -P , D e se n d e r K , A v iro n S, Bailey D , e t al. (2009)

Pervasive effects o f d ispersal l im ita tio n o n w ith in - a n d a m o n g -co m m u n ity species richness in a g ric u ltu ra l landscapes. G lo b a l Ecology a n d B io g eo g rap h y 18: 607— 616.

4. K okko H , L o p ez-S ep u lcre A (2006) F ro m in d iv id u a l d ispersal to species ranges: P erspectives for a c h an g in g w orld . Science 313: 789—791.

5. D e n n o R E , R o d erick G K , P e terso n M A , H u b e r ty A F , D o b e l H G , e t al. (1996) H a b ita t pers is ten ce un d erlies in traspec ific v a ria tio n in th e d ispersal stra teg ies o f p lan th o p p e rs . E co log ical M o n o g ra p h s 66: 389—f0 8 .

6. D h u y v e tte r H , G a u b lo m m e E , D esen d er K (2004) G en e tic d iffe ren tia tio n a n d lo ca l a d a p ta t io n in th e sa lt-m arsh b e e d e P o g o n u s chalceus: a c o m p ariso n b e tw e en a llozym e a n d m icrosa te llite loci. M o lecu lar Ecology 13: 1065—1074.

7. V a n D yck H , M atth y sen E (1999) H a b ita t f ra g m en ta tio n a n d insect flight: a c h an g in g ‘d esig n ’ in a c h an g in g landscape? T re n d s in Ecology & E vo lu tio n 14: 1 7 2 -1 7 4 .

8. R o n c e O (2007) H o w does it feei to b e like a ro llin g stone? T e n questions a b o u t d ispersal evo lu tion . A n n u a l R ev iew o f Ecology E v o lu tio n a n d System atics 38: 2 3 1 -2 5 3 .

9. d e n B oer PJ (1968) S p read in g o f risk a n d th e s tab iliza tio n o f a n im a l nu m b ers . A c ta B io th eo re tica 18: 1 6 5 -1 9 4 .

10. R o ff D A (1994) H a b ita t P ersistence a n d th e E vo lu tio n o f W in g D im o rp h ism in Insects. A m erican N a tu ra lis t 144: 7 7 2 -7 9 8 .

11. M cP eek M A , H o lt R D (1992) T h e E v o lu d o n o f D isp ersa l in Spatia lly a n d T e m p o ra lly V a ry in g E n v iro n m en ts . A m erican N a tu ra lis t 140: 1010—1027.

12. H o lt R D , M cP eek M A (1996) C h a o tic p o p u la tio n d ynam ics favors th e evo lu tion o f d ispersal. A m e ric a n N a tu ra lis t 148: 709—718.

13. M a th ia s A, K isd i E , O liv ie ri I (2001) D iv e rg en t evo lu tio n o f d ispersal in a h e te ro g en eo u s lan d scap e . E vo lu tio n 55: 246—259.

14. D o eb eli M , R u x to n G D (1997) E v o lu tio n o f d ispersal ra tes in m e ta p o p u la tio n m odels: B ra n c h in g a n d cyclic d y n am ics in p h e n o ty p e space. E vo lu tio n 5 1 :1 7 3 0 — 1741.

15. O r r H A (2005) T h e g en e tic th eo ry o f ad a p ta tio n : A b r ie f h istory. N a tu re R eview s G enetics 6: 11 9 -1 2 7 .

16. M ich e l A P , S im S, P ow ell T H Q , T a y lo r M S , N osil P , e t al. (2010) W id e sp rea d g en o m ic d ivergence d u r in g sy m p a tric spécia tion . P ro ceed in g s o f the N a tio n a l A cad em y o f Sciences o f th e U n ited S tates o f A m erica 107: 9 724—9729.

17. H o e k s tra H E , H irsc h m a n n R J, B u n d ey R A , Insel PA , G ro sslan d J P (2006) A single a m in o ac id m u ta tio n c o n trib u te s to ad ap tiv e b e a ch m ouse co lo r p a tte rn . Science 313: 10 1 -1 0 4 .

18. S te in er C C , W e b er J N , H o ek s tra H E (2007) A d ap tiv e v a ria tio n in b e a c h m ice p ro d u c e d b y tw o in te rac tin g p ig m e n ta tio n genes. P los Biology 5: 1880—1889.

19. W e st-E b e rh a rd M J (2005) D e v e lo p m en ta l p lastic ity a n d th e o rig in o f species d ifferences. P ro ceed in g s o f th e N a tio n a l A cad em y o f S ciences o f th e U n ited States o f A m erica 102: 6 5 4 3 -6 5 4 9 .

20. H o e k s tra H E , C o y n e J A (2007) T h e locus o f evolu tion: E vo devo a n d the genetics o f a d a p ta tio n . E vo lu tio n 61: 99 5 —1016.

21. V a n S tra a len N M , R oelofs D (2006) A n in tro d u c tio n to ecological genom ics. N e w Y ork: O x fo rd U n iversity Press. 307 p.

22. B a rre tt R D H , S ch lu te r D (2008) A d a p ta tio n fro m s tan d in g gene tic varia tio n . T re n d s in Ecology & E v o lu tio n 23: 38—44.

23. A re n d t J , R ezn ick D (2008) C o n v erg en ce a n d pa ra lle lism reco n sid ered : w h a t h ave w e le a rn e d a b o u t th e genetics o f ad a p ta tio n ? T re n d s in Ecology & E v o lu tio n 23: 2 6 -3 2 .

24. L e R o u z ic A, C a r lb o rg O (2008) E v o lu tio n a ry p o ten tia l o f h id d en genetic v a ria tio n . T re n d s in E co logy & E vo lu tio n 23: 33—37.

AcknowledgmentsWe thank Janine Marien, affiliated to the Vrije Universiteit Amsterdam, for her help in preparation of the samples. This work was carried out using the Stevin Supercomputer Infrastructure a t G hent University, funded by G hent University, the Hercules Foundation and the Flemish Government — departm ent EW I and we are grateful to the IC T D epartm ent of G hent University for assistance with our com putations. Sequencing was perform ed by the Genomics Core o f the University Hospital o f Leuven, Belgium.

Author ContributionsConceived and designed the experiments: SVB FH D R JV H . Performed the experiments: SVB FH JV H . Analyzed the data: SVB. Contributed reagents/m aterials/analysis tools: SVB FH D R JV H . W rote the paper: SVB FH.

25. G ib so n G , D w o rk in I (2004) U n c o v e rin g c ry p tic gene tic v a ria tio n . N a tu re R eview s G en etics 5: 6 8 1 -U 611 .

26. S tevens V M , T ro c h e t A, V a n D yck H , C lo b e r t J , B ag u ette M (2011) H o w is d ispersal in te g ra te d in life histories: a q u a n tita tiv e analysis using butterflies. Ecology L ette rs 15: 7 4 -8 6 .

27. D esen d er K (1985) W in g p o ly m o rp h ism a n d rep ro d u c tiv e b io logy in the h a lo b io n t c a ra b id b e e d e Pogonus chalceus (M arsham ) (C o leo p te ra , C arab id ae ). B io l jb D o d o n a e a 53: 89—100.

28. D esen d er K (1987) H eritab ility E stim ates fo r D iffe ren t M o rp h o lo g ica l T ra its R e la te d to W in g D e v e lo p m en t a n d B ody Size in th e H a lo b io n t a n d W in g P o ly m o rp h ic C a ra b id B eede P og o n u s-C h a lceu s M a rsh a m (C o leo p tera , C a ra b i­dae). A cta P h y to p a th o lo g ica E t E n to m o ló g ica H u n g a r ic a 22: 85—101.

29. D h u y v e tte r H , H e n d ric k x F , G a u b lo m m e E , D e se n d e r K (2007) D iffe ren d a tio n b e tw een tw o salt m arsh b e e d e ecotypes: E v id en ce fo r o n g o in g spéciation . E v o lu tio n 61: 184—193.

30. D esen d er K , B ackeljau T , D e lah ay e K , D e M eeste r L (1998) A ge a n d size o f E u ro p e a n sa ltm arshes a n d th e p o p u la tio n gene tic co n seq u en ces for g ro u n d beedes. O eco lo g ia 114: 5 0 3 -5 1 3 .

31. A b o u h e if E , W ra y G A (2002) E v o lu tio n o f th e gene n e tw o rk u n d e rly in g w ing p o ly p h en ism in an ts. Science 297: 249—252.

32. W e a th e rb e e S D , N ijh o u t H F , G ru n e r i L W , H a id e r G , G a la n t R , e t al. (1999) U ltra b ith o ra x fu n c d o n in b u tte rfly w ings a n d th e ev o lu d o n o f in sec t w ing p a tte rn s . C u r re n t B iology 9: 109—115.

33. W eih e U , M ilá n M , C o h e n S M (2005) Drosophila L im b D ev elo p m en t. In : G ilb ert L I, ed ito r. In se c t D ev elo p m en t: M o rp h o g en esis , M o ltin g a n d M etam o rp h o sis, first ed. L o n d o n , U K : Elsevier, p p . 730.

34. R ic h a rd s S, G ibbs R A , W einstock G M , B row n SJ, D en e ll R , e t al. (2008) T h e g en o m e o f th e m o d el b e e d e a n d p e s t T r ib o liu m castan eu m . N a tu re 452: 949— 955.

35. E m le n D J, N ijh o u t H F (1999) H o rm o n a l c o n tro l o f m ale h o rn len g th d im o rp h ism in th e d u n g b e e d e O n th o p h a g u s tau ru s (C o leo p te ra : S ca ra b a e i­dae). J o u rn a l o f In sec t Physio logy 45: 45—53.

36. Ish ikaw a A , O g a w a K , G o to h H , W alsh T K , T a g u D , e t al. (2012) Ju v e n ile h o rm o n e titre a n d re la te d g ene express ion d u r in g th e c h an g e o f rep ro d u c tiv e m o d es in th e p e a ap h id . In sec t M o lecu lar Biology 21: 49—60.

37. Z e ra A J (2003) T h e en d o c rin e reg u la tio n o f w in g p o ly m o rp h ism in insects: S tate o f th e a rt, rec e n t surprises, a n d fu tu re d irecdons. In teg rativ e a n d C o m p a ra tiv e B iology 43: 6 0 7 -6 1 6 .

38. Z e ra A J (2007) E n d o c r in e analysis in e v o lu tio n ary -d ev e lo p m en ta l studies o f in sec t po ly m o rp h ism : h o rm o n e m an ip u la tio n versus d ire c t m ea su rem e n t o f h o rm o n a l regu la to rs. E v o lu tio n & D e v e lo p m e n t 9: 49 9 —513.

39. Z e ra A f D e n n o R F (1997) Physio logy a n d ecology o f d ispersal p o ly m o rp h ism in insects. A n n u a l R ev iew o f E n to m o lo g y 42: 207—230.

40. d e n B oer I J (1970) O n th e significance o f d ispersal p o w e r fo r p o p u la tio n s o f c a rab id -b eed es . O eco lo g ia 4: 1—28.

41. d e n B oer I J (1980) W in g p o ly m o rp h ism a n d d im o rp h ism in g ro u n d b eed es as stages in a n e v o lu tio n ary p rocess (C o leo p te ra , C a rab id ae ). E n to m o l G e n 6: 107— 134.

42. D esen d e r K (1988) F ligh t-M usc le D e v e lo p m en t a n d D ispersa l in th e L ife-Cycle o f C a ra b id B eedes. A n n ales D e L a Société R o y a le Z oo lo g iq u e D e B elg ique 118:7 8 -7 9 .

43. T h e o d o r id e s K , D e R iv a A, G o m e z -Z u rita J , F oster P G , V o g ler A P (2002) C o m p a riso n o f E S T lib ra ries fro m seven b e e d e species: to w ard s a fram ew o rk for phy lo g en o m ics o f th e C o leo p te ra . In se c t M o lecu lar Biology 11: 467—475.

44. S o n g H , Sheffield N G , C a m e ro n SL, M ille r K B , W h itin g M F (2010) W h e n p h y lo g en e tic a ssum ptions a re v io lated : b ase co m p o sitio n a l h e te ro g en e ity a n d am ong-site ra te v a ria tio n in b e e d e m ito c h o n d ria l phy logenom ics. System atic E n to m o lo g y 35: 42 9 —448.

PLoS ONE I w w w .p lo so n e .o rg 12 A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605

Page 13: De novo in the Wing Polymorphic Salt Marsh Beetle Pogonus ... · in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae) Steven M. Van Belleghem1'2*,

T ranscrip tom ics o f a W ing P o lym o rp h ic Beetle

45. H u n t T , B ergsten J , L evkan icova Z . P a p a d o p o u lo u A, J o h n O S , e t al. (2007) A c o m p reh en siv e ph y lo g en y o f b eed es reveals the e v o lu tio n ary o rig ins o f a su p e rra d ia tio n . S c ience 318: 1913—1916.

46. S loan D B , K e lle r S R , B e ra rd i A E , S a n d e rso n BJ, K a rp o v ic h J F , e t al. (2012) D e novo tra n sc rip to m e assem bly a n d p o ly m o rp h ism d e tec tio n in the flow ering p la n t S ilene vulgaris (C aryophy llaceae). M o lecu lar Eco logy R eso u rces 12: 333—343.

47. X u e J , B ao Y-Y , L i B-l, C h e n g Y-B, P e n g Z -Y , e t al. (2011) T ra n sc r ip to m e A nalysis o f th e B ro w n P la n th o p p e r N ilap a rv a ta lugens. P los O n e 5.

48. M ittap a lli O , Bai X , M a m id a la P , R a ja ra p u SP, B onello P, e t al. (2010) T issue- Specific T ra n sc rip to m ic s o f th e E xo tic Invasive In se c t P est E m e ra ld A sh B orer (Agrilus p lan ipenn is). P los O n e 5.

49. P o e lch au M F , R eyno lds J A , D en lin g e r D L , Elsik C G , A rm b ru s te r P A (2011) A d e novo tra n sc rip to m e o f th e A sian tiger m osq u ito , A edes a lbop ic tus, to identify can d id a te tran sc rip ts fo r d iap au se p rep a ra tio n . B m c G en o m ics 12.

50 . T u r in H (2000) D e N e d e r la n d se lo o p k ev ers , v e rsp re id in g e n o eco log ie (C o leo p tera . C a ra b id a e ), N e d e rla n d se F au n a3 .: N a tio n a a l N a tu u rh is to risc h M u seu m N a tu ra lis , K N N V U itgeverij & E IS N e d e r la n d , L eiden .

51. S e rra n o J (1981) A C h ro m o so m e S tu d y o f S p an ish B em b id iid ae a n d O th e r C a ra b o id e a (C o leo p te ra A dephaga). G e n e tic a 57: 119—129.

52. G ra b h e rr M G , H a a s BJ, Y asso u r M , L e v in JZ , T h o m p s o n D A , e t al. (2011) Full- len g th tra n sc rip to m e assem bly fro m R N A -S eq d a ta w ith o u t a refe ren ce genom e. N a tu re B io techno logy 29: 6 4 4 -U 130 .

53. S ch m ied e r R , E d w ard s R (2011) F ast Id en tifica tio n a n d R em o v a l o f S equence C o n ta m in a tio n fro m G en o m ic a n d M e ta g e n o m ic D atasets. Plos O n e 6.

54. A ltschu l SF, G ish W , M ille r W , M yers E W , L ip m a n D J (1990) Basic L ocal A lig n m en t S ea rch T o o l. J o u rn a l o f M o lecu lar Biology 215: 4 03^410 .

55. G o tz S, G a rc ia -G o m e z J M , T e ro l J , W illiam s T D , N a g a ra j S H , e t al. (2008) H ig h - th ro u g h p u t fu n c tio n a l a n n o ta tio n a n d d a ta m in in g w ith th e B last2G O suite. N u c le ic A cids R e se a rc h 36: 3 4 2 0 -3 4 3 5 .

56. A sh b u rn e r M , Ball C A , Blake J A , B o tste in D , B u d er H , e t al. (2000) G ene O nto lo g y : to o l fo r th e u n ificatio n o f b io logy. N a tu re G enetics 25: 25—29.

57. K im H S , M u rp h y T , X ia J , C a ra g e a D , P a rk Y , e t al. (2010) B eedeB ase in 2010: rev is ions to p ro v id e c o m p re h e n siv e g e n o m ic in fo rm a tio n fo r T r ib o liu m castan eu m . N u cle ic A cids R e se a rc h 38: D 4 3 7 -D 4 4 2 .

58. K u r tz S, P hillippy A, D e lch er A L , S m o o t M , S h u m w ay M , e t al. (2004) V ersatile a n d o p e n softw are fo r c o m p a rin g la rg e genom es. G e n o m e Biology 5.

59. M in X J , B u d e r G , S to rm s R , T sa n g A (2005) O rfP red ic to r: p red ic tin g p ro te in - co d in g reg ions in E S T -d e riv ed sequences. N u c le ic A cids R e se a rc h 33: W 6 7 7 — W 680 .

60. T a m u ra K , P e terso n D , P e te rso n N , S tech er G , N e i M , e t al. (2011) M E G A 5: M o le c u la r E v o lu tio n a ry G e n e tic s A nalysis U s in g M a x im u m L ik e lih o o d , E v o lu tio n a ry D is tan ce , a n d M a x im u m P arsim o n y M eth o d s. M o lecu lar Biology a n d E vo lu tio n 28: 2 7 3 1 -2 7 3 9 .

61. B risson J A , Ish ikaw a A, M iu ra T (2010) W in g d e v e lo p m en t genes o f th e p ea a p h id a n d d iffe ren tia l g ene expression b e tw e en w in g ed a n d u n w in g ed m o rp h s. In se c t M o lecu lar Biology 19: 6 3 -7 3 .

62. Belles X , M a rt in D , P iu lachs M D (2005) T h e m ev a lo n a te p a th w a y a n d the synthesis o f ju v en ile h o rm o n e in insects. A n n u a l R ev iew o f E n to m o lo g y , pp . 1 8 1 -1 9 9 .

63. W a rre n J T , P e try k A , M a rq u e s G , P a rv y JP , S h in o d a T , e t al. (2004) P h a n to m en co d es th e 25-hydroxy lase o f D ro so p h ila m elan o g aste r a n d B om byx m ori: a P 4 5 0 en z y m e c ritica l in ecd y so n e b iosyn thesis . In se c t B io ch em istry a n d M o lecu lar Biology 34: 991—1010.

64. K a n e h isa M , G o to S (2000) K E G G : K y o to E n cy c lo p ed ia o f G enes a n d G en o m es. N u cle ic A cids R esea rch 28: 27—30.

65. L i H , D u rb in R (2009) F ast a n d accu ra te sh o rt re a d a lig n m en t w ith Burrow s- W h eele r tran sfo rm . B io in fo rm atics 25: 1754—1760.

66. L a n g m ea d B, T ra p n e ll G , P o p M , S a lzb erg SL (2009) U ltra fa s t a n d m em o ry - efficien t a lig n m en t o f sh o rt D N A sequences to th e h u m a n genom e. G en o m e Biology 10.

67. L i H , H a n d sa k e r B, W ysoker A , F en n ell T , R u a n J , e t al. (2009) T h e S equence A lig n m e n t/M a p fo rm a t a n d SA M tools. B io in fo rm atics 25: 2 0 7 8 -2 0 7 9 .

68. L i B, D ew ey C N (2011) R S E M : ac cu ra te tra n sc rip t q u a n tif ica tio n fro m R N A - Seq d a ta w ith o r w ith o u t a refe ren ce g en o m e. B m c B io in fo rm atics 12.

69. L i H (2011) A sta tistical fram ew o rk for S N P c alling, m u ta tio n discovery, a ssoc ia tion m a p p in g a n d p o p u la tio n gene tical p a ra m e te r e s tim a tio n fro m seq u en c in g d a ta . B io in fo rm atics 27: 2987—2993.

70. D an ecek P, A u to n A, A becasis G , A lbers C A , B anks E , e t al. (2011) T h e v a ria n t call fo rm a t a n d V C F too ls. B io in fo rm ad cs 27: 2 1 5 6 -2 1 5 8 .

71. W ilk in M B , B ecker M N , M ulvey D , P h a n I, C h a o A, e t al. (2000) D ro so p h ila D u m p y is a g igan tic ex tracellu la r p ro te in re q u ired to m a in ta in ten sio n a t e p id e rm a l-cu d c le a tta c h m e n t sites. C u r re n t B iology 10: 55 9 —567.

72. B ai X , M a m id a la P, R a ja ra p u SP, J o n e s SG , M ittap a lli O (2011) T ra n sc r ip ­tom ics o f th e B ed B ug (C im ex lectularius). P los O n e 6.

73. W a n g X -W , L u a n J -B , LÍ J -M , B ao Y -Y , Z h a n g G -X , e t al. (2010) D e novo c h a ra c te r iza tio n o f a w hitefly tra n sc rip to m e a n d analysis o f its gene expression d u r in g dev e lo p m en t. B m c G en o m ics 11.

74. S h en G -M , D o u W , N iu J -Z , J ia n g H -B , Y a n g W -J, e t al. (2011) T ran sc r ip to m e A nalysis o f th e O r ie n ta l F ru it Fly (B actrocera dorsalis). Plos O n e 6.

75. O ’N e il S T , D zu ris in J D K , G a rm ic h a e l R D , L obo N F , E m ric h SJ, e t al. (2010) P o p u lad o n -lev el tra n sc rip to m e seq u en c in g o f n o n m o d e l o rgan ism s E rynn is p ro p e rd u s a n d P ap ilio zelicaon . B m c G en o m ics 11.

76. C o lb o u r n e J K , P fre n d e r M E , G ilb ert D , T h o m a s W K , T u c k e r A , e t al. (2011) T h e E coresponsive G e n o m e o f D a p h n ia pu lex . Science 331: 55 5 —561.

77. K a ra to lo s N , P a u c h e t Y, W ilk inson P , G h a u h a n R , D e n h o lm I, e t al. (2011) P y ro seq u en c in g th e tra n sc rip to m e o f th e g reen h o u se w hitefly , T ria leu ro d es v a p o ra r io ru m reveals m u ld p le tran sc rip ts e n co d in g insecdc ide ta rg e ts a n d detox ify ing enzym es. B m c G en o m ics 12.

78. G ustav so n E , G o ld sb o ro u g h A S, A li Z , K o rn b e rg T B (1996) T h e D ro so p h ila en g ra iled a n d inv ec ted genes: P a rtn e rs in reg u la tio n , expression a n d fu n cd o n . G en etics 142: 8 9 3 -9 0 6 .

79. S h ig en o b u S, B ickel R D , B risson J A , B utts T , C h a n g G G , e t al. (2010) C o m p reh en s iv e su rvey o f d e v e lo p m en ta l genes in th e p e a ap h id , A c y rth o sip h o n p isum : f re q u e n t lineage-specific du p lica tio n s a n d losses o f d ev e lo p m en ta l genes. In se c t M o lecu lar Biology 19: 4 7 -6 2 .

80. C o h e n B, M c G u fd n M E , Pfeifle G , Segal D , C o h e n S M (1992) A p te ro u s , a G en e R e q u ire d fo r Im a g in a i D isk D e v e lo p m e n t in D ro so p h ila E n co d es a M e m b e r o f th e Tim F am ily o f D e v e lo p m en ta l R eg u la to ry P ro teins. G en es & D e v e lo p m e n t 6: 7 1 5 -7 2 9 .

81. T h e il E C (1987) F e rrid n - S tru c tu re , G e n e -R e g u la d o n , a n d C e llu la r F u n c d o n in A nim als, P lan ts, a n d M icro o rg an ism s. A n n u a l R ev iew o f B iochem istry 56: 289— 315.

82. O rin o K , L e h m an L, T su ji Y , A yaki H , T o r ti SV , e t al. (2001) F e rritin a n d the resp o n se to o x id ad v e stress. B iochem ical J o u rn a l 357: 241—247.

83. O d u m W E (1988) C o m p a ra tiv e Eco logy o f T id a l F re sh w a te r a n d Salt M arshes. A n n u a l R ev iew o f Ecology a n d System atics 19: 147—176.

84. K a w a h ara -M ik i R , T su d a K , S h iw a Y , A ra i-K ich ise Y, M atsu m o to T , e t al. (2011) W h o le -g en o m e reseq u en c in g show s n u m ero u s genes w ith n o n sy n o n ­y m ous S N Ps in th e J a p a n e s e n a tiv e ca td e K u ch in o sh im a-U sh i. B m c G enom ics 1 2 .

85. E ck S H , B enet-P ages A , Flisikowski K , M eitin g e r T , F ries R , e t al. (2009) W hole g en o m e seq u en c in g o f a single Bos tau ru s a n im a l fo r single n u c leo tid e p o ly m o rp h ism discovery. G e n o m e Biology 10.

86. K im J - I , J u Y S, P a rk H , K im S, L ee S, e t al. (2009) A h igh ly a n n o ta te d w hole- g en o m e seq u en ce o f a K o re a n ind iv idua l. N a tu re 460: 1011—U 1096 .

87. B entley D R , B a la su b ra m a n ia n S, S w erd low H P , S m ith G P , M ilto n J , e t al. (2008) A c c u ra te w ho le h u m a n g en o m e seq u en c in g using reversib le te rm in a to r chem istry . N a tu re 456: 5 3 -5 9 .

88. L evy S, S u tto n G , N g P C , Feu k L , H a lp e rn A L , e t al. (2007) T h e d ip lo id genom e seq u en ce o f a n ind iv id u a l h u m an . P los Biology 5: 2113—2144.

PLoS ONE I w w w .p lo so n e .o rg 13 A u g u s t 2012 | V o lu m e 7 | Issue 8 | e42605