cricket genomes: the genomes of future foodjul 07, 2020 · 302 comparing cricket genomes to other...
TRANSCRIPT
Ylla et al. Page 1 of 27
Cricket genomes: the genomes of future food 1
GuillemYlla1*,TaroNakamura1,2,TakehikoItoh3,ReiKajitani3,AtsushiToyoda4,5,Sayuri2Tomonari6,TetsuyaBando7,YoshiyasuIshimaru6,TakahitoWatanabe6,MasaoFuketa8,Yuji3
Matsuoka6,9,SumihareNoji6,TaroMito6*,CassandraG.Extavour1,10*4
5
1. Department of Organismic and Evolutionary Biology, Harvard University,6Cambridge,USA7
2. Currentaddress:NationalInstituteforBasicBiology,Okazaki,Japan83. SchoolofLifeScienceandTechnology,TokyoInstituteofTechnology,Tokyo,Japan94. ComparativeGenomicsLaboratory,NationalInstituteofGenetics,Shizuoka,Japan105. AdvancedGenomicsCenter,NationalInstituteofGenetics,Shizuoka,Japan116. DepartmentofBioscienceandBioindustry,TokushimaUniversity,Tokushima,Japan127. Graduate School of Medicine, Pharmacology and Dentistry, Okayama University,13
Okayama,Japan148. Graduate School of Advanced Technology and Science, Tokushima University,15
Tokushima,Japan169. Current address: Department of Biological Sciences, National University17
ofSingapore,Singapore1810. DepartmentofMolecularandCellularBiology,HarvardUniversity,Cambridge,USA19
20
*[email protected],[email protected]@oeb.harvard.edu22
23
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 2 of 27
Abstract 2425
Cricketsarecurrentlyinfocusasapossiblesourceofanimalproteinforhumanconsumption26
asanalternativetoproteinfromvertebratelivestock.Thispracticecouldeasesomeofthe27
challengesbothofaworldwidegrowingpopulationandofenvironmentalissues.Thetwo-28
spottedMediterraneanfieldcricketGryllusbimaculatushastraditionallybeenconsumedby29
humansindifferentpartsoftheworld.Notonlyisthisconsideredgenerallysafeforhuman30
consumption, several studies also suggest that introducing crickets into one’s diet may31
confer multiple health benefits. Moreover, G. bimaculatus has been widely used as a32
laboratory research model for decades in multiple scientific fields including evolution,33
developmental biology, neurobiology, and regeneration. Here we report the sequencing,34
assemblyandannotationoftheG.bimaculatusgenome,andtheannotationofthegenomeof35
theHawaiiancricketLaupalakohalensis.Thecomparisonofthesetwocricketgenomeswith36
thoseof14additionalinsectssupportsthehypothesisthatarelativelysmallancestralinsect37
genome expanded to large sizes in many hemimetabolous lineages due to transposable38
elementactivity.BasedontheratioofobservedversusexpectedCpGsites(CpGo/e),wefind39
higherconservationandstrongerpurifyingselectionoftypicallymethylatedgenesthanof40
non-methylatedgenes.Finally,ourgenefamilyexpansionanalysisrevealsanexpansionof41
thepickpocketclassVgenefamilyinthelineageleadingtocrickets,whichwespeculatemight42
playarelevantroleincricketcourtshipbehavior,includingtheircharacteristicchirping. 43
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 3 of 27
Introduction 44
Multipleorthopteranspecies,andcricketsinparticular,arecurrentlyinfocusasasourceof45
animalproteinforhumanconsumptionandforvertebratelivestock.Insectconsumption,or46
entomophagy,iscurrentlypracticedinsomepopulations,includingsomecountrieswithin47
Africa,Asia,andSouthAmerica(Kouřimská&Adámková,2016),butisrelativelyrareinmost48
EuropeanandNorthAmericancountries.Theuseof insects forhumanconsumptionand49
animalfeedingcouldhelpbothtodecreasetheemissionofgreenhousegases,andtoreduce50
thelandextensionconsiderednecessarytofeedthegrowingworldwidepopulation.Crickets51
are especially attractive insects as a food source, since they are already found in the52
entomophagousdietsofmanycountries(VanHuisetal.,2013),andpossesshighnutritional53
value.Cricketshaveahighproportionofproteinfortheirbodyweight(>55%),andcontain54
theessentiallinoleicacidastheirmostpredominantfattyacid(Ghosh,Lee,Jung,&Meyer-55
Rochow,2017;Kouřimská&Adámková,2016;VanHuisetal.,2013).56
The two-spotted Mediterranean field cricket Gryllus bimaculatus has traditionally been57
consumed in different parts of theworld. In northeast Thailand,which recorded 20,00058
insect farmers in 2011 (Hanboonsong, Jamjanya, & Durst, 2013), it is one of the most59
marketedandconsumedinsectspecies.Studieshavereportednoevidencefortoxicological60
effectsrelatedtooralconsumptionofG.bimaculatusbyhumans(Ahn,Han,Kim,Hwang,&61
Yun,2011;Ryuetal.,2016),neitherweregenotoxiceffectsdetectedusingthreedifferent62
mutagenicity tests (Mietal.,2005).Ararebutknownhealthriskassociatedwithcricket63
consumption,however, issensitivityandallergytocrickets(Pener,2016;Ribeiro,Cunha,64
Sousa-Pinto, & Fonseca, 2018), especially in people allergic to seafood, shown by cross-65
allergies between G. bimaculatus and Macrobrachium prawns (Srinroch, Srisomsap,66
Chokchaichamnankit,Punyarit,&Phiriyangkul,2015).67
Not only is the cricketG. bimaculatus considered generally safe forhuman consumption,68
several studiesalso suggest that introducing crickets intoone’sdietmay confermultiple69
healthbenefits.WatersolublecompoundsderivedfromethanolextractsofwholeadultG.70
bimaculatus applied to cultured mouse spleen cells were reported to stimulate the71
expressionofmultiplecytokinesassociatedwithimmunecellproliferationandactivation72
(Dong-Hwanetal.,2004).RatstreatedwithethanolextractsofwholeadultG.bimaculatus73
showedsignsof reducedaging, including characteristicaging-associatedgeneexpression74
profiles,andreducedlevelsofmarkersofDNAoxidativedamage,(Ahn,Hwang,Yun,Kim,&75
Park,2015).GlycosaminoglycansderivedfromG.bimaculatuswerereportedtoelicitsome76
anti-inflammatoryeffectsinaratmodelofchronicarthritis(Ahn,Han,Hwang,Yun,&Lee,77
2014). Rats fed a diet including an ethanol extract of G. bimaculatus accumulated less78
abdominalfatandhadlowerserumglucoselevelsthancontrolanimals(Ahn,Kim,Kwon,79
Hwang, & Park, 2015). More recent studies suggest that G. bimaculatus powder has80
antidiabeticeffectsinratmodelsofTypeIdiabetes(Park,Lee,Lee,Hoang,&Chae,2019)81
and protects against acute alcoholic liver damage inmice (Hwang et al., 2019). Beyond82
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 4 of 27
rodentmodels,astudyofhealthyadulthumansubjectsshowedthattheintakeof25g/day83
of the powdered cricket speciesGrylloides sigillatus supported growth of someprobiotic84
microbiota,andcorrelatedwithreducedexpressionofthepro-imflammatorycytokineTNF-85
a(Stulletal.,2018).86
Althoughcricketsarebecomingeconomicallyimportantplayersinthefoodindustry,there87
arecurrentlynopubliclyavailableannotatedcricketgenomesfromanyof thesetypically88
consumed species. Here,we present the 1.66-Gb genome assembly and annotation ofG.89
bimaculatus, commonly known as the two-spotted cricket, a namederived from the two90
yellowspotsfoundonthebaseoftheforewingsofthisspecies(Figure1A).91
G.bimaculatushasbeenwidelyusedasalaboratoryresearchmodelfordecades,inscientific92
fieldsincludingneurobiologyandneuroethology(Fisheretal.,2018;Huber,Moore,&Loher,93
1989), evo-devo (Kainz,Ewen-Campen,Akam,&Extavour,2011),developmentalbiology94
(Donoughe&Extavour,2015),andregeneration(Mito&Noji,2008).Technicaladvantages95
of this cricket species as a researchmodel include the fact thatG. bimaculatus does not96
require cold temperatures or diapause to complete its life cycle, it is easy to rear in97
laboratoriessinceitcanbefedwithgenericinsectorotherpetfoods,itisamenabletoRNA98
interference (RNAi) and targeted genome editing (Kulkarni & Extavour, 2019), stable99
germlinetransgeniclinescanbeestablished(Shinmyoetal.,2004),andithasanextensive100
list of available experimental protocols ranging from behavioral to functional genetic101
analyses(WilsonHorch,Mito,Popadić,Ohuchi,&Noji,2017).102
Wealsoreportthefirstgenomeannotationforasecondcricketspecies,theHawaiiancricket103
Laupala kohalensis, whose genome assembly was recently made public (Blankers, Oh,104
Bombarely,&Shaw,2018).Comparingthesetwocricketgenomeswiththoseof14other105
insect species allowedus to identify three interesting features of these cricket genomes,106
someofwhichmayrelatetotheiruniquebiology.First,thedifferentialtransposableelement107
(TE)compositionbetweenthetwocricketspeciessuggestsabundantTEactivitysincethey108
divergedfromalastcommonancestor,whichourresultssuggestoccurredcirca89.2million109
yearsago (Mya).Second,basedongeneCpGdepletion,an indirectbut robustmethod to110
identifytypicallymethylatedgenes(Bewick,Vogel,Moore,&Schmitz,2016;Bird,1980),we111
find higher conservation of typically methylated genes than of non-methylated genes.112
Finally,ourgenefamilyexpansionanalysisrevealsanexpansionofthepickpocketclassV113
genefamilyinthelineageleadingtocrickets,whichwespeculatemightplayarelevantrole114
incricketcourtshipbehavior,includingtheircharacteristicchirping.115
Results 116
Gryllus bimaculatus genome assembly 117
We sequenced, assembled, andannotated the1.66-Gbhaploidgenomeof thewhite eyed118
mutant strain (Mito&Noji, 2008) of the cricketG. bimaculatus (Figure1A). 50%of the119
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 5 of 27
genome iscontainedwithin the71 longestscaffolds (L50), theshortestof themhavinga120
lengthof6.3Mb(N50),and90%ofthegenomeiscontainedwithin307scaffolds(L90).In121
comparison to other hemimetabolous genomes, and in particular, to polyneopteran122
genomes, our assembly displays high-quality scores by a number of metrics123
(Supplementary Table 1). Notably, the BUSCO scores (Simão, Waterhouse, Ioannidis,124
Kriventseva,&Zdobnov,2015)ofthisgenomeassemblyatthearthropodandinsectlevels125
are98.50%and97.00%respectively,indicatinghighcompletenessofthisgenomeassembly126
(Table 1). The low percentage of duplicated BUSCO genes (1.31%-1.81%) suggests that127
putative artifactual genomic duplication due tomis-assembly of heterozygotic regions is128
unlikely.129
130
Table1:Gryllusbimaculatusgenomeassemblystatistics.131
NumberofScaffolds 47,877
GenomeLength(nt) 1,658,007,496
GenomeLength(Gb) 1.66
Avg.scaffoldsize(Kb) 34.63
N50(Mb) 6.29
N90(Mb) 1.04
L50 71
L90 307
BUSCOScore–Arthropoda 98.50%
BUSCOScore–Insecta 97.00%
132
Annotation of two cricket genomes 133
The publicly available 1.6-Gb genome assembly of the Hawaiian cricket L. kohalensis134
(Blankers,Oh,Bombarely,etal.,2018),althoughhavinglowerassemblystatisticsthanthat135
ofG.bimaculatus(N50=0.58Mb,L90=3,483),scoreshighintermsofcompleteness,with136
BUSCO scores of 99.3% at the arthropod level and 97.80% at the insect level137
(SupplementaryTable1).138
UsingthreeiterationsoftheMAKER2pipeline(Holt&Yandell,2011),inwhichwecombined139
ab-initioandevidence-basedgenemodels,weannotatedtheprotein-codinggenesinboth140
cricketgenomes(SupplementaryFigures1&2).Weidentified17,871codinggenesand141
28,529 predicted transcripts for G. bimaculatus, and 12,767 coding genes and 13,078142
transcriptsforL.kohalensis(Table2).143
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 6 of 27
Toobtainfunctionalinsightsintotheannotatedgenes,weranInterProScan(Jonesetal.,144
2014)forallpredictedproteinsequencesandretrievedtheirInterProID,PFAMdomains,145
andGene-Ontology(GO)terms(Table2).Inaddition,weretrievedthebestsignificant146
BLASTPhit(E-value<1e-6)for70-90%oftheproteins.Takentogether,thesemethods147
predictedfunctionsfor75%and94%oftheproteinsannotatedforG.bimaculatusandL.148
kohalensisrespectively.Wecreatedanovelgraphicinterfacethroughwhichinterested149
readerscanaccess,search,BLASTanddownloadthegenomedataandannotations150
(http://34.71.36.157:3838/).151
152
Table2:GenomeannotationsummaryforthecricketsG.bimaculatusandL.kohalensis153
G.bimaculatus L.kohalensis
AnnotatedProtein-CodingGenes 17,871 12,767
AnnotatedTranscripts 28,529 13,078
%WithInterProID 59.56% 72.52%
%WithGO-terms 38.66% 47.03%
%WithPFAMmotif 62.44% 76.59%
%WithsignificantBLASTPhit 73.64% 93.23%
BUSCO-transcriptomeScore–Insecta 92.30% 87.20%
Repetitivecontent 33.69% 35.51%
TEcontent 28.94% 34.50%
GClevel 39.93% 35.58%
Abundant Repetitive DNA 154
WeusedRepeatMasker(Smit,Hubley,&Grenn,2015)todeterminethedegreeofrepetitive155
contentinthecricketgenomes,usingspecificcustomrepeatlibrariesforeachspecies.This156
approachidentified33.69%oftheG.bimaculatusgenome,and35.51%oftheL.kohalensis157
genome, as repetitive content (Supplementary File 1). InG. bimaculatus the repetitive158
contentdensitywassimilarthroughoutthegenome,withtheexceptionoftwoscaffoldsthat159
contained 1.75x-1.82x the density of repetitive content than themean of the other N90160
scaffolds(Figure1B).Transposableelements(TEs)accountedfor28.94%ofthisrepetitive161
content in theG. bimaculatus genome, and for 34.50%of the repetitive content in theL.162
kohalensisgenome.AlthoughtheoverallproportionofrepetitivecontentmadeupofTEswas163
similar between the two cricket species, the proportion of each specific TE class varied164
greatly (Figure 1C). In L. kohalensis the most abundant TE type was long interspersed165
elements(LINEs),accountingfor20.21%ofthegenome,whileinG.bimaculatusLINEsmade166
uponly8.88%ofthegenome.ThespecificLINEsubtypesLINE1andLINE3appearedata167
similarfrequencyinbothcricketgenomes(<0.5%),whiletheLINE2subtypewasoverfive168
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 7 of 27
timesmorerepresentedinL.kohalensis,covering10%ofthegenome(167Mb).Ontheother169
hand, DNA transposons accounted for 8.61%of theG. bimaculatus genome, but only for170
3.91%oftheL.kohalensisgenome.171
DNA methylation 172
CpGdepletion,calculatedastheratiobetweenobservedversustheexpectedincidenceofa173
cytosine followed by a guanine (CpGo/e), is considered a reliable indicator of DNA174
methylation. This is because spontaneous C to T mutations occur more frequently on175
methylatedCpGsthanunmethylatedCpGs(Bird,1980).Thus,genomicregionsthatundergo176
methylationareeventuallyCpG-depleted.WecalculatedtheCpGo/evalueforeachpredicted177
protein-codinggeneforthetwocricketspecies.Inbothspecies,weobservedaclearbimodal178
distributionofCpGo/evalues(Figure2A).Oneinterpretationofthisdistributionisthatthe179
peakcorrespondingtolowerCpGo/evaluescontainsgenesthataretypicallymethylated,and180
thepeakofhigherCpGo/econtainsgenesthatdonotundergoDNAmethylation.Underthis181
interpretation,somegeneshavenon-randomdifferentialDNAmethylation incrickets.To182
quantifythegenesinthetwoputativemethylationcategories,wesetaCpGo/ethresholdas183
thevalueof thepointof intersectionbetween the twonormaldistributions (Figure2A).184
Afterapplyingthiscutoff,44%ofG.bimaculatusgenesand45%ofL.kohalensisgeneswere185
identifiedasCpG-depleted.186
AGOenrichmentanalysisofthegenesaboveandbelowtheCpGo/ethresholddefinedabove187
revealedcleardifferencesinthepredictedfunctionsofgenesbelongingtoeachofthetwo188
categories.Strikingly,however,genesineachthresholdcategoryhadfunctionalsimilarities189
acrossthetwocricketspecies(Figure2A).GeneswithlowCpGo/evalues,whicharelikely190
thoseundergoingmethylation,wereenrichedforfunctionsrelatedtoDNAreplicationand191
regulation of gene expression (including transcriptional, translational, and epigenetic192
regulation),whilegeneswithhighCpGo/evalues,suggestinglittleornomethylation,tended193
tohavefunctionsrelatedtometabolism,catabolism,andsensorysystems.194
Toassesswhetherthepredicteddistinctfunctionsofhigh-andlow-CpGo/evaluegeneswere195
specifictocrickets,orwereapotentiallymoregeneraltrendofinsectswithDNAmethylation196
systems,weanalyzedthepredictedfunctionsofgeneswithdifferentCpGo/evaluesinthe197
honeybeeApismellifera.ThisbeewasthefirstinsectforwhichevidenceforDNAmethylation198
wasrobustlydescribedandstudied(Elango,Hunt,Goodisman,&Yi,2009;Y.Wangetal.,199
2006).WefoundthatinA.mellifera,CpG-depletedgeneswereenrichedforsimilarfunctions200
asthoseobservedincricketCpG-depletedgenes(26GO-termsweresignificantlyenriched201
inbothhoneybee and crickets;SupplementaryFigure3). In the sameway, highCpGo/e202
genes in both crickets and honeybee were enriched for similar functions (12 GO-terms203
commonlyenriched;SupplementaryFigure3).204
Additionally,weobservedthatgenesbelongingtothelowCpGo/epeakweremorelikelyto205
haveanorthologousgeneinanotherinsectspecies,andthatorthologwasalsomorelikely206
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 8 of 27
tobelongtothelowCpGo/epeak(Figure2BandSupplementaryFigure4).Bycontrast,207
geneswithhighCpGo/e,weremorelikelytobespecies-specific,butiftheyhadanorthologin208
anotherspecies,thisorthologwasalsolikelytohavehighCpGo/e.Thissuggeststhatgenes209
thataretypicallymethylatedtendtobemoreconservedacrossspecies,whichcouldimply210
lowevolutionaryratesandstrongselectivepressure.Totestthishypothesizedrelationship211
betweenlowCpGo/eandlowevolutionaryrates,wecomparedthedN/dSvaluesof1-to-1212
orthologousgenesbelongingtothesameCpGo/epeakbetweenthetwocricketspecies.We213
foundthatCpG-depletedgenesinbothcricketshadsignificantlylowerdN/dSvaluesthan214
non-CpG-depleted genes (p-value<0.05; Figure 2C), consistent with stronger purifying215
selectiononCpG-depletedgenes.216
Phylogenetics and gene family expansions 217
To study the genome evolution of these cricket lineages, we compared the two cricket218
genomeswiththoseof14additionalinsects,includingmembersofallmajorinsectlineages219
withspecialemphasisonhemimetabolousspecies.Foreachofthese16insectgenomes,we220
retrievedthelongestproteinpergeneandgroupedthemintoorthogroups(OGs),whichwe221
called“genefamilies”forthepurposeofthisanalysis.TheOGscontainingasingleprotein222
perinsect,namelysinglecopyorthologs,wereusedtoinferaphylogenetictreeforthese16223
species.Theobtainedspeciestreetopologywasinaccordancewiththecurrentlyunderstood224
insectphylogeny(Misofetal.,2014).Then,weusedtheMisofetal.(2014)datedphylogeny225
to calibrate our tree on four different nodes,which allowed us to estimate that the two226
cricketspeciesdivergedcirca89.2millionyearsago.227
Ourgenefamilyexpansion/contractionanalysisusing59,516OGsidentified18genefamilies228
that were significantly expanded (p-value<0.01) in the lineage leading to crickets. In229
addition,weidentifiedafurther34and33genefamilyexpansionsspecifictoG.bimaculatus230
and L. kohalensis respectively. Functional analysis of these expanded gene families231
(SupplementaryFile2)revealedthatthecricket-specificgenefamilyexpansionsincluded232
pickpocket genes,whichare involved inmechanosensation inDrosophilamelanogaster as233
describedinthefollowingsection.234
235
Expansion of pickpocket genes 236
In D. melanogaster, the complete pickpocket gene repertoire is composed of 6 classes237
containing31genes.Wefoundcricketorthologsofall31pickpocketgenesacrosssevenof238
ourOGs,andeachOGpredominantlycontainedmembersofasinglepickpocketclass.We239
used all the genes belonging to these 7 OGs to build a pickpocket gene tree, using the240
predictedpickpocketorthologsfrom16insectspecies(Figure3;SupplementaryTable2).241
Thisgenetreeallowedustoclassifythedifferentpickpocketgenesineachofthe16species.242
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 9 of 27
Thepickpocketgenefamilyappearedtobeasignificantlyexpandedgenefamilyincrickets.243
FollowingtheclassificationofpickpocketgenesusedinDrosophilaspp.(Zelle,Lu,Pyfrom,&244
Ben-Shahar,2013)wedeterminedthat thespecificgene familyexpanded incricketswas245
pickpocketclassV(Figure3).InD.melanogasterthisclasscontainseightgenes:ppk(ppk1),246
rpk (ppk2),ppk5,ppk8,ppk12,ppk17,ppk26, andppk28 (Zelle et al., 2013).Our analysis247
suggests that the class V gene family contains 15 and 14 genes inG bimaculatus and L.248
kohalensis respectively. In contrast, their closest analyzed relative, the locust Locusta249
migratoria,hasonlyfivesuchgenes.250
251
Thepickpocketgenesincricketstendedtobegroupedingenomicclusters(Figure1B).For252
instance,inG.bimaculatusnineofthe15classVpickpocketgeneswereclusteredwithina253
regionof900Kb,andfourothergenesappearedintwogroupsoftwo.IntheL.kohalensis254
genome, although this genome is more fragmented than that of G. bimaculatus255
(SupplementaryTable1),weobservedfiveclusterscontainingbetweentwoandfivegenes256
each.257
InD.melanogaster,thepickpocketgeneppk1belongstoclassVandisinvolvedinfunctions258
relatedtostimulusperceptionandmechanotransduction(Adamsetal.,1998).Forexample,259
inlarvae,thisgeneisrequiredformechanicalnociception(Zhong,Hwang,&Tracey,2010),260
andforcoordinatingrhythmiclocomotion(Ainsleyetal.,2003).ppkisexpressedinsensory261
neuronsthatalsoexpressthemalesexualbehaviordeterminerfruitless(fru)(Häsemeyer,262
Yapici,Heberlein,&Dickson,2009;Pavlou&Goodwin,2013;Rezávaletal.,2012).263
Todeterminewhetherpickpocketgenesincricketsarealsoexpressedinthenervoussystem,264
wecheckedforevidenceofexpressionofpickpocketgenesinthepubliclyavailabletheRNA-265
seqlibrariesfortheG.bimaculatusprothoracicganglion(Fisheretal.,2018).Thisanalysis266
detectedexpression(>5FPKMs)ofsixpickpocketgenes,fourofthembelongingtoclassV,in267
theG. bimaculatus nervous system. In the same RNA-seq libraries, we also detected the268
expressionoffru(SupplementaryTable3).269
270
Discussion 271
The importance of cricket genomes 272
Mostofthecropsandlivestockthathumanseathavebeendomesticatedandsubjectedto273
strong artificial selection for hundreds or even thousands of years to improve their274
characteristicsmostdesirableforhumans,includingsize,growthrate,stressresistance,and275
organoleptic properties (Y. H. Chen, Gols, & Benrey, 2015; Gepts, 2004; Thrall, Bever, &276
Burdon,2010;Yamasakietal.,2005).Incontrast,toourknowledge,cricketshaveneverbeen277
selectedbasedonanyfood-relatedcharacteristic.278
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 10 of 27
The advent of genetic engineering techniques has accelerated domestication of some279
organisms(K.Chen&Gao,2014).Thesetechniqueshavebeenused,forinstance,toimprove280
thenutritionalvalueofdifferentcrops,ortomakethemtoleranttopestsandclimatestress281
(Qaim,2009;Thralletal.,2010).Cricketsarenaturallynutritionallyrich(Ghoshetal.,2017),282
butinprinciple,theirnutritionalvaluecouldbefurtherimproved,forexamplebyincreasing283
vitamincontentorOmega-3 fattyacidsproportion. Inaddition,other issues thatpresent284
challenges to cricket farming could potentially be addressed by targeted genome285
modification,whichcanbeachievedinG.bimaculatususingZincfingernucleases,TALENs,286
or CRISPR/Cas9 REF. These challenges include sensitivity to common insect viruses,287
aggressivebehavior resulting in cannibalism, complexmating rituals, and relatively slow288
growthrate.289
Anessentialtoolforanykindofgeneticengineeringisahighqualityannotatedreference290
genome,togetherwithadeepunderstandingofthebiologyofthegivenspecies.BecauseG.291
bimaculatushasbeenusedasaresearchmodel inmultipledifferentscientificdisciplines,292
includingrearingforconsumption,issuesrelevanttoitsbiochemicalcomposition(Ghoshet293
al., 2017), human health and safety (Ahn et al., 2011; Ryu et al., 2016), putative health294
benefits(Ahnetal.,2014;Ahn,Hwang,etal.,2015;Ahn,Kim,etal.,2015;Dong-Hwanetal.,295
2004;Hwangetal.,2019;Parketal.,2019),andprocessingtechniques(Dobermann,Field,296
&Michaelson,2019)havebeenextensivelydescribed.Thus,thegenomeofG.bimaculatus297
herein described, adds to this body of biological knowledge by providing invaluable298
information thatwill be required tomaximize the potential of this cricket to become an299
increasinglysignificantpartoftheworldwidedietinthefuture.300
301
Comparing cricket genomes to other insect genomes 302
Theannotationofthesetwocricketgenomeswasdonebycombiningdenovogenemodels,303
homology-basedmethods,andtheavailableRNA-seqandESTs.Thispipelineallowedusto304
predict17,871genesintheG.bimaculatusgenome,similartothenumberofgenesreported305
forotherhemimetabolous insect genomes including the locustL.migratoria (17,307) (X.306
Wangetal.,2014)andthetermitesCryptotermessecundus(18,162)(Harrisonetal.,2018),307
Macrotermes natalensis (16,140) (Poulsen et al., 2014) and Zootermopsis nevadensis,308
(15,459) (Terrapon et al., 2014). The slightly lower number of protein-coding genes309
annotated in L. kohalensis (12,767) may be due to the lesser amount of RNA-seq data310
availableforthisspecies,whichchallengesgeneannotation.Nevertheless,theBUSCOscores311
aresimilarbetweenthetwocrickets,andtheproportionofannotatedproteinswithputative312
orthologousgenesinotherspecies(proteinswithsignificantBLASThits;seemethods)forL.313
kohalensisishigherthanforG.bimaculatus.Thissuggeststhepossibilitythatwemayhave314
successfully annotatedmost conserved genes, but that highly derived or species-specific315
genesmightbemissingfromourannotations.316
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 11 of 27
317
TEs and genome size evolution 318
Approximately35%ofthegenomeofbothcricketscorrespondstorepetitivecontent.This319
issubstantiallylessthanthe60%reportedforthegenomeofL.migratoria(X.Wangetal.,320
2014).Thislocustgenomeisoneofthelargestsequencedinsectgenomestodate(6.5Gb)321
buthasaverysimilarnumberofannotatedgenes(17,307)tothosewereportforcrickets.322
Wehypothesizethatthelargegenomesizedifferencebetweentheseorthopteranspeciesis323
duetotheTEcontent,whichhasalsobeencorrelatedwithgenomesizeinmultipleeukaryote324
species(Chénais,Caruso,Hiard,&Casse,2012;Kidwell,2002).325
Furthermore,wehypothesizethatthedifferencesintheTEcompositionbetweenthetwo326
crickets are the result of abundant and independent TE activity since their divergence327
around89.2Mya.This,togetherwiththeabsenceofevidenceforlargegenomeduplication328
eventsinthislineage, leadsustohypothesizethattheancestralorthopterangenomewas329
shorterthanthoseofthecricketsstudiedhere(1.6GbforG.bimaculatusand1.59GbforL.330
kohalensis) which are in the lowest range of orthopteran genome sizes (Hanrahan &331
Johnston, 2011). In summary, we propose that the wide range of genome sizes within332
Orthoptera,reachingashighas8.55GbinthelocustSchistocercagregaria(Camachoetal.,333
2015),islikelyduetoTEactivitysincethetimeofthelastorthopteranancestor.334
There isacleartendencyofpolyneopterangenomestobemuch longerthanthoseof the335
holometabolousgenomes(Figure4).Twocurrentlycompetinghypothesesarethat(1)the336
ancestralinsectgenomewassmall,andwasexpandedoutsideofHolometabola,and(2)the337
ancestral insectgenomewas large,anditwascompressedintheHolometabola(Gregory,338
2002).Ourobservationsareconsistentwiththefirstofthesehypotheses.339
Largergenomesizecorrelateswithslowerdevelopmentalratesinsomeplantsandanimals,340
whichishypothesizedtobeduetoaslowercelldivisionrate(Gregory,2002).Thus,onemay341
speculatethatthelargeproportionofTE-derivedDNAincricketgenomesmightnegatively342
impactthedevelopmentalrate.IfthisrepetitiveDNAproveslargelynon-functional,thenin343
principle,thedevelopmentalrate,animportantfactorforinsectfarming,mighteventually344
bemodifiablewiththeuseofgeneticengineeringtechniques.345
346
DNA Methylation 347
Most holometabolan species, including well-studied insects like D. melanogaster and348
Triboliumcastaneum,donotperformDNAmethylation,ortheydoitatverylowlevels(Lyko,349
Ramsahoye,& Jaenisch, 2000).ThehoneybeeA.melliferawasoneof the first insects for350
whichfunctionalDNAmethylationwasdescribed(Y.Wangetal.,2006).AlthoughthisDNA351
modification was initially proposed to be associated with the eusociality of these bees352
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 12 of 27
(Elangoetal.,2009),subsequentstudiesshowedthatDNAmethylationiswidespreadand353
presentindifferentinsectlineagesindependentlyofsocialbehavior(Bewicketal.,2016).354
DNAmethylationalsooccursinothernon-insectarthropods(Thomasetal.,2020).355
WhilethepreciseroleofDNAmethylationingeneexpressionregulationremainsunclear,356
ouranalysissuggeststhatcricketCpG-depletedgenes(putativelyhypermethylatedgenes)357
showsignsofpurifyingselection,tendtohaveorthologsinotherinsects,andareinvolvedin358
basicbiologicalfunctionsrelatedtoDNAreplicationandtheregulationofgeneexpression.359
These predicted functions differ from those of the non-CpG depleted genes (putatively360
hypomethylatedgenes),whichappeartobeinvolvedinsignalingpathways,metabolism,and361
catabolism.Thesepredictedfunctionalcategoriesmaybeconservedfromcricketsovercirca362
345millionyearsofevolution,aswealsodetectthesamepatterninthehoneybee.363
Taken together, these observations suggest a potential relationship between DNA364
methylation, sequence conservation, and function for many cricket genes. Nevertheless,365
basedonourdata,wecannotdeterminewhetherthemethylatedgenesarehighlyconserved366
becausetheyaremethylated,orbecausetheyperformbasicfunctionsthatmayberegulated367
byDNAmethylationevents. InthecockroachBlattellagermanica,DNAmethyltransferase368
enzymesandgeneswithlowCpGo/evaluesshowanexpressionpeakduringthematernalto369
zygotictransition(Ylla,Piulachs,&Belles,2018).Theseresultsincockroaches,togetherwith370
our observations, leads us to speculate that at least in Polyneopteran species, DNA371
methylation might contribute to the maternal zygotic transition by regulating essential372
genesinvolvedinDNAreplication,transcription,andtranslation.373
pickpocket gene expansion 374
ThepickpocketgenesbelongtotheDegenerin/epithelialNa+channel(DEG/ENaC)family,375
whichwere first identified inCaenorhabditiselegansas involved inmechanotransduction376
(Adamsetal.,1998).Thesamefamilyofionchannelswaslaterfoundinmanymulticellular377
animals,withadiverserangeoffunctionsrelatedtomechanoreceptionandfluid–electrolyte378
homeostasis(Liu,Johnson,&Welsh,2003).Mostoftheinformationontheirrolesininsects379
comes from studies inD.melanogaster. In this fruit fly,pickpocket genes are involved in380
neural functions including NaCl taste (Lee et al., 2017), pheromone detection (Averhoff,381
Richardson, Starostina, Kinser, & Pikielny, 1976), courtship behavior (Lu, LaMora, Sun,382
Welsh,&Ben-Shahar,2012),andliquidclearanceinthelarvaltrachea(Liuetal.,2003).383
In D. melanogaster adults, the abdominal ganglia mediate courtship and postmating384
behaviors through neurons expressing ppk and fru (Häsemeyer et al., 2009; Pavlou &385
Goodwin,2013;Rezávaletal.,2012).InD.melanogasterlarvae,ppkexpressionindendritic386
neuronsisrequiredtocontrolthecoordinationofrhythmiclocomotion(Ainsleyetal.,2003).387
In crickets, theabdominalgangliaare responsible fordetermining song rhythm(Jacob&388
Hedwig,2016).Moreover,wefindthatinG.bimaculatus,bothppkandfrugeneexpression389
aredetectableintheadultprothoracicganglion.Theseobservationssuggestthepossibility390
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 13 of 27
thatclassVpickpocketgenescouldbe involved insongrhythmdetermination incrickets391
throughtheirexpressioninabdominalganglia.392
This possibility is consistent with the results of multiple quantitative trait locus (QTL)393
studiesdoneincricketspecies fromthegenusLaupala,which identifiedgenomicregions394
associatedwithmatingsongrhythmvariationsandfemaleacousticpreference(Blankers,395
Oh,&Shaw,2018).The179scaffoldsthattheauthorsreportedbeingwithinoneLODofthe396
sevenQTLpeaks,containedfivepickpocketgenes,threeofthemfromclassVandtwofrom397
classIV.OneofthetwoclassIVgenesalsoappearswithinaQTLpeakofasecondexperiment398
(Blankers,Oh,Bombarely,etal.,2018;Shaw&Lesnick,2009).XuandShaw(2019)found399
thatascaffoldinaregionofLODscore1.5ofoneoftheirminorlinkagegroups(LG3)contains400
slowpoke,agenethataffectssonginterpulseinterval inD.melanogaster,andthisscaffold401
alsocontainstwoclassIIIpickpocketgenes(SupplementaryTable4).402
In summary, the roles ofpickpocket genes in controlling rhythmic locomotion, courtship403
behavior,andpheromonedetectioninD.melanogaster,theirappearanceingenomicregions404
associatedwithsongrhythmvariation inLaupala, and theirexpression inG.bimaculatus405
abdominalganglia,leadustospeculatethattheexpandedpickpocketgenefamilyincricket406
genomes could be playing a role in regulating rhythmic wing movements and sound407
perception,bothofwhicharenecessaryformating(WilsonHorchetal.,2017).Wenotethat408
XuandShaw(2019)hypothesizedthatsongproductionincricketsislikelytoberegulated409
byionchannels,andthatlocomotion,neuralmodulation,andmuscledevelopmentareall410
involved in singing (Xu& Shaw, 2019). However, further experiments,which could take411
advantage of the existing RNAi and genome modification protocols for G. bimaculatus412
(Kulkarni&Extavour,2019),willberequiredtotestthishypothesis.413
414
In conclusion, the G. bimaculatus genome assembly and annotation presented here is a415
sourceofinformationandanessentialtoolthatweanticipatewillenhancethestatusofthis416
cricketasamodernfunctionalgeneticsresearchmodel.Thisgenomemayalsoproveuseful417
to the agricultural sector, and could allow improvement of cricket nutritional value,418
productivity,andreductionofallergencontent.Annotatingasecondcricketgenome,thatof419
L. kohalensis, and comparing the two genomes, allowed us to unveil possible420
synapomorphiesofcricketgenomes,andtosuggestpotentiallygeneralevolutionarytrends421
ofinsectgenomes.422
423
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 14 of 27
Materials and Methods 424
DNA isolation 425
TheG.bimaculatuswhite-eyedmutantstrainwasrearedatTokushimaUniversity, at29±1426
˚Cand30-50%humidityundera10-hlight,14-hdarkphotoperiod.Testesofasinglemale427
adultoftheG.bimaculatuswhite-eyedmutantstrainwereusedforDNAisolationandshort-428
readsequencing.WeusedDNAfromtestesofanadditionalsingleindividualtomakealong429
readPacBiosequencinglibrarytoclosegapsinthegenomeassembly.430
Genome Assembly 431
Paired-end librarieswere generatedwith insert sizes of 375 and 500 bp, andmate-pair432
libraryweregeneratedwithinsertsizesof3,5,10,and20kb.Librariesweresequencedusing433
theIlluminaHiSeq2000andHiSeq2500sequencingplatforms.Thisyieldedatotalof127.4434
Gb of short read paired-end data, that was subsequently assembled using the de novo435
assembler Platanus (v. 1.2.1) (Kajitani et al., 2014). Scaffolding and gap closing were436
performedusingtotal138.2Gbofmate-pairdata.Afurthergapclosingstepwasperformed437
usinglongreadsgeneratedbythePacBioRSsystem.The4.3GbofPacBiosubreaddatawere438
usedtofillgapsintheassemblyusingPBjelly(v.15.8.24)(Englishetal.,2012).439
440
Repetitive Content Masking 441
Wegeneratedacustomrepeatlibraryforeachofthetwocricketgenomesbycombiningthe442
outputs from homology-based and de novo repeat identifiers, including the LTRdigest443
together with LTRharvest (Ellinghaus, Kurtz, & Willhoeft, 2008),444
RepeatModeler/RepeatClassifier (www.repeatmasker.org/RepeatModeler), MITE tracker445
(Crescente, Zavallo, Helguera, & Vanzetti, 2018), TransposonPSI446
(http://transposonpsi.sourceforge.net), and the databases SINEBase (Vassetzky &447
Kramerov,2013)andRepBase(Bao,Kojima,&Kohany,2015).Weremovedredundancies448
from the librarybymergingsequences thatweregreater than80%similarwithusearch449
(RobertC.Edgar,2010),andclassifiedthemwithRepeatClassifier.Sequencesclassifiedas450
“unknown”wereBLASTed(BLASTX)against the9,229reviewedproteinsof insects from451
UniProtKB/Swiss-Prot. Those sequences with a BLAST hit (E-value < 1e-10) against a452
proteinnotannotatedasatransposase,transposableelement,copiaprotein,ortransposon453
wereremovedfromthecustomrepeatlibrary.Thecustomrepeatlibrarywasprovidedto454
RepeatMasker version open-4.0.5 to generate the repetitive content reports, and to the455
MAKER2pipelinetomaskthegenome.456
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 15 of 27
Protein-Coding Genes Annotation 457
We performed genome annotations through three iterations of the MAKER2 (v2.31.8)458
pipeline(Holt&Yandell,2011)combiningab-initiogenemodelsandevidence-basedmodels.459
For the G. bimaculatus genome annotation, we provided the MAKER2 pipeline with the460
43,595 G. bimaculatus nucleotide sequences from NCBI, an assembled developmental461
transcriptome(Zengetal.,2013),anassembledprothoracicgangliontranscriptome(Fisher462
etal.,2018),andagenome-guided transcriptomegeneratedwithStringTie (Perteaetal.,463
2015)using84RNA-seqlibraries(accessionnumbers:XXXX)mappedtothegenomewith464
HISAT2(Kim,Langmead,&Salzberg,2015).AsalternativeESTsandproteinsequences,we465
providedMAKER2with14,391nucleotidesequencesfromL.kohalensisavailableatNCBI,466
andaninsectproteindatabaseobtainedfromUniProtKB/Swiss-Prot(UniProt,2019).467
FortheannotationoftheL.kohalensisgenome,werantheMAKER2pipelinewiththe14,391468
L.kohalensisnucleotidesequencesfromNCBI,theassembledG.bimaculatusdevelopmental469
andprothoracicgangliontranscriptomesdescribedabove,andthe43,595NCBInucleotide470
sequences.Asproteindatabases,weprovidedthe insectproteins fromUniProtKB/Swiss-471
ProtplustheproteinsthatweannotatedintheG.bimaculatusgenome.472
For both crickets, we generated ab-initio gene models with GeneMark-ES (Ter-473
Hovhannisyan,Lomsadze,Chernoff,&Borodovsky,2008) in self-trainingmode, andwith474
Augustus(Stanke&Waack,2003)trainedwithBUSCOv3(Simãoetal.,2015).Aftereachof475
the first twoMAKER2iterations,additionalgenemodelswereobtainedwithSNAP(Korf,476
2004)trainedwiththeannotatedgenes.477
Functional annotations were obtained using InterProScan (Jones et al., 2014), which478
retrievedtheInterProDomains,PFAMdomains,andGO-terms.Additionally,weranaseries479
ofBLASTroundstoassignadescriptortoeachtranscriptbasedonthebestBLASThit.The480
firstroundofBLASTwasagainstthereviewedinsectproteinsfromUniProtKB/Swiss-Prot.481
Proteinswith no significant BLAST hits (E-value < 1e-6)were later BLASTed against all482
proteinsfromUniProtKB/TrEMBL,andthosewithoutahitwithE-value<1e-6wereBLASTed483
againstallproteinsfromUniProtKB/Swiss-Prot.484
AdetailedpipelineschemeisavailableinSupplementaryFigures1&2,andthe485
annotationscriptsareavailableonGitHub486
(https://github.com/guillemylla/Crickets_Genome_Annotation).487
488
Quality Assessment 489
Genomeassemblystatisticswereobtainedwithassembly-stats(https://github.com/sanger-490
pathogens/assembly-stats).BUSCO(v3.1.0)(Simãoetal.,2015)wasusedtoassessthelevel491
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 16 of 27
ofcompletenessofthegenomeassemblies(‘-mgeno’)aswellasthatofthegeneannotations492
(‘-mtran’)atbotharthropod(‘arthropoda_odb9’)andinsect(‘insecta_odb9’)levels.493
CpGo/e Analysis 494
Weused the genome assemblies and their gene annotations from this study for the two495
cricketspecies,andretrievedpubliclyavailableannotatedgenomesfromtheother14insect496
species(SupplementaryTable1).Thegeneannotationfiles(ingffformat)wereusedto497
obtain the amino-acid and CDS sequences for each annotated protein-coding gene per498
genomeusinggffread,withoptions“-y”and“-x”respectively.TheCpGo/evaluepergenewas499
computed as the observed frequency of CpGs (fCpG) divided by the product of C and G500
frequencies(fCandfG)fCpG/fC*fGinthelongestCDSpergeneforeachofthe16studiedinsects.501
CpGo/e values larger than zero and smaller than two were retained and represented as502
densityplots(Figures2&4).503
ThedistributionsofgeneCpGo/evaluespergeneof the twocricketsand thehoneybeeA.504
melliferawerefittedwithamixtureofnormaldistributionsusingthemixtoolsRpackage505
(Benaglia,Chauveau,Hunter,&Young,2009).Thisallowedustoobtainthemeanofeach506
distribution,thestandarderrors,andtheinterceptionpointbetweenthetwodistributions,507
whichwasusedtocategorizethegenesintolowCpGo/eandhighCpGo/ebins.Forthesetwo508
bins of genes, we performed a GO-enrichment analysis (based on GO-terms previously509
obtainedusingInterProScan)ofBiologicalProcesstermsusingtheTopGOpackage(Alexa&510
Rahnenfuhrer,2019)withtheweight01algorithmandtheFisherstatistic.GO-termswitha511
p-value<0.05were plotted as word-clouds using the R package ggwordcloud (Pennec &512
Slowikowski,2018)with thesizeof thewordcorrelatedwith theproportionof the term513
withintheset.514
ForeachofthegenesbelongingtolowandhighCpGo/ecategoriesineachofthethreeinsect515
species,weretrievedtheirorthogroupidentifierfromourgenefamilyanalysis,allowingus516
toassignputativemethylationstatustoorthogroupsineachinsect.ThenweusedtheUpSet517
Rpackage(Lex,Gehlenborg,Strobelt,Vuillemot,&Pfister,2014)tocomputeanddisplaythe518
numberoforthogroupsexclusivetoeachcombinationasanUpSetplot.519
dN/dS Analysis 520
We first aligned the longestpredictedproteinproductof the single-copy-orthologsof all521
protein-codinggenesbetweenthetwocrickets(N=5,728)withMUSCLE.Then,theamino-522
acid alignments were transformed into codon-based nucleotide alignments using the523
Pal2Nalsoftware(Suyama,Torrents,&Bork,2006).Theresultingcodon-basednucleotide524
alignmentswere used to calculate the pairwise dN/dS for each gene pairwith the yn00525
algorithmimplementedinthePAMLpackage(Yang,2007).GeneswithdNordS>2were526
discarded from furtheranalysis.TheWilcoxon-Mann-Whitneystatistical testwasused to527
comparethedN/dSvaluesbetweengeneswithhighandlowCpGo/evaluesinbothinsects.528
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 17 of 27
Gene Family Expansions and Contractions 529
Using custom Python scripts (see530
https://github.com/guillemylla/Crickets_Genome_Annotation) we obtained the longest531
predictedproteinproductpergeneineachofthe16studiedinsectspeciesandgroupedthem532
intoorthogroups(whichwealsorefertohereinas“genefamilies”)usingOrthoFinderv2.3.3533
(Emms&Kelly,2019).Theorthogroups(OGs)determinedbyOrthoFinderthatcontaineda534
singlegeneper insect,namelyputativeone-to-oneorthologs,wereused forphylogenetic535
reconstruction.TheproteinswithineachorthogroupwerealignedwithMUSCLE(RobertC536
Edgar,2004)andthealignmentstrimmedwithGBlocks(Castresana,2000).Thetrimmed537
alignments were concatenated into a single meta-alignment that was used to infer the538
speciestreewithFastTree2(Price,Dehal,&Arkin,2010).539
Tocalibratethespeciestree,weusedthe“chronos”functionfromtheRpackageapev5.3540
(Paradis&Schliep,2019),settingthecommonnodebetweenBlattodeaandOrthopteraat541
248millionyears(my),theoriginofHolometabolaat345my,thecommonnodebetween542
Hemiptera and Thysanoptera at 339 my, and the ancestor of hemimetabolous and543
holometabolous insects(rootof thetree)atbetween385and395my.Thesetimepoints544
wereobtainedfromaphylogenypublishedthatwascalibratedwithseveralfossils(Misofet545
al.,2014).546
Thegenefamilyexpansion/contractionanalysiswasdonewiththeCAFEsoftware(DeBie,547
Cristianini,Demuth,&Hahn,2006).WeranCAFEusingthecalibratedspeciestreeandthe548
tablegeneratedbyOrthoFinderwiththenumberofgenesbelongingtoeachorthogroupin549
eachinsect.FollowingtheCAFEmanual,wefirstcalculatedthebirth-deathparameterswith550
theorthogroupshavinglessthan100genes.Wethencorrectedthembyassemblyquality551
andcalculatedthegeneexpansionsandcontractionsforbothlarge(>100genes)andsmall552
(≤100)genefamilies.Thisallowedustoidentifygenefamiliesthatunderwentasignificant553
(p-value<0.01) gene family expansion or contraction on each branch of the tree. We554
proceededtoobtainfunctionalinformationfromthosefamiliesexpandedonourbranches555
of interest (i.e. theoriginofOrthoptera, thebranch leading to crickets, and thebranches556
specifictoeachcricketspecies.).Tofunctionallyannotatetheorthogroupsof interest,we557
firstobtainedtheD.melanogaster identifiersof theproteinswithineachorthogroup,and558
retrievedtheFlyBaseSymbolandtheFlyBasegenesummarypergeneusingtheFlyBaseAPI559
(Thurmond et al., 2019). Additionally, we ran InterProScan on all the proteins of each560
orthogroupandretrievedallPFAMmotifsandtheGOtermstogetherwiththeirdescriptors.561
Allofthisinformationwassummarizedintabulatedfiles(SupplementaryFile2),whichwe562
usedtoidentifygeneexpansionswithpotentiallyrelevantfunctionsforinsectevolution.563
pickpocket gene family expansion 564
Thefunctionalannotationofsignificantlyexpandedgenefamiliesincricketsallowedusto565
identify an orthogroup containing orthologs ofD.melanogasterpickpocket classV genes.566
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 18 of 27
Subsequently, we retrieved the 6 additional orthogroups containing the complete set of567
pickpocket genes inD.melanogaster according to FlyBase. The protein sequences of the568
membersofthe7PickpocketorthogroupswerealignedwithMUSCLE,andthepickpocket569
gene tree obtainedwith FastTree. Following the pickpocket categorization described for570
Drosophilaspp.(Zelleetal.,2013)andtheobtainedpickpocketgenetree,weclassifiedthe571
cricketspickpocketgenesintoclassesfromItoVI.572
Tocheckforevidenceofexpressionpickpocketgenesinthecricketnervoussystem,weused573
the22RNA-seqlibrariesfromprothoracicganglion(Fisheretal.,2018)ofG.bimaculatus574
available at NCBI GEO (PRJNA376023). Reads were mapped against the G. bimaculatus575
genomewithRSEM (Li&Dewey, 2011)using STAR (Dobin et al., 2013) as themapping576
algorithm,andthenumberofexpectedcountsandFPKMswasretrievedforeachgenein577
eachlibrary.TheFPKMsofthepickpocketgenesandfruitlessisshowninSupplementary578
Table3.GeneswithasumofmorethanfiveFPKMsacrossallsampleswereconsideredto579
beexpressedinG.bimaculatusprothoracicganglion.580
Acknowledgments 581
This work was supported by Harvard University and MEXT KAKENHI (No. 221S0002;582
26292176;17H03945).Thecomputationalinfrastructureinthecloudusedforthegenome583
analysiswasfundedbyAWSCloudCreditsforResearch.TheauthorsaregratefultoHiroo584
SaiharaforhissupportinmanagementofagenomedataserveratTokushimaUniversity.585
Author contributions statement 586
GY,SN,TMandCEdesignedexperiments;TIandATconductedsequencingbyHiSeqand587
assemblingshortreadsusingthePlatanusassembler;ST,YI,TW,MFandYMperformedDNA588
isolation,gapclosingof contigsandmanualannotation;GY,TN,STandTBconductedall589
otherexperimentsandanalyses;TMandCEfundedtheproject;GYandCEwrotethepaper590
withinputfromallauthors.591
Data availability 592
ThegenomeassemblyandgeneannotationsforGryllusbimaculatusweresubmittedtoDDBJ593
andtoNCBIundertheaccessionnumber(XXXXXX).Thescriptsusedforgenomeannotation594
and analysis are available at GitHub595
(https://github.com/guillemylla/Crickets_Genome_Annotation).).596
597
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 19 of 27
References 598599
Adams,C.M.,Anderson,M.G.,Motto,D.G.,Price,M.P.,Johnson,W.A.,&Welsh,M.J.(1998).600Rippedpocketandpickpocket,novelDrosophilaDEG/ENaCsubunitsexpressedin601earlydevelopmentandinmechanosensoryneurons.InTheJournalofCellBiology602(Vol.140,pp.143-152).603
Ahn,M.Y.,Han,J.W.,Hwang,J.S.,Yun,E.Y.,&Lee,B.M.(2014).Anti-inflammatoryeffectof604glycosaminoglycanderivedfromGryllusbimaculatus(Atypeofcricket,insect)on605adjuvant-treatedchronicarthritisratmodel.InJournalofToxicologyand606EnvironmentalHealth-PartA:CurrentIssues(Vol.77,pp.1332-1345).607
Ahn,M.Y.,Han,J.W.,Kim,S.J.,Hwang,J.S.,&Yun,E.Y.(2011).Thirteen-weekoraldose608toxicitystudyofG.bimaculatusinsprague-dawleyrats.InToxicologicalResearch609(Vol.27,pp.231-240).610
Ahn,M.Y.,Hwang,J.S.,Yun,E.Y.,Kim,M.J.,&Park,K.K.(2015).Anti-agingeffectandgene611expressionprofilingofagedratstreatedwithG.bimaculatusextract.InToxicological612Research(Vol.31,pp.173-180).613
Ahn,M.Y.,Kim,M.J.,Kwon,R.H.,Hwang,J.S.,&Park,K.K.(2015).Geneexpression614profilingandinhibitionofadiposetissueaccumulationofG.bimaculatusextractin615ratsonhighfatdiet.InLipidsinHealthandDisease(Vol.14,pp.116).616
Ainsley,J.A.,Pettus,J.M.,Bosenko,D.,Gerstein,C.E.,Zinkevich,N.,Anderson,M.G.,...617Johnson,W.A.(2003).EnhancedLocomotionCausedbyLossoftheDrosophila618DEG/ENaCProteinPickpocket1.InCurrentBiology(Vol.13,pp.1557-1563).619
Alexa,A.,&Rahnenfuhrer,J.(2019).topGO:EnrichmentAnalysisforGeneOntology.R620packageversion2.36.0.621
Averhoff,W.W.,Richardson,R.H.,Starostina,E.,Kinser,R.D.,&Pikielny,C.W.(1976).622MultiplepheromonesystemcontrollingmatinginDrosophilamelanogaster.In623ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica(Vol.62473,pp.591-593).625
Bao,W.,Kojima,K.K.,&Kohany,O.(2015).RepbaseUpdate,adatabaseofrepetitive626elementsineukaryoticgenomes.InMobileDNA(Vol.6,pp.11).627
Benaglia,T.,Chauveau,D.,Hunter,D.R.,&Young,D.S.(2009).Mixtools:AnRpackagefor628analyzingfinitemixturemodels.InJournalofStatisticalSoftware(Vol.32,pp.1-29).629
Bewick,A.J.,Vogel,K.J.,Moore,A.J.,&Schmitz,R.J.(2016).EvolutionofDNAMethylation630acrossInsects.InMolecularBiologyandEvolution(Vol.34,pp.654-665).631
Bird,A.P.(1980).DNAmethylationandthefrequencyofCpGinanimalDNA.InNucleic632AcidsResearch(Vol.8,pp.1499-1504).633
Blankers,T.,Oh,K.P.,Bombarely,A.,&Shaw,K.L.(2018).Thegenomicarchitectureofa634rapidIslandradiation:Recombinationratevariation,chromosomestructure,and635genomeassemblyofthehawaiiancricketLaupala.InGenetics(Vol.209,pp.1329-6361344).637
Blankers,T.,Oh,K.P.,&Shaw,K.L.(2018).Thegeneticsofabehavioralspeciation638phenotypeinanIslandsystem.InGenes(Vol.9,pp.346).639
Camacho,J.P.M.M.,Ruiz-Ruano,F.J.,Martín-Blázquez,R.,López-León,M.D.,Cabrero,J.,640Lorite,P.,...Bakkali,M.(2015).Asteptothegiganticgenomeofthedesertlocust:641chromosomesizesandrepeatedDNAs.InChromosoma(Vol.124,pp.263-275).642
Castresana,J.(2000).SelectionofConservedBlocksfromMultipleAlignmentsforTheirUse643inPhylogeneticAnalysis.InMolecularBiologyandEvolution(Vol.17,pp.540-552).644
Chen,K.,&Gao,C.(2014).Targetedgenomemodificationtechnologiesandtheir645applicationsincropimprovements.PlantCellReports,33(4),575-583.646
Chen,Y.H.,Gols,R.,&Benrey,B.(2015).Cropdomesticationanditsimpactonnaturally647selectedtrophicinteractions.AnnuRevEntomol,60,35-58.648
Chénais,B.,Caruso,A.,Hiard,S.,&Casse,N.(2012).Theimpactoftransposableelementson649eukaryoticgenomes:Fromgenomesizeincreasetogeneticadaptationtostressful650environments.InGene(Vol.509,pp.7-15).651
Crescente,J.M.,Zavallo,D.,Helguera,M.,&Vanzetti,L.S.(2018).MITETracker:anaccurate652approachtoidentifyminiatureinverted-repeattransposableelementsinlarge653genomes.InBMCBioinformatics(Vol.19,pp.348).654
DeBie,T.,Cristianini,N.,Demuth,J.P.,&Hahn,M.W.(2006).CAFE:acomputationaltool655forthestudyofgenefamilyevolution.InBioinformatics(Vol.22,pp.1269-1271).656
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 20 of 27
Dobermann,D.,Field,L.M.,&Michaelson,L.V.(2019).Impactofheatprocessingonthe657nutritionalcontentofGryllusbimaculatus(blackcricket).InNutritionBulletin(Vol.65844,pp.116-122).659
Dobin,A.,Davis,C.A.,Schlesinger,F.,Drenkow,J.,Zaleski,C.,Jha,S.,...Gingeras,T.R.660(2013).STAR:ultrafastuniversalRNA-seqaligner.InBioinformatics(Vol.29,pp.15-66121).662
Dong-Hwan,S.,HWANG,S.-Y.,HAN,J.,KOH,S.-K.,KIM,I.,RYU,K.S.,&YUN,C.-Y.(2004).663Immune-EnhancingActivityScreeningonExtractsfromTwoCrickets,Gryllus664bimaculatusandTeleogryllusemma.InEntomologicalResearch(Vol.34,pp.207-665211).666
Donoughe,S.,&Extavour,C.G.(2015).EmbryonicdevelopmentofthecricketGryllus667bimaculatus.InDevelopmentalBiology(Vol.411,pp.140-156).668
Edgar,R.C.(2004).MUSCLE:multiplesequencealignmentwithhighaccuracyandhigh669throughput.InNucleicAcidsResearch(Vol.32,pp.1792-1797).670
Edgar,R.C.(2010).SearchandclusteringordersofmagnitudefasterthanBLAST.In671Bioinformatics(Vol.26,pp.2460-2461).672
Elango,N.,Hunt,B.G.,Goodisman,M.A.D.,&Yi,S.V.(2009).DNAmethylationis673widespreadandassociatedwithdifferentialgeneexpressionincastesofthe674honeybee,Apismellifera.InProceedingsoftheNationalAcademyofSciencesofthe675UnitedStatesofAmerica(Vol.106,pp.11206-11211).676
Ellinghaus,D.,Kurtz,S.,&Willhoeft,U.(2008).LTRharvest,anefficientandflexible677softwarefordenovodetectionofLTRretrotransposons.InBMCBioinformatics(Vol.6789,pp.18).679
Emms,D.M.,&Kelly,S.(2019).OrthoFinder:phylogeneticorthologyinferencefor680comparativegenomics.GenomeBiol,20(1),238.681
English,A.C.,Richards,S.,Han,Y.,Wang,M.,Vee,V.,Qu,J.,...Gibbs,R.A.(2012).Mindthe682gap:upgradinggenomeswithPacificBiosciencesRSlong-readsequencing683technology.PLoSOne,7(11),e47768.684
Fisher,H.P.,Pascual,M.G.,Jimenez,S.I.,Michaelson,D.A.,Joncas,C.T.,Quenzer,E.D.,685Christie,A.E.&Horch,H.W.(2018).Denovoassemblyofatranscriptomeforthe686cricketGryllusbimaculatusprothoracicganglion:Aninvertebratemodelfor687investigatingadultcentralnervoussystemcompensatoryplasticity.PLoSOne(Vol.68813,pp.e0199070).689
Gepts,P.(2004).Cropdomesticationasalong-termselectionexperiment.Plantbreeding690reviews,24(2),1-44.691
Ghosh,S.,Lee,S.-M.,Jung,C.,&Meyer-Rochow,V.(2017).Nutritionalcompositionoffive692commercialedibleinsectsinSouthKorea.JournalofAsia-PacificEntomology,20(2),693686-694.694
Gregory,T.R.(2002).Genomesizeanddevelopmentalcomplexity.Genetica,115(1),131-695146.696
Hanboonsong,Y.,Jamjanya,T.,&Durst,P.B.(2013).Six-leggedlivestock:edibleinsect697farming,collectingandmarketinginThailand.InOffice(pp.69).698
Hanrahan,S.J.,&Johnston,J.S.(2011).Newgenomesizeestimatesof134speciesof699arthropods.InChromosomeresearch:aninternationaljournalonthemolecular,700supramolecularandevolutionaryaspectsofchromosomebiology(Vol.19,pp.809-701823).702
Harrison,M.C.,Jongepier,E.,Robertson,H.M.,Arning,N.,Bitard-Feildel,T.,Chao,H.,...703Bornberg-Bauer,E.(2018).Hemimetabolousgenomesrevealmolecularbasisof704termiteeusociality.Natureecology&evolution.705
Häsemeyer,M.,Yapici,N.,Heberlein,U.,&Dickson,B.J.(2009).SensoryNeuronsinthe706DrosophilaGenitalTractRegulateFemaleReproductiveBehavior.InNeuron(Vol.70761,pp.511-518).708
Holt,C.,&Yandell,M.(2011).MAKER2:anannotationpipelineandgenome-database709managementtoolforsecond-generationgenomeprojects.InBMCBioinformatics710(Vol.12,pp.491).711
Huber,F.,Moore,T.E.T.E.,&Loher,W.(1989).Cricketbehaviorandneurobiology.565.712Hwang,B.B.,Chang,M.H.,Lee,J.H.,Heo,W.,Kim,J.K.,Pan,J.H.,...Kim,J.H.(2019).The713
edibleinsectGryllusbimaculatusprotectsagainstgut-derivedinflammatory714responsesandliverdamageinmiceafteracutealcoholexposure.InNutrients(Vol.71511,pp.857).716
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 21 of 27
Jacob,P.F.,&Hedwig,B.(2016).Acousticsignallingformateattractionincrickets:717Abdominalgangliacontrolthetimingofthecallingsongpattern.InBehavioural718BrainResearch(Vol.309,pp.51-66).719
Jones,P.,Binns,D.,Chang,H.-Y.,Fraser,M.,Li,W.,McAnulla,C.,...Hunter,S.(2014).720InterProScan5:Genome-scaleProteinFunctionClassification.InBioinformatics(pp.7211-5).722
Kainz,F.,Ewen-Campen,B.,Akam,M.,&Extavour,C.G.(2011).Notch/Deltasignallingis723notrequiredforsegmentgenerationinthebasallybranchinginsectGryllus724bimaculatus.Development,138(22),5015-5026.725
Kajitani,R.,Toshimoto,K.,Noguchi,H.,Toyoda,A.,Ogura,Y.,Okuno,M.,...Itoh,T.(2014).726Efficientdenovoassemblyofhighlyheterozygousgenomesfromwhole-genome727shotgunshortreads.GenomeRes,24(8),1384-1395.728
Kidwell,M.G.(2002).Transposableelementsandtheevolutionofgenomesizein729eukaryotes.InGenetica(Vol.115,pp.49-63).730
Kim,D.,Langmead,B.,&Salzberg,S.L.(2015).HISAT:Afastsplicedalignerwithlow731memoryrequirements.NatureMethods.732
Korf,I.(2004).Genefindinginnovelgenomes.InBMCBioinformatics(Vol.5,pp.59).733Kouřimská,L.,&Adámková,A.(2016).Nutritionalandsensoryqualityofedibleinsects.In734
NFSJournal(Vol.4,pp.22-26).735Kulkarni,A.,&Extavour,C.G.(2019).TheCricketGryllusbimaculatus:Techniquesfor736
QuantitativeandFunctionalGeneticAnalysesofCricketBiology.InEvo-Devo:Non-737modelSpeciesinCellandDevelopmentalBiology(Vol.68,pp.183-216):Springer.738
Lee,M.J.,Sung,H.Y.,Jo,H.,Kim,H.-W.,Choi,M.S.,Kwon,J.Y.,&Kang,K.(2017).Ionotropic739Receptor76bIsRequiredforGustatoryAversiontoExcessiveNa+inDrosophila.In740Moleculesandcells(Vol.40,pp.787-795).741
Lex,A.,Gehlenborg,N.,Strobelt,H.,Vuillemot,R.,&Pfister,H.(2014).UpSet:Visualization742ofintersectingsets.InIEEETransactionsonVisualizationandComputerGraphics743(Vol.20,pp.1983-1992).744
Li,B.,&Dewey,C.N.(2011).RSEM:accuratetranscriptquantificationfromRNA-Seqdata745withorwithoutareferencegenome.InBMCBioinformatics(Vol.12,pp.323).746
Liu,L.,Johnson,W.A.,&Welsh,M.J.(2003).DrosophilaDEG/ENaCpickpocketgenesare747expressedinthetrachealsystem,wheretheymaybeinvolvedinliquidclearance.In748ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica(Vol.749100,pp.2128-2133).750
Lu,B.,LaMora,A.,Sun,Y.,Welsh,M.J.,&Ben-Shahar,Y.(2012).ppk23-Dependent751ChemosensoryFunctionsContributetoCourtshipBehaviorinDrosophila752melanogaster.InM.B.Goodman(Ed.),PLoSGenetics(Vol.8,pp.e1002587).753
Lyko,F.,Ramsahoye,B.H.,&Jaenisch,R.(2000).DNAmethylationinDrosophila754melanogaster.Nature,408,538-540.755
Mi,Y.A.,Hye,J.B.,In,S.K.,Eun,J.Y.,Seung,J.K.,Hyung,S.K.,...Byung,M.L.(2005).756Genotoxicevaluationofthebiocomponentsofthecricket,Gryllusbimaculatus,using757threemutagenicitytests.InJournalofToxicologyandEnvironmentalHealth-PartA758(Vol.68,pp.2111-2118).759
Misof,B.,Liu,S.,Meusemann,K.,Peters,R.S.,Donath,A.,Mayer,C.,...Zhou,X.(2014).760Phylogenomicsresolvesthetimingandpatternofinsectevolution.InScience(Vol.761346,pp.763-767).762
Mito,T.,&Noji,S.(2008).TheTwo-SpottedCricketGryllusbimaculatus:AnEmerging763ModelforDevelopmentalandRegenerationStudies.InCSHprotocols(Vol.2008,pp.764pdb.emo110).765
Paradis,E.,&Schliep,K.(2019).Ape5.0:Anenvironmentformodernphylogeneticsand766evolutionaryanalysesinR.InR.Schwartz(Ed.),Bioinformatics(Vol.35,pp.526-767528).768
Park,S.A.,Lee,G.H.,Lee,H.Y.,Hoang,T.H.,&Chae,H.J.(2019).Glucose-loweringeffectof769Gryllusbimaculatuspowderonstreptozotocin-induceddiabetesthroughthe770AKT/mTORpathway.InFoodScienceandNutrition(Vol.8,pp.402-409).771
Pavlou,H.J.,&Goodwin,S.F.(2013).CourtshipbehaviorinDrosophilamelanogaster:772towardsa'courtshipconnectome'.InCurrentopinioninneurobiology(Vol.23,pp.77376-83).774
AllergytoCrickets:AReview,2591-95(2016).775Pennec,E.,&Slowikowski,K.(2018).ggwordcloud:AWordCloudGeomfor'ggplot2'.R776
packageversion0.5.0.URLhttps://cran.r-project.org/package=ggwordcloud.777
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 22 of 27
Pertea,M.,Pertea,G.M.,Antonescu,C.M.,Chang,T.-C.,Mendell,J.T.,&Salzberg,S.L.778(2015).StringTieenablesimprovedreconstructionofatranscriptomefromRNA-779seqreads.InNatureBiotechnology(Vol.33,pp.290-295).780
Poulsen,M.,Hu,H.,Li,C.,Chen,Z.,Xu,L.,Otani,S.,...Zhang,G.(2014).Complementary781symbiontcontributionstoplantdecompositioninafungus-farmingtermite.In782ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica(Vol.783111,pp.14500-14505).784
Price,M.N.,Dehal,P.S.,&Arkin,A.P.(2010).FastTree2–ApproximatelyMaximum-785LikelihoodTreesforLargeAlignments.InA.F.Y.Poon(Ed.),PLoSOne(Vol.5,pp.786e9490).787
Qaim,M.(2009).TheEconomicsofGeneticallyModifiedCrops.AnnualReviewofResource788Economics,1(1),665-694.789
Rezával,C.,Pavlou,H.J.,Dornan,A.J.,Chan,Y.-B.,Kravitz,E.A.,&Goodwin,S.F.(2012).790NeuralcircuitryunderlyingDrosophilafemalepostmatingbehavioralresponses.In791CurrentBiology(Vol.22,pp.1155-1165).792
Allergicrisksofconsumingedibleinsects:Asystematicreview,621700030(2018).793Ryu,H.Y.,Lee,S.,Ahn,K.S.,Kim,H.J.,Lee,S.S.,Ko,H.J.,...Song,K.S.(2016).Oraltoxicity794
studyandskinsensitizationtestofacricket.InToxicologicalResearch(Vol.32,pp.795159-173).796
Shaw,K.L.,&Lesnick,S.C.(2009).Genomiclinkageofmalesongandfemaleacoustic797preferenceQTLunderlyingarapidspeciesradiation.InProceedingsoftheNational798AcademyofSciences(Vol.106,pp.9737-9742).799
Shinmyo,Y.,Mito,T.,Matsushita,T.,Sarashina,I.,Miyawaki,K.,Ohuchi,H.,&Noji,S.(2004).800piggyBac-mediatedsomatictransformationofthetwo-spottedcricket,Gryllus801bimaculatus.InDevelopment,GrowthandDifferentiation(Vol.46,pp.343-349).802
Simão,F.A.,Waterhouse,R.M.,Ioannidis,P.,Kriventseva,E.V.,&Zdobnov,E.M.(2015).803BUSCO:assessinggenomeassemblyandannotationcompletenesswithsingle-copy804orthologs.InBioinformatics(Vol.31,pp.3210-3212).805
Smit,A.,Hubley,R.,&Grenn,P.(2015).RepeatMaskerOpen-4.0.RepeatMaskerOpen-4.0.7.806Srinroch,C.,Srisomsap,C.,Chokchaichamnankit,D.,Punyarit,P.,&Phiriyangkul,P.(2015).807
Identificationofnovelallergeninedibleinsect,Gryllusbimaculatusanditscross-808reactivitywithMacrobrachiumspp.allergens.InFoodChemistry(Vol.184,pp.160-809166).810
Stanke,M.,&Waack,S.(2003).GenepredictionwithahiddenMarkovmodelandanew811intronsubmodel.InBioinformatics(Vol.19,pp.ii215-ii225).812
Stull,V.J.,Finer,E.,Bergmans,R.S.,Febvre,H.P.,Longhurst,C.,Manter,D.K.,...Weir,T.L.813(2018).ImpactofEdibleCricketConsumptiononGutMicrobiotainHealthyAdults,814aDouble-blind,RandomizedCrossoverTrial.InScientificReports(Vol.8,pp.10762).815
Suyama,M.,Torrents,D.,&Bork,P.(2006).PAL2NAL:Robustconversionofprotein816sequencealignmentsintothecorrespondingcodonalignments.InNucleicAcids817Research(Vol.34,pp.W609-W612).818
Ter-Hovhannisyan,V.,Lomsadze,A.,Chernoff,Y.O.,&Borodovsky,M.(2008).Gene819predictioninnovelfungalgenomesusinganabinitioalgorithmwithunsupervised820training.InGenomeResearch(Vol.18,pp.1979-1990).821
Terrapon,N.,Li,C.,Robertson,H.M.,Ji,L.,Meng,X.,Booth,W.,...Liebig,J.(2014).822Moleculartracesofalternativesocialorganizationinatermitegenome.InNat823Commun(Vol.5,pp.3636).824
Thomas,G.W.C.,Dohmen,E.,Hughes,D.S.T.,Murali,S.C.,Poelchau,M.,Glastad,K.,...825Richards,S.(2020).Genecontentevolutioninthearthropods.GenomeBiol,21(1),82615.827
Thrall,P.H.,Bever,J.D.,&Burdon,J.J.(2010).Evolutionarychangeinagriculture:thepast,828presentandfuture.EvolAppl,3(5-6),405-408.829
Thurmond,J.,Goodman,J.L.,Strelets,V.B.,Attrill,H.,Gramates,L.S.,Marygold,S.J.,...830FlyBase,C.(2019).FlyBase2.0:thenextgeneration.NucleicAcidsRes,47(D1),D759-831D765.832
UniProt,C.(2019).UniProt:aworldwidehubofproteinknowledge.NucleicAcidsRes,83347(D1),D506-D515.834
VanHuis,A.,VanItterbeeck,J.,Klunder,H.,Mertens,E.,Halloran,A.,Muir,G.,&Vantomme,835P.(2013).Edibleinsects:futureprospectsforfoodandfeedsecurity:Foodand836agricultureorganizationoftheUnitedNations(FAO).837
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 23 of 27
Vassetzky,N.S.,&Kramerov,D.A.(2013).SINEBase:adatabaseandtoolforSINEanalysis.838InNucleicacidsresearch(Vol.41,pp.D83-89).839
Wang,X.,Fang,X.,Yang,P.,Jiang,X.,Jiang,F.,Zhao,D.,...Kang,L.(2014).Thelocust840genomeprovidesinsightintoswarmformationandlong-distanceflight.InNature841communications(Vol.5,pp.2957).842
Wang,Y.,Jorda,M.,Jones,P.L.,Maleszka,R.,Ling,X.,Robertson,H.M.,...Robinson,G.E.843(2006).FunctionalCpGMethylationSysteminaSocialInsect.InScience(Vol.314,844pp.645-647).845
WilsonHorch,H.,Mito,T.,Popadić,A.,Ohuchi,H.,&Noji,S.(2017).Thecricketasamodel846organism:Development,regeneration,andbehavior.InH.W.Horch,T.Mito,A.847Popadić,H.Ohuchi,&S.Noji(Eds.),TheCricketasaModelOrganism:Development,848Regeneration,andBehavior(pp.1-376).Tokyo.849
Xu,M.,&Shaw,K.L.(2019).Thegeneticsofmatingsongevolutionunderlyingrapid850speciation:Linkingquantitativevariationtocandidategenesforbehavioral851isolation.InGenetics(Vol.211,pp.1089-1104).852
Yamasaki,M.,Tenaillon,M.I.,Bi,I.V.,Schroeder,S.G.,Sanchez-Villeda,H.,Doebley,J.F.,...853McMullen,M.D.(2005).Alarge-scalescreenforartificialselectioninmaize854identifiescandidateagronomiclocifordomesticationandcropimprovement.Plant855Cell,17(11),2859-2872.856
Yang,Z.(2007).PAML4:PhylogeneticAnalysisbyMaximumLikelihood.InMolecular857BiologyandEvolution(Vol.24,pp.1586-1591).858
Ylla,G.,Piulachs,M.-D.,&Belles,X.(2018).Comparativetranscriptomicsintwoextreme859neopteransrevealgeneraltrendsintheevolutionofmoderninsects.IniScience(Vol.8604,pp.164-179).861
Zelle,K.M.,Lu,B.,Pyfrom,S.C.,&Ben-Shahar,Y.(2013).Thegeneticarchitectureof862degenerin/epithelialsodiumchannelsinDrosophila.InG3:Genes,Genomes,Genetics863(Vol.3,pp.441-450).864
Zeng,V.,Ewen-Campen,B.,Horch,H.W.,Roth,S.,Mito,T.,&Extavour,C.G.(2013).865DevelopmentalGeneDiscoveryinaHemimetabolousInsect:DeNovoAssemblyand866AnnotationofaTranscriptomefortheCricketGryllusbimaculatus.InP.K.Dearden867(Ed.),PLoSOne(Vol.8,pp.e61479).868
Zhong,L.,Hwang,R.Y.,&Tracey,W.D.(2010).PickpocketIsaDEG/ENaCProteinRequired869forMechanicalNociceptioninDrosophilaLarvae.InCurrentBiology(Vol.20,pp.870429-434).871
872
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 24 of 27
873
874
Figure1:TheG.bimaculatusgenome.A)ThecricketG.bimaculatus(topandsideviews),commonlycalledthetwo-spottedcricket,owesitsnametothetwoyellow875spotsonthebaseoftheforewings.B)CircularrepresentationoftheN50(pink)andN90(purple)scaffolds,repetitivecontentdensity(green),thehigh-(yellow)and876low-(lightblue)CpGo/evaluegenes,pickpocketgeneclusters(darkblue),andgenedensity(orange).C)Theproportionofthegenomemadeupofdifferentfamiliesof877transposableelementsisdifferentbetweenthecricketspeciesG.bimaculatusandL.kohalensis.878
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 25 of 27
879
Figure2:CpGo/edistributionacrossinsectsandfunctionalanalysis.A)The880distributionofCpGo/evalueswithintheCDSregionsdisplaysabimodaldistributioninboth881cricketsandinthehoneybeeA.mellifera.Wemodeledeachpeakwithanormaldistribution882anddefinedtheirintersection(redline)asathresholdtoseparategenesintohigh-and883low-CpGo/evaluecategories.Thesignificantlyenriched(p-value<0.05)GOtermsfromthe884genesbelongingtoeachgroupisrepresentedasawordcloudwiththefontsizeofeach885termproportionaltotheirfoldchange,andcolor-codedtoindicatefunctionalcategories:886red=RNA/transcription,blue=DNA/repair/histone/methylation,orange=887protein/translation/ribosome,green=metabolism/catabolism/biosynthesis,purple=888sensory/perception/neuro,andgrayforothers.B)UpSetplotshowingthetotalnumberof889orthogroups(OGs)belongingtoeachcategory(horizontalbarchart)andthenumberof890themthatarecommonacrossdifferentcategories(linkeddots)orexclusivetoasingle891category(unlinkeddot).Forallthreeinsects,thespecies-specificOGslargelycorrespondto892highCpGo/evaluegenes,whilelowCpGo/evaluegenesaremorelikelytohaveorthologsin893theotherspecies.Thisplotshowsonlythenon-overlappingcategoriesandthethreemost894commoncombinations;theremainingpossiblecombinationsareshownin895SupplementaryFigure4.C)One-to-oneorthologousgeneswithlowCpGo/evaluesinboth896cricketshavesignificantlylowerdN/dSvaluesthangeneswithhighCpGo/evalues.897
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 26 of 27
898
Figure3:ThepickpocketgenefamilyclassVisexpandedincrickets.pickpocketgene899treewithallthegenesbelongingtothesevenOGsthatcontaintheD.melanogaster900pickpocketgenes.AllOGspredominantlycontainmembersofasingleppkfamily,except901OG0000167,whichcontainsmembersoftwopickpocketclasses,IIandVI.Thepickpocket902classV(circularcladogram)wassignificantlyexpandedincricketsrelativetootherinsects.903
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint
Ylla et al. Page 27 of 27
904
905
Figure4:Cricketgenomesinthecontextofinsectevolution.Aphylogenetictree906including16insectspeciescalibratedatfourdifferenttimepoints(redwatchsymbols)907basedonMisofetal.(2014),suggeststhatG.bimaculatusandL.kohalensisdivergedca.89.2908Mya.Thenumberofexpanded(bluetext)andcontracted(redtext)genefamiliesisshown909foreachinsect,andforthebranchesleadingtocrickets.ThedensityplotsshowtheCpGo/e910distributionforallgenesforeachspecies.ThegenomesizeinGb,wasobtainedfromthe911genomefastafiles(SupplementaryTable1).912
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint