cricket genomes: the genomes of future foodjul 07, 2020  · 302 comparing cricket genomes to other...

27
Ylla et al. Page 1 of 27 Cricket genomes: the genomes of future food 1 Guillem Ylla 1 *, Taro Nakamura 1,2 , Takehiko Itoh 3 , Rei Kajitani 3 , Atsushi Toyoda 4,5 , Sayuri 2 Tomonari 6 , Tetsuya Bando 7 , Yoshiyasu Ishimaru 6 , Takahito Watanabe 6 , Masao Fuketa 8 , Yuji 3 Matsuoka 6,9 , Sumihare Noji 6 , Taro Mito 6 *, Cassandra G. Extavour 1,10 * 4 5 1. Department of Organismic and Evolutionary Biology, Harvard University, 6 Cambridge, USA 7 2. Current address: National Institute for Basic Biology, Okazaki, Japan 8 3. School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan 9 4. Comparative Genomics Laboratory, National Institute of Genetics, Shizuoka, Japan 10 5. Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan 11 6. Department of Bioscience and Bioindustry, Tokushima University, Tokushima, Japan 12 7. Graduate School of Medicine, Pharmacology and Dentistry, Okayama University, 13 Okayama, Japan 14 8. Graduate School of Advanced Technology and Science, Tokushima University, 15 Tokushima, Japan 16 9. Current address: Department of Biological Sciences, National University 17 of Singapore, Singapore 18 10. Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA 19 20 * correspondence to [email protected], [email protected] and 21 [email protected] 22 23 . CC-BY-NC-ND 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841 doi: bioRxiv preprint

Upload: others

Post on 12-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 1 of 27

Cricket genomes: the genomes of future food 1

GuillemYlla1*,TaroNakamura1,2,TakehikoItoh3,ReiKajitani3,AtsushiToyoda4,5,Sayuri2Tomonari6,TetsuyaBando7,YoshiyasuIshimaru6,TakahitoWatanabe6,MasaoFuketa8,Yuji3

Matsuoka6,9,SumihareNoji6,TaroMito6*,CassandraG.Extavour1,10*4

5

1. Department of Organismic and Evolutionary Biology, Harvard University,6Cambridge,USA7

2. Currentaddress:NationalInstituteforBasicBiology,Okazaki,Japan83. SchoolofLifeScienceandTechnology,TokyoInstituteofTechnology,Tokyo,Japan94. ComparativeGenomicsLaboratory,NationalInstituteofGenetics,Shizuoka,Japan105. AdvancedGenomicsCenter,NationalInstituteofGenetics,Shizuoka,Japan116. DepartmentofBioscienceandBioindustry,TokushimaUniversity,Tokushima,Japan127. Graduate School of Medicine, Pharmacology and Dentistry, Okayama University,13

Okayama,Japan148. Graduate School of Advanced Technology and Science, Tokushima University,15

Tokushima,Japan169. Current address: Department of Biological Sciences, National University17

ofSingapore,Singapore1810. DepartmentofMolecularandCellularBiology,HarvardUniversity,Cambridge,USA19

20

*[email protected],[email protected]@oeb.harvard.edu22

23

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 2: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 2 of 27

Abstract 2425

Cricketsarecurrentlyinfocusasapossiblesourceofanimalproteinforhumanconsumption26

asanalternativetoproteinfromvertebratelivestock.Thispracticecouldeasesomeofthe27

challengesbothofaworldwidegrowingpopulationandofenvironmentalissues.Thetwo-28

spottedMediterraneanfieldcricketGryllusbimaculatushastraditionallybeenconsumedby29

humansindifferentpartsoftheworld.Notonlyisthisconsideredgenerallysafeforhuman30

consumption, several studies also suggest that introducing crickets into one’s diet may31

confer multiple health benefits. Moreover, G. bimaculatus has been widely used as a32

laboratory research model for decades in multiple scientific fields including evolution,33

developmental biology, neurobiology, and regeneration. Here we report the sequencing,34

assemblyandannotationoftheG.bimaculatusgenome,andtheannotationofthegenomeof35

theHawaiiancricketLaupalakohalensis.Thecomparisonofthesetwocricketgenomeswith36

thoseof14additionalinsectssupportsthehypothesisthatarelativelysmallancestralinsect37

genome expanded to large sizes in many hemimetabolous lineages due to transposable38

elementactivity.BasedontheratioofobservedversusexpectedCpGsites(CpGo/e),wefind39

higherconservationandstrongerpurifyingselectionoftypicallymethylatedgenesthanof40

non-methylatedgenes.Finally,ourgenefamilyexpansionanalysisrevealsanexpansionof41

thepickpocketclassVgenefamilyinthelineageleadingtocrickets,whichwespeculatemight42

playarelevantroleincricketcourtshipbehavior,includingtheircharacteristicchirping. 43

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 3: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 3 of 27

Introduction 44

Multipleorthopteranspecies,andcricketsinparticular,arecurrentlyinfocusasasourceof45

animalproteinforhumanconsumptionandforvertebratelivestock.Insectconsumption,or46

entomophagy,iscurrentlypracticedinsomepopulations,includingsomecountrieswithin47

Africa,Asia,andSouthAmerica(Kouřimská&Adámková,2016),butisrelativelyrareinmost48

EuropeanandNorthAmericancountries.Theuseof insects forhumanconsumptionand49

animalfeedingcouldhelpbothtodecreasetheemissionofgreenhousegases,andtoreduce50

thelandextensionconsiderednecessarytofeedthegrowingworldwidepopulation.Crickets51

are especially attractive insects as a food source, since they are already found in the52

entomophagousdietsofmanycountries(VanHuisetal.,2013),andpossesshighnutritional53

value.Cricketshaveahighproportionofproteinfortheirbodyweight(>55%),andcontain54

theessentiallinoleicacidastheirmostpredominantfattyacid(Ghosh,Lee,Jung,&Meyer-55

Rochow,2017;Kouřimská&Adámková,2016;VanHuisetal.,2013).56

The two-spotted Mediterranean field cricket Gryllus bimaculatus has traditionally been57

consumed in different parts of theworld. In northeast Thailand,which recorded 20,00058

insect farmers in 2011 (Hanboonsong, Jamjanya, & Durst, 2013), it is one of the most59

marketedandconsumedinsectspecies.Studieshavereportednoevidencefortoxicological60

effectsrelatedtooralconsumptionofG.bimaculatusbyhumans(Ahn,Han,Kim,Hwang,&61

Yun,2011;Ryuetal.,2016),neitherweregenotoxiceffectsdetectedusingthreedifferent62

mutagenicity tests (Mietal.,2005).Ararebutknownhealthriskassociatedwithcricket63

consumption,however, issensitivityandallergytocrickets(Pener,2016;Ribeiro,Cunha,64

Sousa-Pinto, & Fonseca, 2018), especially in people allergic to seafood, shown by cross-65

allergies between G. bimaculatus and Macrobrachium prawns (Srinroch, Srisomsap,66

Chokchaichamnankit,Punyarit,&Phiriyangkul,2015).67

Not only is the cricketG. bimaculatus considered generally safe forhuman consumption,68

several studiesalso suggest that introducing crickets intoone’sdietmay confermultiple69

healthbenefits.WatersolublecompoundsderivedfromethanolextractsofwholeadultG.70

bimaculatus applied to cultured mouse spleen cells were reported to stimulate the71

expressionofmultiplecytokinesassociatedwithimmunecellproliferationandactivation72

(Dong-Hwanetal.,2004).RatstreatedwithethanolextractsofwholeadultG.bimaculatus73

showedsignsof reducedaging, including characteristicaging-associatedgeneexpression74

profiles,andreducedlevelsofmarkersofDNAoxidativedamage,(Ahn,Hwang,Yun,Kim,&75

Park,2015).GlycosaminoglycansderivedfromG.bimaculatuswerereportedtoelicitsome76

anti-inflammatoryeffectsinaratmodelofchronicarthritis(Ahn,Han,Hwang,Yun,&Lee,77

2014). Rats fed a diet including an ethanol extract of G. bimaculatus accumulated less78

abdominalfatandhadlowerserumglucoselevelsthancontrolanimals(Ahn,Kim,Kwon,79

Hwang, & Park, 2015). More recent studies suggest that G. bimaculatus powder has80

antidiabeticeffectsinratmodelsofTypeIdiabetes(Park,Lee,Lee,Hoang,&Chae,2019)81

and protects against acute alcoholic liver damage inmice (Hwang et al., 2019). Beyond82

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 4: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 4 of 27

rodentmodels,astudyofhealthyadulthumansubjectsshowedthattheintakeof25g/day83

of the powdered cricket speciesGrylloides sigillatus supported growth of someprobiotic84

microbiota,andcorrelatedwithreducedexpressionofthepro-imflammatorycytokineTNF-85

a(Stulletal.,2018).86

Althoughcricketsarebecomingeconomicallyimportantplayersinthefoodindustry,there87

arecurrentlynopubliclyavailableannotatedcricketgenomesfromanyof thesetypically88

consumed species. Here,we present the 1.66-Gb genome assembly and annotation ofG.89

bimaculatus, commonly known as the two-spotted cricket, a namederived from the two90

yellowspotsfoundonthebaseoftheforewingsofthisspecies(Figure1A).91

G.bimaculatushasbeenwidelyusedasalaboratoryresearchmodelfordecades,inscientific92

fieldsincludingneurobiologyandneuroethology(Fisheretal.,2018;Huber,Moore,&Loher,93

1989), evo-devo (Kainz,Ewen-Campen,Akam,&Extavour,2011),developmentalbiology94

(Donoughe&Extavour,2015),andregeneration(Mito&Noji,2008).Technicaladvantages95

of this cricket species as a researchmodel include the fact thatG. bimaculatus does not96

require cold temperatures or diapause to complete its life cycle, it is easy to rear in97

laboratoriessinceitcanbefedwithgenericinsectorotherpetfoods,itisamenabletoRNA98

interference (RNAi) and targeted genome editing (Kulkarni & Extavour, 2019), stable99

germlinetransgeniclinescanbeestablished(Shinmyoetal.,2004),andithasanextensive100

list of available experimental protocols ranging from behavioral to functional genetic101

analyses(WilsonHorch,Mito,Popadić,Ohuchi,&Noji,2017).102

Wealsoreportthefirstgenomeannotationforasecondcricketspecies,theHawaiiancricket103

Laupala kohalensis, whose genome assembly was recently made public (Blankers, Oh,104

Bombarely,&Shaw,2018).Comparingthesetwocricketgenomeswiththoseof14other105

insect species allowedus to identify three interesting features of these cricket genomes,106

someofwhichmayrelatetotheiruniquebiology.First,thedifferentialtransposableelement107

(TE)compositionbetweenthetwocricketspeciessuggestsabundantTEactivitysincethey108

divergedfromalastcommonancestor,whichourresultssuggestoccurredcirca89.2million109

yearsago (Mya).Second,basedongeneCpGdepletion,an indirectbut robustmethod to110

identifytypicallymethylatedgenes(Bewick,Vogel,Moore,&Schmitz,2016;Bird,1980),we111

find higher conservation of typically methylated genes than of non-methylated genes.112

Finally,ourgenefamilyexpansionanalysisrevealsanexpansionofthepickpocketclassV113

genefamilyinthelineageleadingtocrickets,whichwespeculatemightplayarelevantrole114

incricketcourtshipbehavior,includingtheircharacteristicchirping.115

Results 116

Gryllus bimaculatus genome assembly 117

We sequenced, assembled, andannotated the1.66-Gbhaploidgenomeof thewhite eyed118

mutant strain (Mito&Noji, 2008) of the cricketG. bimaculatus (Figure1A). 50%of the119

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 5: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 5 of 27

genome iscontainedwithin the71 longestscaffolds (L50), theshortestof themhavinga120

lengthof6.3Mb(N50),and90%ofthegenomeiscontainedwithin307scaffolds(L90).In121

comparison to other hemimetabolous genomes, and in particular, to polyneopteran122

genomes, our assembly displays high-quality scores by a number of metrics123

(Supplementary Table 1). Notably, the BUSCO scores (Simão, Waterhouse, Ioannidis,124

Kriventseva,&Zdobnov,2015)ofthisgenomeassemblyatthearthropodandinsectlevels125

are98.50%and97.00%respectively,indicatinghighcompletenessofthisgenomeassembly126

(Table 1). The low percentage of duplicated BUSCO genes (1.31%-1.81%) suggests that127

putative artifactual genomic duplication due tomis-assembly of heterozygotic regions is128

unlikely.129

130

Table1:Gryllusbimaculatusgenomeassemblystatistics.131

NumberofScaffolds 47,877

GenomeLength(nt) 1,658,007,496

GenomeLength(Gb) 1.66

Avg.scaffoldsize(Kb) 34.63

N50(Mb) 6.29

N90(Mb) 1.04

L50 71

L90 307

BUSCOScore–Arthropoda 98.50%

BUSCOScore–Insecta 97.00%

132

Annotation of two cricket genomes 133

The publicly available 1.6-Gb genome assembly of the Hawaiian cricket L. kohalensis134

(Blankers,Oh,Bombarely,etal.,2018),althoughhavinglowerassemblystatisticsthanthat135

ofG.bimaculatus(N50=0.58Mb,L90=3,483),scoreshighintermsofcompleteness,with136

BUSCO scores of 99.3% at the arthropod level and 97.80% at the insect level137

(SupplementaryTable1).138

UsingthreeiterationsoftheMAKER2pipeline(Holt&Yandell,2011),inwhichwecombined139

ab-initioandevidence-basedgenemodels,weannotatedtheprotein-codinggenesinboth140

cricketgenomes(SupplementaryFigures1&2).Weidentified17,871codinggenesand141

28,529 predicted transcripts for G. bimaculatus, and 12,767 coding genes and 13,078142

transcriptsforL.kohalensis(Table2).143

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 6: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 6 of 27

Toobtainfunctionalinsightsintotheannotatedgenes,weranInterProScan(Jonesetal.,144

2014)forallpredictedproteinsequencesandretrievedtheirInterProID,PFAMdomains,145

andGene-Ontology(GO)terms(Table2).Inaddition,weretrievedthebestsignificant146

BLASTPhit(E-value<1e-6)for70-90%oftheproteins.Takentogether,thesemethods147

predictedfunctionsfor75%and94%oftheproteinsannotatedforG.bimaculatusandL.148

kohalensisrespectively.Wecreatedanovelgraphicinterfacethroughwhichinterested149

readerscanaccess,search,BLASTanddownloadthegenomedataandannotations150

(http://34.71.36.157:3838/).151

152

Table2:GenomeannotationsummaryforthecricketsG.bimaculatusandL.kohalensis153

G.bimaculatus L.kohalensis

AnnotatedProtein-CodingGenes 17,871 12,767

AnnotatedTranscripts 28,529 13,078

%WithInterProID 59.56% 72.52%

%WithGO-terms 38.66% 47.03%

%WithPFAMmotif 62.44% 76.59%

%WithsignificantBLASTPhit 73.64% 93.23%

BUSCO-transcriptomeScore–Insecta 92.30% 87.20%

Repetitivecontent 33.69% 35.51%

TEcontent 28.94% 34.50%

GClevel 39.93% 35.58%

Abundant Repetitive DNA 154

WeusedRepeatMasker(Smit,Hubley,&Grenn,2015)todeterminethedegreeofrepetitive155

contentinthecricketgenomes,usingspecificcustomrepeatlibrariesforeachspecies.This156

approachidentified33.69%oftheG.bimaculatusgenome,and35.51%oftheL.kohalensis157

genome, as repetitive content (Supplementary File 1). InG. bimaculatus the repetitive158

contentdensitywassimilarthroughoutthegenome,withtheexceptionoftwoscaffoldsthat159

contained 1.75x-1.82x the density of repetitive content than themean of the other N90160

scaffolds(Figure1B).Transposableelements(TEs)accountedfor28.94%ofthisrepetitive161

content in theG. bimaculatus genome, and for 34.50%of the repetitive content in theL.162

kohalensisgenome.AlthoughtheoverallproportionofrepetitivecontentmadeupofTEswas163

similar between the two cricket species, the proportion of each specific TE class varied164

greatly (Figure 1C). In L. kohalensis the most abundant TE type was long interspersed165

elements(LINEs),accountingfor20.21%ofthegenome,whileinG.bimaculatusLINEsmade166

uponly8.88%ofthegenome.ThespecificLINEsubtypesLINE1andLINE3appearedata167

similarfrequencyinbothcricketgenomes(<0.5%),whiletheLINE2subtypewasoverfive168

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 7: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 7 of 27

timesmorerepresentedinL.kohalensis,covering10%ofthegenome(167Mb).Ontheother169

hand, DNA transposons accounted for 8.61%of theG. bimaculatus genome, but only for170

3.91%oftheL.kohalensisgenome.171

DNA methylation 172

CpGdepletion,calculatedastheratiobetweenobservedversustheexpectedincidenceofa173

cytosine followed by a guanine (CpGo/e), is considered a reliable indicator of DNA174

methylation. This is because spontaneous C to T mutations occur more frequently on175

methylatedCpGsthanunmethylatedCpGs(Bird,1980).Thus,genomicregionsthatundergo176

methylationareeventuallyCpG-depleted.WecalculatedtheCpGo/evalueforeachpredicted177

protein-codinggeneforthetwocricketspecies.Inbothspecies,weobservedaclearbimodal178

distributionofCpGo/evalues(Figure2A).Oneinterpretationofthisdistributionisthatthe179

peakcorrespondingtolowerCpGo/evaluescontainsgenesthataretypicallymethylated,and180

thepeakofhigherCpGo/econtainsgenesthatdonotundergoDNAmethylation.Underthis181

interpretation,somegeneshavenon-randomdifferentialDNAmethylation incrickets.To182

quantifythegenesinthetwoputativemethylationcategories,wesetaCpGo/ethresholdas183

thevalueof thepointof intersectionbetween the twonormaldistributions (Figure2A).184

Afterapplyingthiscutoff,44%ofG.bimaculatusgenesand45%ofL.kohalensisgeneswere185

identifiedasCpG-depleted.186

AGOenrichmentanalysisofthegenesaboveandbelowtheCpGo/ethresholddefinedabove187

revealedcleardifferencesinthepredictedfunctionsofgenesbelongingtoeachofthetwo188

categories.Strikingly,however,genesineachthresholdcategoryhadfunctionalsimilarities189

acrossthetwocricketspecies(Figure2A).GeneswithlowCpGo/evalues,whicharelikely190

thoseundergoingmethylation,wereenrichedforfunctionsrelatedtoDNAreplicationand191

regulation of gene expression (including transcriptional, translational, and epigenetic192

regulation),whilegeneswithhighCpGo/evalues,suggestinglittleornomethylation,tended193

tohavefunctionsrelatedtometabolism,catabolism,andsensorysystems.194

Toassesswhetherthepredicteddistinctfunctionsofhigh-andlow-CpGo/evaluegeneswere195

specifictocrickets,orwereapotentiallymoregeneraltrendofinsectswithDNAmethylation196

systems,weanalyzedthepredictedfunctionsofgeneswithdifferentCpGo/evaluesinthe197

honeybeeApismellifera.ThisbeewasthefirstinsectforwhichevidenceforDNAmethylation198

wasrobustlydescribedandstudied(Elango,Hunt,Goodisman,&Yi,2009;Y.Wangetal.,199

2006).WefoundthatinA.mellifera,CpG-depletedgeneswereenrichedforsimilarfunctions200

asthoseobservedincricketCpG-depletedgenes(26GO-termsweresignificantlyenriched201

inbothhoneybee and crickets;SupplementaryFigure3). In the sameway, highCpGo/e202

genes in both crickets and honeybee were enriched for similar functions (12 GO-terms203

commonlyenriched;SupplementaryFigure3).204

Additionally,weobservedthatgenesbelongingtothelowCpGo/epeakweremorelikelyto205

haveanorthologousgeneinanotherinsectspecies,andthatorthologwasalsomorelikely206

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 8: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 8 of 27

tobelongtothelowCpGo/epeak(Figure2BandSupplementaryFigure4).Bycontrast,207

geneswithhighCpGo/e,weremorelikelytobespecies-specific,butiftheyhadanorthologin208

anotherspecies,thisorthologwasalsolikelytohavehighCpGo/e.Thissuggeststhatgenes209

thataretypicallymethylatedtendtobemoreconservedacrossspecies,whichcouldimply210

lowevolutionaryratesandstrongselectivepressure.Totestthishypothesizedrelationship211

betweenlowCpGo/eandlowevolutionaryrates,wecomparedthedN/dSvaluesof1-to-1212

orthologousgenesbelongingtothesameCpGo/epeakbetweenthetwocricketspecies.We213

foundthatCpG-depletedgenesinbothcricketshadsignificantlylowerdN/dSvaluesthan214

non-CpG-depleted genes (p-value<0.05; Figure 2C), consistent with stronger purifying215

selectiononCpG-depletedgenes.216

Phylogenetics and gene family expansions 217

To study the genome evolution of these cricket lineages, we compared the two cricket218

genomeswiththoseof14additionalinsects,includingmembersofallmajorinsectlineages219

withspecialemphasisonhemimetabolousspecies.Foreachofthese16insectgenomes,we220

retrievedthelongestproteinpergeneandgroupedthemintoorthogroups(OGs),whichwe221

called“genefamilies”forthepurposeofthisanalysis.TheOGscontainingasingleprotein222

perinsect,namelysinglecopyorthologs,wereusedtoinferaphylogenetictreeforthese16223

species.Theobtainedspeciestreetopologywasinaccordancewiththecurrentlyunderstood224

insectphylogeny(Misofetal.,2014).Then,weusedtheMisofetal.(2014)datedphylogeny225

to calibrate our tree on four different nodes,which allowed us to estimate that the two226

cricketspeciesdivergedcirca89.2millionyearsago.227

Ourgenefamilyexpansion/contractionanalysisusing59,516OGsidentified18genefamilies228

that were significantly expanded (p-value<0.01) in the lineage leading to crickets. In229

addition,weidentifiedafurther34and33genefamilyexpansionsspecifictoG.bimaculatus230

and L. kohalensis respectively. Functional analysis of these expanded gene families231

(SupplementaryFile2)revealedthatthecricket-specificgenefamilyexpansionsincluded232

pickpocket genes,whichare involved inmechanosensation inDrosophilamelanogaster as233

describedinthefollowingsection.234

235

Expansion of pickpocket genes 236

In D. melanogaster, the complete pickpocket gene repertoire is composed of 6 classes237

containing31genes.Wefoundcricketorthologsofall31pickpocketgenesacrosssevenof238

ourOGs,andeachOGpredominantlycontainedmembersofasinglepickpocketclass.We239

used all the genes belonging to these 7 OGs to build a pickpocket gene tree, using the240

predictedpickpocketorthologsfrom16insectspecies(Figure3;SupplementaryTable2).241

Thisgenetreeallowedustoclassifythedifferentpickpocketgenesineachofthe16species.242

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 9: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 9 of 27

Thepickpocketgenefamilyappearedtobeasignificantlyexpandedgenefamilyincrickets.243

FollowingtheclassificationofpickpocketgenesusedinDrosophilaspp.(Zelle,Lu,Pyfrom,&244

Ben-Shahar,2013)wedeterminedthat thespecificgene familyexpanded incricketswas245

pickpocketclassV(Figure3).InD.melanogasterthisclasscontainseightgenes:ppk(ppk1),246

rpk (ppk2),ppk5,ppk8,ppk12,ppk17,ppk26, andppk28 (Zelle et al., 2013).Our analysis247

suggests that the class V gene family contains 15 and 14 genes inG bimaculatus and L.248

kohalensis respectively. In contrast, their closest analyzed relative, the locust Locusta249

migratoria,hasonlyfivesuchgenes.250

251

Thepickpocketgenesincricketstendedtobegroupedingenomicclusters(Figure1B).For252

instance,inG.bimaculatusnineofthe15classVpickpocketgeneswereclusteredwithina253

regionof900Kb,andfourothergenesappearedintwogroupsoftwo.IntheL.kohalensis254

genome, although this genome is more fragmented than that of G. bimaculatus255

(SupplementaryTable1),weobservedfiveclusterscontainingbetweentwoandfivegenes256

each.257

InD.melanogaster,thepickpocketgeneppk1belongstoclassVandisinvolvedinfunctions258

relatedtostimulusperceptionandmechanotransduction(Adamsetal.,1998).Forexample,259

inlarvae,thisgeneisrequiredformechanicalnociception(Zhong,Hwang,&Tracey,2010),260

andforcoordinatingrhythmiclocomotion(Ainsleyetal.,2003).ppkisexpressedinsensory261

neuronsthatalsoexpressthemalesexualbehaviordeterminerfruitless(fru)(Häsemeyer,262

Yapici,Heberlein,&Dickson,2009;Pavlou&Goodwin,2013;Rezávaletal.,2012).263

Todeterminewhetherpickpocketgenesincricketsarealsoexpressedinthenervoussystem,264

wecheckedforevidenceofexpressionofpickpocketgenesinthepubliclyavailabletheRNA-265

seqlibrariesfortheG.bimaculatusprothoracicganglion(Fisheretal.,2018).Thisanalysis266

detectedexpression(>5FPKMs)ofsixpickpocketgenes,fourofthembelongingtoclassV,in267

theG. bimaculatus nervous system. In the same RNA-seq libraries, we also detected the268

expressionoffru(SupplementaryTable3).269

270

Discussion 271

The importance of cricket genomes 272

Mostofthecropsandlivestockthathumanseathavebeendomesticatedandsubjectedto273

strong artificial selection for hundreds or even thousands of years to improve their274

characteristicsmostdesirableforhumans,includingsize,growthrate,stressresistance,and275

organoleptic properties (Y. H. Chen, Gols, & Benrey, 2015; Gepts, 2004; Thrall, Bever, &276

Burdon,2010;Yamasakietal.,2005).Incontrast,toourknowledge,cricketshaveneverbeen277

selectedbasedonanyfood-relatedcharacteristic.278

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 10: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 10 of 27

The advent of genetic engineering techniques has accelerated domestication of some279

organisms(K.Chen&Gao,2014).Thesetechniqueshavebeenused,forinstance,toimprove280

thenutritionalvalueofdifferentcrops,ortomakethemtoleranttopestsandclimatestress281

(Qaim,2009;Thralletal.,2010).Cricketsarenaturallynutritionallyrich(Ghoshetal.,2017),282

butinprinciple,theirnutritionalvaluecouldbefurtherimproved,forexamplebyincreasing283

vitamincontentorOmega-3 fattyacidsproportion. Inaddition,other issues thatpresent284

challenges to cricket farming could potentially be addressed by targeted genome285

modification,whichcanbeachievedinG.bimaculatususingZincfingernucleases,TALENs,286

or CRISPR/Cas9 REF. These challenges include sensitivity to common insect viruses,287

aggressivebehavior resulting in cannibalism, complexmating rituals, and relatively slow288

growthrate.289

Anessentialtoolforanykindofgeneticengineeringisahighqualityannotatedreference290

genome,togetherwithadeepunderstandingofthebiologyofthegivenspecies.BecauseG.291

bimaculatushasbeenusedasaresearchmodel inmultipledifferentscientificdisciplines,292

includingrearingforconsumption,issuesrelevanttoitsbiochemicalcomposition(Ghoshet293

al., 2017), human health and safety (Ahn et al., 2011; Ryu et al., 2016), putative health294

benefits(Ahnetal.,2014;Ahn,Hwang,etal.,2015;Ahn,Kim,etal.,2015;Dong-Hwanetal.,295

2004;Hwangetal.,2019;Parketal.,2019),andprocessingtechniques(Dobermann,Field,296

&Michaelson,2019)havebeenextensivelydescribed.Thus,thegenomeofG.bimaculatus297

herein described, adds to this body of biological knowledge by providing invaluable298

information thatwill be required tomaximize the potential of this cricket to become an299

increasinglysignificantpartoftheworldwidedietinthefuture.300

301

Comparing cricket genomes to other insect genomes 302

Theannotationofthesetwocricketgenomeswasdonebycombiningdenovogenemodels,303

homology-basedmethods,andtheavailableRNA-seqandESTs.Thispipelineallowedusto304

predict17,871genesintheG.bimaculatusgenome,similartothenumberofgenesreported305

forotherhemimetabolous insect genomes including the locustL.migratoria (17,307) (X.306

Wangetal.,2014)andthetermitesCryptotermessecundus(18,162)(Harrisonetal.,2018),307

Macrotermes natalensis (16,140) (Poulsen et al., 2014) and Zootermopsis nevadensis,308

(15,459) (Terrapon et al., 2014). The slightly lower number of protein-coding genes309

annotated in L. kohalensis (12,767) may be due to the lesser amount of RNA-seq data310

availableforthisspecies,whichchallengesgeneannotation.Nevertheless,theBUSCOscores311

aresimilarbetweenthetwocrickets,andtheproportionofannotatedproteinswithputative312

orthologousgenesinotherspecies(proteinswithsignificantBLASThits;seemethods)forL.313

kohalensisishigherthanforG.bimaculatus.Thissuggeststhepossibilitythatwemayhave314

successfully annotatedmost conserved genes, but that highly derived or species-specific315

genesmightbemissingfromourannotations.316

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 11: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 11 of 27

317

TEs and genome size evolution 318

Approximately35%ofthegenomeofbothcricketscorrespondstorepetitivecontent.This319

issubstantiallylessthanthe60%reportedforthegenomeofL.migratoria(X.Wangetal.,320

2014).Thislocustgenomeisoneofthelargestsequencedinsectgenomestodate(6.5Gb)321

buthasaverysimilarnumberofannotatedgenes(17,307)tothosewereportforcrickets.322

Wehypothesizethatthelargegenomesizedifferencebetweentheseorthopteranspeciesis323

duetotheTEcontent,whichhasalsobeencorrelatedwithgenomesizeinmultipleeukaryote324

species(Chénais,Caruso,Hiard,&Casse,2012;Kidwell,2002).325

Furthermore,wehypothesizethatthedifferencesintheTEcompositionbetweenthetwo326

crickets are the result of abundant and independent TE activity since their divergence327

around89.2Mya.This,togetherwiththeabsenceofevidenceforlargegenomeduplication328

eventsinthislineage, leadsustohypothesizethattheancestralorthopterangenomewas329

shorterthanthoseofthecricketsstudiedhere(1.6GbforG.bimaculatusand1.59GbforL.330

kohalensis) which are in the lowest range of orthopteran genome sizes (Hanrahan &331

Johnston, 2011). In summary, we propose that the wide range of genome sizes within332

Orthoptera,reachingashighas8.55GbinthelocustSchistocercagregaria(Camachoetal.,333

2015),islikelyduetoTEactivitysincethetimeofthelastorthopteranancestor.334

There isacleartendencyofpolyneopterangenomestobemuch longerthanthoseof the335

holometabolousgenomes(Figure4).Twocurrentlycompetinghypothesesarethat(1)the336

ancestralinsectgenomewassmall,andwasexpandedoutsideofHolometabola,and(2)the337

ancestral insectgenomewas large,anditwascompressedintheHolometabola(Gregory,338

2002).Ourobservationsareconsistentwiththefirstofthesehypotheses.339

Largergenomesizecorrelateswithslowerdevelopmentalratesinsomeplantsandanimals,340

whichishypothesizedtobeduetoaslowercelldivisionrate(Gregory,2002).Thus,onemay341

speculatethatthelargeproportionofTE-derivedDNAincricketgenomesmightnegatively342

impactthedevelopmentalrate.IfthisrepetitiveDNAproveslargelynon-functional,thenin343

principle,thedevelopmentalrate,animportantfactorforinsectfarming,mighteventually344

bemodifiablewiththeuseofgeneticengineeringtechniques.345

346

DNA Methylation 347

Most holometabolan species, including well-studied insects like D. melanogaster and348

Triboliumcastaneum,donotperformDNAmethylation,ortheydoitatverylowlevels(Lyko,349

Ramsahoye,& Jaenisch, 2000).ThehoneybeeA.melliferawasoneof the first insects for350

whichfunctionalDNAmethylationwasdescribed(Y.Wangetal.,2006).AlthoughthisDNA351

modification was initially proposed to be associated with the eusociality of these bees352

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 12: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 12 of 27

(Elangoetal.,2009),subsequentstudiesshowedthatDNAmethylationiswidespreadand353

presentindifferentinsectlineagesindependentlyofsocialbehavior(Bewicketal.,2016).354

DNAmethylationalsooccursinothernon-insectarthropods(Thomasetal.,2020).355

WhilethepreciseroleofDNAmethylationingeneexpressionregulationremainsunclear,356

ouranalysissuggeststhatcricketCpG-depletedgenes(putativelyhypermethylatedgenes)357

showsignsofpurifyingselection,tendtohaveorthologsinotherinsects,andareinvolvedin358

basicbiologicalfunctionsrelatedtoDNAreplicationandtheregulationofgeneexpression.359

These predicted functions differ from those of the non-CpG depleted genes (putatively360

hypomethylatedgenes),whichappeartobeinvolvedinsignalingpathways,metabolism,and361

catabolism.Thesepredictedfunctionalcategoriesmaybeconservedfromcricketsovercirca362

345millionyearsofevolution,aswealsodetectthesamepatterninthehoneybee.363

Taken together, these observations suggest a potential relationship between DNA364

methylation, sequence conservation, and function for many cricket genes. Nevertheless,365

basedonourdata,wecannotdeterminewhetherthemethylatedgenesarehighlyconserved366

becausetheyaremethylated,orbecausetheyperformbasicfunctionsthatmayberegulated367

byDNAmethylationevents. InthecockroachBlattellagermanica,DNAmethyltransferase368

enzymesandgeneswithlowCpGo/evaluesshowanexpressionpeakduringthematernalto369

zygotictransition(Ylla,Piulachs,&Belles,2018).Theseresultsincockroaches,togetherwith370

our observations, leads us to speculate that at least in Polyneopteran species, DNA371

methylation might contribute to the maternal zygotic transition by regulating essential372

genesinvolvedinDNAreplication,transcription,andtranslation.373

pickpocket gene expansion 374

ThepickpocketgenesbelongtotheDegenerin/epithelialNa+channel(DEG/ENaC)family,375

whichwere first identified inCaenorhabditiselegansas involved inmechanotransduction376

(Adamsetal.,1998).Thesamefamilyofionchannelswaslaterfoundinmanymulticellular377

animals,withadiverserangeoffunctionsrelatedtomechanoreceptionandfluid–electrolyte378

homeostasis(Liu,Johnson,&Welsh,2003).Mostoftheinformationontheirrolesininsects379

comes from studies inD.melanogaster. In this fruit fly,pickpocket genes are involved in380

neural functions including NaCl taste (Lee et al., 2017), pheromone detection (Averhoff,381

Richardson, Starostina, Kinser, & Pikielny, 1976), courtship behavior (Lu, LaMora, Sun,382

Welsh,&Ben-Shahar,2012),andliquidclearanceinthelarvaltrachea(Liuetal.,2003).383

In D. melanogaster adults, the abdominal ganglia mediate courtship and postmating384

behaviors through neurons expressing ppk and fru (Häsemeyer et al., 2009; Pavlou &385

Goodwin,2013;Rezávaletal.,2012).InD.melanogasterlarvae,ppkexpressionindendritic386

neuronsisrequiredtocontrolthecoordinationofrhythmiclocomotion(Ainsleyetal.,2003).387

In crickets, theabdominalgangliaare responsible fordetermining song rhythm(Jacob&388

Hedwig,2016).Moreover,wefindthatinG.bimaculatus,bothppkandfrugeneexpression389

aredetectableintheadultprothoracicganglion.Theseobservationssuggestthepossibility390

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 13: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 13 of 27

thatclassVpickpocketgenescouldbe involved insongrhythmdetermination incrickets391

throughtheirexpressioninabdominalganglia.392

This possibility is consistent with the results of multiple quantitative trait locus (QTL)393

studiesdoneincricketspecies fromthegenusLaupala,which identifiedgenomicregions394

associatedwithmatingsongrhythmvariationsandfemaleacousticpreference(Blankers,395

Oh,&Shaw,2018).The179scaffoldsthattheauthorsreportedbeingwithinoneLODofthe396

sevenQTLpeaks,containedfivepickpocketgenes,threeofthemfromclassVandtwofrom397

classIV.OneofthetwoclassIVgenesalsoappearswithinaQTLpeakofasecondexperiment398

(Blankers,Oh,Bombarely,etal.,2018;Shaw&Lesnick,2009).XuandShaw(2019)found399

thatascaffoldinaregionofLODscore1.5ofoneoftheirminorlinkagegroups(LG3)contains400

slowpoke,agenethataffectssonginterpulseinterval inD.melanogaster,andthisscaffold401

alsocontainstwoclassIIIpickpocketgenes(SupplementaryTable4).402

In summary, the roles ofpickpocket genes in controlling rhythmic locomotion, courtship403

behavior,andpheromonedetectioninD.melanogaster,theirappearanceingenomicregions404

associatedwithsongrhythmvariation inLaupala, and theirexpression inG.bimaculatus405

abdominalganglia,leadustospeculatethattheexpandedpickpocketgenefamilyincricket406

genomes could be playing a role in regulating rhythmic wing movements and sound407

perception,bothofwhicharenecessaryformating(WilsonHorchetal.,2017).Wenotethat408

XuandShaw(2019)hypothesizedthatsongproductionincricketsislikelytoberegulated409

byionchannels,andthatlocomotion,neuralmodulation,andmuscledevelopmentareall410

involved in singing (Xu& Shaw, 2019). However, further experiments,which could take411

advantage of the existing RNAi and genome modification protocols for G. bimaculatus412

(Kulkarni&Extavour,2019),willberequiredtotestthishypothesis.413

414

In conclusion, the G. bimaculatus genome assembly and annotation presented here is a415

sourceofinformationandanessentialtoolthatweanticipatewillenhancethestatusofthis416

cricketasamodernfunctionalgeneticsresearchmodel.Thisgenomemayalsoproveuseful417

to the agricultural sector, and could allow improvement of cricket nutritional value,418

productivity,andreductionofallergencontent.Annotatingasecondcricketgenome,thatof419

L. kohalensis, and comparing the two genomes, allowed us to unveil possible420

synapomorphiesofcricketgenomes,andtosuggestpotentiallygeneralevolutionarytrends421

ofinsectgenomes.422

423

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 14: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 14 of 27

Materials and Methods 424

DNA isolation 425

TheG.bimaculatuswhite-eyedmutantstrainwasrearedatTokushimaUniversity, at29±1426

˚Cand30-50%humidityundera10-hlight,14-hdarkphotoperiod.Testesofasinglemale427

adultoftheG.bimaculatuswhite-eyedmutantstrainwereusedforDNAisolationandshort-428

readsequencing.WeusedDNAfromtestesofanadditionalsingleindividualtomakealong429

readPacBiosequencinglibrarytoclosegapsinthegenomeassembly.430

Genome Assembly 431

Paired-end librarieswere generatedwith insert sizes of 375 and 500 bp, andmate-pair432

libraryweregeneratedwithinsertsizesof3,5,10,and20kb.Librariesweresequencedusing433

theIlluminaHiSeq2000andHiSeq2500sequencingplatforms.Thisyieldedatotalof127.4434

Gb of short read paired-end data, that was subsequently assembled using the de novo435

assembler Platanus (v. 1.2.1) (Kajitani et al., 2014). Scaffolding and gap closing were436

performedusingtotal138.2Gbofmate-pairdata.Afurthergapclosingstepwasperformed437

usinglongreadsgeneratedbythePacBioRSsystem.The4.3GbofPacBiosubreaddatawere438

usedtofillgapsintheassemblyusingPBjelly(v.15.8.24)(Englishetal.,2012).439

440

Repetitive Content Masking 441

Wegeneratedacustomrepeatlibraryforeachofthetwocricketgenomesbycombiningthe442

outputs from homology-based and de novo repeat identifiers, including the LTRdigest443

together with LTRharvest (Ellinghaus, Kurtz, & Willhoeft, 2008),444

RepeatModeler/RepeatClassifier (www.repeatmasker.org/RepeatModeler), MITE tracker445

(Crescente, Zavallo, Helguera, & Vanzetti, 2018), TransposonPSI446

(http://transposonpsi.sourceforge.net), and the databases SINEBase (Vassetzky &447

Kramerov,2013)andRepBase(Bao,Kojima,&Kohany,2015).Weremovedredundancies448

from the librarybymergingsequences thatweregreater than80%similarwithusearch449

(RobertC.Edgar,2010),andclassifiedthemwithRepeatClassifier.Sequencesclassifiedas450

“unknown”wereBLASTed(BLASTX)against the9,229reviewedproteinsof insects from451

UniProtKB/Swiss-Prot. Those sequences with a BLAST hit (E-value < 1e-10) against a452

proteinnotannotatedasatransposase,transposableelement,copiaprotein,ortransposon453

wereremovedfromthecustomrepeatlibrary.Thecustomrepeatlibrarywasprovidedto454

RepeatMasker version open-4.0.5 to generate the repetitive content reports, and to the455

MAKER2pipelinetomaskthegenome.456

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 15: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 15 of 27

Protein-Coding Genes Annotation 457

We performed genome annotations through three iterations of the MAKER2 (v2.31.8)458

pipeline(Holt&Yandell,2011)combiningab-initiogenemodelsandevidence-basedmodels.459

For the G. bimaculatus genome annotation, we provided the MAKER2 pipeline with the460

43,595 G. bimaculatus nucleotide sequences from NCBI, an assembled developmental461

transcriptome(Zengetal.,2013),anassembledprothoracicgangliontranscriptome(Fisher462

etal.,2018),andagenome-guided transcriptomegeneratedwithStringTie (Perteaetal.,463

2015)using84RNA-seqlibraries(accessionnumbers:XXXX)mappedtothegenomewith464

HISAT2(Kim,Langmead,&Salzberg,2015).AsalternativeESTsandproteinsequences,we465

providedMAKER2with14,391nucleotidesequencesfromL.kohalensisavailableatNCBI,466

andaninsectproteindatabaseobtainedfromUniProtKB/Swiss-Prot(UniProt,2019).467

FortheannotationoftheL.kohalensisgenome,werantheMAKER2pipelinewiththe14,391468

L.kohalensisnucleotidesequencesfromNCBI,theassembledG.bimaculatusdevelopmental469

andprothoracicgangliontranscriptomesdescribedabove,andthe43,595NCBInucleotide470

sequences.Asproteindatabases,weprovidedthe insectproteins fromUniProtKB/Swiss-471

ProtplustheproteinsthatweannotatedintheG.bimaculatusgenome.472

For both crickets, we generated ab-initio gene models with GeneMark-ES (Ter-473

Hovhannisyan,Lomsadze,Chernoff,&Borodovsky,2008) in self-trainingmode, andwith474

Augustus(Stanke&Waack,2003)trainedwithBUSCOv3(Simãoetal.,2015).Aftereachof475

the first twoMAKER2iterations,additionalgenemodelswereobtainedwithSNAP(Korf,476

2004)trainedwiththeannotatedgenes.477

Functional annotations were obtained using InterProScan (Jones et al., 2014), which478

retrievedtheInterProDomains,PFAMdomains,andGO-terms.Additionally,weranaseries479

ofBLASTroundstoassignadescriptortoeachtranscriptbasedonthebestBLASThit.The480

firstroundofBLASTwasagainstthereviewedinsectproteinsfromUniProtKB/Swiss-Prot.481

Proteinswith no significant BLAST hits (E-value < 1e-6)were later BLASTed against all482

proteinsfromUniProtKB/TrEMBL,andthosewithoutahitwithE-value<1e-6wereBLASTed483

againstallproteinsfromUniProtKB/Swiss-Prot.484

AdetailedpipelineschemeisavailableinSupplementaryFigures1&2,andthe485

annotationscriptsareavailableonGitHub486

(https://github.com/guillemylla/Crickets_Genome_Annotation).487

488

Quality Assessment 489

Genomeassemblystatisticswereobtainedwithassembly-stats(https://github.com/sanger-490

pathogens/assembly-stats).BUSCO(v3.1.0)(Simãoetal.,2015)wasusedtoassessthelevel491

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 16: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 16 of 27

ofcompletenessofthegenomeassemblies(‘-mgeno’)aswellasthatofthegeneannotations492

(‘-mtran’)atbotharthropod(‘arthropoda_odb9’)andinsect(‘insecta_odb9’)levels.493

CpGo/e Analysis 494

Weused the genome assemblies and their gene annotations from this study for the two495

cricketspecies,andretrievedpubliclyavailableannotatedgenomesfromtheother14insect496

species(SupplementaryTable1).Thegeneannotationfiles(ingffformat)wereusedto497

obtain the amino-acid and CDS sequences for each annotated protein-coding gene per498

genomeusinggffread,withoptions“-y”and“-x”respectively.TheCpGo/evaluepergenewas499

computed as the observed frequency of CpGs (fCpG) divided by the product of C and G500

frequencies(fCandfG)fCpG/fC*fGinthelongestCDSpergeneforeachofthe16studiedinsects.501

CpGo/e values larger than zero and smaller than two were retained and represented as502

densityplots(Figures2&4).503

ThedistributionsofgeneCpGo/evaluespergeneof the twocricketsand thehoneybeeA.504

melliferawerefittedwithamixtureofnormaldistributionsusingthemixtoolsRpackage505

(Benaglia,Chauveau,Hunter,&Young,2009).Thisallowedustoobtainthemeanofeach506

distribution,thestandarderrors,andtheinterceptionpointbetweenthetwodistributions,507

whichwasusedtocategorizethegenesintolowCpGo/eandhighCpGo/ebins.Forthesetwo508

bins of genes, we performed a GO-enrichment analysis (based on GO-terms previously509

obtainedusingInterProScan)ofBiologicalProcesstermsusingtheTopGOpackage(Alexa&510

Rahnenfuhrer,2019)withtheweight01algorithmandtheFisherstatistic.GO-termswitha511

p-value<0.05were plotted as word-clouds using the R package ggwordcloud (Pennec &512

Slowikowski,2018)with thesizeof thewordcorrelatedwith theproportionof the term513

withintheset.514

ForeachofthegenesbelongingtolowandhighCpGo/ecategoriesineachofthethreeinsect515

species,weretrievedtheirorthogroupidentifierfromourgenefamilyanalysis,allowingus516

toassignputativemethylationstatustoorthogroupsineachinsect.ThenweusedtheUpSet517

Rpackage(Lex,Gehlenborg,Strobelt,Vuillemot,&Pfister,2014)tocomputeanddisplaythe518

numberoforthogroupsexclusivetoeachcombinationasanUpSetplot.519

dN/dS Analysis 520

We first aligned the longestpredictedproteinproductof the single-copy-orthologsof all521

protein-codinggenesbetweenthetwocrickets(N=5,728)withMUSCLE.Then,theamino-522

acid alignments were transformed into codon-based nucleotide alignments using the523

Pal2Nalsoftware(Suyama,Torrents,&Bork,2006).Theresultingcodon-basednucleotide524

alignmentswere used to calculate the pairwise dN/dS for each gene pairwith the yn00525

algorithmimplementedinthePAMLpackage(Yang,2007).GeneswithdNordS>2were526

discarded from furtheranalysis.TheWilcoxon-Mann-Whitneystatistical testwasused to527

comparethedN/dSvaluesbetweengeneswithhighandlowCpGo/evaluesinbothinsects.528

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 17: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 17 of 27

Gene Family Expansions and Contractions 529

Using custom Python scripts (see530

https://github.com/guillemylla/Crickets_Genome_Annotation) we obtained the longest531

predictedproteinproductpergeneineachofthe16studiedinsectspeciesandgroupedthem532

intoorthogroups(whichwealsorefertohereinas“genefamilies”)usingOrthoFinderv2.3.3533

(Emms&Kelly,2019).Theorthogroups(OGs)determinedbyOrthoFinderthatcontaineda534

singlegeneper insect,namelyputativeone-to-oneorthologs,wereused forphylogenetic535

reconstruction.TheproteinswithineachorthogroupwerealignedwithMUSCLE(RobertC536

Edgar,2004)andthealignmentstrimmedwithGBlocks(Castresana,2000).Thetrimmed537

alignments were concatenated into a single meta-alignment that was used to infer the538

speciestreewithFastTree2(Price,Dehal,&Arkin,2010).539

Tocalibratethespeciestree,weusedthe“chronos”functionfromtheRpackageapev5.3540

(Paradis&Schliep,2019),settingthecommonnodebetweenBlattodeaandOrthopteraat541

248millionyears(my),theoriginofHolometabolaat345my,thecommonnodebetween542

Hemiptera and Thysanoptera at 339 my, and the ancestor of hemimetabolous and543

holometabolous insects(rootof thetree)atbetween385and395my.Thesetimepoints544

wereobtainedfromaphylogenypublishedthatwascalibratedwithseveralfossils(Misofet545

al.,2014).546

Thegenefamilyexpansion/contractionanalysiswasdonewiththeCAFEsoftware(DeBie,547

Cristianini,Demuth,&Hahn,2006).WeranCAFEusingthecalibratedspeciestreeandthe548

tablegeneratedbyOrthoFinderwiththenumberofgenesbelongingtoeachorthogroupin549

eachinsect.FollowingtheCAFEmanual,wefirstcalculatedthebirth-deathparameterswith550

theorthogroupshavinglessthan100genes.Wethencorrectedthembyassemblyquality551

andcalculatedthegeneexpansionsandcontractionsforbothlarge(>100genes)andsmall552

(≤100)genefamilies.Thisallowedustoidentifygenefamiliesthatunderwentasignificant553

(p-value<0.01) gene family expansion or contraction on each branch of the tree. We554

proceededtoobtainfunctionalinformationfromthosefamiliesexpandedonourbranches555

of interest (i.e. theoriginofOrthoptera, thebranch leading to crickets, and thebranches556

specifictoeachcricketspecies.).Tofunctionallyannotatetheorthogroupsof interest,we557

firstobtainedtheD.melanogaster identifiersof theproteinswithineachorthogroup,and558

retrievedtheFlyBaseSymbolandtheFlyBasegenesummarypergeneusingtheFlyBaseAPI559

(Thurmond et al., 2019). Additionally, we ran InterProScan on all the proteins of each560

orthogroupandretrievedallPFAMmotifsandtheGOtermstogetherwiththeirdescriptors.561

Allofthisinformationwassummarizedintabulatedfiles(SupplementaryFile2),whichwe562

usedtoidentifygeneexpansionswithpotentiallyrelevantfunctionsforinsectevolution.563

pickpocket gene family expansion 564

Thefunctionalannotationofsignificantlyexpandedgenefamiliesincricketsallowedusto565

identify an orthogroup containing orthologs ofD.melanogasterpickpocket classV genes.566

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 18: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 18 of 27

Subsequently, we retrieved the 6 additional orthogroups containing the complete set of567

pickpocket genes inD.melanogaster according to FlyBase. The protein sequences of the568

membersofthe7PickpocketorthogroupswerealignedwithMUSCLE,andthepickpocket569

gene tree obtainedwith FastTree. Following the pickpocket categorization described for570

Drosophilaspp.(Zelleetal.,2013)andtheobtainedpickpocketgenetree,weclassifiedthe571

cricketspickpocketgenesintoclassesfromItoVI.572

Tocheckforevidenceofexpressionpickpocketgenesinthecricketnervoussystem,weused573

the22RNA-seqlibrariesfromprothoracicganglion(Fisheretal.,2018)ofG.bimaculatus574

available at NCBI GEO (PRJNA376023). Reads were mapped against the G. bimaculatus575

genomewithRSEM (Li&Dewey, 2011)using STAR (Dobin et al., 2013) as themapping576

algorithm,andthenumberofexpectedcountsandFPKMswasretrievedforeachgenein577

eachlibrary.TheFPKMsofthepickpocketgenesandfruitlessisshowninSupplementary578

Table3.GeneswithasumofmorethanfiveFPKMsacrossallsampleswereconsideredto579

beexpressedinG.bimaculatusprothoracicganglion.580

Acknowledgments 581

This work was supported by Harvard University and MEXT KAKENHI (No. 221S0002;582

26292176;17H03945).Thecomputationalinfrastructureinthecloudusedforthegenome583

analysiswasfundedbyAWSCloudCreditsforResearch.TheauthorsaregratefultoHiroo584

SaiharaforhissupportinmanagementofagenomedataserveratTokushimaUniversity.585

Author contributions statement 586

GY,SN,TMandCEdesignedexperiments;TIandATconductedsequencingbyHiSeqand587

assemblingshortreadsusingthePlatanusassembler;ST,YI,TW,MFandYMperformedDNA588

isolation,gapclosingof contigsandmanualannotation;GY,TN,STandTBconductedall589

otherexperimentsandanalyses;TMandCEfundedtheproject;GYandCEwrotethepaper590

withinputfromallauthors.591

Data availability 592

ThegenomeassemblyandgeneannotationsforGryllusbimaculatusweresubmittedtoDDBJ593

andtoNCBIundertheaccessionnumber(XXXXXX).Thescriptsusedforgenomeannotation594

and analysis are available at GitHub595

(https://github.com/guillemylla/Crickets_Genome_Annotation).).596

597

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 19: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 19 of 27

References 598599

Adams,C.M.,Anderson,M.G.,Motto,D.G.,Price,M.P.,Johnson,W.A.,&Welsh,M.J.(1998).600Rippedpocketandpickpocket,novelDrosophilaDEG/ENaCsubunitsexpressedin601earlydevelopmentandinmechanosensoryneurons.InTheJournalofCellBiology602(Vol.140,pp.143-152).603

Ahn,M.Y.,Han,J.W.,Hwang,J.S.,Yun,E.Y.,&Lee,B.M.(2014).Anti-inflammatoryeffectof604glycosaminoglycanderivedfromGryllusbimaculatus(Atypeofcricket,insect)on605adjuvant-treatedchronicarthritisratmodel.InJournalofToxicologyand606EnvironmentalHealth-PartA:CurrentIssues(Vol.77,pp.1332-1345).607

Ahn,M.Y.,Han,J.W.,Kim,S.J.,Hwang,J.S.,&Yun,E.Y.(2011).Thirteen-weekoraldose608toxicitystudyofG.bimaculatusinsprague-dawleyrats.InToxicologicalResearch609(Vol.27,pp.231-240).610

Ahn,M.Y.,Hwang,J.S.,Yun,E.Y.,Kim,M.J.,&Park,K.K.(2015).Anti-agingeffectandgene611expressionprofilingofagedratstreatedwithG.bimaculatusextract.InToxicological612Research(Vol.31,pp.173-180).613

Ahn,M.Y.,Kim,M.J.,Kwon,R.H.,Hwang,J.S.,&Park,K.K.(2015).Geneexpression614profilingandinhibitionofadiposetissueaccumulationofG.bimaculatusextractin615ratsonhighfatdiet.InLipidsinHealthandDisease(Vol.14,pp.116).616

Ainsley,J.A.,Pettus,J.M.,Bosenko,D.,Gerstein,C.E.,Zinkevich,N.,Anderson,M.G.,...617Johnson,W.A.(2003).EnhancedLocomotionCausedbyLossoftheDrosophila618DEG/ENaCProteinPickpocket1.InCurrentBiology(Vol.13,pp.1557-1563).619

Alexa,A.,&Rahnenfuhrer,J.(2019).topGO:EnrichmentAnalysisforGeneOntology.R620packageversion2.36.0.621

Averhoff,W.W.,Richardson,R.H.,Starostina,E.,Kinser,R.D.,&Pikielny,C.W.(1976).622MultiplepheromonesystemcontrollingmatinginDrosophilamelanogaster.In623ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica(Vol.62473,pp.591-593).625

Bao,W.,Kojima,K.K.,&Kohany,O.(2015).RepbaseUpdate,adatabaseofrepetitive626elementsineukaryoticgenomes.InMobileDNA(Vol.6,pp.11).627

Benaglia,T.,Chauveau,D.,Hunter,D.R.,&Young,D.S.(2009).Mixtools:AnRpackagefor628analyzingfinitemixturemodels.InJournalofStatisticalSoftware(Vol.32,pp.1-29).629

Bewick,A.J.,Vogel,K.J.,Moore,A.J.,&Schmitz,R.J.(2016).EvolutionofDNAMethylation630acrossInsects.InMolecularBiologyandEvolution(Vol.34,pp.654-665).631

Bird,A.P.(1980).DNAmethylationandthefrequencyofCpGinanimalDNA.InNucleic632AcidsResearch(Vol.8,pp.1499-1504).633

Blankers,T.,Oh,K.P.,Bombarely,A.,&Shaw,K.L.(2018).Thegenomicarchitectureofa634rapidIslandradiation:Recombinationratevariation,chromosomestructure,and635genomeassemblyofthehawaiiancricketLaupala.InGenetics(Vol.209,pp.1329-6361344).637

Blankers,T.,Oh,K.P.,&Shaw,K.L.(2018).Thegeneticsofabehavioralspeciation638phenotypeinanIslandsystem.InGenes(Vol.9,pp.346).639

Camacho,J.P.M.M.,Ruiz-Ruano,F.J.,Martín-Blázquez,R.,López-León,M.D.,Cabrero,J.,640Lorite,P.,...Bakkali,M.(2015).Asteptothegiganticgenomeofthedesertlocust:641chromosomesizesandrepeatedDNAs.InChromosoma(Vol.124,pp.263-275).642

Castresana,J.(2000).SelectionofConservedBlocksfromMultipleAlignmentsforTheirUse643inPhylogeneticAnalysis.InMolecularBiologyandEvolution(Vol.17,pp.540-552).644

Chen,K.,&Gao,C.(2014).Targetedgenomemodificationtechnologiesandtheir645applicationsincropimprovements.PlantCellReports,33(4),575-583.646

Chen,Y.H.,Gols,R.,&Benrey,B.(2015).Cropdomesticationanditsimpactonnaturally647selectedtrophicinteractions.AnnuRevEntomol,60,35-58.648

Chénais,B.,Caruso,A.,Hiard,S.,&Casse,N.(2012).Theimpactoftransposableelementson649eukaryoticgenomes:Fromgenomesizeincreasetogeneticadaptationtostressful650environments.InGene(Vol.509,pp.7-15).651

Crescente,J.M.,Zavallo,D.,Helguera,M.,&Vanzetti,L.S.(2018).MITETracker:anaccurate652approachtoidentifyminiatureinverted-repeattransposableelementsinlarge653genomes.InBMCBioinformatics(Vol.19,pp.348).654

DeBie,T.,Cristianini,N.,Demuth,J.P.,&Hahn,M.W.(2006).CAFE:acomputationaltool655forthestudyofgenefamilyevolution.InBioinformatics(Vol.22,pp.1269-1271).656

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 20: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 20 of 27

Dobermann,D.,Field,L.M.,&Michaelson,L.V.(2019).Impactofheatprocessingonthe657nutritionalcontentofGryllusbimaculatus(blackcricket).InNutritionBulletin(Vol.65844,pp.116-122).659

Dobin,A.,Davis,C.A.,Schlesinger,F.,Drenkow,J.,Zaleski,C.,Jha,S.,...Gingeras,T.R.660(2013).STAR:ultrafastuniversalRNA-seqaligner.InBioinformatics(Vol.29,pp.15-66121).662

Dong-Hwan,S.,HWANG,S.-Y.,HAN,J.,KOH,S.-K.,KIM,I.,RYU,K.S.,&YUN,C.-Y.(2004).663Immune-EnhancingActivityScreeningonExtractsfromTwoCrickets,Gryllus664bimaculatusandTeleogryllusemma.InEntomologicalResearch(Vol.34,pp.207-665211).666

Donoughe,S.,&Extavour,C.G.(2015).EmbryonicdevelopmentofthecricketGryllus667bimaculatus.InDevelopmentalBiology(Vol.411,pp.140-156).668

Edgar,R.C.(2004).MUSCLE:multiplesequencealignmentwithhighaccuracyandhigh669throughput.InNucleicAcidsResearch(Vol.32,pp.1792-1797).670

Edgar,R.C.(2010).SearchandclusteringordersofmagnitudefasterthanBLAST.In671Bioinformatics(Vol.26,pp.2460-2461).672

Elango,N.,Hunt,B.G.,Goodisman,M.A.D.,&Yi,S.V.(2009).DNAmethylationis673widespreadandassociatedwithdifferentialgeneexpressionincastesofthe674honeybee,Apismellifera.InProceedingsoftheNationalAcademyofSciencesofthe675UnitedStatesofAmerica(Vol.106,pp.11206-11211).676

Ellinghaus,D.,Kurtz,S.,&Willhoeft,U.(2008).LTRharvest,anefficientandflexible677softwarefordenovodetectionofLTRretrotransposons.InBMCBioinformatics(Vol.6789,pp.18).679

Emms,D.M.,&Kelly,S.(2019).OrthoFinder:phylogeneticorthologyinferencefor680comparativegenomics.GenomeBiol,20(1),238.681

English,A.C.,Richards,S.,Han,Y.,Wang,M.,Vee,V.,Qu,J.,...Gibbs,R.A.(2012).Mindthe682gap:upgradinggenomeswithPacificBiosciencesRSlong-readsequencing683technology.PLoSOne,7(11),e47768.684

Fisher,H.P.,Pascual,M.G.,Jimenez,S.I.,Michaelson,D.A.,Joncas,C.T.,Quenzer,E.D.,685Christie,A.E.&Horch,H.W.(2018).Denovoassemblyofatranscriptomeforthe686cricketGryllusbimaculatusprothoracicganglion:Aninvertebratemodelfor687investigatingadultcentralnervoussystemcompensatoryplasticity.PLoSOne(Vol.68813,pp.e0199070).689

Gepts,P.(2004).Cropdomesticationasalong-termselectionexperiment.Plantbreeding690reviews,24(2),1-44.691

Ghosh,S.,Lee,S.-M.,Jung,C.,&Meyer-Rochow,V.(2017).Nutritionalcompositionoffive692commercialedibleinsectsinSouthKorea.JournalofAsia-PacificEntomology,20(2),693686-694.694

Gregory,T.R.(2002).Genomesizeanddevelopmentalcomplexity.Genetica,115(1),131-695146.696

Hanboonsong,Y.,Jamjanya,T.,&Durst,P.B.(2013).Six-leggedlivestock:edibleinsect697farming,collectingandmarketinginThailand.InOffice(pp.69).698

Hanrahan,S.J.,&Johnston,J.S.(2011).Newgenomesizeestimatesof134speciesof699arthropods.InChromosomeresearch:aninternationaljournalonthemolecular,700supramolecularandevolutionaryaspectsofchromosomebiology(Vol.19,pp.809-701823).702

Harrison,M.C.,Jongepier,E.,Robertson,H.M.,Arning,N.,Bitard-Feildel,T.,Chao,H.,...703Bornberg-Bauer,E.(2018).Hemimetabolousgenomesrevealmolecularbasisof704termiteeusociality.Natureecology&evolution.705

Häsemeyer,M.,Yapici,N.,Heberlein,U.,&Dickson,B.J.(2009).SensoryNeuronsinthe706DrosophilaGenitalTractRegulateFemaleReproductiveBehavior.InNeuron(Vol.70761,pp.511-518).708

Holt,C.,&Yandell,M.(2011).MAKER2:anannotationpipelineandgenome-database709managementtoolforsecond-generationgenomeprojects.InBMCBioinformatics710(Vol.12,pp.491).711

Huber,F.,Moore,T.E.T.E.,&Loher,W.(1989).Cricketbehaviorandneurobiology.565.712Hwang,B.B.,Chang,M.H.,Lee,J.H.,Heo,W.,Kim,J.K.,Pan,J.H.,...Kim,J.H.(2019).The713

edibleinsectGryllusbimaculatusprotectsagainstgut-derivedinflammatory714responsesandliverdamageinmiceafteracutealcoholexposure.InNutrients(Vol.71511,pp.857).716

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 21: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 21 of 27

Jacob,P.F.,&Hedwig,B.(2016).Acousticsignallingformateattractionincrickets:717Abdominalgangliacontrolthetimingofthecallingsongpattern.InBehavioural718BrainResearch(Vol.309,pp.51-66).719

Jones,P.,Binns,D.,Chang,H.-Y.,Fraser,M.,Li,W.,McAnulla,C.,...Hunter,S.(2014).720InterProScan5:Genome-scaleProteinFunctionClassification.InBioinformatics(pp.7211-5).722

Kainz,F.,Ewen-Campen,B.,Akam,M.,&Extavour,C.G.(2011).Notch/Deltasignallingis723notrequiredforsegmentgenerationinthebasallybranchinginsectGryllus724bimaculatus.Development,138(22),5015-5026.725

Kajitani,R.,Toshimoto,K.,Noguchi,H.,Toyoda,A.,Ogura,Y.,Okuno,M.,...Itoh,T.(2014).726Efficientdenovoassemblyofhighlyheterozygousgenomesfromwhole-genome727shotgunshortreads.GenomeRes,24(8),1384-1395.728

Kidwell,M.G.(2002).Transposableelementsandtheevolutionofgenomesizein729eukaryotes.InGenetica(Vol.115,pp.49-63).730

Kim,D.,Langmead,B.,&Salzberg,S.L.(2015).HISAT:Afastsplicedalignerwithlow731memoryrequirements.NatureMethods.732

Korf,I.(2004).Genefindinginnovelgenomes.InBMCBioinformatics(Vol.5,pp.59).733Kouřimská,L.,&Adámková,A.(2016).Nutritionalandsensoryqualityofedibleinsects.In734

NFSJournal(Vol.4,pp.22-26).735Kulkarni,A.,&Extavour,C.G.(2019).TheCricketGryllusbimaculatus:Techniquesfor736

QuantitativeandFunctionalGeneticAnalysesofCricketBiology.InEvo-Devo:Non-737modelSpeciesinCellandDevelopmentalBiology(Vol.68,pp.183-216):Springer.738

Lee,M.J.,Sung,H.Y.,Jo,H.,Kim,H.-W.,Choi,M.S.,Kwon,J.Y.,&Kang,K.(2017).Ionotropic739Receptor76bIsRequiredforGustatoryAversiontoExcessiveNa+inDrosophila.In740Moleculesandcells(Vol.40,pp.787-795).741

Lex,A.,Gehlenborg,N.,Strobelt,H.,Vuillemot,R.,&Pfister,H.(2014).UpSet:Visualization742ofintersectingsets.InIEEETransactionsonVisualizationandComputerGraphics743(Vol.20,pp.1983-1992).744

Li,B.,&Dewey,C.N.(2011).RSEM:accuratetranscriptquantificationfromRNA-Seqdata745withorwithoutareferencegenome.InBMCBioinformatics(Vol.12,pp.323).746

Liu,L.,Johnson,W.A.,&Welsh,M.J.(2003).DrosophilaDEG/ENaCpickpocketgenesare747expressedinthetrachealsystem,wheretheymaybeinvolvedinliquidclearance.In748ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica(Vol.749100,pp.2128-2133).750

Lu,B.,LaMora,A.,Sun,Y.,Welsh,M.J.,&Ben-Shahar,Y.(2012).ppk23-Dependent751ChemosensoryFunctionsContributetoCourtshipBehaviorinDrosophila752melanogaster.InM.B.Goodman(Ed.),PLoSGenetics(Vol.8,pp.e1002587).753

Lyko,F.,Ramsahoye,B.H.,&Jaenisch,R.(2000).DNAmethylationinDrosophila754melanogaster.Nature,408,538-540.755

Mi,Y.A.,Hye,J.B.,In,S.K.,Eun,J.Y.,Seung,J.K.,Hyung,S.K.,...Byung,M.L.(2005).756Genotoxicevaluationofthebiocomponentsofthecricket,Gryllusbimaculatus,using757threemutagenicitytests.InJournalofToxicologyandEnvironmentalHealth-PartA758(Vol.68,pp.2111-2118).759

Misof,B.,Liu,S.,Meusemann,K.,Peters,R.S.,Donath,A.,Mayer,C.,...Zhou,X.(2014).760Phylogenomicsresolvesthetimingandpatternofinsectevolution.InScience(Vol.761346,pp.763-767).762

Mito,T.,&Noji,S.(2008).TheTwo-SpottedCricketGryllusbimaculatus:AnEmerging763ModelforDevelopmentalandRegenerationStudies.InCSHprotocols(Vol.2008,pp.764pdb.emo110).765

Paradis,E.,&Schliep,K.(2019).Ape5.0:Anenvironmentformodernphylogeneticsand766evolutionaryanalysesinR.InR.Schwartz(Ed.),Bioinformatics(Vol.35,pp.526-767528).768

Park,S.A.,Lee,G.H.,Lee,H.Y.,Hoang,T.H.,&Chae,H.J.(2019).Glucose-loweringeffectof769Gryllusbimaculatuspowderonstreptozotocin-induceddiabetesthroughthe770AKT/mTORpathway.InFoodScienceandNutrition(Vol.8,pp.402-409).771

Pavlou,H.J.,&Goodwin,S.F.(2013).CourtshipbehaviorinDrosophilamelanogaster:772towardsa'courtshipconnectome'.InCurrentopinioninneurobiology(Vol.23,pp.77376-83).774

AllergytoCrickets:AReview,2591-95(2016).775Pennec,E.,&Slowikowski,K.(2018).ggwordcloud:AWordCloudGeomfor'ggplot2'.R776

packageversion0.5.0.URLhttps://cran.r-project.org/package=ggwordcloud.777

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 22: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 22 of 27

Pertea,M.,Pertea,G.M.,Antonescu,C.M.,Chang,T.-C.,Mendell,J.T.,&Salzberg,S.L.778(2015).StringTieenablesimprovedreconstructionofatranscriptomefromRNA-779seqreads.InNatureBiotechnology(Vol.33,pp.290-295).780

Poulsen,M.,Hu,H.,Li,C.,Chen,Z.,Xu,L.,Otani,S.,...Zhang,G.(2014).Complementary781symbiontcontributionstoplantdecompositioninafungus-farmingtermite.In782ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica(Vol.783111,pp.14500-14505).784

Price,M.N.,Dehal,P.S.,&Arkin,A.P.(2010).FastTree2–ApproximatelyMaximum-785LikelihoodTreesforLargeAlignments.InA.F.Y.Poon(Ed.),PLoSOne(Vol.5,pp.786e9490).787

Qaim,M.(2009).TheEconomicsofGeneticallyModifiedCrops.AnnualReviewofResource788Economics,1(1),665-694.789

Rezával,C.,Pavlou,H.J.,Dornan,A.J.,Chan,Y.-B.,Kravitz,E.A.,&Goodwin,S.F.(2012).790NeuralcircuitryunderlyingDrosophilafemalepostmatingbehavioralresponses.In791CurrentBiology(Vol.22,pp.1155-1165).792

Allergicrisksofconsumingedibleinsects:Asystematicreview,621700030(2018).793Ryu,H.Y.,Lee,S.,Ahn,K.S.,Kim,H.J.,Lee,S.S.,Ko,H.J.,...Song,K.S.(2016).Oraltoxicity794

studyandskinsensitizationtestofacricket.InToxicologicalResearch(Vol.32,pp.795159-173).796

Shaw,K.L.,&Lesnick,S.C.(2009).Genomiclinkageofmalesongandfemaleacoustic797preferenceQTLunderlyingarapidspeciesradiation.InProceedingsoftheNational798AcademyofSciences(Vol.106,pp.9737-9742).799

Shinmyo,Y.,Mito,T.,Matsushita,T.,Sarashina,I.,Miyawaki,K.,Ohuchi,H.,&Noji,S.(2004).800piggyBac-mediatedsomatictransformationofthetwo-spottedcricket,Gryllus801bimaculatus.InDevelopment,GrowthandDifferentiation(Vol.46,pp.343-349).802

Simão,F.A.,Waterhouse,R.M.,Ioannidis,P.,Kriventseva,E.V.,&Zdobnov,E.M.(2015).803BUSCO:assessinggenomeassemblyandannotationcompletenesswithsingle-copy804orthologs.InBioinformatics(Vol.31,pp.3210-3212).805

Smit,A.,Hubley,R.,&Grenn,P.(2015).RepeatMaskerOpen-4.0.RepeatMaskerOpen-4.0.7.806Srinroch,C.,Srisomsap,C.,Chokchaichamnankit,D.,Punyarit,P.,&Phiriyangkul,P.(2015).807

Identificationofnovelallergeninedibleinsect,Gryllusbimaculatusanditscross-808reactivitywithMacrobrachiumspp.allergens.InFoodChemistry(Vol.184,pp.160-809166).810

Stanke,M.,&Waack,S.(2003).GenepredictionwithahiddenMarkovmodelandanew811intronsubmodel.InBioinformatics(Vol.19,pp.ii215-ii225).812

Stull,V.J.,Finer,E.,Bergmans,R.S.,Febvre,H.P.,Longhurst,C.,Manter,D.K.,...Weir,T.L.813(2018).ImpactofEdibleCricketConsumptiononGutMicrobiotainHealthyAdults,814aDouble-blind,RandomizedCrossoverTrial.InScientificReports(Vol.8,pp.10762).815

Suyama,M.,Torrents,D.,&Bork,P.(2006).PAL2NAL:Robustconversionofprotein816sequencealignmentsintothecorrespondingcodonalignments.InNucleicAcids817Research(Vol.34,pp.W609-W612).818

Ter-Hovhannisyan,V.,Lomsadze,A.,Chernoff,Y.O.,&Borodovsky,M.(2008).Gene819predictioninnovelfungalgenomesusinganabinitioalgorithmwithunsupervised820training.InGenomeResearch(Vol.18,pp.1979-1990).821

Terrapon,N.,Li,C.,Robertson,H.M.,Ji,L.,Meng,X.,Booth,W.,...Liebig,J.(2014).822Moleculartracesofalternativesocialorganizationinatermitegenome.InNat823Commun(Vol.5,pp.3636).824

Thomas,G.W.C.,Dohmen,E.,Hughes,D.S.T.,Murali,S.C.,Poelchau,M.,Glastad,K.,...825Richards,S.(2020).Genecontentevolutioninthearthropods.GenomeBiol,21(1),82615.827

Thrall,P.H.,Bever,J.D.,&Burdon,J.J.(2010).Evolutionarychangeinagriculture:thepast,828presentandfuture.EvolAppl,3(5-6),405-408.829

Thurmond,J.,Goodman,J.L.,Strelets,V.B.,Attrill,H.,Gramates,L.S.,Marygold,S.J.,...830FlyBase,C.(2019).FlyBase2.0:thenextgeneration.NucleicAcidsRes,47(D1),D759-831D765.832

UniProt,C.(2019).UniProt:aworldwidehubofproteinknowledge.NucleicAcidsRes,83347(D1),D506-D515.834

VanHuis,A.,VanItterbeeck,J.,Klunder,H.,Mertens,E.,Halloran,A.,Muir,G.,&Vantomme,835P.(2013).Edibleinsects:futureprospectsforfoodandfeedsecurity:Foodand836agricultureorganizationoftheUnitedNations(FAO).837

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 23: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 23 of 27

Vassetzky,N.S.,&Kramerov,D.A.(2013).SINEBase:adatabaseandtoolforSINEanalysis.838InNucleicacidsresearch(Vol.41,pp.D83-89).839

Wang,X.,Fang,X.,Yang,P.,Jiang,X.,Jiang,F.,Zhao,D.,...Kang,L.(2014).Thelocust840genomeprovidesinsightintoswarmformationandlong-distanceflight.InNature841communications(Vol.5,pp.2957).842

Wang,Y.,Jorda,M.,Jones,P.L.,Maleszka,R.,Ling,X.,Robertson,H.M.,...Robinson,G.E.843(2006).FunctionalCpGMethylationSysteminaSocialInsect.InScience(Vol.314,844pp.645-647).845

WilsonHorch,H.,Mito,T.,Popadić,A.,Ohuchi,H.,&Noji,S.(2017).Thecricketasamodel846organism:Development,regeneration,andbehavior.InH.W.Horch,T.Mito,A.847Popadić,H.Ohuchi,&S.Noji(Eds.),TheCricketasaModelOrganism:Development,848Regeneration,andBehavior(pp.1-376).Tokyo.849

Xu,M.,&Shaw,K.L.(2019).Thegeneticsofmatingsongevolutionunderlyingrapid850speciation:Linkingquantitativevariationtocandidategenesforbehavioral851isolation.InGenetics(Vol.211,pp.1089-1104).852

Yamasaki,M.,Tenaillon,M.I.,Bi,I.V.,Schroeder,S.G.,Sanchez-Villeda,H.,Doebley,J.F.,...853McMullen,M.D.(2005).Alarge-scalescreenforartificialselectioninmaize854identifiescandidateagronomiclocifordomesticationandcropimprovement.Plant855Cell,17(11),2859-2872.856

Yang,Z.(2007).PAML4:PhylogeneticAnalysisbyMaximumLikelihood.InMolecular857BiologyandEvolution(Vol.24,pp.1586-1591).858

Ylla,G.,Piulachs,M.-D.,&Belles,X.(2018).Comparativetranscriptomicsintwoextreme859neopteransrevealgeneraltrendsintheevolutionofmoderninsects.IniScience(Vol.8604,pp.164-179).861

Zelle,K.M.,Lu,B.,Pyfrom,S.C.,&Ben-Shahar,Y.(2013).Thegeneticarchitectureof862degenerin/epithelialsodiumchannelsinDrosophila.InG3:Genes,Genomes,Genetics863(Vol.3,pp.441-450).864

Zeng,V.,Ewen-Campen,B.,Horch,H.W.,Roth,S.,Mito,T.,&Extavour,C.G.(2013).865DevelopmentalGeneDiscoveryinaHemimetabolousInsect:DeNovoAssemblyand866AnnotationofaTranscriptomefortheCricketGryllusbimaculatus.InP.K.Dearden867(Ed.),PLoSOne(Vol.8,pp.e61479).868

Zhong,L.,Hwang,R.Y.,&Tracey,W.D.(2010).PickpocketIsaDEG/ENaCProteinRequired869forMechanicalNociceptioninDrosophilaLarvae.InCurrentBiology(Vol.20,pp.870429-434).871

872

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 24: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 24 of 27

873

874

Figure1:TheG.bimaculatusgenome.A)ThecricketG.bimaculatus(topandsideviews),commonlycalledthetwo-spottedcricket,owesitsnametothetwoyellow875spotsonthebaseoftheforewings.B)CircularrepresentationoftheN50(pink)andN90(purple)scaffolds,repetitivecontentdensity(green),thehigh-(yellow)and876low-(lightblue)CpGo/evaluegenes,pickpocketgeneclusters(darkblue),andgenedensity(orange).C)Theproportionofthegenomemadeupofdifferentfamiliesof877transposableelementsisdifferentbetweenthecricketspeciesG.bimaculatusandL.kohalensis.878

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 25: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 25 of 27

879

Figure2:CpGo/edistributionacrossinsectsandfunctionalanalysis.A)The880distributionofCpGo/evalueswithintheCDSregionsdisplaysabimodaldistributioninboth881cricketsandinthehoneybeeA.mellifera.Wemodeledeachpeakwithanormaldistribution882anddefinedtheirintersection(redline)asathresholdtoseparategenesintohigh-and883low-CpGo/evaluecategories.Thesignificantlyenriched(p-value<0.05)GOtermsfromthe884genesbelongingtoeachgroupisrepresentedasawordcloudwiththefontsizeofeach885termproportionaltotheirfoldchange,andcolor-codedtoindicatefunctionalcategories:886red=RNA/transcription,blue=DNA/repair/histone/methylation,orange=887protein/translation/ribosome,green=metabolism/catabolism/biosynthesis,purple=888sensory/perception/neuro,andgrayforothers.B)UpSetplotshowingthetotalnumberof889orthogroups(OGs)belongingtoeachcategory(horizontalbarchart)andthenumberof890themthatarecommonacrossdifferentcategories(linkeddots)orexclusivetoasingle891category(unlinkeddot).Forallthreeinsects,thespecies-specificOGslargelycorrespondto892highCpGo/evaluegenes,whilelowCpGo/evaluegenesaremorelikelytohaveorthologsin893theotherspecies.Thisplotshowsonlythenon-overlappingcategoriesandthethreemost894commoncombinations;theremainingpossiblecombinationsareshownin895SupplementaryFigure4.C)One-to-oneorthologousgeneswithlowCpGo/evaluesinboth896cricketshavesignificantlylowerdN/dSvaluesthangeneswithhighCpGo/evalues.897

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 26: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 26 of 27

898

Figure3:ThepickpocketgenefamilyclassVisexpandedincrickets.pickpocketgene899treewithallthegenesbelongingtothesevenOGsthatcontaintheD.melanogaster900pickpocketgenes.AllOGspredominantlycontainmembersofasingleppkfamily,except901OG0000167,whichcontainsmembersoftwopickpocketclasses,IIandVI.Thepickpocket902classV(circularcladogram)wassignificantlyexpandedincricketsrelativetootherinsects.903

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint

Page 27: Cricket genomes: the genomes of future foodJul 07, 2020  · 302 Comparing cricket genomes to other insect genomes 303 The annotation of these two cricket genomes was done by combining

Ylla et al. Page 27 of 27

904

905

Figure4:Cricketgenomesinthecontextofinsectevolution.Aphylogenetictree906including16insectspeciescalibratedatfourdifferenttimepoints(redwatchsymbols)907basedonMisofetal.(2014),suggeststhatG.bimaculatusandL.kohalensisdivergedca.89.2908Mya.Thenumberofexpanded(bluetext)andcontracted(redtext)genefamiliesisshown909foreachinsect,andforthebranchesleadingtocrickets.ThedensityplotsshowtheCpGo/e910distributionforallgenesforeachspecies.ThegenomesizeinGb,wasobtainedfromthe911genomefastafiles(SupplementaryTable1).912

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted July 7, 2020. . https://doi.org/10.1101/2020.07.07.191841doi: bioRxiv preprint