molecular evolution of glycoside hydrolase genes in the ...bioinformatics.unl.edu › eyun ›...

15
Molecular Evolution of Glycoside Hydrolase Genes in the Western Corn Rootworm (Diabrotica virgifera virgifera) Seong-il Eyun 1 , Haichuan Wang 2 , Yannick Pauchet 5 , Richard H. ffrench-Constant 5 , Andrew K. Benson 4 , Arnubio Valencia-Jime ´ nez 2,6,7 , Etsuko N. Moriyama 1,3 , Blair D. Siegfried 2 * 1 School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America, 2 Department of Entomology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America, 3 Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America, 4 Food Science and Technology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America, 5 Department of Entomology, Max Planck Institute for Chemical Ecology, Jena, Germany, 6 Biosciences, University of Exeter, Penryn, United Kingdom, 7 Departamento de Produccio ´ n Agropecuaria, Facultad de Ciencias Agropecuarias, Universidad de Caldas, Manizales, Colombia Abstract Cellulose is an important nutritional resource for a number of insect herbivores. Digestion of cellulose and other polysaccharides in plant-based diets requires several types of enzymes including a number of glycoside hydrolase (GH) families. In a previous study, we showed that a single GH45 gene is present in the midgut tissue of the western corn rootworm, Diabrotica virgifera virgifera (Coleoptera: Chrysomelidae). However, the presence of multiple enzymes was also suggested by the lack of a significant biological response when the expression of the gene was silenced by RNA interference. In order to clarify the repertoire of cellulose-degrading enzymes and related GH family proteins in D. v. virgifera, we performed next-generation sequencing and assembled transcriptomes from the tissue of three different developmental stages (eggs, neonates, and third instar larvae). Results of this study revealed the presence of seventy-eight genes that potentially encode GH enzymes belonging to eight families (GH45, GH48, GH28, GH16, GH31, GH27, GH5, and GH1). The numbers of GH45 and GH28 genes identified in D. v. virgifera are among the largest in insects where these genes have been identified. Three GH family genes (GH45, GH48, and GH28) are found almost exclusively in two coleopteran superfamilies (Chrysomeloidea and Curculionoidea) among insects, indicating the possibility of their acquisitions by horizontal gene transfer rather than simple vertical transmission from ancestral lineages of insects. Acquisition of GH genes by horizontal gene transfers and subsequent lineage-specific GH gene expansion appear to have played important roles for phytophagous beetles in specializing on particular groups of host plants and in the case of D. v. virgifera, its close association with maize. Citation: Eyun S-i, Wang H, Pauchet Y, ffrench-Constant RH, Benson AK, et al. (2014) Molecular Evolution of Glycoside Hydrolase Genes in the Western Corn Rootworm (Diabrotica virgifera virgifera). PLoS ONE 9(4): e94052. doi:10.1371/journal.pone.0094052 Editor: Omprakash Mittapalli, The Ohio State University/OARDC, United States of America Received November 21, 2013; Accepted March 11, 2014; Published April 9, 2014 Copyright: ß 2014 Eyun et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the Consortium for Plant Biotechnology Research (CPBR Agreement GO12026-333) and Pioneer Hi-Bred International. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: Funding for this research was provided by the Consortium for Plant Biotechnology Research (Agreement GO12026-333) with matching support from Pioneer Hi-Bred International. This does not alter the authors’ adherence to PLOS ONE policies on sharing data and materials. * E-mail: [email protected] Introduction Cellulose, which is mostly synthesized by terrestrial plants and marine algae, is the most abundant organic compound on Earth. It is a simple carbohydrate polymer, consisting of repeating glucose units linked by b-1,4-glycosidic bonds. It is comprised of nanometer-thick crystalline microfibrils and highly resistant to enzymatic hydrolysis [1]. Cellulolytic fungi and bacteria have developed complex cellulase systems that efficiently hydrolyze cellulose [2]. These cellulase systems play important roles in a wide range of processes ranging from biosphere maintenance (carbon recycling) to the generation of potentially sustainable energy sources such as glucose, ethanol, hydrogen, and methane [1,3–5]. For many herbivorous, detritivorous, as well as omnivorous insects, cellulose comprises a major nutritional resource. However, endogenous cellulases were long thought to be absent in metazoans including insects. It had been wildly accepted that cellulose digestion in insects was mediated by gut-associated microbes such as mixtures of bacteria and protozoa under anaerobic conditions [6–8]. However, since the first endogenous cellulase gene was identified in the termite Reticulitermes speratus [9], many studies now account for the endogenous origin of cellulases in nematodes, insects, and some other invertebrates [10–12]. In the termite systems, where the metazoan cellulose digestion is most extensively studied, a dual (independent) or synergistic collabora- tion system among host and symbiont-mediated cellulases has been proposed [13–18]. However, understanding of the exact roles of the host and symbiotic microbiota in the complex cellulose degradation process is still emerging. Cellulase is a general term for cellulytic enzymes including three classes of hydrolytic enzymes: endoglucanases (EC 3.2.1.4), exoglucanases (cellobiohydrolases: EC 3.2.1.74 and 3.2.1.91), and b-glucosidases (cellobiases: EC 3.2.1.21). Plant cell wall digestion also requires other enzymes including pectinases and hemicellulases. All these enzymes are grouped into glycoside hydrolase (GH; EC 3.2.1.-) (also known as glycosidase or glycosyl hydrolase) families according to their amino-acid sequence similarities and their folding patterns based on the Carbohy- PLOS ONE | www.plosone.org 1 April 2014 | Volume 9 | Issue 4 | e94052

Upload: others

Post on 27-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

Molecular Evolution of Glycoside Hydrolase Genes in theWestern Corn Rootworm (Diabrotica virgifera virgifera)Seong-il Eyun1, Haichuan Wang2, Yannick Pauchet5, Richard H. ffrench-Constant5, Andrew K. Benson4,

Arnubio Valencia-Jimenez2,6,7, Etsuko N. Moriyama1,3 , Blair D. Siegfried2*

1 School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America, 2 Department of Entomology, University of Nebraska-Lincoln,

Lincoln, Nebraska, United States of America, 3 Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America, 4 Food

Science and Technology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America, 5 Department of Entomology, Max Planck Institute for Chemical

Ecology, Jena, Germany, 6 Biosciences, University of Exeter, Penryn, United Kingdom, 7 Departamento de Produccion Agropecuaria, Facultad de Ciencias Agropecuarias,

Universidad de Caldas, Manizales, Colombia

Abstract

Cellulose is an important nutritional resource for a number of insect herbivores. Digestion of cellulose and otherpolysaccharides in plant-based diets requires several types of enzymes including a number of glycoside hydrolase (GH)families. In a previous study, we showed that a single GH45 gene is present in the midgut tissue of the western cornrootworm, Diabrotica virgifera virgifera (Coleoptera: Chrysomelidae). However, the presence of multiple enzymes was alsosuggested by the lack of a significant biological response when the expression of the gene was silenced by RNAinterference. In order to clarify the repertoire of cellulose-degrading enzymes and related GH family proteins in D. v.virgifera, we performed next-generation sequencing and assembled transcriptomes from the tissue of three differentdevelopmental stages (eggs, neonates, and third instar larvae). Results of this study revealed the presence of seventy-eightgenes that potentially encode GH enzymes belonging to eight families (GH45, GH48, GH28, GH16, GH31, GH27, GH5, andGH1). The numbers of GH45 and GH28 genes identified in D. v. virgifera are among the largest in insects where these geneshave been identified. Three GH family genes (GH45, GH48, and GH28) are found almost exclusively in two coleopteransuperfamilies (Chrysomeloidea and Curculionoidea) among insects, indicating the possibility of their acquisitions byhorizontal gene transfer rather than simple vertical transmission from ancestral lineages of insects. Acquisition of GH genesby horizontal gene transfers and subsequent lineage-specific GH gene expansion appear to have played important roles forphytophagous beetles in specializing on particular groups of host plants and in the case of D. v. virgifera, its closeassociation with maize.

Citation: Eyun S-i, Wang H, Pauchet Y, ffrench-Constant RH, Benson AK, et al. (2014) Molecular Evolution of Glycoside Hydrolase Genes in the Western CornRootworm (Diabrotica virgifera virgifera). PLoS ONE 9(4): e94052. doi:10.1371/journal.pone.0094052

Editor: Omprakash Mittapalli, The Ohio State University/OARDC, United States of America

Received November 21, 2013; Accepted March 11, 2014; Published April 9, 2014

Copyright: � 2014 Eyun et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the Consortium for Plant Biotechnology Research (CPBR Agreement GO12026-333) and Pioneer Hi-Bred International. Thefunders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: Funding for this research was provided by the Consortium for Plant Biotechnology Research (Agreement GO12026-333) with matchingsupport from Pioneer Hi-Bred International. This does not alter the authors’ adherence to PLOS ONE policies on sharing data and materials.

* E-mail: [email protected]

Introduction

Cellulose, which is mostly synthesized by terrestrial plants and

marine algae, is the most abundant organic compound on Earth.

It is a simple carbohydrate polymer, consisting of repeating

glucose units linked by b-1,4-glycosidic bonds. It is comprised of

nanometer-thick crystalline microfibrils and highly resistant to

enzymatic hydrolysis [1]. Cellulolytic fungi and bacteria have

developed complex cellulase systems that efficiently hydrolyze

cellulose [2]. These cellulase systems play important roles in a wide

range of processes ranging from biosphere maintenance (carbon

recycling) to the generation of potentially sustainable energy

sources such as glucose, ethanol, hydrogen, and methane [1,3–5].

For many herbivorous, detritivorous, as well as omnivorous

insects, cellulose comprises a major nutritional resource. However,

endogenous cellulases were long thought to be absent in

metazoans including insects. It had been wildly accepted that

cellulose digestion in insects was mediated by gut-associated

microbes such as mixtures of bacteria and protozoa under

anaerobic conditions [6–8]. However, since the first endogenous

cellulase gene was identified in the termite Reticulitermes speratus [9],

many studies now account for the endogenous origin of cellulases

in nematodes, insects, and some other invertebrates [10–12]. In

the termite systems, where the metazoan cellulose digestion is most

extensively studied, a dual (independent) or synergistic collabora-

tion system among host and symbiont-mediated cellulases has

been proposed [13–18]. However, understanding of the exact roles

of the host and symbiotic microbiota in the complex cellulose

degradation process is still emerging.

Cellulase is a general term for cellulytic enzymes including three

classes of hydrolytic enzymes: endoglucanases (EC 3.2.1.4),

exoglucanases (cellobiohydrolases: EC 3.2.1.74 and 3.2.1.91),

and b-glucosidases (cellobiases: EC 3.2.1.21). Plant cell wall

digestion also requires other enzymes including pectinases and

hemicellulases. All these enzymes are grouped into glycoside

hydrolase (GH; EC 3.2.1.-) (also known as glycosidase or glycosyl

hydrolase) families according to their amino-acid sequence

similarities and their folding patterns based on the Carbohy-

PLOS ONE | www.plosone.org 1 April 2014 | Volume 9 | Issue 4 | e94052

Page 2: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

drate-Active enZymes Database (CAZy, http://www.cazy.org)

[19]. b-glucosidases (GH family 1) are found universally in all

domains of organisms [20]. While insects lack endogenous

exoglucanases ([11] but see, e.g., [21]), genes encoding endoglu-

canases and other GH family enzymes have been identified from a

number of phytophagous coleopterans belonging to the superfam-

ilies Chrysomeloidea, which includes long-horned beetles and leaf

beetles, and Curculionoidea (weevils) [11,22–24]. A b-1,4-

endoglucanase gene belonging to the GH family 9 was also

isolated and characterized from the red flour beetle Tribolium

castaneum (Coleoptera: Tenebrionidae) [25].

Recently, we have cloned and characterized a novel b-1,4-

endoglucanase gene (DvvENGaseI, JQ755253) belonging to the GH

family 45 from the western corn rootworm Diabrotica virgifera

virgifera (Coleoptera: Chrysomelidae), an important insect pest of

maize (Zea mays L.) in the United States [26,27]. We showed that

suppression of DvvENGaseI expression by RNA interference

(RNAi) resulted in only slight developmental delays suggesting

that this gene might be a part of the larger system of cellulose

degrading enzymes [26]. The goal of this study is focused on the

exploration of genetic diversity among GH family genes in D. v.

virgifera, especially focusing on its larval stages. In order to identify

the diversity of GH family genes encoding plant cell wall

degrading and related enzymes expressed in D. v. virgifera larvae,

we sequenced the transcriptomes covering three different devel-

opmental stages (eggs, neonates, and midgut from third instar

larvae) using next-generation technologies. We identified eight

types of GH family genes that encode b-1,4-endoglucanases

(GH45, GH48, and GH5) as well as a pectinase (GH28), an endo-

1,3-b-glucanase (GH16), an a-galactosidase (GH27), an a-gluco-

sidase (GH31), and a b-glucosidase (GH1). We found large

numbers of GH45 and GH28 genes from the D. v. virgifera

transcriptomes, among the largest so far known from coleopteran

species studied. Our analyses also suggested multiple horizontal

transfer events during the evolution of GH45, GH48, and GH28

genes from bacteria or fungi to the common ancestor of

chrysomelid and curculionid beetles as well as to other herbivorous

insects. Acquisition and subsequent expansion of GH gene copies

in phytophagous beetle lineages may have been adaptive and have

played important roles for their specialization in feeding on

particular host plants.

Results and Discussion

Sequencing and de novo Assembly of D. v. virgiferaTranscriptomes

Using Illumina paired-end as well as 454 Titanium sequencing

technologies, in total ,700 gigabases were sequenced from cDNA

prepared from eggs (15,162,017 Illumina paired-end reads after

filtering), neonates (721,697,288 Illumina paired-end reads after

filtering), and midguts of third instar larvae (44,852,488 Illumina

paired-end reads and 415,742 Roche 454 reads, both after

filtering) (see Table S1 for details). De novo transcriptome assembly

was performed using Trinity [28] for each of three samples as well

as for the pooled dataset (see Materials and Methods and Tables

S1, S2, and S3 for the comparative analysis of assembly programs

and other details). The D. v. virgifera transcriptome assembled from

the pooled dataset included 163,871 contigs (the average length:

914 bp) (Table 1).

Identification of GH Family Genes from D. v. virgiferaTranscriptomes

A total of seventy eight potential genes belonging to eight GH

families were identified from our D. v. virgifera transcriptome. In

Figure 1, numbers of the genes for these GH families found in D. v.

virgifera are compared with those found in other coleopteran

species. While the enzymes encoded by GH45, GH48, and GH5

family genes are known to have b-1,4-endoglucanase (EC. 3.2.1.4)

activity, GH28 genes encode a pectolytic enzyme, polygalacturo-

nase (EC 3.2.1.15) [19]. GH16 family genes encode a laminar-

inase, b-1,3-glucanase (EC 3.2.1.39). We also found genes

encoding GH27 (a-galactosidase, EC 3.2.1.22), GH31 (a-glucosi-

dase, EC 3.2.1.20), and GH1 (b-glucosidase, EC 3.2.1.21) families.

GH45 FamilyEleven GH45 family gene candidates were identified from the

D. v. virgifera transcriptome with ten of them covering the entire

coding regions (615–741 bp or 204–246 amino acids (AA); Figure

S1A). The partial sequence (GH45-6) was also confirmed in the

draft D. v. virgifera genome. Four of them (GH45-1, GH45-4,

GH45-7, and GH45-10) were highly expressed (.100 reads per

kilobase of per million mapped reads or RPKM in the neonate

and third-instar larval midgut samples but not expressed in the egg

samples (Table S4). We have previously identified GH45-7 as

DvvENGaseI (JQ755253) [26]. This gene exhibits the highest

expression among the eleven GH45 family genes and also the

highest among all GH genes identified in the present study (Table

S4). Note that its gene and protein expressions in D. v. virgifera

larvae were also confirmed in our previous study [26].

GH45 family genes have been described from a number of

coleopteran species belonging to the suborder Polyphaga (e.g.,

[23,29–31]). Similarity searches against the NCBI (National

Center for Biotechnology and Information) non-redundant (NR)

protein database as well as ten complete insect genomes confirmed

that within insects, GH45 family genes are found only in two

polyphagan coleopteran superfamilies, Chrysomeloidea and Cur-

culionoidea. As shown in Figure 1, multiple GH45 genes have

been identified in some species. D. v. virgifera has the largest known

number of GH45 genes (11 genes) among coleopteran species, and

probably among any known invertebrates where this gene exists.

In addition to these coleopteran sequences, a sequence similar

to the GH45 family has been identified from the Antarctic

springtail Cryptopygus antarcticus [12], which belongs to one of the

basal hexapodan orders, Collembola [32]. Another sequence

similar to the GH45 family was reported from the water bear

Hypsibius dujardini (phylum Tardigrada, a sister group of arthro-

pods) [33]. GH45 family genes have also been reported among

various metazoans, from protists symbiotic to wood-feeding

termites and a cockroach [34,35] to plant-parasitic nematodes

and mollusks [10,36–38]. In order to understand the evolutionary

process that has led to the diversity of coleopteran GH45 family

genes, a maximum-likelihood phylogeny was reconstructed

including GH45 family proteins from eleven coleopteran species

as well as other metazoans mentioned above, fungi, and bacteria

(Figure 2). Our phylogenetic analysis suggests that all coleopteran

GH45 family genes are monophyletic although the support was

weak (#66% bootstrap supports). Based on currently available

sequences, several species-specific gene duplications were found in

coleopteran species (shown with blue branches in Figure 2). While

all bacterial GH45 family proteins, except for sequences from

Myxococcus stipitatus and uncultured bacterium (ADV57513.1),

formed a monophyletic group, relationships among fungal and

metazoan sequences were unresolved. Although the exact timings

are not clear, multiple horizontal gene transfer (HGT) events are

likely to have involved in the evolution of metazoan GH45 family

genes.

As shown in [22], except for a clade of curculionid proteins

(Group 1 in Figure 2), the putative catalytic nucleophile and

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 2 April 2014 | Volume 9 | Issue 4 | e94052

Page 3: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

proton donor positions of GH45 proteins are highly conserved

with Asp (Gln is found in Group 1 proteins). This is also the case

for all but one D. v. virgifera GH45 proteins (Val is found in GH45-

9; Figure S1A). Other exceptional cases include: Asn in H. dujardini

(Tardigrada), Thr in Leptosphaeria maculans (fungus), Ser in Alternaria

alternate (fungus), and Glu in Myxococcus stipitatus (bacterium). In

addition to possessing the conserved catalytic residues, the

majority of GH45 genes identified in D. v. virgifera showed

significant expression levels in neonate and third-larval midgut

transciptomes (Table S4). These proteins probably share similar

functions and help explain our previous results with RNAi-

Table 1. Summary of the D. v. virgifera transcriptome assembly using the pooled dataset.

Samples Egg, neonate, and third-instar larval midgut

Number of paired-end reads before filtering 1,462.26106 (144,6906106 bp)

Number of paired-end reads after filtering 781.76106 (77,3936106 bp)

Assembly program used Trinity (2013-02-25)

Total number of contigs 163,871

Average contig length (range) 914 bp (201–31,064 bp)

N50 length 1,396 bp

doi:10.1371/journal.pone.0094052.t001

Figure 1. Distribution of glycoside hydrolase family genes among polyphagan coleopterans. Numbers for GH5, GH45, GH48, GH28, andGH11 genes are taken from [22] (marked with *). Exceptions are for D. v. virgifera (this study; numbers in square brackets are for partial sequences), P.cochleariae GH45, GH28 [83], and GH11 [24], G. atrocyanea GH48 [39], D. ponderosae [45], C. sordidus (preliminary results from transcriptomes areshown in parentheses; A. Valencia-Jimenez, personal communication), O. sulcatus GH48 (CAH25542.1), T. castaneum [25,70], and P. chalceus (thisstudy, searched from the transcriptome [69]). Numbers with { indicate that they are based on the search results from the NCBI NR database or fromliteratures. Since neither genomes nor transcriptomes are available for these species, the actual numbers of their GH family genes are not known. ForGH5 genes, their subfamilies are indicated with ‘s’ followed by the number (e.g., s2 for subfamily 2). Accession numbers for all coleopteran GH genesincluded in this study are found in Table S5. The taxonomical relationship is based on [68]. ‘.’: not determined. For other insect groups, only existence(+) or absence (2) is shown.doi:10.1371/journal.pone.0094052.g001

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 3 April 2014 | Volume 9 | Issue 4 | e94052

Page 4: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 4 April 2014 | Volume 9 | Issue 4 | e94052

Page 5: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

suppression of single GH45 gene expression not drastically

affecting the D. v. virgifera larval development.

GH48 FamilyWe identified three GH48 family gene candidates from D. v.

virgifera: two complete (1,926 bp, 641 aa) and one partial (374 bp,

124 aa) (Figure S1B). The partial sequence (GH48-2) was

confirmed in the draft D. v. virgifera genome. Similar to GH45

family genes, GH48 family genes have been identified from many

polyphagan coleopterans especially from the two superfamilies

(Chrysomeloidea and Curculionoidea) [22,39,40] (Figure 1).

Consistent with the results shown in [22], the number of GH48

genes found in coleopterans was in general smaller than those of

GH45 and GH28 family genes.

Two GH48 family genes (active phase-associated proteins,

APAP I and II; shown as Gatr GH48-1 and 22 in Figure 3) were

isolated from a leaf beetle Gastrophysa atrocyanea [39]. While neither

glucanase nor cellobiohydrolase activity was detected with G.

atrocyanea GH48-1, it exhibited chitinase activity. G. atrocyanea

GH48-1 was shown to be necessary for diapause termination in

adults [39]. Based on our phylogenetic analysis, G. atrocyanea

GH48-1 was found to be closer to D. v. virgifera GH48-2 (Figure 3).

However, only a fragment has been identified from the D. v.

virgifera GH48-2 and its expression was not confirmed from our egg

and larval samples (Table S4). While D. v. virgifera GH48-1 also

had very low expression, GH48-3 was found to be expressed more

in larvae than in eggs.

GH48 is one of the most common GH family genes in bacteria

[41]. Apart from their presence in bacteria and in coleopterans,

this family has been reported from three fungal species

(Neocallimastix patriciarum, Piromyces equi, and Piromyces sp.). None of

the ten insect genomes we examined had GH48 family genes. This

disparate and limited distribution of GH48 family genes in two

related coleopteran superfamilies and in three fungal species but

not in any other eukaryotes, clearly indicates at least two

independent HGT events: one from bacteria to the ancestral

coleopteran lineage before the divergence of the two coleopteran

superfamilies and the other from bacteria to the ancestral lineage

before the divergence of the three fungal species. The three fungal

GH48 sequences belong to the family Neocallimastigaceae

(phylum Neocallimastigomycota). These fungi are isolated in the

digestive tracts of ruminant and non-ruminant mammals and

herbivorous reptiles [42]. Although our similarity search and

phylogenetic analysis did not show a clear relationship with any

known bacterial species, rumen fungi have been reported to obtain

catalytic enzymes from bacterial sources by HGT events. For

example, GH5 (endoglucanase, EC 3.2.1.4) and GH11 (xylanase,

EC 3.2.1.8) family genes found in Orpinomyces joyonii and Orpinomyces

sp. (phylum Neocallimastigomycota) are considered to be bacterial

origin [43]. GH5 family genes in a rumen fungus, Neocallimastix

patriciarum, have also been suggested to have originated from

bacteria (Streptococcus equinus and Ruminococcus albus) [44].

GH28 FamilyGH28 family genes encode polygalacturonase (pectinase, EC

3.2.1.15). Ten intact (average 1087 bp, 361 aa) and four partial

GH28 candidate sequences were identified in the D. v. virgifera

transcriptome (Figure S1C). Gene expression, especially in larvae,

was confirmed from the majority of the eleven intact candidates

(Table S4). Although the expression of the three partial sequences

(GH28-8, 10, and 14) was either very low or confirmed neither in

eggs nor in larvae, their partial sequences were found in the draft

genome. Multiple copies of GH28 family genes have been found in

a number of coleopteran species belonging to its two superfamilies

(Chrysomeloidea and Curculionoidea) [22,29]. The largest num-

ber (19 functional genes) was found in the mountain pine beetle

(Dendroctonus ponderosae) [40,45]. D. v. virgifera has the second largest

number, 10 (and 4 partial sequences), of GH28 family genes

(Figure 1). Our phylogenetic analysis based on currently available

sequences confirmed many species-specific duplications of GH28

family genes in coleopterans (Figure 4, blue branches).

Consistent with what was indicated by Pauchet et al. [22], our

phylogenetic analysis showed that GH28 family genes can be

divided into two clades. GH28 enzymes from Callosobruchus

maculatus (bean beetle) form a subgroup (B, Figure 4) and are

more closely related to bacterial GH28 enzymes (all Gram-

negative bacteria) (.83% bootstrap supports), while all other

beetle GH28 enzymes are more closely related to fungal and plant

bug (Hemiptera) enzymes. Although two plant bug species (Lygus

hesperus and Lygus lineolaris, Hemiptera) were reported to have

multiple GH28 family genes [46,47], we failed to identify GH28

candidate sequences in the ten insect genomes including two from

hemipterans Rhodnius prolixus (a blood-sucking bug) and Acyrthosi-

phon pisum (pea aphid). Among insects, except for the two plant bug

species, GH28 family genes were found only in two coleopteran

superfamilies (Chrysomeloidea and Curculionoidea). These insect

GH28 family genes except for those of C. maculatus are

phylogenetically nested within a fungal GH28 cluster (Figure 4).

Therefore, GH28 genes currently found in coleopterans and plant

bugs were most likely acquired by three independent HGT events:

from Gram-negative bacteria to C. maculatus, from a fungus to a

hemiptera, and from a fungus to an ancestral coleopteran before

the divergence of the two superfamilies.

GH16 FamilyWe identified nine GH16 family genes, which encode b-1,3-

glucanases, in the D. v. virgifera transcriptome: five full-length

(average 1124 bp, 374 aa) and four partial coding sequences

(Figure S3A). Their expressions were identified both in larval and

egg samples (Table S4). The most significantly highly expressed

gene (GH16-1) showed larval specific expression. GH16 family

genes are widely found in insects (e.g., [48–50]). Similarity searches

further confirmed a wide distribution of GH16 family genes within

metazoa including insects, mollusks (e.g., [51]), sea urchins (e.g.,

Strongylocentrotus purpuratus), as well as basal chordates (e.g., Ciona

intestinalis) but not in vertebrates. It was also found widely in fungi

Figure 2. The maximum-likelihood phylogeny of GH45 proteins. Forty seven GH45 protein sequences from eleven coleopteran species areincluded. Their species name abbreviations are found in Table S5. Labels for the coleopteran species belonging to the superfamily Curculionoidea areolive-colored and all other coleopteran sequences colored in black belong to the superfamily Chrysomeloidea. D. v. virgifera sequences are shown inred. Other sequences include: two mollusks (purple), Cryptopygus antarcticus (Collembola, black), Hypsibius dujardini (Tardigrada, black), 24 termite-symbiotic protists (dark green), 10 plant-parasitic nematodes (all are from Bursaphelenchus xylophilus, grey), representative fungi (chosen from 138sequences, cyan), and representative bacteria (chosen from 18 sequences, brown). Bacterial sequences were used as outgroups. The numbers atinternal branches show the bootstrap support values (%) for the maximum-likelihood and neighbor-joining phylogenies in this order. Supportingvalues are shown only when higher than 60%. Blue-colored branches indicate the species-specific gene duplications (based on currently availablesequences) within a cluster supported by higher than 70% of bootstrap values. The scale bar represents the number of amino acid substitutions persite. See Figure S2 for more details.doi:10.1371/journal.pone.0094052.g002

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 5 April 2014 | Volume 9 | Issue 4 | e94052

Page 6: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

and bacteria. An ortholog identified in the Antarctic springtail C.

antarcticus (CaLam) is believed to have originated from bacteria by

HGT [52].

Figure 5 shows the phylogeny of GH16 family protein

sequences from four coleopteran species (T. castaneum, Tenebrio

molitor, D. ponderosae, and D. v. virgifera) and other insects as well as

some other metazoans, fungi, and bacteria. As reported previously,

insects have a group of pattern recognition proteins that are

originated from a duplicated copy of GH16 family genes [48–50].

They are called Gram-negative bacteria-binding proteins (GNBPs)

Figure 3. The maximum-likelihood phylogeny of GH48 proteins. Twenty two GH48 protein sequences from seven coleopteran species areincluded. Their species name abbreviations are found in Table S5. Labels for the coleopteran species belonging to the superfamily Curculionoidea areolive-colored and all other coleopteran sequences colored in black belong to the superfamily Chrysomeloidea. D. v. virgifera sequences are shown inred. Other sequences include: representative bacteria (chosen from 653 sequences, brown) and 3 fungi (shown in cyan). Bacterial sequences wereused as outgroups. The numbers at internal branches show the bootstrap support values (%) for the maximum-likelihood and neighbor-joiningphylogenies in this order. Supporting values are shown only when higher than 60%. Blue-colored branches indicate the species-specific geneduplications (based on currently available sequences) within a cluster supported by higher than 70% of bootstrap values. The scale bar represents thenumber of amino acid substitutions per site.doi:10.1371/journal.pone.0094052.g003

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 6 April 2014 | Volume 9 | Issue 4 | e94052

Page 7: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 7 April 2014 | Volume 9 | Issue 4 | e94052

Page 8: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

or b-1,3-glucan recognition proteins (bGRPs) and are involved in

innate immune recognition [53–57]. These proteins (indicated as

‘‘GNBP’’ in Figure 5) have lost their original GH16 enzymatic

activity [56,58] and their active sites are not conserved (Figure

S3B; also see [48,49]). These proteins, however, contain a unique

conserved b-1,3-glucan binding domain in their N-terminal

regions [57,59,60]. Both types of genes, GH16 family genes with

conserved active sites as well as GNBP genes that contain the N-

terminal domain but no conserved active sites, have been

identified from the coleopteran species examined so far (T.

castaneum, T. molitor, D. ponderosae). Both types of genes were also

identified in our D. v. virgifera transcriptome. For the nine GH16

family gene candidates, the active site regions show highly

conserved patterns including two Glu residues (Figure S3A). We

also identified three potential GNBP genes from D. v. virgifera.

Consistent with GNBPs found in other insects, their active sites are

not conserved with Glu’s (Figure S3B). All but one GNBP gene

candidates were weakly expressed both in eggs and in larvae

(neonates and third-instar larval midguts) (Table S4), which is

consistent with the pattern found with Drosophila GNBP genes [53].

One gene (GNBP-3) does not have the conserved N-terminal

domain (Figure S3C); this gene may not function as a GNBP. No

expression was detected from this gene in larval samples.

The b-1,3-glucanase activity has been confirmed with GH16

enzymes identified from midguts of several insects including T.

molitor (TLam) [61], Spodoptera frugiperda (Lepidoptera) (SLam) [49],

Helicoverpa armigera (Lepidoptera) [48], Abracris flavolineata (Orthop-

tera) [62], Periplaneta americana (Blattodea) [63], as well as termites

(Isoptera) [58]. These insect, except for lepidopteran, enzymes can

lyse Saccharomyces cerevisea cells. For detritivorous insects such as T.

molitor, P. americana, and termites, their abilities of digesting fungal

cell walls may play roles in antifungal protection as well as in

nutrient acquisition. T. molitor’s midgut content is found to be

almost free from fungi [61]. Blocking one of these proteins in

termites accelerated and increased fungal infection [58]. b-1,3-

glucanase can also hydrolyze callose (b-1,3-glucan). Callose exists

in the cell walls of higher plants and plays important roles during

plant development. Callose deposition also is induced by a variety

of biotic and abiotic stresses such as wounding, pathogen infection,

aluminum, abscisic acid, and raised or lowered temperatures [64].

Therefore, it is plausible that D. v. virgifera larvae use this enzyme

(e.g., encoded by GH16-1, which has significantly high larval

expression) to digest callose.

GH5 FamilyA short sequence similar to part of GH5 family genes was

identified in the D. v. virgifera transcriptome (317 bp, 105 aa)

(Figure S4A). Among the 51 GH5 subfamilies [65], coleopteran

GH5 genes known so far belong to three subfamilies (2, 8, and 10)

(Figure S4B). The subfamily 8 gene found in the coffee berry borer

Hypothenemus hampei (Curculionoidea), however, was shown to be

bacterial origin [66]. The short D. v. virgifera GH5 sequence is

phylogenetically closer to fungal GH5 sequences belonging to the

subfamily 12 (Figure S4B). We should, however, note that we

failed to confirm the corresponding sequence in the draft D. v.

virgifera genome. Furthermore, the expression of this sequence was

not confirmed with confidence (Table S4). Therefore, we consider

the existence of a GH5 gene in D. v. virgifera to be inconclusive.

Absence of GH9 Family in D. v. virgiferaGH9 family genes have been identified in insect orders

Orthoptera, Blattaria, Phthiraptera, Hemiptera, Coleoptera, and

Hymenoptera [11,67]. However, no GH9 candidate sequence was

identified in the D. v. virgifera transcriptome (Figure 1). GH9 family

genes appear to be absent among chrysomelids and curculionids.

[11,67]. Among beetle species, a GH9 family gene is present in T.

castaneum (Tenebrionoidea) [25]. We also found a GH9 family gene

sequence from the transcriptome of the salt marsh beetle Pogonus

chalceus (Caraboidea, Adephaga). Because P. chalceus is placed as

the most basal species in Coleoptera [68] (Figure 1), GH9 family

genes were likely maintained in the common ancestor of

Coleoptera and the lineage leading to the superfamily Tenebrio-

noidea. GH9 family genes must have been subsequently lost in the

common ancestor of Chrysomeloidea and Curculionoidea.

We confirmed that three GH families (GH45, GH48, and

GH28) are absent from the transcriptomes of P. chalceus [69] and

the genome of T. castaneum [70] (Figure 1). The loss of GH9 and

gain of GH45, GH48, and GH28 families, therefore, can be traced

back at least to the common ancestor of chrysomelids and

curculionids. Although GH9 and three enzymes (GH48, GH45,

and GH28) do not share sequence similarities and have different

3D structural features (CAZy classifies GH48 in GH-M and GH28

in GH-N clans; GH9 and GH45 are not classified), Watanabe and

Tokuda [11] suggested, for example, a possible convergent

evolution in terms of enzymatic function based on the same

substrate specificities (e.g., b-1,4 linkages) with GH9 and GH45

enzymes. GH9 and GH28 enzymes utilize the inverting glycosi-

dase mechanism, which only allows polysaccharide hydrolysis

[71]. Thus, their functional similarities may have allowed the

laterally acquired genes to replace the role of the lost GH9

enzymes.

GH1 FamilyThe hydrolysis of cellulose is completed by b-glucosidases,

which hydrolyze cellobiose and oligosaccharides to glucose. The

GH1 family, the largest of GH families that encode b-glucosidase

activities, has been identified widely in insects (e.g., [72–74]). From

the D. v. virgifera transcriptome, we identified twenty-eight GH1

gene candidates (23 intact and 5 partial; Figure S5A). Multiple

copies of GH1 gene candidates were found also in other

coleopteran species (e.g., 19 in D. ponderosae and 15 in T. castaneum)

(Figure 1). While multiple GH1 family genes are regularly found in

eukaryotic organisms, this family is particularly expanded in

coleopteran species. Phylogenetic analysis shows that all insect

GH1 family proteins are monophyletic (although bootstrap

support is marginal; 76% only by the maximum-likelihood

phylogeny) (Figure S5B). Consistent with previous studies, GH1

family proteins form clusters according to each domain of life

Figure 4. The maximum-likelihood phylogeny of GH28 proteins. Eighty four GH28 protein sequences from eight coleopteran species areincluded. Their species abbreviations are found in Table S5. Labels for the coleopteran species belonging to the superfamily Curculionoidea are olive-colored and all other coleopteran sequences colored in black belong to the superfamily Chrysomeloidea. D. v. virgifera sequences are shown in red.Other sequences include: plant bugs (Lygus hesperus and Lygus lineolaris), representative fungi (chosen from 651 sequences, cyan), representativebacteria (chosen from 42 sequences, brown), and representative plants (chosen from 491 sequences, green). Bacterial sequences were used asoutgroups. The numbers at internal branches show the bootstrap support values (%) for the maximum-likelihood and neighbor-joining phylogeniesin this order. Supporting values are shown only when higher than 60%. Blue-colored branches indicate the species-specific gene duplications (basedon currently available sequences) within a cluster supported by higher than 70% of bootstrap values. The scale bar represents the number of aminoacid substitutions per site.doi:10.1371/journal.pone.0094052.g004

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 8 April 2014 | Volume 9 | Issue 4 | e94052

Page 9: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

Figure 5. The maximum-likelihood phylogeny of GH16 proteins. Sixteen GH16 protein sequences from four coleopteran species areincluded. Labels for the coleopteran species belonging to the superfamily Curculionoidea, D. v. virgifera, and other beetle sequences are shown inolive, red, and orange, respectively. Their species abbreviations are found in Table S5. Arthropod, other metazoan, fungal (6 chosen from 222sequences), and bacterial (5 chosen from 977 sequences) sequences are indicated by black, purple, cyan, and brown, respectively. Bacterial sequenceswere used as outgroups. The numbers at internal branches show the bootstrap support values (%) for the maximum-likelihood and neighbor-joining

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 9 April 2014 | Volume 9 | Issue 4 | e94052

Page 10: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

[75,76]. GH1 family expansion appears to have happened

independently in many different lineages. For example, four b-

glucosidases have been identified in the midgut of the yellow

mealworm T. molitor, and their enzyme specificities and efficiencies

differ slightly [72,77]. The majority of GH1 gene candidates we

identified from D. v. virgifera have high expression in larvae,

especially in the third-instar larval midgut, but not in eggs (Table

S4). Therefore, these GH1 gene products likely participate in

digesting various plant materials.

GH31 and GH27 FamiliesIn addition to cellulases and other plant cell wall digesting

enzymes, we also searched genes encoding GH31 (a-glucosidase)

and GH27 (a-galactosidase) families. These enzymes share a

common (b/a)8 (TIM) barrel catalytic domain and belong to the

same GH-D clan [78]. In mosquitoes, an a-glucosidase has been

shown to function as a receptor of Bin toxin from Bacillus sphaericus

as well as Cry11Ba toxin from B. thuringiensis subsp. jegathesan

[79,80]. Ten GH31 and two GH27 family gene candidates were

identified from the D. v. virgifera transcriptome (Figures S6A and

S7A). All of these genes are highly expressed especially in the

third-instar larval midgut (Table S4). Both families are found in a

wide range of organisms from bacteria to eukaryotes (Figures S6B

and S7B). Wheeler et al. [81] showed two lepidopteran GH31-

related sequences as HGT origin from bacteria

(BGIBMGA013995 and Px016165 in Figure S6B). We found no

such evidence in search of GH31 family genes in D. v. virgifera as

well as in D. ponderosae transcriptomes.

GH Family Gene ExpressionWe compared the expression levels of GH family gene

candidates we identified from the D. v. virgifera transcriptomes

between egg and larval samples. Almost all were expressed

significantly more in larval stages. We found that the majority of

GH45, GH28, GH1, GH31, and GH27 family genes are

expressed more in the third-instar larval midgut samples compared

to egg and neonate samples, with some genes particularly standing

out (GH45-4, GH45-7, GH45-10, GH28-6, GH1-18, GH27-1,

GH31-7) (Table S4). Gene expression and enzyme activity of

polygalacturonase have been reported from the gut of another

corn rootworm species, Diabrotica undecimpunctata howardi (spotted

cucumber beetle) [82]. GH28 and GH45 family genes are

expressed more in the guts of P. cochleariae larvae and adults

[83]. Polygalacturonases are known to loosen the primary cell wall

and make cellulose-hemicellulose network more accessible to

enzymatic digestion [84]. With its high number of GH45, GH28,

and GH1 genes and their high expression in larval midgut tissue,

D. v. virgifera may utilize b-1,4-endoglucanase, polygalacturonase,

as well as b-glucosidase activities in larval midgut to assist in the

digestion of corn root cell walls and in releasing dietary

monosaccharides such as glucose.

Horizontal Gene Transfer of GH Family GenesOur current study indicates that the three GH gene families

(GH45, GH48, and GH28) are unique to the two coleopteran

superfamilies (Chrysomeloidea and Curculionoidea) and generally

absent from other insects except in plant bugs (GH28) and in a

springtail (GH45). These results imply that these genes are likely

not vertically inherited from the ancestral species but acquired by

HGT events from bacteria or fungi to the common ancestor of

chrysomelid and curculionid beetles.

As mentioned before, the GH5 family gene (HhMAN1) identified

from the coffee berry borer H. hampei is thought to be bacterial

origin [66]. This gene was not found in two other related species,

H. obscurus (topical nut borer) and Araecerus fasciculatus (coffee bean

weevil, Anthribidae, Coleoptera). H. obscurus is a pest of

macadamia nuts but not coffee [85]. A. fasciculatus is polyphagous,

a common pest of stored food products including coffee [86]. In

contrast, H. hampei is mainly a coffee pest, although it may not be

strictly monophagous [87,88]. Therefore, acquisition of HhMAN1

from bacteria may have made a rapid adaptation possible for H.

hampei by enabling hydrolysis of galactomannan, the major

nutrient source for this species [66]. Other examples of possible

HGTs of GH family genes include: GH5 and GH11 family genes

in rumen fungi from rumen bacteria Fibrobacter succinogenes [43], a

GH16 family gene in C. antarcticus from bacteria [52], lepidopteran

GH31-like genes from Enterococcus bacteria [81], and GH11 family

genes in P. cochleariae from c-proteobacteria [24]. We also found

evidence of several independent HGT events such as fungal GH48

family genes and plant bug GH28 family genes. Although HGT

events are often detected in prokaryotes [89], GH families seem to

be characterized by frequent HGT events in various animals

especially in insects. Such acquisitions followed by frequent

duplications of these GH genes must have contributed to these

organisms’ ability to adapt to novel niches.

Conclusion

We have identified eight GH family genes from the transcrip-

tomes of D. v. virgifera. Three GH families (GH45, GH48, and

GH28) were likely to have been obtained by HGT events before

the divergence of chrysomelid and curculionid beetles. Rapid

birth-and-death processes have been also observed among these

coleopteran GH family genes. A large number of GH family

enzymes owing to their lineage-specific duplications in D. v. virgifera

could have contributed to the successful adaptation to its niche by

providing more efficient hydrolyzation of corn cell walls.

Materials and Methods

Sample Collection and PreparationEggs. Two thousands freshly hatched non-diapause D. v.

virgifera eggs (ten Petri dishes, ,200 eggs/Petri dish) were

purchased from Crop Characteristics, Inc. (Farmington, Minne-

sota, USA). All ten Petri dishes were wrapped with aluminum foil

and placed in a growth chamber for incubation at 27uC. One Petri

dish was removed from incubator on each day and the eggs were

isolated with a 60-mesh sieve. Briefly, the soil with eggs were

rinsed with tap water until soil was removed completely, and the

isolated eggs were washed with double distilled water before being

transferred into a 1.7 ml centrifuge tube. The water was removed

with pipette and eggs were snap-frozen in liquid nitrogen and

stored in 280uC freezer. All other Petri dishes were processed in

the same way until the day 10.Neonates. A Petri dish containing 10,000 eggs was pur-

chased from Crop Characteristics, Inc. (Farmington, Minnesota,

USA) and placed in a growth chamber at 27uC with LD 16:8

photoperiod until hatching. The eggs were isolated from soil with

methods described above. The clean eggs were rinsed with double

phylogenies in this order. Supporting values are shown only when higher than 60%. Blue-colored branches indicate the species-specific geneduplications (based on currently available sequences) within a cluster supported by higher than 70% of bootstrap values. The scale bar represents thenumber of amino acid substitutions per site.doi:10.1371/journal.pone.0094052.g005

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 10 April 2014 | Volume 9 | Issue 4 | e94052

Page 11: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

distilled water three times before transferring to a new egg Petri

dish (60615 mm) with moistened filter paper (42.5 mm). Finally,

the Petri dish was placed back to the same growth chamber for

more neonates to hatch. The freshly hatched (less than 24 hrs old)

neonates were collected, snap-frozen in liquid nitrogen, and stored

at 280uC freezer.

Preparation of midgut from third instar larvae. Fifty

third-instar larvae purchased from Crop Characteristics, Inc.

(Farmington, Minnesota, USA) were dissected for midgut tissue

under dissection microscope. Briefly, the head, thorax, and last

two segments of abdomen were removed with a scalpel and the

midgut was pulled from the carcass with forceps. The fat body and

other liquids were carefully removed by pulling the midgut on a

clean Kimtech Science Precision Wipes Tissue Wipers (Fisher).

The midgut was then opened longitudinally and gut contents were

removed by rinsing 3 times with 1X PBS buffer (pH 7.4). The

clean midgut tissue was snap-frozen in liquid nitrogen and saved at

280uC until RNA extraction. Sixteen midgut tissues were pooled

as one replicate with 3 replicates in total.

RNA extraction. Approximately 35 mg of pooled midgut

tissue was used directly in a single RNA preparation while ,32

pooled neonates were used for single RNA extraction. For egg

RNA preparation, ,35 mg of pooled eggs from day 1 to day

10 were used in single RNA preparation. Three replicates for eggs

and midgut dissection and six replicates for neonates were

prepared. The total RNAs were extracted with RNeasy mini kit

(Qiagen, Cat. 74104) and treated with RNase-free DNase set

(Qiagen, Cat. 79254) for potential genomic DNA contamination

by following the manufactures’ instructions. The quality and

quantity of RNA were evaluated on 1% agarose gel and

NanoDrop 1000 (Thermo) for further analysis.

Next Generation Sequencing and Assembly of D. v.virgifera Transcriptomes

Next generation sequencing. The 454 pyrosequencing

experiments of larval midgut samples were completed using

Roche GS-FLX titanium sequencer at the Core for Applied

Genomics and Ecology, University of Nebraska-Lincoln. The

transcriptome sequencing for the egg and larval midgut samples

with an insert size of 300 bp was done on Illumina Genome

Analyzer II platform at the Center for Biotechnology, University

of Nebraska-Lincoln. The neonate samples were sequenced with

an insert size of 500 bp on Illumina HiSeq2000 system at the

Durham Research Center, University of Nebraska Medical

Center. In total, 16.6 gigabases (Gb) (read length 75 bp) of egg

RNA, 33 Gb (read length 75 bp) of larval midgut RNA, and

662 Gb (read length 101 bp) of neonate RNA were sequenced. All

454 and Illumina reads were deposited into the NCBI Sequence

Read Archive (SRA) under the accession number SRP037561.

Filtering of low quality reads. Because sequencing errors

can cause difficulties for the assembly algorithm, we applied a

stringent quality filter process. For 454 reads, the adapter and

poly(A/T) sequences were trimmed using PRINSEQ [90]. 454

reads that have abnormal read length (,50 bp or .1000 bp) or

where the average quality was less than 20 were removed. The

Illumina paired-end reads that did not have the minimum quality

score (20 per base for egg and midgut samples or 30 per base for

neonate samples) across the whole read were removed using

PRINSEQ [90] and Sickle (ver. 1.2) [91]. Note that the quality

scores of 20 (Q20) and 30 (Q30) correspond to 1% and 0.1%

expected error rates, respectively. We removed all Illumina reads

that have any unknown nucleotide ‘N’.

de novo transcriptome assembly. After the filtering

process, we performed de novo transcriptome assembly for each

of three samples. We used four different short read assemblers:

Newbler (ver. 2.5) (Roche, 454 Life Sciences; used only for

454 read assembly), Mira (ver. 3.4.0) [92], Velvet/Oasis (ver.

1.2.03) [93], and Trinity (release 2013-02-25) [28]. The k-mer size

of 25 was used for all programs. Mira could be used only for

454 read assembly from the third instar larval samples and for the

Illumina read assembly from the egg samples due to the large

memory requirement. The results of these assemblies are

summarized in Table S1. The number of assembled transcripts

varied among the different assemblers, ranging from 37,181 by

Trinity to 165,361 for Velvet/Oasis for 454 reads (larval midgut

sample) and from 56,135 by Velvet/Oasis to 72,638 by Trinity for

Illumina reads (egg sample). The average length and N50 of

contigs were generally longer with the Trinity assembly (Tables S1

and S2). Results of NCBI BLAST similarity search (blastx, ver.

2.2.26+) [94,95] against the UniProt protein database (http://

www.uniprot.org) [96] showed that fractions of contigs that had

highly significant hits (E-value #102100) were larger with the

Trinity (18.9%) and Velvet/Oasis assemblies (,19.4%) than the

Mira assembly (11%) although the difference was not significant

(P.0.5 by t-test between Trinity and Mira) (Figure S8). Note that

Zhao et al. [97] showed the highest accuracy with Trinity among

methods specialized in de novo transcriptome assemblies such as

SOAPdenovo [98], ABySS [99], Velvet/Oasis, and Trinity (they

did not include Mira in their comparison). We also attempted the

hybrid assemblies using two different sequencing platforms (454

and Illumina Genome Analyzer II) for the third instar larval

midgut sample as well as for the pooled egg and third instar larval

midgut samples (Table S3). Furthermore, we performed assembly

using the dataset pooled from egg (produced by Illumina Genome

Analyzer II), third instar larval midgut (produced by Illumina

Genome Analyzer II), and neonate samples (produced by Illumina

HiSeq2000). Among all of these assemblies, the Trinity assembly

using the pooled Illumina dataset had the longest average length of

contigs and N50, even longer than the hybrid assemblies including

454 reads (Table 1). With this assembly, more GH family gene

candidates were also identified. Therefore, we used this Trinity

assembly using the pooled dataset as the most inclusive ‘‘combined

D. v. virgifera transcriptome’’ for all our further studies.

Sequence AnalysisGene expression analysis. To compare the gene expression

levels, the paired-end reads were mapped onto our combined D. v.

virgifera transcriptome using bowtie (ver. 1.0.0) [100] with 0

mismatch. The numerical values of gene expression were

measured by RPKM (reads per kilobase per million mapped

reads) to normalize for the number of sequencing reads and total

read length [101]. RPKM values above 0.3 [102] as well as having

more than 10 reads was used as the threshold for gene expression.

Identification of GH family genes from the D. v. virgifera

transcriptome. Previously reported insect, especially coleop-

teran, GH family gene sequences were obtained for GH45

[22,23,31], for GH48 [22,39], for GH28 [22], for GH9 [25], for

GH5 [66], for GH11 [24], for GH16 [61], for GH1 [72,73], for

GH27 (XP_973339.2; this entry is incorrectly shown to be a

GH31 family enzyme), and for GH31 [81] (see Table S5). Using

these sequences as initial queries, we searched GH family gene

candidates against our combined D. v. virgifera transcriptome using

NCBI BLAST (tblastn, ver. 2.2.26+) [94,95]. The initial E-value

threshold (161026) used was rather lenient and chosen to be

inclusive of all true positives even though some false positives from

non-target genes may have been included. With recurrent

phylogenetic analyses and BLAST (blastp) similarity searches, we

identified protein sequences for each GH family. Reciprocal

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 11 April 2014 | Volume 9 | Issue 4 | e94052

Page 12: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

similarity search (using blastp) was further performed against the

NCBI NR protein database to confirm GH family associations. All

D. v. virgifera GH gene candidate sequences were also confirmed by

BLAST (tblastn, E- value #10230) similarity search against the

draft D. v. virgifera genome sequence (Hugh M. Robertson,

personal communication). The criterion we used to identify

alternative spliced isoforms was to have a more than 60 bp of

100% identical region among the candidate sequences. We did not

find any potential alternative spliced isoforms for the GH gene

sequences we identified.

GH family gene search. In order to identify GH family

sequences from a wide range of insects and other organisms, we

performed BLAST similarity searches (blastp, E- value #10230)

using D. v. virgifera and other representative GH gene sequences as

queries against the NCBI NR protein database (http://www.ncbi.

nlm.nih.gov) as well as against ten insect genomes (Drosophila

melanogaster, Anopheles gambiae, Aedes aegypti, Bombyx mori, Apis mellifera,

Nasonia vitripennis, Solenopsis invicta, Ixodes scapularis, Rhodnius prolixus,

and Acyrthosiphon pisum). For all BLAST similarity searches, in order

to obtain comparable E-values, the database size was set to

1.161010 (using the ‘-dbsize’ option), which is based on the

database size equivalent to the NCBI NR database.

Multiple sequence alignments and phylogenetic

analysis. Multiple alignments of GH family protein sequences

were generated using MAFFT (ver. 7.050b) with the L-INS-i

algorithm, which uses a consistency-based objective function and

local pairwise alignment with affine gap costs [103]. Phylogenetic

relationships were reconstructed by the maximum-likelihood

method using RAxML (ver. 7.0.4) [104] with the PROTGAM-

MAJTT substitution model (JTT matrix with gamma-distributed

rate variation). The neighbor-joining phylogenies [105] were

reconstructed by using neighbor of the Phylip package (ver. 3.69)

[106]. The protein distances were estimated using protdist of the

Phylip package with the JTT model. Non-parametric boot-

strapping with 1000 pseudoreplicates [107] was used to estimate

the confidence of branching patterns. FigTree (http://tree.bio.ed.

ac.uk/software/figtree) was used to display the phylogenetic trees.

The nucleotide sequences of all GH family genes identified and

the alignments used for the phylogenetic analysis in this study are

available from our website: http://bioinfolab.unl.edu/emlab/

GH/.

Supporting Information

Figure S1 GH family gene sequences identified from theD. v. virgifera transcriptome. Amino acid sequences of

GH45 (A), GH48 (B), and GH28 (C) are shown in alignments. The

labels for partial sequences are shown in italics. Potential residues

for the catalytic nucleophile and the proton donor are highlighted

with magenta and green, respectively (based on Sakamoto and

Toyohara, 2009, Comp Biochem Physiol B 152: 390; Parsiegla et al.,

2008, J Mol Biol 375: 499; van Santen et al., 1999, J Biol Chem

274: 30474).

(PDF)

Figure S2 The maximum-likelihood phylogeny of GH45family proteins. Labels for the coleopteran species belonging to

the superfamily Curculionoidea are olive-colored and all other

coleopteran sequences colored in black belong to the superfamily

Chrysomeloidea. D. v. virgifera sequences are shown in red. Their

species name abbreviations are found in Table S5. Other

sequences are colored as follows: mollusks (purple), Cryptopygus

antarcticus (Collembola, black), Hypsibius dujardini (Tardigrada,

black), protists (dark green), plant-parasitic nematodes (grey),

fungi (cyan), and bacteria (brown). The scale bar represents the

number of amino acid substitutions per site.

(PDF)

Figure S3 Multiple alignments of D. v. virgifera GH16family protein sequences. A. GH16 family proteins identified

from the D. v. virgifera transcriptome. B. The active site region

sequences. GNBP sequences are boxed. The catalytic nucleophile

and proton donor residues are highlighted with magenta and

green, respectively (based on Viladot et al. 1998, Biochemistry 34:

11332). C. The N-terminal conserved domain sequences of

GNBPs. Residue shown to be within hydrogen-binding distances

and involved in hydrophilic interaction with lamitrihexaoses from

Plodia interpunctella and Bombyx mori proteins are highlighted with

yellow and Arg’s involved in binding of triplex b-glucan are

highlighted in light blue (based on Kanagawa et al., 2011, J Biol

Chem 286: 19158).

(PDF)

Figure S4 Multiple alignment of the potential GH5family protein sequence identified from D. v. virgiferawith four fungal GH5 proteins (A) and the maximum-likelihood phylogeny including other known GH5 familyproteins (B). The potential amino acid residues for the catalytic

nucleophile and catalytic proton donor are highlighted with

magenta and green in the alignment, respectively (based on

Larsson et al. 2006, J Mol Biol 357: 1500). Coleopteran proteins

included in the phylogeny are found in Table S5. The D. v.

virgifera sequence is shown in red. The GH5 protein sequences

are classified into subfamilies according to Aspeborg et al. (2012,

BMC Evol Biol 12: 186). Bacterial, plant, fungal, and nematode

sequences are indicated by brown, green, cyan, and grey. The

numbers at internal branches show the bootstrap support values

(%) for the maximum-likelihood and neighbor-joining phylogenies

in this order. Only bootstrap values higher than 70% are shown.

(PDF)

Figure S5 GH1 family gene sequences identified fromthe D. v. virgifera transcriptome (A) and the maximum-likelihood phylogeny of GH1 family proteins (B). In the

alignment, the labels for partial sequences are shown in italics.

Potential residues for the catalytic nucleophile and the proton are

highlighted with magenta and green, respectively (based on

Marana et al., 2001, Biochim Biophys Acta 1545: 41; Scharf et al.

2010, Insect Biochem Mol Biol 40: 611). In the phylogeny, labels for

the coleopteran species belonging to the superfamily Curculionoi-

dea, D. v. virgifera, and other beetle sequences are shown in olive,

red, and orange, respectively. Their species abbreviations are

found in Table S5. Arthropod, other metazoan, nematode, fungal,

plant, and bacterial sequences are indicated by black, purple, grey,

cyan, green, and brown, respectively. The numbers at internal

branches show the bootstrap support values (%) for the

maximumlikelihood and neighbor-joining phylogenies in this

order. Supporting values are shown only when higher than

60%. The scale bar represents the number of amino acid

substitutions per site.

(PDF)

Figure S6 GH31 family gene sequences identified fromthe D. v. virgifera transcriptome (A) and the maximum-likelihood phylogeny of GH31 family proteins (B). In the

alignment, the labels for partial sequences are shown in italics. In

the phylogeny, labels for the coleopteran species belonging to the

superfamily Curculionoidea, D. v. virgifera, and other beetle

sequences are shown in olive, red, and orange, respectively. Their

species abbreviations are found in Table S5. Arthropod, other

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 12 April 2014 | Volume 9 | Issue 4 | e94052

Page 13: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

metazoan, nematode, fungal, plant, and bacterial sequences are

indicated by black, purple, grey, cyan, green, and brown,

respectively. The accession numbers shown in parenthesis are

from NCBI except for: BGIBMGA012077-PA and

BGIBMGA013995 from SilkDB (http://www.silkdb.org),

DPOGS202361 from MonarchBase (http://monarchbase.

umassmed.edu), and Px016165 from Diamondback moth Genome

Database (http://59.79.254.1/DBM/). The numbers at internal

branches show the bootstrap support values (%) for the maximum-

likelihood and neighbor-joining phylogenies in this order.

Supporting values are shown only when higher than 60%. The

scale bar represents the number of amino acid substitutions per

site.

(PDF)

Figure S7 GH27 family gene sequences identified fromthe D. v. virgifera transcriptome (A) and the maximum-likelihood phylogeny including representative GH27family proteins (B). Labels for the coleopteran species

belonging to the superfamily Curculionoidea, D. v. virgifera, and

other beetle sequences are shown in olive, red, and orange,

respectively. Their species abbreviations are found in Table S5.

Arthropod, other metazoan, nematode, fungal, plant, and

bacterial sequences are indicated by black, purple, grey, cyan,

green, and brown, respectively. Bacterial sequences were used as

outgroups. The numbers at internal branches show the bootstrap

support values (%) for the maximum-likelihood and neighbor-

joining phylogenies in this order. Supporting values are shown

only when higher than 60%. The scale bar represents the number

of amino acid substitutions per site.

(PDF)

Figure S8 The distribution of E-values obtained fromblastx similarity search against the UniProt proteindatabase using the assemblies generated by threeprograms using the D. v. virgifera egg samples. The

numbers of contigs are 18,173 in Mira (blue), 11,035 in Trinity

(red), and 9843 in Velvet/Oasis (green). E-values are shown as 2

log10 (E-value) except for E-value = 0. Note that there is no

significant difference between Trinity and Mira (t-test P.0.5 for

both Evalue #102100 and for all E-values).

(PDF)

Table S1 Summary statistics for D. v. virgifera tran-scriptome sequencing and assembly.

(PDF)

Table S2 Summary of D. v. virgifera transcriptomesequencing and assemblies.

(PDF)

Table S3 Summary statistics for hybrid and pooled-data assembly of D. v. virgifera transcriptome.

(PDF)

Table S4 Expression analysis of the D. v. virgifera GHfamily genes identified in this study.

(PDF)

Table S5 Coleopteran GH family gene sequences usedin this study.

(PDF)

Acknowledgments

We would like to thank Drs. Kim Walden and Hugh M. Robertson

(University of Illinois at Urbana-Champaign, USA) for letting us use the

draft genome sequence of D. v. virgifera and Steven M. Van Belleghem for

the transcriptome of P. chalceus.

Author Contributions

Conceived and designed the experiments: BDS SE EM AKB. Performed

the experiments: SE YP HW. Analyzed the data: SE EM HW. Contributed

reagents/materials/analysis tools: YP Rf AV. Wrote the paper: SE EM

BDS. Prepared library for 454 sequencing: YP Rf.

References

1. Wei H, Xu Q, Taylor LE 2nd, Baker JO, Tucker MP, et al. (2009) Natural

paradigms of plant cell wall degradation. Curr Opin Biotechnol 20: 330–338.

2. Tomme P, Warren RAJ, Gilkes NR (1995) Cellulose hydrolysis by bacteria and

fungi. In: Poole RK, editor. Advances in Microbial Physiology: Academic

Press. 1–81.

3. Kamm B, Kamm M (2004) Principles of biorefineries. Appl Microbiol

Biotechnol 64: 137–145.

4. Wyman CE (2003) Potential synergies and challenges in refining cellulosic

biomass to fuels, chemicals, and power. Biotechnol Prog 19: 254–262.

5. Zhang Y-HP, Lynd LR (2005) Cellulose utilization by Clostridium thermocellum:

bioenergetics and hydrolysis product assimilation. Proc Natl Acad Sci USA

102: 7321–7325.

6. Cleveland LR (1924) The physiological and symbiotic relationships between

the intestinal protozoa of termites and their host, with special reference to

Reticulitermes flavipes Kollar. Biol Bull Mar Biol Lab 46: 117–227.

7. Martin M (1991) The evolution of cellulose digestion in insects. Philos

Trans R Soc Lond Ser B 333: 281–288.

8. Breznak JA, Brune A (1994) Role of microorganisms in the digestion of

lignocellulose by termites. Annu Rev Entomol 39: 453–487.

9. Watanabe H, Noda H, Tokuda G, Lo N (1998) A cellulase gene of termite

origin. Nature 394: 330–331.

10. Smant G, Stokkermans JP, Yan Y, de Boer JM, Baum TJ, et al. (1998)

Endogenous cellulases in animals: isolation of beta-1,4-endoglucanase genes

from two species of plant-parasitic cyst nematodes. Proc Natl Acad Sci USA 95:

4906–4911.

11. Watanabe H, Tokuda G (2010) Cellulolytic systems in insects. Annu Rev

Entomol 55: 609–632.

12. Calderon-Cortes N, Quesada M, Watanabe H, Cano-Camacho H, Oyama K

(2012) Endogenous plant cell wall digestion: a key mechanism in insect

evolution. Annu Rev Ecol, Evol Syst 43: 45–71.

13. Nakashima K, Watanabe H, Saitoh H, Tokuda G, Azuma JI (2002) Dual

cellulose-digesting system of the wood-feeding termite, Coptotermes formosanus

Shiraki. Insect Biochem Mol Biol 32: 777–784.

14. Zhou X, Smith JA, Oi FM, Koehler PG, Bennett GW, et al. (2007) Correlation

of cellulase gene expression and cellulolytic activity throughout the gut of the

termite Reticulitermes flavipes. Gene 395: 29–39.

15. Warnecke F, Luginbuhl P, Ivanova N, Ghassemian M, Richardson TH, et al.

(2007) Metagenomic and functional analysis of hindgut microbiota of a wood-

feeding higher termite. Nature 450: 560–565.

16. Tokuda G, Watanabe H (2007) Hidden cellulases in termites: revision of an old

hypothesis. Biol Lett 3: 336–339.

17. Tokuda G, Watanabe H, Lo N (2007) Does correlation of cellulase gene

expression and cellulolytic activity in the gut of termite suggest synergistic

collaboration of cellulases? Gene 401: 131–134.

18. Scharf ME, Karl ZJ, Sethi A, Boucias DG (2011) Multiple levels of synergistic

collaboration in termite lignocellulose digestion. PloS one 6: e21709.

19. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, et al. (2009)

The Carbohydrate-Active EnZymes database (CAZy): an expert resource for

glycogenomics. Nucleic Acids Res 37: D233–D238.

20. Ketudat Cairns JR, Esen A (2010) b-Glucosidases. Cell Mol Life Sci 67: 3389–

3405.

21. Chang CJ, Wu CP, Lu SC, Chao AL, Ho TH, et al. (2012) A novel exo-

cellulase from white spotted longhorn beetle (Anoplophora malasiaca). Insect

Biochem Mol Biol 42: 629–636.

22. Pauchet Y, Wilkinson P, Chauhan R, ffrench-Constant RH (2010) Diversity of

beetle genes encoding novel plant cell wall degrading enzymes. PloS one 5:

e15635.

23. Calderon-Cortes N, Watanabe H, Cano-Camacho H, Zavala-Paramo G,

Quesada M (2010) cDNA cloning, homology modelling and evolutionary

insights into novel endogenous cellulases of the borer beetle Oncideres

albomarginata chamela (Cerambycidae). Insect Mol Biol 19: 323–336.

24. Pauchet Y, Heckel DG (2013) The genome of the mustard leaf beetle encodes

two active xylanases originally acquired from bacteria through horizontal gene

transfer. Proc R Soc B 280: 20131021.

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 13 April 2014 | Volume 9 | Issue 4 | e94052

Page 14: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

25. Willis JD, Oppert B, Oppert C, Klingeman WE, Jurat-Fuentes JL (2011)Identification, cloning, and expression of a GHF9 cellulase from Tribolium

castaneum (Coleoptera: Tenebrionidae). J Insect Physiol 57: 300–306.

26. Valencia A, Alves AP, Siegfried BD (2013) Molecular cloning and functional

characterization of an endogenous endoglucanase belonging to GHF45 fromthe western corn rootworm, Diabrotica virgifera virgifera. Gene 513: 260–267.

27. Siegfried BD, Waterfield N, Ffrench-Constant RH (2005) Expressed sequence

tags from Diabrotica virgifera virgifera midgut identify a coleopteran cadherin and adiversity of cathepsins. Insect Mol Biol 14: 137–143.

28. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, et al. (2011)

Full-length transcriptome assembly from RNA-Seq data without a referencegenome. Nat Biotech 29: 644–652.

29. Girard C, Jouanin L (1999) Molecular cloning of a gut-specific chitinase cDNA

from the beetle Phaedon cochleariae. Insect Biochem Mol Biol 29: 549–556.

30. Eigenheer AL, Keeling CI, Young S, Tittiger C (2003) Comparison of generepresentation in midguts from two phytophagous insects, Bombyx mori and Ips

pini, using expressed sequence tags. Gene 316: 127–136.

31. Lee SJ, Kim SR, Yoon HJ, Kim I, Lee KS, et al. (2004) cDNA cloning,expression, and enzymatic activity of a cellulase from the mulberry longicorn

beetle, Apriona germari. Comp Biochem Physiol B: Biochem Mol Biol 139: 107–116.

32. Gao Y, Bu Y, Luan Y-X (2008) Phylogenetic relationships of basal hexapods

reconstructed from nearly complete 18S and 28S rRNA gene sequences. Zool

Sci 25: 1139–1145.

33. Davison A, Blaxter M (2005) Ancient origin of glycosyl hydrolase family 9

cellulase genes. Mol Biol Evol 22: 1273–1284.

34. Ohtoko K, Ohkuma M, Moriya S, Inoue T, Usami R, et al. (2000) Diversegenes of cellulase homologues of glycosyl hydrolase family 45 from the

symbiotic protists in the hindgut of the termite Reticulitermes speratus.

Extremophiles 4: 343–349.

35. Todaka N, Inoue T, Saita K, Ohkuma M, Nalepa CA, et al. (2010)Phylogenetic analysis of cellulolytic enzyme genes from representative lineages

of termites and a related cockroach. PloS one 5: e8636.

36. Li L, Frohlich J, Pfeiffer P, Konig H (2003) Termite gut symbiotic archaezoaare becoming living metabolic fossils. Eukaryot Cell 2: 1091–1098.

37. Xu B, Janson J-C, Sellos D (2001) Cloning and sequencing of a molluscan

endo-b-1,4-glucanase gene from the blue mussel, Mytilus edulis. Eur J Biochem268: 3718–3727.

38. Harada Y, Hosoiri Y, Kuroda R (2004) Isolation and evaluation of dextral-

specific and dextral-enriched cDNA clones as candidates for the handedness-determining gene in a freshwater gastropod, Lymnaea stagnalis. Dev Genes Evol

214: 159–169.

39. Fujita K, Shimomura K, Yamamoto K, Yamashita T, Suzuki K (2006) Achitinase structurally related to the glycoside hydrolase family 48 is

indispensable for the hormonally induced diapause termination in a beetle.

Biochem Biophys Res Commun 345: 502–507.

40. Keeling CI, Henderson H, Li M, Yuen M, Clark EL, et al. (2012)

Transcriptome and full-length cDNA resources for the mountain pine beetle,

Dendroctonus ponderosae Hopkins, a major insect pest of pine forests. InsectBiochem Mol Biol 42: 525–536.

41. Berger E, Zhang D, Zverlov VV, Schwarz WH (2007) Two noncellulosomal

cellulases of Clostridium thermocellum, Cel9I and Cel48Y, hydrolyse crystallinecellulose synergistically. FEMS Microbiol Lett 268: 194–201.

42. Ljungdahl LG (2008) The cellulase/hemicellulase system of the anaerobic

fungus orpinomyces PC-2 and aspects of its applied use. Ann N Y Acad Sci1125: 308–321.

43. Garcia-Vallve S, Romeu A, Palau J (2000) Horizontal gene transfer of glycosyl

hydrolases of the rumen fungi. Mol Biol Evol 17: 352–361.

44. Hung YL, Chen HJ, Liu JC, Chen YC (2012) Catalytic efficiencydiversification of duplicate beta-1,3–1,4-glucanases from Neocallimastix patri-

ciarum J11. Appl Environ Microbiol 78: 4294–4300.

45. Keeling C, Yuen M, Liao N, Docking T, Chan S, et al. (2013) Draft genome ofthe mountain pine beetle, Dendroctonus ponderosae Hopkins, a major forest pest.

Genome Biology 14: R27.

46. Celorio-Mancera MP, Allen ML, Powell AL, Ahmadi H, Salemi MR, et al.(2008) Polygalacturonase causes lygus-like damage on plants: cloning and

identification of western tarnished plant bug (Lygus hesperus) polygalacturonases

secreted during feeding. Arthropod-Plant Interactions 2: 215–225.

47. Allen ML, Mertens JA (2008) Molecular cloning and expression of threepolygalacturonase cDNAs from the tarnished plant bug, Lygus lineolaris. J Insect

Sci 8: 1–14.

48. Pauchet Y, Freitak D, Heidel-Fischer HM, Heckel DG, Vogel H (2009)Immunity or digestion: glucanase activity in a glucan-binding protein family

from Lepidoptera. J Biol Chem 284: 2214–2224.

49. Bragatto I, Genta FA, Ribeiro AF, Terra WR, Ferreira C (2010)Characterization of a b-1,3-glucanase active in the alkaline midgut of Spodoptera

frugiperda larvae and its relation to b-glucan-binding proteins. Insect BiochemMol Biol 40: 861–872.

50. Hughes AL (2012) Evolution of the betaGRP/GNBP/beta-1,3-glucanase

family of insects. Immunogenetics 64: 549–558.

51. Kovalchuk SN, Bakunina IY, Burtseva YV, Emelyanenko VI, Kim NY, et al.(2009) An endo-(1R3)-b-D-glucanase from the scallop Chlamys albidus: catalytic

properties, cDNA cloning and secondary-structure characterization. Carbohy-

drate Research 344: 191–197.

52. Song JM, Nam K, Sun YU, Kang MH, Kim CG, et al. (2010) Molecular andbiochemical characterizations of a novel arthropod endo-beta-1,3-glucanase

from the Antarctic springtail, Cryptopygus antarcticus, horizontally acquired from

bacteria. Comp Biochem Physiol B Biochem Mol Biol 155: 403–412.

53. Kim Y-S, Ryu J-H, Han S-J, Choi K-H, Nam K-B, et al. (2000) Gram-negativebacteria-binding protein, a pattern recognition receptor for lipopolysaccharide

and b-1,3-glucan that mediates the signaling for the induction of innateimmune genes in Drosophila melanogaster cells. J Biol Chem 275: 32721–32727.

54. Pili-Floury S, Leulier F, Takahashi K, Saigo K, Samain E, et al. (2004) In vivo

RNA interference analysis reveals an unexpected role for GNBP1 in thedefense against Gram-positive bacterial infection in Drosophila adults. J Biol

Chem 279: 12848–12853.

55. Zhang R, Cho HY, Kim HS, Ma YG, Osaki T, et al. (2003) Characterizationand properties of a 1,3-beta-D-glucan pattern recognition protein of Tenebrio

molitor larvae that is specifically degraded by serine protease during

prophenoloxidase activation. J Biol Chem 278: 42072–42079.

56. Ma C, Kanost MR (2000) A beta1,3-glucan recognition protein from an insect,Manduca sexta, agglutinates microorganisms and activates the phenoloxidase

cascade. J Biol Chem 275: 7505–7514.

57. Ochiai M, Ashida M (2000) A pattern-recognition protein for beta-1,3-glucan.The binding domain and the cDNA cloning of beta-1,3-glucan recognition

protein from the silkworm, Bombyx mori. J Biol Chem 275: 4995–5002.

58. Bulmer MS, Bachelet I, Raman R, Rosengaus RB, Sasisekharan R (2009)Targeting an antimicrobial effector function in insect immunity as a pest

control strategy. Proc Natl Acad Sci USA 106: 12652–12657.

59. Kanagawa M, Satoh T, Ikeda A, Adachi Y, Ohno N, et al. (2011) Structuralinsights into recognition of triple-helical beta-glucans by an insect fungal

receptor. J Biol Chem 286: 29158–29165.

60. Mishima Y, Quintin J, Aimanianda V, Kellenberger C, Coste F, et al. (2009)The N-terminal domain of Drosophila Gram-negative binding protein 3

(GNBP3) defines a novel family of fungal pattern recognition receptors. J Biol

Chem 284: 28687–28697.

61. Genta FA, Bragatto I, Terra WR, Ferreira C (2009) Purification, character-

ization and sequencing of the major b-1,3-glucanase from the midgut of

Tenebrio molitor larvae. Insect Biochem Mol Biol 39: 861–874.

62. Genta FA, Dumont AF, Marana SR, Terra WR, Ferreira C (2007) The

interplay of processivity, substrate inhibition and a secondary substrate binding

site of an insect exo-beta-1,3-glucanase. Biochim Biophys Acta 1774: 1079–1091.

63. Genta FA, Terra WR, Ferreira C (2003) Action pattern, specificity, lytic

activities, and physiological role of five digestive beta-glucanases isolated fromPeriplaneta americana. Insect Biochem Mol Biol 33: 1085–1097.

64. Chen XY, Kim JY (2009) Callose synthesis in higher plants. Plant Signal Behav

4: 489–492.

65. Aspeborg H, Coutinho PM, Wang Y, Brumer H, Henrissat B (2012) Evolution,substrate specificity and subfamily classification of glycoside hydrolase family 5

(GH5). BMC Evol Biol 12: 186.

66. Acuna R, Padilla BE, Florez-Ramos CP, Rubio JD, Herrera JC, et al. (2012)Adaptive horizontal transfer of a bacterial gene to an invasive insect pest of

coffee. Proc Natl Acad Sci USA 109: 4197–4202.

67. Fischer R, Ostafe R, Twyman RM (2013) Cellulases from insects. Advances inbiochemical engineering/biotechnology 136: 51–64.

68. Hunt T, Bergsten J, Levkanicova Z, Papadopoulou A, John OS, et al. (2007) A

comprehensive phylogeny of beetles reveals the evolutionary origins of asuperradiation. Science 318: 1913–1916.

69. Van Belleghem SM, Roelofs D, Van Houdt J, Hendrickx F (2012) De novo

transcriptome assembly and SNP discovery in the wing polymorphic salt marsh

beetle Pogonus chalceus (Coleoptera, Carabidae). PloS one 7: e42605.

70. Tribolium Genome Sequencing Consortium (2008) The genome of the model

beetle and pest Tribolium castaneum. Nature 452: 949–955.

71. Sinnott M (1990) Catalytic mechanisms of enzymatic glycosyl transfer. ChemRev 90: 1171–1202.

72. Ferreira AH, Marana SR, Terra WR, Ferreira C (2001) Purification, molecular

cloning, and properties of a beta-glycosidase isolated from midgut lumen of

Tenebrio molitor (Coleoptera) larvae. Insect Biochem Mol Biol 31: 1065–1076.

73. Marana SR, Jacobs-Lorena M, Terra WR, Ferreira C (2001) Amino acid

residues involved in substrate binding and catalysis in an insect digestive beta-

glycosidase. Biochim Biophys Acta 1545: 41–52.

74. Scharf ME, Kovaleva ES, Jadhao S, Campbell JH, Buchman GW, et al. (2010)Functional and translational analyses of a beta-glucosidase gene (glycosyl

hydrolase family 1) isolated from the gut of the lower termite Reticulitermes

flavipes. Insect Biochem Mol Biol 40: 611–620.

75. Mian IS (1998) Sequence, structural, functional, and phylogenetic analyses of

three glycosidase families. Blood cells, molecules & diseases 24: 83–100.

76. Marques AR, Coutinho PM, Videira P, Fialho AM, Sa-Correia I (2003)Sphingomonas paucimobilis beta-glucosidase Bgl1: a member of a new bacterial

subfamily in glycoside hydrolase family 1. The Biochemical journal 370: 793–804.

77. Ferreira AH, Terra WR, Ferreira C (2003) Characterization of a beta-

glycosidase highly active on disaccharides and of a beta-galactosidase fromTenebrio molitor midgut lumen. Insect Biochem Mol Biol 33: 253–265.

78. Ernst HA, Lo Leggio L, Willemoes M, Leonard G, Blum P, et al. (2006)

Structure of the Sulfolobus solfataricus alpha-glucosidase: implications for domain

conservation and substrate recognition in GH31. J Mol Biol 358: 1106–1124.

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 14 April 2014 | Volume 9 | Issue 4 | e94052

Page 15: Molecular Evolution of Glycoside Hydrolase Genes in the ...bioinformatics.unl.edu › eyun › publications › 14_PLoS... · Molecular Evolution of Glycoside Hydrolase Genes in the

79. Opota O, Charles JF, Warot S, Pauron D, Darboux I (2008) Identification and

characterization of the receptor for the Bacillus sphaericus binary toxin in themalaria vector mosquito, Anopheles gambiae. Comp Biochem Physiol B Biochem

Mol Biol 149: 419–427.

80. Zhang Q, Hua G, Bayyareddy K, Adang MJ (2013) Analyses of alpha-amylaseand alpha-glucosidase in the malaria vector mosquito, Anopheles gambiae, as

receptors of Cry11Ba toxin of Bacillus thuringiensis subsp. jegathesan. InsectBiochem Mol Biol 43: 907–915.

81. Wheeler D, Redding AJ, Werren JH (2013) Characterization of an ancient

lepidopteran lateral gene transfer. PloS one 8: e59262.82. Shen Z, Denton M, Mutti N, Pappan K, Kanost MR, et al. (2003)

Polygalacturonase from Sitophilus oryzae: possible horizontal transfer of apectinase gene from fungi to weevils. J Insect Sci 3: 24.

83. Kirsch R, Wielsch N, Vogel H, Svatos A, Heckel D, et al. (2012) Combiningproteomics and transcriptome sequencing to identify active plant-cell-wall-

degrading enzymes in a leaf beetle. BMC Genomics 13: 587.

84. Juge N (2006) Plant protein inhibitors of cell wall degrading enzymes. TrendsPlant Sci 11: 359–367.

85. Waller JM, Bigger M, Hillocks RJ (2007) Postharvest and processing pests andmicrobial problems. In: Waller JM, Bigger M, Hillocks RJ, editors. Coffee

Pests, Diseases and Their Management. CABI, Wallingford, UK. 325–335.

86. Valentine BD (2005) The scientific name of the coffee bean weevil and someadditional bibliography (Coleoptera: Anthribidae: Araecerus Schonherr). Insecta

Mundi 19: 247–253.87. Damon A (2000) A review of the biology and control of the coffee berry borer,

Hypothenemus hampei (Coleoptera: Scolytidae). Bull Entomol Res 90: 453–465.88. Vega FE, Davis AP, Jaramillo J (2012) From forest to plantation? Obscure

articles reveal alternative host plants for the coffee berry borer, Hypothenemus

hampei (Coleoptera: Curculionidae). Biol J Linn Soc 107: 86–94.89. Dunning Hotopp JC (2011) Horizontal gene transfer between bacteria and

animals. Trends Genet 27: 157–163.90. Schmieder R, Edwards R (2011) Quality control and preprocessing of

metagenomic datasets. Bioinformatics 27: 863–864.

91. Joshi N. Sickle - A windowed adaptive trimming for fastq files using quality.Available: https://github.com/najoshi/sickle. Accessed 2013 May 30.

92. Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WEG, et al. (2004)Using the miraEST assembler for reliable and automated mRNA transcript

assembly and SNP detection in sequenced ESTs. Genome Res 14: 1147–1159.

93. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly

using de Bruijn graphs. Genome Res 18: 821–829.94. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped

BLAST and PSI-BLAST: a new generation of protein database search

programs. Nucleic Acids Res 25: 3389–3402.95. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, et al. (2009)

BLAST+: architecture and applications. BMC Bioinformatics 10: 421.96. The UniProt Consortium (2013) Update on activities at the Universal Protein

Resource (UniProt) in 2013. Nucleic Acids Res 41: D43–D47.

97. Zhao QY, Wang Y, Kong YM, Luo D, Li X, et al. (2011) Optimizing de novo

transcriptome assembly from short-read RNA-Seq data: a comparative study.

BMC Bioinformatics 12 Suppl 14: S2.98. Li R, Yu C, Li Y, Lam TW, Yiu SM, et al. (2009) SOAP2: an improved

ultrafast tool for short read alignment. Bioinformatics 25: 1966–1967.99. Birol I, Jackman SD, Nielsen CB, Qian JQ, Varhol R, et al. (2009) De novo

transcriptome assembly with ABySS. Bioinformatics 25: 2872–2877.

100. Langmead B, Trapnell C, Pop M, Salzberg S (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome

Biology 10: R25.101. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping

and quantifying mammalian transcriptomes by RNA-seq. Nat Methods 5: 621–

628.102. Ramskold D, Wang ET, Burge CB, Sandberg R (2009) An abundance of

ubiquitously expressed genes revealed by tissue transcriptome sequence data.PLoS Comput Biol 5: e1000598.

103. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment softwareversion 7: improvements in performance and usability. Mol Biol Evol 30: 772–

780.

104. Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phyloge-netic analyses with thousands of taxa and mixed models. Bioinformatics 22:

2688–2690.105. Saitou N, Nei M (1987) The neighbor-joining method: a new method for

reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425.

106. Felsenstein J (2013) PHYLIP (Phylogeny Inference Package) version 3.6.Distributed by the author. Department of Genome Sciences, University of

Washington, Seattle.107. Felsenstein J (1985) Confidence limits on phylogenies: an approach using the

bootstrap. Evolution 39: 783–791.

Western Corn Rootworm Glycoside Hydrolase Genes

PLOS ONE | www.plosone.org 15 April 2014 | Volume 9 | Issue 4 | e94052