tracing determinants of dual substrate specificity in glycoside hydrolase family 5* s

10
Tracing Determinants of Dual Substrate Specificity in Glycoside Hydrolase Family 5 * S Received for publication, March 14, 2012, and in revised form, May 18, 2012 Published, JBC Papers in Press, May 29, 2012, DOI 10.1074/jbc.M112.362640 Zhiwei Chen ‡§1 , Gregory D. Friedland ‡§1 , Jose H. Pereira ‡¶ , Sonia A. Reveco ‡¶ , Rosa Chan , Joshua I. Park ‡§2 , Michael P. Thelen **, Paul D. Adams ‡¶‡‡ , Adam P. Arkin ‡¶‡‡ , Jay D. Keasling ‡¶‡‡§§ , Harvey W. Blanch ‡¶§§ , Blake A. Simmons ‡§ , Kenneth L. Sale ‡§ , Dylan Chivian ‡¶3 , and Swapnil R. Chhabra ‡¶4 From the Joint BioEnergy Institute, Emeryville, California 94608, the § Sandia National Laboratories, Livermore, California 94551, the Lawrence Berkeley National Laboratory, Berkeley, California 94720, the Departments of Nutritional Sciences and Toxicology, ‡‡ Bioengineering, and §§ Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720, and the **Lawrence Livermore National Laboratory, Livermore, California 94550 Background: Glycoside hydrolase family 5 (GH5) comprises enzymes with a wide range of activities critical for the decon- struction of lignocellulose. Results: Concurrent glucan and mannan specificity in over 70 members of GH5 can be ascribed to a conserved active site motif. Conclusion: Single domain multispecific hydrolases are widely prevalent. Significance: This finding has potential applications in improved enzyme mixture design or microbes engineered for consoli- dated bioprocessing of lignocellulose. Enzymes are traditionally viewed as having exquisite substrate specificity; however, recent evidence supports the notion that many enzymes have evolved activities against a range of substrates. The diversity of activities across glycoside hydrolase family 5 (GH5) suggests that this family of enzymes may contain numerous mem- bers with activities on multiple substrates. In this study, we com- bined structure- and sequence-based phylogenetic analysis with biochemical characterization to survey the prevalence of dual spec- ificity for glucan- and mannan-based substrates in the GH5 family. Examination of amino acid profile differences between the subfam- ilies led to the identification and subsequent experimental confir- mation of an active site motif indicative of dual specificity. The motif enabled us to successfully discover several new dually specific members of GH5, and this pattern is present in over 70 other enzymes, strongly suggesting that dual endoglucanase-mannanase activity is widespread in this family. In addition, reinstatement of the conserved motif in a wild type member of GH5 enhanced its catalytic efficiency on glucan and mannan substrates by 175 and 1,600%, respectively. Phylogenetic examination of other GH fami- lies further indicates that the prevalence of enzyme multispecificity in GHs may be greater than has been experimentally characterized. Single domain multispecific GHs may be exploited for developing improved enzyme cocktails or facile engineering of microbial hosts for consolidated bioprocessing of lignocellulose. Enzymes are commonly viewed as highly specific for their natural substrates; however, this view obscures the fact that many have the ability to perform multiple activities (1, 2). This “promiscuity” typically involves the same chemistry applied to different substrates or, alternatively, can use different catalytic machinery within the same active site (3, 4). One well studied example is the serum paraoxonase PON1, which can hydrolyze lactones, thiolactones, carbonates, esters, and phosphotri- esters, using one set of active site residues for some functions and other residues for different functions (4, 5). Enzyme multi- specificity is essential, in many cases, to organismal survival and has also been argued to be a byproduct of divergent evolution from unspecialized ancestor enzymes, potentially explaining why secondary functions of one enzyme are often primary func- tions in other members of the same family or superfamily (3, 4). The relative prevalence of enzyme promiscuity is an open ques- tion, but it has been suggested to be a fundamental character- istic of enzymes in general (2). Multispecific enzymes can be found in numerous different protein families including the glycoside hydrolases (GHs), 5 a large class of enzymes, which catalyze the hydrolysis of plant polysaccharides (6). GHs are categorized in the Carbohydrate- Active Enzymes (6) (CAZy) database into more than 100 sequence-based families including endo-, exo-, and side chain- acting hydrolases specific to glucose-, xylose-, mannose-, galac- tose-, and arabinose-containing polysaccharides, among oth- ers. An important application of GHs is in the hydrolysis of cellulose and hemicellulose into fermentable sugars for subse- quent conversion to biofuels or commodity chemicals. Mem- bers of a given CAZy family share structural features and conserved catalytic residues but may or may not exhibit identical substrate specificity. For example, members of the * This work was supported by the Office of Science, Office of Biological and Environmental Research, of the United States Department of Energy under Contract DE-AC02-05CH11231. S This article contains supplemental Tables S1–S5 and Figs. S1–S3. 1 Both authors contributed equally to this work. 2 Present address: Dept. of Pediatrics, Memorial Sloan-Kettering Cancer Cen- ter, New York, NY 10065. 3 To whom correspondence may be addressed: Joint BioEnergy Institute, 5885 Hollis St., Emeryville, CA 94608. Tel.: 510-643-3722; Fax: 510-486- 6219; E-mail: [email protected]. 4 To whom correspondence may be addressed: Intrexon Corporation, 201 Industrial Rd., San Carlos, CA 94070. E-mail: [email protected]. 5 The abbreviations used are: GH, glycoside hydrolase; GH5, glycoside hydro- lase family 5; CAZy, Carbohydrate-Active Enzymes database; MSA, multiple sequence alignment; MBTH, 3-methyl-2-benzothiazolinonehydrazone; CMC, carboxymethyl cellulose. THE JOURNAL OF BIOLOGICAL CHEMISTRY VOL. 287, NO. 30, pp. 25335–25343, July 20, 2012 Published in the U.S.A. JULY 20, 2012 • VOLUME 287 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 25335 by guest on April 5, 2019 http://www.jbc.org/ Downloaded from

Upload: others

Post on 17-Mar-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Tracing Determinants of Dual Substrate Specificity inGlycoside Hydrolase Family 5*□S

Received for publication, March 14, 2012, and in revised form, May 18, 2012 Published, JBC Papers in Press, May 29, 2012, DOI 10.1074/jbc.M112.362640

Zhiwei Chen‡§1, Gregory D. Friedland‡§1, Jose H. Pereira‡¶, Sonia A. Reveco‡¶, Rosa Chan�, Joshua I. Park‡§2,Michael P. Thelen‡**, Paul D. Adams‡¶‡‡, Adam P. Arkin‡¶‡‡, Jay D. Keasling‡¶‡‡§§, Harvey W. Blanch‡¶§§,Blake A. Simmons‡§, Kenneth L. Sale‡§, Dylan Chivian‡¶3, and Swapnil R. Chhabra‡¶4

From the ‡Joint BioEnergy Institute, Emeryville, California 94608, the §Sandia National Laboratories, Livermore, California 94551,the ¶Lawrence Berkeley National Laboratory, Berkeley, California 94720, the Departments of �Nutritional Sciences and Toxicology,‡‡Bioengineering, and §§Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720, and the**Lawrence Livermore National Laboratory, Livermore, California 94550

Background:Glycoside hydrolase family 5 (GH5) comprises enzymes with a wide range of activities critical for the decon-struction of lignocellulose.Results:Concurrent glucan andmannan specificity in over 70members of GH5 can be ascribed to a conserved active sitemotif.Conclusion: Single domain multispecific hydrolases are widely prevalent.Significance: This finding has potential applications in improved enzyme mixture design or microbes engineered for consoli-dated bioprocessing of lignocellulose.

Enzymes are traditionally viewed as having exquisite substratespecificity; however, recent evidence supports the notion thatmany enzymeshave evolved activities against a rangeof substrates.Thediversityofactivitiesacrossglycosidehydrolase family5 (GH5)suggests that this family of enzymesmay contain numerousmem-bers with activities on multiple substrates. In this study, we com-bined structure- and sequence-based phylogenetic analysis withbiochemical characterization tosurvey theprevalenceofdual spec-ificity for glucan- andmannan-based substrates in theGH5 family.Examinationofaminoacidprofiledifferencesbetweenthesubfam-ilies led to the identification and subsequent experimental confir-mation of an active site motif indicative of dual specificity. Themotif enabledus tosuccessfullydiscoverseveralnewdually specificmembers of GH5, and this pattern is present in over 70 otherenzymes, strongly suggesting that dual endoglucanase-mannanaseactivity is widespread in this family. In addition, reinstatement ofthe conserved motif in a wild type member of GH5 enhanced itscatalytic efficiency on glucan and mannan substrates by 175 and1,600%, respectively. Phylogenetic examination of other GH fami-lies further indicates that theprevalenceofenzymemultispecificityinGHsmaybegreater thanhasbeenexperimentally characterized.Single domain multispecific GHsmay be exploited for developingimprovedenzymecocktails or facile engineeringofmicrobial hostsfor consolidated bioprocessing of lignocellulose.

Enzymes are commonly viewed as highly specific for theirnatural substrates; however, this view obscures the fact thatmany have the ability to perform multiple activities (1, 2). This“promiscuity” typically involves the same chemistry applied todifferent substrates or, alternatively, can use different catalyticmachinery within the same active site (3, 4). One well studiedexample is the serum paraoxonase PON1, which can hydrolyzelactones, thiolactones, carbonates, esters, and phosphotri-esters, using one set of active site residues for some functionsand other residues for different functions (4, 5). Enzymemulti-specificity is essential, inmany cases, to organismal survival andhas also been argued to be a byproduct of divergent evolutionfrom unspecialized ancestor enzymes, potentially explainingwhy secondary functions of one enzyme are often primary func-tions in othermembers of the same family or superfamily (3, 4).The relative prevalence of enzyme promiscuity is an open ques-tion, but it has been suggested to be a fundamental character-istic of enzymes in general (2).Multispecific enzymes can be found in numerous different

protein families including the glycoside hydrolases (GHs),5 alarge class of enzymes, which catalyze the hydrolysis of plantpolysaccharides (6). GHs are categorized in the Carbohydrate-Active Enzymes (6) (CAZy) database into more than 100sequence-based families including endo-, exo-, and side chain-acting hydrolases specific to glucose-, xylose-, mannose-, galac-tose-, and arabinose-containing polysaccharides, among oth-ers. An important application of GHs is in the hydrolysis ofcellulose and hemicellulose into fermentable sugars for subse-quent conversion to biofuels or commodity chemicals. Mem-bers of a given CAZy family share structural features andconserved catalytic residues but may or may not exhibitidentical substrate specificity. For example, members of the

* This work was supported by the Office of Science, Office of Biological andEnvironmental Research, of the United States Department of Energy underContract DE-AC02-05CH11231.

□S This article contains supplemental Tables S1–S5 and Figs. S1–S3.1 Both authors contributed equally to this work.2 Present address: Dept. of Pediatrics, Memorial Sloan-Kettering Cancer Cen-

ter, New York, NY 10065.3 To whom correspondence may be addressed: Joint BioEnergy Institute,

5885 Hollis St., Emeryville, CA 94608. Tel.: 510-643-3722; Fax: 510-486-6219; E-mail: [email protected].

4 To whom correspondence may be addressed: Intrexon Corporation, 201Industrial Rd., San Carlos, CA 94070. E-mail: [email protected].

5 The abbreviations used are: GH, glycoside hydrolase; GH5, glycoside hydro-lase family 5; CAZy, Carbohydrate-Active Enzymes database; MSA, multiplesequence alignment; MBTH, 3-methyl-2-benzothiazolinonehydrazone;CMC, carboxymethyl cellulose.

THE JOURNAL OF BIOLOGICAL CHEMISTRY VOL. 287, NO. 30, pp. 25335–25343, July 20, 2012Published in the U.S.A.

JULY 20, 2012 • VOLUME 287 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 25335

by guest on April 5, 2019

http://ww

w.jbc.org/

Dow

nloaded from

GH1 family are active on a number of sugar types linkedthrough the �-1,4 bond, including �-galactose, �-mannose,and �-glucose. In contrast, all existing members of GH64 are�-1,3-endoglucanases.

In this study, we probed the extent and mechanisms of mul-tisubstrate specificity in a highly diverse GH family using phy-logenetic analysis and biochemical characterization. GH family5 (GH5) comprises enzymes with a wide range of activities crit-ical for the deconstruction of lignocellulose including endo-�-1,4-glucanase, endo-�-1,4-mannanase, endo-�-1,3-glucanase,endo-�-1,6-galactanase, lichenase, xyloglucan-specific endo-�-1,4-glucanases, and endo-�-1,4-xylanase (6). The number ofsubstrates catalyzed by GH5 enzymes suggests that this familymay contain single domain enzymes withmultiple specificities.To test this hypothesis, we built a phylogenetic tree for thishighly diverse family from a multiple sequence alignment(MSA) built using both sequence and structure information.This combined sequence and structure approach allowed theresulting alignment to contain genes with low pairwisesequence identity and a variety of functions, something thatwould not have been possible with standard sequence-onlyMSA-building methods. We analyzed sequence patterns fromthe resulting subfamilies to identify glucan and mannan speci-ficity-determining residues and, through extensive biochemicalcharacterization, validated a conserved motif that enables dualsubstrate specificity within a single catalytic domain. Subse-quently, we applied this motif to enhance the catalytic effi-ciency of a GH5 enzyme for glucan and mannan hydrolysis by175 and 1,600%, respectively. The conserved motif allowed usto discover new enzymes active on both glucan and mannansubstrates, and its presence in over 70 members of GH5strongly suggests the widespread prevalence of dual specificityin this family. Finally, extending the aforementioned phyloge-netic analyses to other CAZy families, GH1 and GH43, furtherindicates that single domain multispecific GH enzymes may bemore common than is currently characterized. As such, singledomain multispecific GHs would be expected to reduce thecomplexity of designing enzyme mixtures, as well as microbialhosts for consolidated bioprocessing of lignocellulose.

EXPERIMENTAL PROCEDURES

Creation of Structure-based Sequence Alignments—To builda high quality sequence alignment in this diverse protein family,we used a combination of structural and sequence information.First, we performed pairwise structural alignments with 3Dhit(7) of 22 GH5 family structures (chain A of Protein Data Bank(PDB) IDs 2JEP, 3VDH, 1BQC, 2WHL, 7A3H, 2OSX, 1H1N,1QNR, 1TVN, 1RH9, 1UUQ, 1EDG, 2C0H, 2CKS, 1WKY,1H4P, 2PC8, 1CEO, 2ZUM, 1VJZ, 1EGZ, and 1ECE) to theCel5A_Tma structure (PDB ID 3MMW, chain A). These 23structures were selected based on their resolution and toremove redundancy at 90% sequence identity. For each of thesestructures, we used BLAST on GH5 sequences (after removingshort sequence fragments) from the CAZy database to findsequences between 25 and 90% sequence identity with thesequence of the structure, and the resulting sequences werealignedwithMUSCLE (8). These 23MSAswere then combinedinto oneMSA by aligning equivalent positions in the individual

MSAs using the pairwise structural alignments to 3MMW.Redundant sequences were filtered out at 90% sequence iden-tity, preferentially keeping sequences with structures, experi-mental characterization, and longer lengths, in this order ofpriority. We did not filter explicitly for active site residue iden-tity; however, 94% of the sequences in the alignment containedboth catalytic glutamates, and we do not expect removal of thesmall number of sequences not containing both glutamates tosignificantly alter the tree. Filtering for required inclusion ofother active site residues is possible but was not performed hereas some of these active site residues were of interest in findingthe specificity-determining motif.Creation of the Phylogenetic Tree—Gap positions and their

neighbors were trimmed from the above structure-basedsequence alignment by removing positions with less than 60%occupancy and two flanking positions. The gap positionsremoved were at positions 1–9, 24–31, 59–67, 137–141, 200–227, 256–262, 293–299, and 305–309 (Cel5A_Tma number-ing). A tree was built from the resulting trimmed alignmentusing FastTree 2.1.3 (9), and the tree was rerooted such that theroot was the midpoint between leaves with the furthest evolu-tionary distance. To test the sensitivity of the alignment andtree to its method of creation, we built the alignment and treestarting from a different x-ray structure (PDB ID 2WHL fromBacillus agaradhaerens). The resulting tree was nearly identi-cal, displaying essentially the same subfamily separations as thetree built from the Cel5A_Tma structure.Subfamily Identification—Subfamilies were divided based on

the clade divisions in the tree based on evolutionary distancefrom the root node, the length of their branches, and their boot-strap support (above 80%; see Fig. 1). Specifically, we chosesubfamilies by first moving along branches away from the cen-ter of the tree until a long branch distance was found from anode that had bootstrap support above 80% (such as exists forsubfamilies A1 and A8). This allowed identification of subfam-ilies and subfamily groups A1, A8, A7/10, A12, A5/6, A2, A11,A9, and A4. A3 did not have a long branch but clustered differ-ently from nearby A4; A9 was thus assigned as a subfamily. A10was split from A7 by iterating the above procedure a secondtimebecause therewas a long branch from the commonnode ofthese two subfamilies. The subfamily naming used the designa-tions in the literature describing structures in each subfamily,with the exception of the two new subfamilies A11 and A12.A11 and A12 contained the PDB IDs 1VJZ and 2OSX, respec-tively, neither of which contained a reference to a subfamily inthe literature.Selection of Cel5A_Tma Active Site Residues for Analysis—

The ligand in PDB ID 1ECE was used to find active site posi-tions because this ligand represents a four-sugar substrate withunits on both sides of the active site, whereas most other co-crystals of homologs contain ligands binding to only one side ofthe active site. Residues with side chain atoms with 6 Å of the1ECE ligand were selected with the exception of Ala-24, whichis pointing away from the active site. Residues with highsequence entropy (above 1.75) and low occupancy (below 70%)in the A4 subfamily MSA were removed.

A Substrate Specificity Governing Motif for GH5 A4 Subfamily

25336 JOURNAL OF BIOLOGICAL CHEMISTRY VOLUME 287 • NUMBER 30 • JULY 20, 2012

by guest on April 5, 2019

http://ww

w.jbc.org/

Dow

nloaded from

Determination of Conserved and Nonconserved Active SitePositions—We used the following equation to calculate aBLOSUM-weighted profile difference score for alignmentposition i

scorei � �aa1

�aa2

BLOSUMaa1,aa2(paa1,i � paa2,i)2 (Eq. 1)

where paa,i is the probability of amino acid aa occurring at posi-tion i in the alignment and BLOSUMaa1,aa2 is the BLOSUMsubstitution matrix value for amino acids aa1 and aa2. Theresulting profile difference scores for the active site profiles ofsubfamily A4 versus each of the large primarily endoglucanaseor mannanase subfamilies (A1, A2, A5/6, A7, and A8) are sum-marized in supplemental Table S3. Positions with averagedprofile difference scores greater than 0.05 are classified as non-conserved (Asn-20, Glu-23, Pro-53, His-95, His-96, Phe-201,and Glu-287), and those below this cutoff are classified as con-served (Trp-30, Asn-135, Glu-136, His-196, Tyr-198, Glu-253,and Trp-286). Higher cutoff values up to 20% of the maximumscore would yield identical results.Structural Modeling—The structural model of Cel5A_Tma

in complex with the disaccharide glucan-based substrate hasbeen published previously (10). To create theCel5A_Tma com-plex with the disaccharide mannan-based substrate, the glu-can-based substrate configuration was altered at OH-C2 bycomparison with other mannan-based complex co-crystals.Hydrogens were added using UCSF Chimera (11), and His-95and Asn-20 dihedrals were optimized for hydrogen bondingwith the ligand (resulting in heavy atom rootmean square devi-ation values of 0.43 and 0.32 Å, respectively); other rotatablehydrogen dihedrals were positioned by inspection to assesspossible hydrogen bond geometries (supplemental Table S4).The subfamily A7 co-crystal structures described in Resultscontaining an asparagine distant in primary sequence thatoccupies similar three-dimensional coordinates as Asn-20 areMan5_Tfu (Thermomonospora fusca, PDB ID 3MAN (12)) andMan5A_Bag (B. agaradhaerens, PDB ID 2WHL (13)). The sub-family A8 co-crystal structures containing the aspartate in sim-ilar three-dimensional coordinate space as Asn-20 areMan5A_Sly (Solanum lycopersicum, PDB ID 1RH9 (14)) andMan5A_Hje (Hypocrea jecorina, PDB ID 1QNR (15)). Themodel for Cel5B_Dtu was created with Phyre2 (16).Chemicals and Reagents—All chemicals and enzymes were

analytical grade from Sigma or EMD Chemicals. BugBusterprotein extraction reagent, Popculture reagent, rLysozymesolution, Benzonase nuclease HC (purity �90%), and protein-ase inhibitor mixture V (EDTA-free) were from Novagen andCalbiochem (EMDBiosciences). The Champion pET101 direc-tional TOPO expression kit was from Invitrogen. Nickel-nitri-lotriacetic acid spin columns were from Qiagen. Zeba spindesalting columns (2 ml, 70,000 molecular weight cut off) werefrom Pierce (Thermo Fisher Scientific). The bicinchoninic acidkit (BCA1-1KT) was from Sigma-Aldrich. Luria-Bertani (LB)medium was from EMD Chemicals, and 2xYT medium wasfrom Sigma-Aldrich.Gene Synthesis, Cloning, and Mutagenesis—Genes were

codon-optimized according to the codon usage in Escherichia

coli and synthesized by GenScript USA, Inc. All the genes wereamplified and cloned by the pCDF-2 Ek/LIC vector kit (Nova-gen, EMD Biosciences) except that cel5a_Pbr was cloned intopET101 vector (Invitrogen). Cloning primers are listed in supple-mental Table S5a. Construct forCel5A_Tma, pCDF2-cel5a_Tma,has been described before (10). All the constructs were confirmedby DNA sequencing (Quintara Biosciences). Site-directedmutagenesis was conducted by using the QuikChange Lightningsite-directedmutagenesiskit according to the instructionsofman-ufacturer (AgilentTechnologies). Allmutagenic primers are listedin supplemental Table S5b. The mutant plasmids were extractedby the QIAprep spin miniprep kit (Qiagen) and confirmed byDNA sequencing (Quintara Biosciences).Protein Expression and Purification—All the constructs were

transformed into BL21 (DE3) (Novagen, EMD Biosciences) forprotein expression. Single colonieswere inoculated into5mlofLBautoinductionmedium (Overnight Express autoinduction system1, Novagen, EMDBiosciences) containing appropriate antibiotics(100 �g/ml carbenicillin for pET101 constructs and 100 �g/mlstreptomycin for the others) and incubated at 30 °C for 24 h.Induced cultures were harvested and preserved at �80 °C untiluse. Protein extraction, purification, buffer exchange, and concen-tration determination were as described before (data not shown).Reducing Sugar Assays—The dinitrosalicylic acid method in

a microplate format (17), without adding phenol and sulfite,was used for most of the enzyme assays, whereas 3-methyl-2-benzothiazolinonehydrazone (MBTH) was used for the kineticassays. The MBTH method was used as described by Anthonand Barrett (18) with the followingmodifications: 40 �l of sam-ple was mixed with 80 �l of Reagent A (0.25 M sodium hydrox-ide, 0.075% (w/v) MBTH and 0.025% (w/v) dithiothreitol) andthen heated at 80 °C for 15min. After cooling the samples downto room temperature, 80 �l of Reagent B (0.5% (w/v)FeNH4(SO4)2�12H2O, 0.5% (w/v) sulfamic acid and 0.25 M

hydrochloric acid) was added. These mixtures were incubatedat room temperature for 30 min. Samples were assayed forabsorbance at 620 nm.The linear range of theMBTHmethod is0.05–1mMof reducing sugars (D-glucose for endoglucanases orD-mannose for mannanases).Enzyme Assays—Enzyme assays for Cel5A_Tma and its

mutants were performed at the respective optimal conditionsfor the two activities, 70 °C and pH 5.00 for endoglucanaseactivity and 90 °C and pH 5.50 for mannanase activity (data notshown), both in 50 mM sodium citrate buffer. For the otherenzymes and their mutants, mesophilic enzymes were assayedat 37 °C, whereas thermophilic enzymes were assayed at 60 °C.50 mM sodium citrate buffer (pH 5.50) was used for theseenzyme reactions. The enzyme reactions contained 0.5% (w/v)carboxymethyl cellulose (CMC, average molecular mass �90kDa, Aldrich) and lotus bean gum (Sigma) as substrates forendoglucanase and mannanase activity assays, respectively.D-Glucose and D-mannose (0–5mM)were used as standards forreducing sugars, as described above, when assaying endogluca-nase andmannanase, respectively. The optimal temperatures ofCel5B_Dtu on CMC and lotus bean gum were analyzed from50–100 °C with 5 °C intervals. 50 mM sodium citrate buffers(pH 3.00–6.50with 0.50-unit intervals) were used to survey theoptimal pHs for endoglucanase and mannanase activities of

A Substrate Specificity Governing Motif for GH5 A4 Subfamily

JULY 20, 2012 • VOLUME 287 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 25337

by guest on April 5, 2019

http://ww

w.jbc.org/

Dow

nloaded from

Cel5B_Dtu.One unit of endoglucanase ormannanase activity isdefined as the amount of enzyme required for producing 1�mol of reducing sugars per minute.Kinetic Assays—All the specific activities and kinetic assays

for Cel5B_Dtu were performed under the optimal conditions(pH 5.00 and 70 °C for endoglucanase activity; pH 5.50 and75 °C for mannanase activity). CMC and carob galactomannan(low viscosity, Megazyme) instead of lotus bean gumwere usedas substrates in the kinetic assays. Initial velocities under a widerange of substrate concentrations ([S], 0.2–40 mg � ml�1)were obtained for the calculation of kcat and Km by the Lin-eweaver-Burk Plot.

RESULTS

Building a High Quality Phylogenetic Tree for GH5—TheCAZy database provides a wealth of data about GHs includinglists of genes, activities, and structures within the sequence-based families. However, relationships between members of agiven family as revealed by phylogenetic trees are not generallyavailable. To begin our search for single domainmultispecificmembers in GH5, we constructed a phylogenetic tree usingavailable sequence and structure information from this fam-ily. Such a tree allows placing the genes into their evolution-ary context and identification of subfamilies and sequencepatterns between subfamilies with different functions. Phy-logenetic tree building relies on the creation first of an MSAcontaining the sequences of interest. Although there arenumerous available tools for building MSAs, their construc-tion for sequence and functionally diverse families is nottrivial. Standard MSA tools do not work well when there issequence identity between members of less than 25%. Forexample, MSAs have been built with sequence-only ap-proaches (19–23) that covered part of GH5, limiting theoverall size and sequence diversity of its constituent genes.Incorporating the complementary information from experi-mentally determined protein structures can significantlyhelp in the building of alignments and trees for sequence-diverse families (24–27). Given that this combined structureand sequence-based tree building approach has not previ-ously been used on GH families, we chose to draw on thelarge number of structures in various GH families to buildhigh quality sequence alignments and phylogenetic trees.Our approach uses the relatively large number of crystal

structures in GH5 (more than 30) to combine the low sequenceidentity parts of the family into a larger MSA. To do this, wecreated MSAs containing sequences with greater than 25%sequence identity to enzymes with experimentally determinedcrystal structures and then combined these MSAs using struc-ture alignmentmethods.We used the resulting GH5 alignmentcontaining 681 sequences to build a phylogenetic tree usingFastTree 2 (9) and annotated it with the experimentally char-acterized activities obtained from CAZy (Fig. 1). In contrast tothis phylogenetic analysis, previous studies of GH5 subfamilyclassifications focused on one subfamily at a time and werelimited to sequence identity-basedmetrics (e.g.Refs. 28 and 29).In this work, we used the tree to classify subfamilies using theirdistance from the root, the length of the ancestral branch split,and the bootstrap support (Fig. 1).

Comparison of the functional assignments between the sub-families in this tree shows phylogenetic correspondence withthe divisions of different sugar specificities (Fig. 1). Three largesubfamilies appear to contain predominantly �-1,4-glucan-specific enzymes (A1, A2, and A5/6); two are predominantly�-1,4-mannan-specific (A7 and A8); and one is predominantly�-1,3-glucan-specific (A9). In terms of substrate specificity,subfamily A4 appears to be the most diverse in GH5 (supple-mental Fig. S1a) in that it contains a variety of �-1,4-linkedglucan-, mannan-, and xylan-specific enzymes (supplementalTable S1). Notably, several members of subfamily A4 havepreviously been reported to act on more than one substrate.For instance, GH5 proteins from Prevotella ruminicola(AAC36862.1) and Clostridium cellulovorans (AAA23231.1)have been reported to act on glucan as well as xylan substrates,although detailed biochemical characterization or structuralinformation for these enzymes is not available (30, 31). Themost thoroughly characterized GH5 enzyme from subfamilyA4 is the thermostable enzyme, Cel5A_Tma (AAD36816.1),fromThermotogamaritima (32). Cel5A_Tma can degrade bothgalactomannan (71 units/mg) and CMC (616 units/mg) at ratescomparable with those of its single substrate-specific counter-parts Man5_Tma from GH5 (83 units/mg on galactomannan)and Cel74_Tma from family GH74 (121 units/mg on CMC).Functional genomics studies on T. maritima have revealedrecruitment of this enzyme on mannan- and glucan-basedgrowth substrates (33).Discovery of a Specificity-determining Sequence Motif in

Cel5A_Tma—To dissect the determinants of substrate speci-ficity in subfamily A4, we used the comprehensive phylogenetictree of family GH5 to examine the amino acid profiles of activesite residues (see “Experimental Procedures”) in the A4 sub-family, themainly mannanase subfamilies (A7 and A8), and thepredominantly endoglucanase subfamilies (A1, A2, and A5/6)(Fig. 2a). We categorized these positions as either conserved orvariable based upon the extent of amino acid diversity (see“Experimental Procedures”) between the subfamily alignments(Fig. 2b, green circles). For example, the catalytic glutamates atpositions 136 and 253 (using sequence numbering fromCel5A_Tma) are conserved among allmembers ofGH5; in con-trast, position 96 is variable, having mainly histidines in sub-family A4, but relatively few histidines in the other GH5 sub-families (Fig. 2a).These analyses resulted in the identification of seven posi-

tions (20, 23, 53, 95, 96, 201, and 287; Cel5A_Tma number-ing) that varied between the subfamilies, suggesting theirinvolvement in substrate specificity. To evaluate the role ofthese seven residues in substrate specificity, we generatedalanine substitutions at these positions in Cel5A_Tma andassayed the purified enzymes for endoglucanase and man-nanase activities (Fig. 2b). Of the seven variable positions,alanine mutations at five positions (N20A, E23A, P53A,H96A, and E287A) resulted in reduced activity on mannan,one mutation (H95A) had reduced activity on glucan, andone mutation (F201A) did not have a large impact for eithersubstrate. For each of the seven positions conserved betweenthe subfamilies (30, 135, 136, 196, 198, 253, and 286), muta-tion to alanine eliminated activity on both substrates, con-

A Substrate Specificity Governing Motif for GH5 A4 Subfamily

25338 JOURNAL OF BIOLOGICAL CHEMISTRY VOLUME 287 • NUMBER 30 • JULY 20, 2012

by guest on April 5, 2019

http://ww

w.jbc.org/

Dow

nloaded from

sistent with the expected role of these positions either incatalysis or in nonspecific sugar substrate binding.Application of theMotif to PredictMultispecificity—Next, we

examined whether the pattern of amino acids at the six speci-ficity-altering residues found in Cel5A_Tma could be general-ized by assessing the presence of the pattern across variousenzymes in theGH5A4 subfamily.We reasoned that dualman-nanase and glucanase activity might broadly occur within theA4 subfamily despite the lack of experimental characterizationof mannanase activity in the A4 subfamily other thanCel5A_Tma (6), perhaps because previous studies did not testfor mannanase activity in addition to the more typical gluca-nase assay. To this end, we searched for the six-residue pattern(allowing either aspartate or glutamate at positions 23 and 287)in the 143 genes in subfamily A4. We identified more than 70sequences containing the motif (supplemental Fig. S1b). Basedon the presence of the motif, we predicted that these enzymesmay have both endoglucanase andmannanase activities (Fig. 1,pink branches).To test our prediction of broad multispecificity in subfamily

A4,we assayed 10 additional enzymes, selected to broadly coverthe phylogenetic diversity in A4 and to either match or differfrom the six-residue pattern (Fig. 3a and supplemental Table

S2). Of these enzymes, all exhibited endoglucanase activity, andsix also had detectable mannanase activity. Of the six charac-terized dual specificity enzymes, four had the same pattern atthe six residues as Cel5A_Tma, whereas two (Cel5A_Umi andCel5B_Dtu) did not match the pattern, differing at only a singleposition.Of the four characterized single specificity enzymes, eachdiffered at one position or more from the motif. We further con-firmed the specificity determination of the six-residue pattern inother enzymes from theA4 subfamily by characterizing the endo-glucanase and mannanase activities of alanine mutants in twodual specificity enzymes from subfamily A4 with low sequenceidentity to Cel5A_Tma: Cel5C_Cth (29% sequence identity) andCel5A_Eec (25% sequence identity). The specificity changesresulting frommutations in bothCel5C_Cth andCel5A_Eecwereconsistent with the specificity changes resulting from the corre-spondingmutations inCel5A_Tma (Fig. 3b),with the exceptionofthe P72A variant of Cel5C_Cth.Using the Motif to Engineer Enhanced Activity—In addition

to using the six-residue pattern to predict dual specificity, weapplied the pattern to engineer enhanced activity. We postu-lated that the activity of Cel5B_Dtu could be improved bymutating the aspartate at position 14 in Cel5B_Dtu (corre-sponding to Asn-20 in Cel5A_Tma) to asparagine to fully

FIGURE 1. Phylogeny of glycoside hydrolase family 5. A phylogenetic tree of the GH5 family constructed from a structure-based sequence alignment isshown. Experimental characterizations of function from the CAZy database are depicted in the outer rings for endoglucanases (EC 3.2.1.4; orange), mannanases(3.2.1.78; blue), 1,3-�-glucosidases (3.2.1.58; purple), and other functions (red). Genes with structures are represented by black boxes. Tree branches of genespredicted in this work to have dual endoglucanase and mannanase activities are colored pink. Subfamilies A1–A12 are labeled. (Created with the interactiveTree of Life (42).)

A Substrate Specificity Governing Motif for GH5 A4 Subfamily

JULY 20, 2012 • VOLUME 287 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 25339

by guest on April 5, 2019

http://ww

w.jbc.org/

Dow

nloaded from

match the six-amino acid pattern. Homology modeling ofCel5B_Dtu (data not shown) suggested that D14N could allowthree hydrogen bonds to the mannan substrate, whereas anaspartate might be limited to two hydrogen bonds. Mutation ofD14N in Cel5B_Dtu resulted in enhanced hydrolysis for bothsubstrates; we found an �70% increase in specific endogluca-nase activity and an �300% increase in specific mannanaseactivity (Table 1). Kinetic analysis revealed that this singleamino acid substitution decreased the Km for galactoman-nan by �1,500%, accompanied by a 5.2% increase in kcat; theKm for the glucan substrate was reduced by �50%, whereasthe kcat was increased by �35%. Notably, improvement incatalytic efficiencies (kcat/Km) attributed to this single muta-tion for endoglucanase and mannanase activities were �175and �1,600%, respectively.

DISCUSSION

The comprehensive GH5 phylogenetic tree described hereled to the identification of an active site motif describing dualspecificity for glucan- andmannan-based substrates in the largeand diverse A4 subfamily of GH5. However, a sequence motif

alone cannot fully determine the substrate specificity of asequence-distant group of enzymes given the importance ofsubtle sub-Angstrom level interactions in the active site. It isinteresting then that this motif managed to capture the endog-lucanase andmannanase specificity pattern for almost allmuta-tions at these sites in three sequence-distant enzymes (Fig. 3b)and helped to successfully identify dual specificity enzymes(Fig. 3a).To postulate structural explanations for the mechanisms of

specificity changes in the six specificity-altering residues, wemodeled (see “Experimental Procedures”) the glucan andman-nan disaccharides into the Cel5A_Tma active site using the

FIGURE 2. Determination of positions affecting specificity in glycosidehydrolase family 5. a, sequence profiles of the active site positions in theGH5 subfamily containing Cel5A_Tma (A4), of two predominantly man-nanase subfamilies (A7 and A8), and of three predominantly endoglucanasesubfamilies (A1, A2, and A5/6) (created with WebLogo (43)). b, experimentalmeasurements of the relative specific endoglucanase and mannanase activ-ities of alanine mutants at positions in the Cel5A_Tma active site. Residuesthat are variable on average between A4 and the other subfamilies arelabeled with a green circle (see “Experimental Procedures” for details). Data inpanel b are means from three independent experiments; error bars show S.D.

FIGURE 3. Characterization of additional GH5 A4 subfamily genes for dualspecificity on glucan and mannan. a, experimental characterization of theendoglucanase (orange) and mannanase (blue) activities of Cel5A_Tma and10 other genes from GH5 subfamily A4. These genes were selected to broadlycover the A4 subfamily tree and to contain diversity at the specificity-deter-mining positions. Sequence identity to Cel5A_Tma of each gene is depictedwith a black line on the plot, and the amino acid identities of the six specificity-determining positions are shown at right. b, the pattern of specificity changesin Cel5C_Cth and Cel5A_Eec from subfamily A4 in comparison with the cor-responding mutations in Cel5A_Tma of N20A, E23A, P53A, H96A, E287A, andH95A, respectively (Fig. 2b). Cel5C_Cth and Cel5A_Eec are 29 and 25% iden-tical to Cel5A_Tma, respectively, and closely match specificity patternsobserved for Cel5A_Tma except the P72A mutation in Cel5C_Cth. Data in aand b are means from three independent experiments; error bars show S.D.

A Substrate Specificity Governing Motif for GH5 A4 Subfamily

25340 JOURNAL OF BIOLOGICAL CHEMISTRY VOLUME 287 • NUMBER 30 • JULY 20, 2012

by guest on April 5, 2019

http://ww

w.jbc.org/

Dow

nloaded from

orientation from the structure of Cel5A_Bag (10) (Fig. 4, a andb). Mannan and glucan sugars differ in the configuration of thehydroxyl group at the C2 sugar, with mannan units having anaxial configuration and glucan units having an equatorial con-figuration (supplemental Fig. S2) (13).Withmannan present inthe active site, Asn-20 forms two hydrogen bonds with the axialOH-C2 group at �2 subsite (Fig. 4b), an interaction that isunlikely to occur with the equatorial OH-C2 configurationpresent in the glucan-based substrate (Fig. 4a). Examination ofthe co-crystal structures of four strict mannanases from sub-families A7 and A8 emphasize the importance of this position,showing similar interactions between the OH-C2 group at the�2 subsite and an aspartate or asparagine (see “ExperimentalProcedures” for details). In the model, Glu-23 and Glu-287make hydrogen bonds with the main chain or side chain atomsof Asn-20, respectively, which may act to stabilize the Asn-20side chain orientation and support its hydrogen bonding withthe OH-C2 group of mannan. Mutation of Pro-53 could breakthe�2�-strand, whichwould produce conformational changesaffecting the nearby Asn-20 and Glu-23 residues. The strongeffect of the His-95mutation in reducing glucanase activity canbe explained by its interaction with the �1 subsite OH-C2when the OH is in the equatorial configuration in the glucansubstrate, whereas this interaction does not appear to occur forthe axial conformation found in the mannan substrate. A

recently released structure of Cel5A_Tma in complex with dif-ferent sugar moieties confirms our model and supports ourinterpretations (34).Although there are no reports in the literature for enzymatic

activity enhancement for mannan hydrolysis, the largestimprovements for glucan and xylanhydrolysis to date are�80%(35) and �300% (36), respectively. That the 300% increase inspecific mannanase activity and 1,600% improvement in man-nanase kcat/Km observed for the “back-to-motif” mutation inCel5B_Dtu come from a point mutant is intriguing given thedifficulty of enhancing activity in these enzymes through otheroptimization techniques (37–39). The back-to-motif mutationindicates that this substrate-determining motif could be ances-tral to the GH5 family, supporting Jensen’s hypothesis (40) thatthe spectrum of specificity in the ancestors of an enzyme familycan be seen in the descendant families. Similar to back-to-an-cestor mutations at nonactive sites, back-to-motif mutationswithin the active sites may broaden enzyme activity and alsomake enzymes more evolvable (41).In addition to endoglucanase and mannanase activity in A4

subfamily enzymes, preliminary work has shown the presenceof a third specificity, xylanase, in some enzymes.6 This co-oc-

6 Z. Chen, unpublished data.

FIGURE 4. Structural models of glucan- and mannan-based disaccharides in the �1 and �2 subsites (nomenclature of Davies et al. (44)) of theCel5A_Tma crystal structure (PDB ID 3MMW (10)). Glucose and mannose differ in the configuration of the OH-C2 groups, which are labeled in orange.Hydrogen bonds between glucan (a) and mannan (b) substrates and Cel5A_Tma and between residues in the six-residue motif are shown with black dashedlines, and the hydrogen-acceptor distances are labeled; hydrogen bonds between OH-C2 and Cel5A_Tma are labeled in orange for clarity. The orientations ofthe substrates were modeled based on the orientation of cellotriose in the Cel5A_Bag crystal structure (45). Further details about the hydrogen bondinggeometries are provided in supplemental Table S4.

TABLE 1Specific activity and kinetics of Cel5B_Dtu and mutant D14NCMC, carboxymethyl cellulose; S.A., specific activity; CGM, carob galactomannan.

Substrate Activity parametersCel5B_Dtu

ImprovementWT D14N

%CMC S.A. (units � mg�1 protein) 28.89 � 0.96 50.03 � 0.97 73.17

kcat (s�1) 408.19 550.66 34.90Km (mg � ml�1) 24.02 11.76 104.25kcat/Km (ml � mg�1 � s�1) 17.00 46.81 175.35

CGM S.A. (units � mg�1 protein) 2.11 � 0.03 8.83 � 0.36 318.48kcat (s�1) 68.25 71.82 5.24Km (mg � ml�1) 11.57 0.72 1506.94kcat/Km (ml � mg�1�s�1) 5.90 99.89 1593.05

A Substrate Specificity Governing Motif for GH5 A4 Subfamily

JULY 20, 2012 • VOLUME 287 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 25341

by guest on April 5, 2019

http://ww

w.jbc.org/

Dow

nloaded from

currence of single domain multispecificity and multiple speci-ficities in the subfamily raises the interesting question ofwhether multispecificity is an “inherent” property of somegroups of related enzymes, such as the GH5 A4 subfamily. Toinvestigate the extent of this co-occurrence in CAZy, weextended the aforementioned MSA and phylogenetic analysisto two other well characterized GH families: GH1 and GH43.GH1 contains �3,500 members (with 232 biochemical charac-terizations), andGH43 contains�2,000members (with 85 bio-chemical characterizations) (6). Similar to our findings in GH5,we observed the presence of single domain multispecificenzymes within subfamilies bearing different sugar specificities(supplemental Fig. S3, a and b), which suggests that these sub-families could contain numerous multispecific members. Fur-ther analysis in other GH and non-GH families is needed toconfirm this observation more generally, but these results sup-port the idea thatmultispecificity could be an inherent propertyof some groups of enzymes.In conclusion, our comprehensive phylogenetic and bio-

chemical analyses of GH5 and subsequent phylogenetic analy-sis of GH1 and GH43 suggest that multispecific GH enzymesmay be more prevalent than have been experimentally charac-terized. It will be interesting to investigate whether these mul-tiple specificities are utilized in certain conditions by the hostorganism or whether they are perhaps a latent property ofenzymes evolved from a promiscuous ancestor.

Acknowledgments—We thank Morgan Price for writing the scriptsused in the sequence alignment trimming and tree rerooting, aswell as writing the phylogenetic tree building program FastTree.The codon-optimized Cel5A_Pbr gene was a gift from Dr. Christo-pher A. Voigt at the University of California, San Francisco. Wethank Tanveer Batth and Dr. Christopher J. Petzold in the Technol-ogy Division of the Joint BioEnergy Institute for help with mass spec-troscopy analysis. Tanja Kortemme provided suggestions after criticalreading of the manuscript.

REFERENCES1. Nobeli, I., Favia, A. D., andThornton, J.M. (2009) Protein promiscuity and

its implications for biotechnology. Nat. Biotechnol. 27, 157–1672. Khersonsky, O., and Tawfik, D. S. (2010) Enzyme promiscuity: a mecha-

nistic and evolutionary perspective. Annu. Rev. Biochem. 79, 471–5053. Peisajovich, S. G., and Tawfik, D. S. (2007) Protein engineers turned evo-

lutionists. Nat. Methods 4, 991–9944. Khersonsky, O., Roodveldt, C., and Tawfik, D. S. (2006) Enzyme promis-

cuity: evolutionary and mechanistic aspects. Curr. Opin. Chem. Biol. 10,498–508

5. Khersonsky, O., and Tawfik, D. S. (2006) The histidine 115-histidine 134dyad mediates the lactonase activity of mammalian serum paraoxonases.J. Biol. Chem. 281, 7649–7656

6. Cantarel, B. L., Coutinho, P. M., Rancurel, C., Bernard, T., Lombard, V.,and Henrissat, B. (2009) The Carbohydrate-Active EnZymes database(CAZy): an expert resource for glycogenomics. Nucleic Acids Res. 37,D233–D238

7. Plewczyski, D., Pas, J., von Grotthuss, M., and Rychlewski, L. (2002) 3D-Hit: fast structural comparison of proteins. Appl. Bioinformatics 1,223–225

8. Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with highaccuracy and high throughput. Nucleic Acids Res. 32, 1792–1797

9. Price,M.N., Dehal, P. S., andArkin, A. P. (2010) FastTree 2: approximatelymaximum-likelihood trees for large alignments. PloS One 5, e9490

10. Pereira, J. H., Chen, Z., McAndrew, R. P., Sapra, R., Chhabra, S. R., Sale,K. L., Simmons, B. A., and Adams, P. D. (2010) Biochemical characteriza-tion and crystal structure of endoglucanase Cel5A from the hyperthermo-philic Thermotoga maritima. J. Struct. Biol. 172, 372–379

11. Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt,D.M., Meng, E. C., and Ferrin, T. E. (2004) UCSF Chimera: a visualizationsystem for exploratory research and analysis. J. Comput. Chem. 25,1605–1612

12. Hilge, M., Gloor, S. M., Rypniewski, W., Sauer, O., Heightman, T. D.,Zimmermann, W., Winterhalter, K., and Piontek, K. (1998) High-resolu-tion native and complex structures of thermostable �-mannanase fromThermomonospora fusca: substrate specificity in glycosyl hydrolase family5. Structure 6, 1433–1444

13. Tailford, L. E., Ducros, V. M., Flint, J. E., Roberts, S. M., Morland, C.,Zechel, D. L., Smith, N., Bjørnvad, M. E., Borchert, T. V., Wilson, K. S.,Davies, G. J., and Gilbert, H. J. (2009) Understanding how diverse �-man-nanases recognize heterogeneous substrates.Biochemistry 48, 7009–7018

14. Bourgault, R., Oakley, A. J., Bewley, J. D., and Wilce, M. C. (2005) Three-dimensional structure of (1,4)-�-D-mannan mannanohydrolase from to-mato fruit. Protein Sci. 14, 1233–1241

15. Sabini, E., Schubert, H., Murshudov, G., Wilson, K. S., Siika-Aho, M., andPenttila, M. (2000) The three-dimensional structure of a Trichodermareesei �-mannanase from glycoside hydrolase family 5. Acta Crystallogr.D. Biol. Crystallogr. 56, 3–13

16. Kelley, L. A., and Sternberg, M. J. (2009) Protein structure prediction onthe Web: a case study using the Phyre server. Nat. Protoc. 4, 363–371

17. Xiao, Z., Storms, R., andTsang, A. (2005)Microplate-based carboxymeth-ylcellulose assay for endoglucanase activity.Anal. Biochem. 342, 176–178

18. Anthon, G. E., and Barrett, D.M. (2002) Determination of reducing sugarswith 3-methyl-2-benzothiazolinonehydrazone. Anal. Biochem. 305,287–289

19. Wang, Y.,Wang,X., Tang, R., Yu, S., Zheng, B., and Feng, Y. (2010)Anovelthermostable cellulase from Fervidobacterium nodosum. J. Mol. Catal. B.Enzym. 66, 294–301

20. Vlasenko, E., Schulein, M., Cherry, J., and Xu, F. (2010) Substrate specific-ity of family 5, 6, 7, 9, 12, and 45 endoglucanases. Bioresour. Technol. 101,2405–2411

21. Tyler, L., Bragg, J. N.,Wu, J., Yang, X., Tuskan,G.A., andVogel, J. P. (2010)Annotation and comparative analysis of the glycoside hydrolase genes inBrachypodium distachyon. BMC Genomics 11, 600

22. Elifantz, H., Waidner, L. A., Michelou, V. K., Cottrell, M. T., and Kirch-man, D. L. (2008) Diversity and abundance of glycosyl hydrolase family 5in the North Atlantic Ocean. FEMS Microbiol. Ecol. 63, 316–327

23. Danchin, E. G., Rosso, M. N., Vieira, P., de Almeida-Engler, J., Coutinho,P.M., Henrissat, B., andAbad, P. (2010)Multiple lateral gene transfers andduplications have promoted plant parasitism ability in nematodes. Proc.Natl. Acad. Sci. U.S.A. 107, 17651–17656

24. Chivian, D., and Baker, D. (2006) Homology modeling using parametricalignment ensemble generation with consensus and energy-based modelselection. Nucleic Acids Res. 34, e112

25. Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. (2000) Enhancedgenome annotation using structural profiles in the program 3D-PSSM. J.Mol. Biol. 299, 499–520

26. O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D. G., and Notredame, C.(2004) 3DCoffee: combining protein sequences and structures withinmultiple sequence alignments. J. Mol. Biol. 340, 385–395

27. Stebbings, L. A., andMizuguchi, K. (2004) HOMSTRAD: recent develop-ments of theHomologous Protein Structure Alignment Database.NucleicAcids Res. 32, D203–D207

28. Ducros, V., Czjzek,M., Belaich, A., Gaudin, C., Fierobe, H. P., Belaich, J. P.,Davies, G. J., andHaser, R. (1995) Crystal structure of the catalytic domainof a bacterial cellulase belonging to family 5. Structure 3, 939–949

29. Domínguez, R., Souchon, H., Lascombe, M., and Alzari, P. M. (1996) Thecrystal structure of a family 5 endoglucanase mutant in complexed anduncomplexed forms reveals an induced fit activation mechanism. J. Mol.Biol. 257, 1042–1051

30. Foong, F., Hamamoto, T., Shoseyov, O., and Doi, R. H. (1991) Nucleotidesequence and characteristics of endoglucanase gene engB from Clostrid-

A Substrate Specificity Governing Motif for GH5 A4 Subfamily

25342 JOURNAL OF BIOLOGICAL CHEMISTRY VOLUME 287 • NUMBER 30 • JULY 20, 2012

by guest on April 5, 2019

http://ww

w.jbc.org/

Dow

nloaded from

ium cellulovorans. J. Gen. Microbiol. 137, 1729–173631. Whitehead, T. R. (1993) Analyses of the gene and amino acid sequence of

the Prevotella (Bacteroides) ruminicola 23 xylanase reveals unexpectedhomology with endoglucanases from other genera of bacteria. Curr. Mi-crobiol. 27, 27–33

32. Chhabra, S. R., Shockley, K. R., Ward, D. E., and Kelly, R. M. (2002) Reg-ulation of endo-acting glycosyl hydrolases in the hyperthermophilic bac-terium Thermotoga maritima grown on glucan- andmannan-based poly-saccharides. Appl. Environ. Microbiol. 68, 545–554

33. Chhabra, S. R., Shockley, K. R., Conners, S. B., Scott, K. L.,Wolfinger, R. D.,and Kelly, R. M. (2003) Carbohydrate-induced differential gene expres-sion patterns in the hyperthermophilic bacterium Thermotoga maritima.J. Biol. Chem. 278, 7540–7552

34. Wu, T. H., Huang, C. H., Ko, T. P., Lai, H. L., Ma, Y., Chen, C. C., Cheng,Y. S., Liu, J. R., and Guo, R. T. (2011) Diverse substrate recognition mech-anism revealed by Thermotoga maritima Cel5A structures in complexwith cellotetraose, cellobiose, and mannotriose. Biochim. Biophys. Acta1814, 1832–1840

35. Liang, C., Fioroni, M., Rodríguez-Ropero, F., Xue, Y., Schwaneberg, U.,and Ma, Y. (2011) Directed evolution of a thermophilic endoglucanase(Cel5A) into highly active Cel5A variants with an expanded temperatureprofile. J. Biotechnol. 154, 46–53

36. Reitinger, S., Yu, Y., Wicki, J., Ludwiczek, M., D’Angelo, I., Baturin, S.,Okon, M., Strynadka, N. C., Lutz, S., Withers, S. G., and McIntosh, L. P.(2010) Circular permutation of Bacillus circulans xylanase: a kinetic andstructural study. Biochemistry 49, 2464–2474

37. Himmel, M. E., Ding, S. Y., Johnson, D. K., Adney, W. S., Nimlos, M. R.,Brady, J. W., and Foust, T. D. (2007) Biomass recalcitrance: engineeringplants and enzymes for biofuels production. Science 315, 804–807

38. Wen, F., Nair, N. U., and Zhao, H. (2009) Protein engineering in designingtailored enzymes and microorganisms for biofuels production. Curr.Opin. Biotechnol. 20, 412–419

39. Percival Zhang, Y.H., Himmel,M. E., andMielenz, J. R. (2006)Outlook forcellulase improvement: screening and selection strategies. Biotechnol.Adv. 24, 452–481

40. Jensen, R. A. (1976) Enzyme recruitment in evolution of new function.Annu. Rev. Microbiol. 30, 409–425

41. Bershtein, S., Goldin, K., and Tawfik, D. S. (2008) Intense neutral driftsyield robust and evolvable consensus proteins. J. Mol. Biol. 379,1029–1044

42. Letunic, I., and Bork, P. (2011) Interactive Tree Of Life v2: online annota-tion and display of phylogenetic trees made easy. Nucleic Acids Res. 39,W475–W478

43. Crooks, G. E., Hon, G., Chandonia, J. M., and Brenner, S. E. (2004) We-bLogo: a sequence logo generator. Genome Res. 14, 1188–1190

44. Davies, G. J., Wilson, K. S., and Henrissat, B. (1997) Nomenclature forsugar-binding subsites in glycosyl hydrolases. Biochem. J. 321, 557–559

45. Davies, G. J., Mackenzie, L., Varrot, A., Dauter, M., Brzozowski, A. M.,Schulein, M., and Withers, S. G. (1998) Snapshots along an enzymaticreaction coordinate: analysis of a retaining �-glycoside hydrolase. Bio-chemistry 37, 11707–11713

A Substrate Specificity Governing Motif for GH5 A4 Subfamily

JULY 20, 2012 • VOLUME 287 • NUMBER 30 JOURNAL OF BIOLOGICAL CHEMISTRY 25343

by guest on April 5, 2019

http://ww

w.jbc.org/

Dow

nloaded from

ChhabraHarvey W. Blanch, Blake A. Simmons, Kenneth L. Sale, Dylan Chivian and Swapnil R.

Joshua I. Park, Michael P. Thelen, Paul D. Adams, Adam P. Arkin, Jay D. Keasling, Zhiwei Chen, Gregory D. Friedland, Jose H. Pereira, Sonia A. Reveco, Rosa Chan,

5Tracing Determinants of Dual Substrate Specificity in Glycoside Hydrolase Family

doi: 10.1074/jbc.M112.362640 originally published online May 29, 20122012, 287:25335-25343.J. Biol. Chem. 

  10.1074/jbc.M112.362640Access the most updated version of this article at doi:

 Alerts:

  When a correction for this article is posted• 

When this article is cited• 

to choose from all of JBC's e-mail alertsClick here

Supplemental material:

  http://www.jbc.org/content/suppl/2012/05/29/M112.362640.DC1

  http://www.jbc.org/content/287/30/25335.full.html#ref-list-1

This article cites 45 references, 7 of which can be accessed free at

by guest on April 5, 2019

http://ww

w.jbc.org/

Dow

nloaded from