native-like in vivo folding of a circularly permuted jellyroll protein

5
Proc. Nati. Acad. Sci. USA Vol. 91, pp. 10417-10421, October 1994 Biochemistry Native-like in vivo folding of a circularly permuted jellyroll protein shown by crystal structure analysis (protein fodig/protein secretion/x-ray crystaflography/circular permutation) MICHAEL HAHN*, KIRILL PIOTUKHt, RAINER BORRiSSt, AND UDO HEINEMANN** *Max-DelbrfickCentrum fOr Molekulare Medizin, Robert-ROssle-Strasse 10, 13122 Berlin, Germany, and Institut fMr Kristallographie, Freie Universitat, Takustrasse 6, 14195 Berlin, Germany; and tInstitut fOr Genetik und Mikrobiologie, Humboldt-Universitkt, Warschauer Strasse 43, 10243 Berlin, Germany Communicated by Diter von Wettstein, May 9, 1994 (received for review February 23, 1994) ABSTRACT A jellyroll a-sandwich protein, the Bacilus P gluanase H(A16-M), is used to probe the role of N-terminal peptide regions in protein folding in vivo. A gene e in H(A16-M) is rearranged to place residues 1-58 of the protein behind a signal peptide and residues 59-214. The rearranged gene is expressed in Escherichia coli. The resultant circularly permuted protein, cpA16M-59, is secreted into the periplasm, correctly processed, and folded into a stable and active enzyme. Crystal s e analysis at 2.0- resolution, R = 15.3%, shows cpAl6M-59 to have a three-dimensional structure nearly identical with that of the parent figlucanase. An anogous experiment based on the wild-type Bacillus nacerans 3-fluca- nase, giving rise to the circularly permuted variant cpMAC-57, yields the ame results. Folding of these proteins, therefore, is not a vectorial process depending on the conformation adopted by their native N-terminal oligopeptides after ribosomal syn- thesis and truslocation through the cytoplasmic membrane. Many proteins can be refolded to their native state in vitro, demonstrating that their active conformation is completely encoded by the amino acid sequence (1, 2). This appears to hold true for in vivo protein folding as well (3, 4). However, since proteins are synthesized as linear polypeptide chains starting from the N-terminus and may be threaded through membranes in the same way, a distinct role in the folding process is often attributed to the N-terminal peptide portion. According to a popular view it is "attractive to envisage the initial amino-terminal fragment folding into a nucleus of fixed conformation, which then directs folding of the remainder of the polypeptide chain as it is assembled" (5). This may be named the vectorial folding hypothesis. Quite frequently, N and C termini are situated near each other on the surface of globular proteins (6). These proteins may thus be used to test the vectorial folding hypothesis by linking the natural polypeptide chain termini and creating new ends somewhere else on the protein surface. Bovine pancreatic trypsin inhibitor circularly permuted in this way on the protein level could be shown to refold in vitro (7). More recently, the genes encoding several proteins of different folding topologies have been rearranged and expressed to give rise to circularly permuted proteins with native-like physical characteristics and enzymatic activities (8-11, 32). Although formal proof that the resulting proteins adopt a conformation very similar to the conformations of their native counterparts could not be presented, these results are difficult to reconcile with a model attributing an important role in the folding process to the native protein N terminus. (-Glucanases (1,3-1,4-p-D-glucan 4-glucanohydrolases, EC 3.2.1.73) hydrolyze 1,4-3glycosidic bonds which are next to 1,3-P linkages in mixed linked polysaccharides (12). The hybrid H(A16-M) (13) is identical to the P-glucanase from Bacillus macerans (BGLM) except for the 16 N-terminal residues, which are derived from the Bacillus amylolique- faciens enzyme (BGLA). Crystal structure analysis has shown H(A16-M) to be a jellyroll 3-sandwich protein of 214 residues with N and C termini next to each other in antipar- allel 3-strands on the protein surface (14). Here we describe the rearrangement of genes encoding H(A16-M) and BGLM, their secretion expression yielding the correctly processed, stable, and enzymatically competent circularly permuted variants cpAl6M-59 and cpMAC-57, and high-resolution crystal structures proving their native-like folding.§ EXPERIMENTAL PROCEDURES Plasmid Construction. The cloned genes encoding BGLM (15) and H(A16-M) (13) served as starting points for the construction of cpMAC-57 and cpA16M-59. Genes encoding the circularly permuted P-glucanases were constructed by using splicing by overlap extension (16). Two overlapping gene fragments were synthesized in separate primary ampli- fications. The fragments were then combined and amplified in a second reaction. Oligonucleotides used for sequencing with Sequenase (Pharmacia) and gene construction were from TIBMolbiol (Berlin). Thermostable Taq DNA polymer- ase, Replitherm, was from Perkin-Elmer/Cetus. For the construction of cpMAC-57, primers DoRe 1 and DoRe 3 (direct) and DoRe 2 and DoRe 4 (reverse) were used. DoRe 1 contains an unpaired tail complementary to the 3' part of the BGLM gene, and the unpaired tail of DoRe 2 contains two stop codons and a HindIll recognition site. DoRe 3 contains an unpaired tail coding for part of the BGLM signal peptide, and the unpaired tail of DoRe 4 is comple- mentary to the region of the gene encoding the first amino acids of the protein. For the construction of cpAl6M-59, primers DoRe 2 and DoRe 3 were used along with two specific primers, DoRe 5 (direct) with a 3' portion comple- mentary to the start of the BGLA gene and a 5' unpaired tail complementary to the end of the BGLM gene and DoRe 6 (reverse). The 3' portion of DoRe 6 is complementary to the end of the BGLM gene, and the unpaired 5' tail is comple- mentary to the start of the BGLA gene. Primary amplifications were carried out for a total of 25 cycles on 1 pg of template DNA and with 20 pmol of each Abbreviations: BGLM, P-glucanase from Bacillus macerans; BGLA, P-glucanase from Bacillus amyloliquefaciens; H(A16-M), hybrid P-glucanase with residues 1-16 derived from BGLA and residues 17-214 derived from BGLM; cpMAC-57, circularly per- muted variant of BGLM with residues 1-56 following residues 57-212; cpA16M-59, circularly permuted variant of H(A16-M) with residues 1-58 following residues 59-214. tTo whom reprint requests should be addressed. §Atomic coordinates and structure amplitudes of cpA16M-59 and cpMAC-57 have been deposited in the Protein Data Bank, Chem- istry Department, Brookhaven National Laboratory, Upton, NY 11973 (references 1CPM and 1CPN). 10417 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Upload: hathien

Post on 14-Dec-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Native-like in vivo folding of a circularly permuted jellyroll protein

Proc. Nati. Acad. Sci. USAVol. 91, pp. 10417-10421, October 1994Biochemistry

Native-like in vivo folding of a circularly permuted jellyroll proteinshown by crystal structure analysis

(protein fodig/protein secretion/x-ray crystaflography/circular permutation)

MICHAEL HAHN*, KIRILL PIOTUKHt, RAINER BORRiSSt, AND UDO HEINEMANN***Max-DelbrfickCentrum fOr Molekulare Medizin, Robert-ROssle-Strasse 10, 13122 Berlin, Germany, and Institut fMr Kristallographie, Freie Universitat,Takustrasse 6, 14195 Berlin, Germany; and tInstitut fOr Genetik und Mikrobiologie, Humboldt-Universitkt, Warschauer Strasse 43, 10243 Berlin, Germany

Communicated by Diter von Wettstein, May 9, 1994 (received for review February 23, 1994)

ABSTRACT A jellyroll a-sandwich protein, the BacilusPgluanase H(A16-M), is used to probe the role of N-terminalpeptide regions in protein folding in vivo. A gene e inH(A16-M) is rearranged to place residues 1-58 of the proteinbehind a signal peptide and residues 59-214. The rearrangedgene is expressed in Escherichia coli. The resultant circularlypermuted protein, cpA16M-59, is secreted into the periplasm,correctly processed, and folded into a stable and active enzyme.Crystal s e analysis at 2.0- resolution, R = 15.3%,shows cpAl6M-59 to have a three-dimensional structure nearlyidentical with that of the parent figlucanase. An anogousexperiment based on the wild-type Bacillus nacerans 3-fluca-nase, giving rise to the circularly permuted variant cpMAC-57,yields the ame results. Folding of these proteins, therefore, isnot a vectorial process depending on the conformation adoptedby their native N-terminal oligopeptides after ribosomal syn-thesis and truslocation through the cytoplasmic membrane.

Many proteins can be refolded to their native state in vitro,demonstrating that their active conformation is completelyencoded by the amino acid sequence (1, 2). This appears tohold true for in vivo protein folding as well (3, 4). However,since proteins are synthesized as linear polypeptide chainsstarting from the N-terminus and may be threaded throughmembranes in the same way, a distinct role in the foldingprocess is often attributed to the N-terminal peptide portion.According to a popular view it is "attractive to envisage theinitial amino-terminal fragment folding into a nucleus offixedconformation, which then directs folding of the remainder ofthe polypeptide chain as it is assembled" (5). This may benamed the vectorial folding hypothesis.

Quite frequently, N and C termini are situated near eachother on the surface of globular proteins (6). These proteinsmay thus be used to test the vectorial folding hypothesis bylinking the natural polypeptide chain termini and creatingnew ends somewhere else on the protein surface. Bovinepancreatic trypsin inhibitor circularly permuted in this wayon the protein level could be shown to refold in vitro (7). Morerecently, the genes encoding several proteins of differentfolding topologies have been rearranged and expressed togive rise to circularly permuted proteins with native-likephysical characteristics and enzymatic activities (8-11, 32).Although formal proof that the resulting proteins adopt aconformation very similar to the conformations of theirnative counterparts could not be presented, these results aredifficult to reconcile with a model attributing an importantrole in the folding process to the native protein N terminus.

(-Glucanases (1,3-1,4-p-D-glucan 4-glucanohydrolases,EC 3.2.1.73) hydrolyze 1,4-3glycosidic bonds which arenext to 1,3-P linkages in mixed linked polysaccharides (12).The hybrid H(A16-M) (13) is identical to the P-glucanase from

Bacillus macerans (BGLM) except for the 16 N-terminalresidues, which are derived from the Bacillus amylolique-faciens enzyme (BGLA). Crystal structure analysis hasshown H(A16-M) to be a jellyroll 3-sandwich protein of 214residues with N and C termini next to each other in antipar-allel 3-strands on the protein surface (14). Here we describethe rearrangement ofgenes encoding H(A16-M) and BGLM,their secretion expression yielding the correctly processed,stable, and enzymatically competent circularly permutedvariants cpAl6M-59 and cpMAC-57, and high-resolutioncrystal structures proving their native-like folding.§

EXPERIMENTAL PROCEDURESPlasmid Construction. The cloned genes encoding BGLM

(15) and H(A16-M) (13) served as starting points for theconstruction of cpMAC-57 and cpA16M-59. Genes encodingthe circularly permuted P-glucanases were constructed byusing splicing by overlap extension (16). Two overlappinggene fragments were synthesized in separate primary ampli-fications. The fragments were then combined and amplifiedin a second reaction. Oligonucleotides used for sequencingwith Sequenase (Pharmacia) and gene construction werefrom TIBMolbiol (Berlin). Thermostable Taq DNA polymer-ase, Replitherm, was from Perkin-Elmer/Cetus.For the construction of cpMAC-57, primers DoRe 1 and

DoRe 3 (direct) and DoRe 2 and DoRe 4 (reverse) were used.DoRe 1 contains an unpaired tail complementary to the 3'part of the BGLM gene, and the unpaired tail of DoRe 2contains two stop codons and a HindIll recognition site.DoRe 3 contains an unpaired tail coding for part ofthe BGLMsignal peptide, and the unpaired tail of DoRe 4 is comple-mentary to the region of the gene encoding the first aminoacids of the protein. For the construction of cpAl6M-59,primers DoRe 2 and DoRe 3 were used along with twospecific primers, DoRe 5 (direct) with a 3' portion comple-mentary to the start of the BGLA gene and a 5' unpaired tailcomplementary to the end of the BGLM gene and DoRe 6(reverse). The 3' portion of DoRe 6 is complementary to theend of the BGLM gene, and the unpaired 5' tail is comple-mentary to the start of the BGLA gene.Primary amplifications were carried out for a total of 25

cycles on 1 pg of template DNA and with 20 pmol of each

Abbreviations: BGLM, P-glucanase from Bacillus macerans;BGLA, P-glucanase from Bacillus amyloliquefaciens; H(A16-M),hybrid P-glucanase with residues 1-16 derived from BGLA andresidues 17-214 derived from BGLM; cpMAC-57, circularly per-muted variant of BGLM with residues 1-56 following residues57-212; cpA16M-59, circularly permuted variant of H(A16-M) withresidues 1-58 following residues 59-214.tTo whom reprint requests should be addressed.§Atomic coordinates and structure amplitudes of cpA16M-59 andcpMAC-57 have been deposited in the Protein Data Bank, Chem-istry Department, Brookhaven National Laboratory, Upton, NY11973 (references 1CPM and 1CPN).

10417

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 2: Native-like in vivo folding of a circularly permuted jellyroll protein

10418 Biochemistry: Hahn et al. Proc. Natl. Acad. Sci. USA 91 (1994)

3r 59

ss--157 B

/ %,f-1

S 1 B 156

I CC

.86

H(Al 6-M) r S-S -

IaE WS.- rn

c:pAt 6M-59

ELIII B 'D657

BGLM208 210 212

209 211 1 2 3 4 5 6 7K Y T S N G S V F W E P

AAATATACGAGCAATGGGAGTGTGTTCTGGGAACCA5' ACGAGCAATGGOAGTGTGTTCTGGG 3'

BGLM49 50 51 52 53 54 55 56 Hind IIIL T S S A Y N K -STOP-

TTAACGAGTTCTGCGTACAACAAATAGTAAGCTTGGCA3' TCAAGACGCATGTTGTTTATCATTCGAACCGT 5

Signal peptide B. mac. 0GLM-8 -7 -6 -5 -4 -3 -2 -1 57 56 59 60 61 62 63 64I F S V S A L A F D C A E Y R S

ATTTTTTCTGTA;AGCGCTTTAGCGTTTGACTGCGCGGAGTACCGATCA5' CTGTAAGCGCTTTAGCGTTTGACTGCGCGGAGTAC 3'

BGLM206 208 210 212

205 207 209 211 1 2 3 4 5 6 7D W V K Y T S N G S V F W E P

GACTGGGTAAAATATACGAGCAAT0GGAGTGTGTTCT0GGGAACCA3 CCATTTTATATGCTCGTTACCCTCACAC 5

H (A16-M)210 212 214

211 213 1 2 3 4 5 6 7K Y T S N Q T G GO F F

AAATATACGAGCAATCAAACAGGCGGATCOTTTTTT5' ACGAGCAATCAAACAGGCGGATCG 3'

H(A16-M)206 208 210 212 214

207 209 211 213 1 2 3 4 5 6Y D W V K Y T S N Q T G G S B

5 TATGACTGGGTAAAATATACGAGCAATCAAWCAGGCGGATCOGT3' CCCATTTTATATGCTCGTTAGTTTGTCCGCC 5'

Fic. 1 Sequence organization in thef-glucanases H) X16-M) and BGtLM andtheir circularis pe rmuted .ariants.cpAl6M-9 mnd cpMAC-7 . (4.) C(rxstalstructureofH(A16-M)f14). lhe N-terminalpeptide portion iresidues 1-W shown inblack is moved to the C terminus bh thecircular permutation gi ving rise to cpA 16M-59. This ribbon diagram was drawlvn withM01SC(RIPT {24). (B) One-dimensional rep-resentation of thc circular permutations.S-S. disulfide bond. SP B.Bracerwrs signalpeptide. Hybrid H(A16-M) is identical withBGI M except for the N-terminal 16 resi-dues (shaded). which ire from the B.anVlohiiquew(1 iite7 0-glucanast. BL-LA. (C)Oligonucleotide primers uLsed for construc-tion of the permuted genes. DNA and aminoacid sequences around the fusion points areindicated. Primers DoRe 1-4 were used forconstruction ofcpMAC-5'7. Primers DoRe 5and 6 were used for construction ofcpA16M-59. Overlapping fragments ob-

6 tained by the primary amplification stepsTT were fused. with primers DoRe 2 and DoRe

3 yielding the permuted f-glucanases.

primer in a total volume of 50 A. The PCR cycle parameterswere denaturation at 94°C for 36 sec, annealing at 50°C for 36sec, and extension at 72°C for 42 sec. The amplificationproducts were separated by electrophoresis in 1% (wt/vol)agarose, and DNA fragments were recovered from the gelmatrix by using Prep-A-Gene (Bio-Rad). The DNA obtainedserved as template for subsequent amplifications to produce

the complete permuted genes. In the second round of PCR,for each gene construction 1 pg each of the overlapping genefragments were mixed. Five PCR cycles of 94°C for 36 sec,50°C for 36 sec, and 72°C for 42 sec were carried out toenhance the production of the full-length permuted genes.Thereafter, 20 pmol each of the primers DoRe 2 and DoRe 3were added and the reactions were continued for 25 cycles.

Table 1. Enzymatic activity and thermal stability of the P-glucanases H(A16-M) and BGLM andtheir circularly permuted variants

Property H(A16-M) cpA16M-59 BGLM cpMAC-57Km, mg/ml 0.49 ± 0.1 0.47 ± 0.02 0.48 ± 0.08 0.47 ± 0.01k~at, S-' 1860 ± 50 1450 ± 90 1880 ± 70 1700 ± 50Temp. optimum, °C 64 62 67 65ti/2 at 70°C and pH 5, min 137 46 30 8

BBGLM

I - 21121

cpMAC-57

214

188

C

Primer DoRe 1 - US

Primer DoRe 2 - LS

Primer DoRe 3 - US

Primer DoRe 4 - LS

Primer DoRe 5 - US

Primer DoRe 6 - LS

I I

LJ L-n11

1. -, IC; II q

Page 3: Native-like in vivo folding of a circularly permuted jellyroll protein

Proc. Natl. Acad. Sci. USA 91 (1994) 10419

Amplified products were digested with Eco47III and HindIIIand subcloned in pTZ19R-M (13). The constructed geneswere completely sequenced.

Protein Expression, Purification, and Characterization.Escherichia coli transformants growing on lichenan-agarplates and secreting active ft-glucanase were identified bystaining with Congo red (17). The proteins were liberatedfrom the E. coli periplasm by osmotic shock (18) and diafil-tration (Amicon) with sodium acetate buffer, pH 5.0/10 mMCaCl2. The sample was applied to a column containingS-Sepharose HP (Pharmacia) and eluted in 50 mM sodiumacetate, pH 5.0/5 mM CaCl2 with a linear gradient of 0-400mM NaCl. Fractions containing 3-glucanase were furtherpurified by gel filtration using Superdex 75 (Pharmacia)equilibrated with 20mM sodium acetate, pH 5.0/5 mM CaCl2before analysis for homogeneity by SDS/PAGE. Enzymaticactivities were determined in triplicate with lichenan assubstrate (17) at 50°C in 50 mM sodium acetate buffer, pH6.0/20 mM CaC42. The half-life (tOi2) was measured by incu-bation of the enzyme solution at 700C in 50 mM sodiumacetate buffer, pH 5.0/10mM CaCl2 and determination ofthetime required for 50%o inactivation.

Crystal Structure Analysis. Crystals were obtained by thehanging-drop vapor diffusion method, using equal volumes ofprotein and precipitant solutions. CpA16M-59 was stored in20 mM 4-morpholineethanesulfonic acid, pH 7.0/2 mMEDTA at a concentration of 38 mg/ml and equilibratedagainst 12% (vol/vol) polyethylene glycol 4000 in 0.1 Msodium acetate, pH 4.6/5 mM CaCl2. CpMAC-57 was storedat 5 mg/ml in 10mM sodium acetate, pH 4.3/2 mM EDTA/5mM CaCl2. Small crystals were obtained with 20% polyeth-ylene glycol 8000, 5% (vol/vol) isopropyl alcohol in 0.1 MTris HCl, pH 8.0, and enlarged by macro seeding. X-ray datafor cpA16M-59 were collected on a FAST television areadetector system and data for cpMAC-57 on a 180-mm MAR-Research image plate, both mounted on FR 571 rotatinganode x-ray generators producing Cu Ka radiation. In bothcases, the start model for molecular replacement with X-PLOR(19) was the crystal structure of H(A16-M) (14). Manualcorrections of the models with FRODO (20), mainly to accom-modate the circular permutations and sequence modifica-tions in case of cpMAC-57, followed by simulated annealingrefinement (21, 22), yieldedR values of22.4% for cpA16M-59and 27.5% for cpMAC-57. Subsequent refinement until con-vergence was with TNT (23).

RESULTSConstruction, Stability, and Enzymatic Activity of Circu-

larly Permuted Proteins. The circular permutations ofH(A16-M) and BGLM are shown in Fig. 1, both on the gene-level and on the level ofthe folded protein. They are achievedin the most straightforward way without deletion of residues

Table 2. Crystallographic parameters for the circularly permutedproteins cpA16M-59 and cpMAC-57

Parameter cpA16M-59 cpMAC-57Crystal dataSpace group P212121 P21Cell parameters

a, A 39.43 65.47b, A 44.30 41.56c, A 128.34 39.570,0 111.46

Nominal resolution, A 2.0 1.8Independent observations, no. 12,507 16,674Completeness, % 85.2 90.1Rmerge, % 3.3 4.7

Structure refinementAmino acid residues in final model 214 208Water molecules in final model 123 116rms A bond lengths, A 0.021 0.015rms A bond angles, 0 2.78 2.77Close nonbonded contacts 27 23rms A nonbonded contacts, A 0.119 0.091R, % 15.3 16.4Max. difference electron density,

eiA-3 0.325 0.325Mean coordinate error, A <0.2 <0.2Rmerge = YIIh - IhI1/YJh,., the sum over i measurements of h

independent reflections. The crystallographic residual is R = Y2IFo -F,1/>2F0, where F. and F_ are observed and calculated structureamplitudes, respectively. rms A, root-mean-square deviation fromtarget values. The mean coordinate error was estimated according toLuzzati (25).

or insertion of linker peptides. The promoter and the codingregion for the signal peptide of cpMAC-57 were derived fromthe B. macerans -glucanase (BGLM) gene. By using thePCR technique, overlapping segments of the gene encodingBGLM were extended to give rise to a permuted geneencoding cpMAC-57 with Phe-57 as the N-terminal aminoacid residue and Lys-56 at the C terminus. cpA16M-59 wasconstructed according to the same scheme but using thehybrid H(A16-M) gene as the template. The N-terminalamino acid residue of the mature enzyme is Phe-59.Both cpA16M-59 and cpMAC-57 are folded in the peri-

plasmic space to stable proteins and active enzymes. Thepurified proteins display enzymatic activities similar to thoseof the parental enzymes (Table 1). There is a slight decreasein the half-life at 70°C for both variants which may reflect theloss of a small number of stabilizing intramolecular interac-tions upon circular permutation. N-terminal sequencingyielded the expected sequence Phe-Asp-Cys-Ala-Glu-Tyr(not shown), proving that both cpAl6M-59 and cpMAC-57are correctly processed within the E. coli periplasm.

FiG. 2. Stereoview of newlyformed peptide bond in cpMAC-57Iinking the polypeptide chain ter-mini of BGLM, with FO - Fc dif-ference electron density contouredat 1 above the mean density. Themap was calculated by using thefinal model ofcpMAC-57 but with-out contribution from the labeledresidues. Hydrogen bonds areshown with dashed lines.

Biochemistry: Hahn et al.

Page 4: Native-like in vivo folding of a circularly permuted jellyroll protein

Proc. Natl. Acad. Sci. USA 91 (1994)

/ /N.,

Glu 109 /

/A/Sp 107

. Glu 105

Trp103.;Trp 10)31\

(

Glu 109 /

A\sp 10(7

(I Glu I105

Trp I03

K

FiG. 3. Stereoviews of least-squares superposition (28) of the calcium (green) and sodium (red) crystal forms ofH(A16-M) and the circularlypermuted (-glucanase variants cpA16M-59 (blue) and cpMAC-57 (black). (Upper) Ca trace. (Lower) Active-site region with the likely catalyticresidues Glu-105 and Glu-109 [H(A16-M) numbering] of the ,-glucanases.

Crystal Structures of Circulrl Permuted Proteins. Crystalstructure analyses at 2.0- and 1.8-A resolution, respectively,yield reliable protein models with good stereochemistry andR values of 15.3% for cpA16M-59 and 16.4% for cpMAC-57(Table 2). In cpMAC-57 the four C-terminal amino acidresidues are not seen in the electron density due to disorder.The newly formed peptide bonds are clearly seen in differ-ence electron density (Fig. 2). In cpMAC-57 the peptide unitis part of a type I (-turn which is further stabilized by ahydrogen bond linking the side chains ofSer-155 and Ser-158.Due to the different sequence with insertion of two residuesin this area the novel turn is less regular in cpA16M-59.The circularly permuted proteins have conformations

closely similar to that of H(A16-M), adopting a jellyroll-typefolding with two curved (-sheets which form a compactstructure and a deep active-site channel. A calcium ion isbound to cpA16M-59 and cpMAC-57 at the same site as inH(A16-M) (14). The two cysteine residues form disulfidebonds in both permuted proteins. However, due to thecircular permutation they now span 186 and 184 residues in

the linear sequences of cpA16M-59 and cpMAC-57, respec-tively, rather than 30 residues in H(A16-M) and BGLM. Thisis expected to have a stabilizing effect on the conformationsof the permuted proteins (26) which is offset by slightstructural changes leading to the observed decrease in ther-mal stability.

Native-like Folding of Circularly Permuted Proteins. Quan-titative structure comparisons are complicated by the factthat H(A16-M), cpAl6M-59, and cpMAC-57 all crystallize indifferent crystal environments, which may add to structuraldifferences caused by the circular permutation. In the crystalstructure originally reported (14) a calcium ion was bound toH(A16-M). The lattice constants in space group P212121 ofthis form, H(A16-M) Ca2+, a = 64.23 A, b = 78.52 A, and c= 39.30 A, are different from those of H(A16-M) with boundsodium, H(A16-M) Na+, with a = 70.22 A, b = 72.56 A, andc = 49.97 A in the same space group (27). This may be usedas an internal control for the effect of crystal packing on/3-glucanase structure. rms deviations between correspond-

10420 Biochemistry: Hahn et al.

Page 5: Native-like in vivo folding of a circularly permuted jellyroll protein

Proc. NatL. Acad. Sci. USA 91 (1994) 10421

ing a-carbon positions of H(A16-M)-Ca2+, H(A6-M) Na+,cpAl6M-59, and cpMAC-57 are below 0.39 A for all pairs ofproteins ifa small number ofresidues around the old and newchain termini are excluded from the match. The structuraldifferences thus only slightly exceed the experimental error.The combined effects of circular permutation and crystalpacking do not exceed the combined effects of cation bindingand crystal packing on the three-dimensional structure of thef3glucanase.Structural differences between the (-glucanases are con-

fined to loop regions and are most obvious near the old andnew polypeptide chain termini (Fig. 3). The small structuraldeviations observed there are partly due to differences incrystal packing involving C-terminal residues of all crystalforms except for H(A16-M)-Na+ and the N-terminal residuein H(A16-M)Ca2+ which interacts with the peptide segmentbetween residues 16 and 18. The (3-sheet structure is almostentirely conserved in the different protein variants and crys-tal forms. Thus, most interatomic interactions stabilizing theH(A16-M) structure and the geometry of the catalytic siteinvolving the side chains ofGlu-105 and Glu-109 ofH(A16-M)are preserved in the circularly permuted proteins, explainingtheir described conformational stability and catalytic activ-ity.

DISCUSSIONIt has been shown earlier that proteins with novel polypeptidetermini generated by circular permutation on either theprotein or the gene level can fold in vitro (7) and in vivo (8-11,32) into species with enzymatic activities and spectral char-acteristics similar to those of the original wild-type proteins.However, three-dimensional structures of these circularlypermuted proteins have not been reported. The proteinsstudied by circular permutation oftheir genes (8-11,32) differin their folding topologies and domain structures. All wereproduced and folded in E. coli. The jellyroll proteinH(A16-M) and its variants adopt yet another fold and acquiretheir native conformation in the E. coli periplasm rather thanthe cytoplasm. Nevertheless, although the proteins and theenvironments in which they fold are drastically different, allstudies arrive at the conclusion that in vivo folding of circu-larly permuted variants is possible. For cpA16M-59 andcpMAC-57, x-ray crystallography proves native-like foldingin addition. It should be noted that a circular permutation hasoccurred in the gene encoding the (3-glucanase from Fibro-bacter succinogenes (29, 30) the sequence organization ofwhich resembles cpA16M-59, but not the 3-glucanases fromBacillus and Clostridium species which are otherwise homol-ogous. This enzyme has served as the template according towhich the circular permutations were designed.The permutation experiments with H(A16-M) and other

proteins (8-11) seem to suggest that the vectorial in vivofolding model (5) attributing a role as a folding "template" tothe N-terminal polypeptide segment cannot be generallyvalid. They also argue that in vitro refolding of such proteinsis not necessarily different from their in vivo folding. It shouldbe pointed out, however, that some proteins, such as sub-tilisin, where an N-terminal propeptide acts as an intramo-lecular chaperone (31), require their intact N terminus forcorrect folding. It remains an open question at presentwhether H(A16-M) and cpA16M-59 fold via similar interme-

diates and where these may be located on the folding path-ways. Circularly permuted proteins may prove to be goodmodels to study protein folding and valuable test cases fortheoretical structure prediction methods.

We thank C. N. Pace (Texas A&M University) for inspiring theexperiment, W. Saenger (Free University of Berlin) for continuedsupport, and both for reading the manuscript. The preparation ofparental and permuted proteins by 0. Simon is gratefully acknowl-edged. This study was made possible by grants from the DeutscheForschungsgemeinschaft to U.H. and to R.B. and by the Fonds derChemischen Industrie.

1. Anfinsen, C. B. (1973) Science 181, 223-230.2. Baldwin, R. L. (1993) Curr. Opin. Struct. Biol. 3, 84-91.3. Ellis, R. J. (1993) Nature (London) 366, 213-214.4. Martin, J., Mayhew, M., Langer, T. & Hard, F. U. (1993)

Nature (London) 366, 228-233.5. Creighton, T. E. (1993) Proteins: Structures and Molecular

Properties (Freeman, New York), 2nd Ed., p. 323.6. Thornton, J. M. & Sibanda, B. L. (1983) J. Mol. Biol. 167,

443-460.7. Goldenberg, D. P. & Creighton, T. E. (1983) J. Mol. Biol. 165,

407-413.8. Luger, K., Hommel, U., Herold, M., Hofsteenge, J. &

Kirschner, K. (1989) Science 243, 206-210.9. Buchwalder, A., Szadkowski, H. & Kirschner, K. (1992)

Biochemistry 31, 1621-1630.10. Yang, Y. R. & Schachman, H. K. (1993) Proc. Natl. Acad. Sci.

USA 90, 11980-11984.11. Zhang, T., Bertelsen, E., Benvegnu, D. & Alber, T. (1993)

Biochemistry 32, 12311-12318.12. Svensson, B. & Sogaard, M. (1993) J. Biotechnol. 29, 1-37.13. Olsen, O., Borriss, R., Simon, 0. & Thomsen, K. K. (1991)

Mol. Gen. Genet. 225, 177-185.14. Keitel, T., Simon, O., Borriss, R. & Heinemann, U. (1993)

Proc. Natd. Acad. Sci. USA 90, 5287-5291.15. Borriss, R., Buettner, K. & Maentsaelae, P. (1990) Mol. Gen.

Genet. 22, 278-283.16. Horton, R. M., Hunt, H. D., Ho, S. N., Pullen, J. K. & Pease,

L. R. (1989) Gene 77, 61-8.17. Borriss, R., Olsen, O., Thomsen, K. K. & von Wettstein, D.

(1989) Carlsberg Res. Commun. 54, 41-54.18. Cornelis, P., Digneffe, C. & Willemot, K. (1982) Mol. Gen.

Genet. 186, 507-511.19. Brunger, A. T. (1990) Acta Crystallogr. A46, 46-57.20. Jones, T. A. (1985) Methods Enzymol. 115, 157-171.21. Brfnger, A. T., Kuriyan, J. & Karplus, M. (1987) Science 235,

458-460.22. Brnnger, A. T., Knukowski, A. & Erickson, J. (1990) Acta

Crystallogr. A46, 585-593.23. Tronrud, D. E., Ten Eyck, L. F. & Matthews, B. W. (1987)

Acta Crystallogr. A43, 489-501.24. Kraulis, J. P. (1991) J. Appl. Crystailogr. 24, 946-950.25. Luzzati, V. (1952) Acta Crystallogr. S. 802-810.26. Pace, C. N., Heinemann, U., Hahn, U. & Saenger, W. (1991)

Angew. Chem. Int. Ed. Engl. 30, 343-360.27. Keitel, T., Meldgaard, M. & Heinemann, U. (1994) Eur. J.

Biochem. 222, 203-214.28. Kabsch, W. (1978) Acta Crystallogr. A32, 922-923.29. Teather, R. M. & Erfle, J. D. (1990) J. Bacteriol. 172, 3837-

3841.30. Schmming, S., Schwarz, W. H. & Staudenbauer, W. L. (1992)

Eur. J. Biochem. 204, 13-19.31. Shinde, U., Li, Y., Chatteijee, S. & Inouye, M. (1993) Proc.

Nati. Acad. Sci. USA ", 6924-6928.32. Mullins, L. S., Wesseling, K., Kuo, J. M., Garrett, J. B. &

Raushel, F. M. (1994) J. Am. Chem. Soc. 116, 5529-5533.

Biochemistry: Hahn et al.