nucleotide sequence of an escherichia coli chromosomal hemolysin

12
JOURNAL OF BACTERIOLOGY, JUIY 1985, p. 94-105 0021-9193/85/070094-12$02.00/0 Copyright © 1985, American Society for Microbiology Vol. 163, No. 1 Nucleotide Sequence of an Escherichia coli Chromosomal Hemolysin TERESA FELMLEE, SHAHAIREEN PELLETT, AND RODNEY A. WELCH* Department of Medical Microbiology, University of Wisconsin Medical School, Madison, Wisconsin 53706 Received 26 December 1984/Accepted 17 April 1985 We determined the DNA sequence of an 8,211-base-pair region encompassing the chromosomal hemolysin, molecularly cloned from an 04 serotype strain of Escherichia coli. All four hemolysin cistrons (transcriptional order, C, A, B, and D) were encoded on the same DNA strand, and their predicted molecular masses were, respectively, 19.7, 109.8, 79.9, and 54.6 kilodaltons. The identification of pSF4000-encoded polypeptides in E. coli minicells corroborated the assignment of the predicted polypeptides for hlyC, hlyA, and hlyD. However, based on the minicell results, two polypeptides appeared to be encoded on the hlyB region, one similar in size to the predicted molecular mass of 79.9 kilodaltons, anqd the other a smaller 46-kilodalton polypeptide. The four hemolysin gene displayed similar codon usage, which is atypical for E. coli. This reflects the low guanine-plus- cytosine content (40.2%) of the hemolysin DNA sequence and suggests the non-E. coli origin of the hemolysin determinant. In vitro-derived deletions of the hemolysin recombinant plasmid pSF4000 indicated that a region between 433 and 301 base pairs upstream of the putative start of hlyC is necessary for hemolysin synthesis. Based on the DNA sequence, a stem-loop transcription terminator-like structure (a 16-base-pair stem followed by seven uridylates) in the mRNA was predicted distal to the C-terminal end of hlyA. A model for the general transcriptional organization of the E. coli hemolysin determinant is presented. The genetics and biochemistry of the Escherichia coli hemolysin have become an active area of research after recent reports of its significance in the virulence of extrain- testinal E. coli infections (2-4, 7, 12, 43). In addition, factors influencing the level of expression of extracellular hemolysin activity are known to affect the level of homolysin-associ- ated virulence in vivo (44). This evidence suggests that insights into the regulation of the hemolysin may be of clinical as well as academic interest. Our laboratory, along with others, has identified by transposon-mediated mutagen- esis an approximate coding region of seven kilobases (kb) responsible for hemolysin synthesis (29, 30, 38, 45). Goebel and co-workers identified within this region four cistrons, hlyC, hlyA, hlyBa, and hlyBb, necessary for extracellular hemolysin activity (30, 42). There exists a close similarity in DNA sequences among E. coli hemolysins based on DNA- DNA hybridization (1, 19, 45). However, it has also become apparent that there are DNA sequence differences among the E. coli hemolysins and, not surprisingly, that these differences appear to have significant genetic, biochemical, and pathogenic consequences (12, 23, 43, 45). Presented here is the complete DNA sequence of the hemolysin-encoding region of the recombinant plasmid pSF4000. The hemolysin determinant in this case originated from a urinary tract isolate of E. coli (strain J96, 04 serotype). Genetic as well as physical evidence indicates that the hemolysin was originally encoded on the J96 chro- mosome (15, 45). MATERIALS AND METHODS Bacteria and bacteriophage strains. Bacterial strains used in this study include HB101, WAF100 (HB101 transformed with pSF4000) (45) as the source of hemolysin-specific restric- tion endonuclease fragments, and JM101 as the host of M13 bacteriophage subclones (24). The source of hemolysin re- combinant plasmids pSF4000 and pANN202-312 has been * Corresponding author. described (43, 45). DS410 (minA minB) was used as the host background for electrophoretic analysis of [35S]methionine- labeled plasmid-encoded polypeptides present in isolated minicells (22). pUC9 and the replicative forms of the M13 vectors mp8, mplO, and mpll (40) used for the subcloning were acquired from New England Biolabs, Inc., and P-L Biochemicals, Inc. Media and buffers. Recipes for LB broth and YT and 5-bromo-4-chloro-3-indolyl-o-D-galactoside-YT-overlay agar media employed in growing bacteria and bacteriophage were taken from Miller (28). Tris-acetate and Tris-borate buffers utilized for DNA electrophoresis were prepared as described previously (44, 45). Molecular cloning and DNA sequencing. The strategy for determination of the hemolysin DNA sequence by using the chain termination method (35) and M13 vectors (40) was to purify overlapping 1- to 3-kb restriction endonuclease frag- ments by agarose gel electrophoresis and electroelution of the DNA from the isolated gel slice. The isolated fragments were redigested with a second restriction endonuclease that is known to have multiple digestion sites within the fragment and is capable of generating ends that are insertible in the multiple cloning site of the M13 vectors. The redigested fragments were ligated with the appropriately digested M13 vector, and the chimeric phage DNA was transfected into JM101 (26). The protocols for the isolation of recombinant templates and the dideoxy-sequencing reactions were those suggested by the commercial supplier of the dideoxy- nucleotide mixtures, the DNA polymerase large fragment, and the restriction endonucleases (New England Biolabs). [a-32P]dATP (ca. 800 Ci/mmol) was purchased from Amersham Corp. The labeled reaction mixtures were sepa- rated by electrophoresis on urea-10% acrylamide gels, the gels were dried down, and the sequence was read from X-ray film autoradiograms. The methods used in our laboratory for the construction of in vitro derived deletions and subclones of pSF4000 and pANN202-312 have been described elsewhere (44, 45). 94

Upload: lamdang

Post on 30-Dec-2016

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

JOURNAL OF BACTERIOLOGY, JUIY 1985, p. 94-1050021-9193/85/070094-12$02.00/0Copyright © 1985, American Society for Microbiology

Vol. 163, No. 1

Nucleotide Sequence of an Escherichia coliChromosomal Hemolysin

TERESA FELMLEE, SHAHAIREEN PELLETT, AND RODNEY A. WELCH*

Department of Medical Microbiology, University of Wisconsin Medical School, Madison, Wisconsin 53706

Received 26 December 1984/Accepted 17 April 1985

We determined the DNA sequence of an 8,211-base-pair region encompassing the chromosomal hemolysin,molecularly cloned from an 04 serotype strain of Escherichia coli. All four hemolysin cistrons (transcriptionalorder, C, A, B, and D) were encoded on the same DNA strand, and their predicted molecular masses were,respectively, 19.7, 109.8, 79.9, and 54.6 kilodaltons. The identification of pSF4000-encoded polypeptides in E.coli minicells corroborated the assignment of the predicted polypeptides for hlyC, hlyA, and hlyD. However,based on the minicell results, two polypeptides appeared to be encoded on the hlyB region, one similar in sizeto the predicted molecular mass of 79.9 kilodaltons, anqd the other a smaller 46-kilodalton polypeptide. The fourhemolysin gene displayed similar codon usage, which is atypical for E. coli. This reflects the low guanine-plus-cytosine content (40.2%) of the hemolysin DNA sequence and suggests the non-E. coli origin of the hemolysindeterminant. In vitro-derived deletions of the hemolysin recombinant plasmid pSF4000 indicated that a regionbetween 433 and 301 base pairs upstream of the putative start of hlyC is necessary for hemolysin synthesis.Based on the DNA sequence, a stem-loop transcription terminator-like structure (a 16-base-pair stem followedby seven uridylates) in the mRNA was predicted distal to the C-terminal end of hlyA. A model for the generaltranscriptional organization of the E. coli hemolysin determinant is presented.

The genetics and biochemistry of the Escherichia colihemolysin have become an active area of research afterrecent reports of its significance in the virulence of extrain-testinal E. coli infections (2-4, 7, 12, 43). In addition, factorsinfluencing the level of expression of extracellular hemolysinactivity are known to affect the level of homolysin-associ-ated virulence in vivo (44). This evidence suggests thatinsights into the regulation of the hemolysin may be ofclinical as well as academic interest. Our laboratory, alongwith others, has identified by transposon-mediated mutagen-esis an approximate coding region of seven kilobases (kb)responsible for hemolysin synthesis (29, 30, 38, 45). Goebeland co-workers identified within this region four cistrons,hlyC, hlyA, hlyBa, and hlyBb, necessary for extracellularhemolysin activity (30, 42). There exists a close similarity inDNA sequences among E. coli hemolysins based on DNA-DNA hybridization (1, 19, 45). However, it has also becomeapparent that there are DNA sequence differences amongthe E. coli hemolysins and, not surprisingly, that thesedifferences appear to have significant genetic, biochemical,and pathogenic consequences (12, 23, 43, 45).

Presented here is the complete DNA sequence of thehemolysin-encoding region of the recombinant plasmidpSF4000. The hemolysin determinant in this case originatedfrom a urinary tract isolate of E. coli (strain J96, 04serotype). Genetic as well as physical evidence indicatesthat the hemolysin was originally encoded on the J96 chro-mosome (15, 45).

MATERIALS AND METHODSBacteria and bacteriophage strains. Bacterial strains used

in this study include HB101, WAF100 (HB101 transformedwith pSF4000) (45) as the source ofhemolysin-specific restric-tion endonuclease fragments, and JM101 as the host of M13bacteriophage subclones (24). The source of hemolysin re-combinant plasmids pSF4000 and pANN202-312 has been

* Corresponding author.

described (43, 45). DS410 (minA minB) was used as the hostbackground for electrophoretic analysis of [35S]methionine-labeled plasmid-encoded polypeptides present in isolatedminicells (22). pUC9 and the replicative forms of the M13vectors mp8, mplO, and mpll (40) used for the subcloningwere acquired from New England Biolabs, Inc., and P-LBiochemicals, Inc.Media and buffers. Recipes for LB broth and YT and

5-bromo-4-chloro-3-indolyl-o-D-galactoside-YT-overlayagar media employed in growing bacteria and bacteriophagewere taken from Miller (28). Tris-acetate and Tris-boratebuffers utilized for DNA electrophoresis were prepared asdescribed previously (44, 45).

Molecular cloning and DNA sequencing. The strategy fordetermination of the hemolysin DNA sequence by using thechain termination method (35) and M13 vectors (40) was topurify overlapping 1- to 3-kb restriction endonuclease frag-ments by agarose gel electrophoresis and electroelution ofthe DNA from the isolated gel slice. The isolated fragmentswere redigested with a second restriction endonuclease thatis known to have multiple digestion sites within the fragmentand is capable of generating ends that are insertible in themultiple cloning site of the M13 vectors. The redigestedfragments were ligated with the appropriately digested M13vector, and the chimeric phage DNA was transfected intoJM101 (26). The protocols for the isolation of recombinanttemplates and the dideoxy-sequencing reactions were thosesuggested by the commercial supplier of the dideoxy-nucleotide mixtures, the DNA polymerase large fragment,and the restriction endonucleases (New England Biolabs).[a-32P]dATP (ca. 800 Ci/mmol) was purchased fromAmersham Corp. The labeled reaction mixtures were sepa-rated by electrophoresis on urea-10% acrylamide gels, thegels were dried down, and the sequence was read from X-rayfilm autoradiograms.The methods used in our laboratory for the construction of

in vitro derived deletions and subclones of pSF4000 andpANN202-312 have been described elsewhere (44, 45).

94

Page 2: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

E. COLI CHROMOSOMAL HEMOLYSIN 95

PACYC 184

pSF4000 S Hrs P SP BP PYT II-11 TTiII*, a a a ----II

s|s|-By p H P H P H

I IS

4 6 7 9-I I - I

Ah 111 --IAlu I -

Eco RI - - . -Eco RV -Hae III -Hind III -Hpa II - " -'-. -e 1 -'

Pot I - _Pvu II -RsaI -ISau 3A -Sma I -,--FIG. 1. Sequencing strategy of the E. coli hemolysin located on the recombinant plasmid pSF4000. The hemolysin was localized by

Tnl-mediated mutagenesis to the region shown above the restriction enconuclease fragment map at the top. The 8.2-kb region that wassequenced has been expanded in the lower portion of the figure (the numbered vertical lines represent 1 kb). The direction of the sequencedDNA fragments is shown by the arrows, and the identity of the particular restriction endonuclease fragments is listed to the left. Theabbreviations for the restriction endonuclease sites shown on the physical map at the top of the figure are as follows: SA, Sall; H, HindIII;P, PstI; S, SmaI; B, BamHI; and Bg, BglII.

DNA and amino acid sequence analysis. For routine DNAsequence searches we employed the Pustell programs (33) onan IBM PC microcomputer (programs provided courtesy ofInternational Biotechnologies, Inc.). The program for con-struction of codon preference plots has been describedelsewhere (10). A program written to permit the drawing ofhydropathy plots and the predicted polypeptide secondarystructures by using the rules of Kyte and Doolittle (20) andChou and Fasman (5) was developed by M. Gribskov and R.Burgess. These two programs were performed on a DigitalVAX computer, and the plots were produced by a Hewlett-Packard 7221T plotter.

Isolation and radiolabeling of minicells. Minicells wereisolated from DS410 and different DS410 plasmid transform-ants after overnight growth at 37°C in brain heart infusionbroth. The minicells were purified by sedimentation throughsucrose gradients and radiolabeled with [35S]methionine(Amersham) according to the methods of Gill et al. (8).Glycerol (final concentration, 0.2%) was substituted forglucose in the minicell-labeling medium when isopropyl-,-D-thiogalactoside-induced transcription from the pUC9lac promoter was being examined. Isopropyl-p-D-thiogalac-toside (final concentration, 0.66 mM) was added 30 minbefore the addition of [35S]methionine label, and incubationwas continued for 90 min. Radiolabeled proteins present insodium dodecyl sulfate-lysed minicells were separated bypolyacrylamide gel electrophoresis by the method ofLaemm-li (21). The gels were impregnated with En3Hance (NewEngland Nuclear Corp.) according to the manufacturer'sdirections, and the dried gel was used for fluorographyagainst X-Omat R film (Eastman Kodak Co.).

RESULTSDNA sequence of the E. coli hemolysin. Shown in Fig. 1 is

the series of overlapping M13 subclones of the hemolysindeterminant present on pSF4000 used to generate the DNAsequence shown in Fig. 2. The sequence derived from thesecovers a continuous 8,211-base-pair (bp) region of pSF4000within which transposon-mediated mutations and deletionsindicate the hemolysin is located (45). Eighty-three percentof both DNA strands were directly sequenced, and in most

instances in which the sequence for only one DNA strandwas available, overlapping clones of the area helped confirmthe sequence.

Association of ORFs with hemolysin cistrons. Goebel andco-workers previously established the existence of fourhemolysin cistrons by use of subclones and their ability tocomplement a series of transposon-mediated mutations (17,30, 42). There exist four long open-reading frames (ORFs)with ATG starts preceded by sequences resembling ribo-some-binding sites (Shine-Dalgamo sequences) (36) withinthe sequence presented in Fig. 2. They were encodedsequentially by the DNA strand presented in the figure, andno ORFs with the aforementioned features were found onthe reverse complement of the sequence presented in Fig. 2.The predicted molecular masses of the polypeptides encodedby these ORFs were, in left-to-right order (see Fig. 1), 19.7,109.9, 79.9, and 54.6 kilodaltons (kd). Through the use of E.coli minicells, each putative polypeptide corresponded inmass to a similar species identified as being hemolysin-specific and physically encoded in the region presented inFig. 2. Through the use of deletion and transposon mutantsof pSF4000, as well as subcloned fragments of pSF4000 anda recombinant plasmid encoding a nearly homologous hemo-lysin determinant (pANN202-312) in a minicell-producingbackground, we identified where along the hemolysin deter-minant the different polypeptide species were encoded.These results are shown in Fig. 3. The order, size, andlocation of the 19.7- and 109.9-kd species corresponded,respectively, to the hlyC and hlyA cistrons (9). Although weconfirmed the apparent existence of two genes downstreamof hlyA due to the presence of two large ORFs, the order andsize of the 79.9- and 54.6-kd polypeptides were not similar tothose in previous reports (13, 18, 42). Because of thiscomplication and the desire to utilize conventional geneticnomenclature, we propose that the two cistrons, respec-tively, be termed hlyB and hlyD (rather than hlyBa andhlyBb).The minicell analysis of Tnl insertions within the hlyB

region of pSF4000 did not reveal the loss of any polypeptidespecies detected visually in autoradiographs (data notshown). Therefore, we chose to examine the polypeptides

I-1-

L.- .-I II

VOL. 163, 1985

%A

Page 3: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

-iie 4233 4246 4 ~~~~~ ~~~ ~~~~~ ~~~~~~~~6211M 6233 6 246 625 62 66fiAn ft Al. La l Ty N ft Il.k 61 Am.m Am Pro n II .AmBfsI,i La Ala N Ala His AatePh, 11hr O1. La.ftg SIOl Sly Tyr ftalh71 Ik ValiSl

416 427 :63 G2 43113 43110i 623 6303 6331 3

L ImlbS TS laUTUTkITMTUTTA UUEDTTIsBTAlnAa l l AwLMla My Al&Sly ULatlr Sly Sly Sla ArI Sln AUII* Al& Na l " Ala LoulIeIlahrAlaAla~~~~~~~~~jjba~~~PValALyAaSoiLyIfriAlaMeAlaoAlaa our_pLar6- S-LaELoorIy m laSe fpPh....rayaykmaI~DU n

4* an a *

LAllaeVAU ly AlhhSIGfrAaHiseLVTralrhaP NoLya YetLHilLa11aSisAL MyhA"laL alAIpTyI6l.h,* aa494a 43 44 49 4 4 a 6 aa49

11.1lami fUIlui.i.51iJimim~iN ii Iu.iuA ET Clf CUT MU ETC AC3AOWAU ICA GE CIC MTT ATT SiC ATS AU AU 5 AM4473 4413 4439 45331 4513 Ala PHis eah ibr Ptal Lya Am Ala Asp PAg I13 IIa Pal Pat Oie Lya Sly 'y

a a aWU 653 6533 6546 633 1165NU3T TCT TCRTAUATMTO3T 816 TTA TATCTSCAMONATT TTAO CATSC mTI UTUTUUlE tITUA TT TSTlhrt Sa phr EyHi. Lya Ia Sap Tyrily Lau Tyr AlaLa. Ola Iha La. Ala SlaTr 11f oal SIMwWA6MS WSLPLS LaaaarSTC PmgaSI Dir TIC DirT

453 4546 453 4133 4573 IbaVe yyayaatOnBaayNsLsBlnU frM roSeSrUpTyrrTy

4533 4613 4613 6 4633 m L y l mIab s*a a a a a ~~~~~ ~~~~ ~~~~~ ~~~~~661666 6633 664 6653 6663

La. Sly La. 1w hr TVpa La. Ala- Ala Lye hr La. 51, Lee Lye Val Lye Sln Val NU AU SEA US TA ATe BIG TIC ME US TTC EU TS UTC UT AU CTT STE TM' AST465 41 4673 463 4639 473 NotLys 11w Trp Laahrt Sly Ph SW Gsl Pha La. Lia Arg Tyr Lyes Lea Val Tr: Dir*a a a a a ~~~~~ ~~~~~ ~~~~~~~~~66736613 663 673 6713 6723

AUAUMA1TUT63RCTTAUTTATAC 1TCTETMTSEEEMETTASMETSSUWSUAT a G C C ECa RILyseLys Th IIeSoAnUbAmah Ph. Iletr LaoPro AlaLUaaVal Trp kg Slu Sa ACA 1USMAUECWMg3SA=ITATAEWAT MOSTA6BT MA SWCEA FATWAU

4716 473 4733 4741 447 473 Me1 TAr Trp Lya Ile Arg7 Lye 31n Lea Asap Thr Pro Val Ang SI. Ly Aap SIv hr 61aaa a a a a 67~~~~~~~~ ~~~~~~~~~~~33671 67531 6763 677 6713SlyECysaHis PhIle LaaTwrLysValhfr Lyes laAlahrOkgTyr LaaIIePheAa mJTCTAUTSECRT M A TTAAOfTWuMTAUE6SA1EEAUCMSIZSESTETSS9TT4773 4733 4139 413 4113 46 Ph. Lea Pro Ala His Lea Gsa Lea 11e 51. TW Pr. Pal gar Are Arn OrgA La. Pal*a a a a a ~~~~ ~~~~~~~~~~~~~~~67361 6313 633 6 6646

EUAUUTUT=TSTTETCA TETSAS TTSSSS6 TATSTCAUSS a aUaaSl oASlaOrgrProArIgVal LauaSlaSlahfrSOlePhes,luAla LeaTyrSOlaBSy CT WTm I NIBUIIMTMUST TATTCTTICA1TTSM7C TT1TSMAWCCTS41333 464 46 4616 4173 463 Ala Tyr Ph. Ile NARSl eLee Val Ile Ala Ph. 1l. La. hr Yal Lev Sly Sla Pal

aa a a a~~~~~~~~~~~~6361A 6373 631 63 633NisIaI#11leUaIle AlabrAOr garhrVal Thr Sly LyseUe.Ala LyaesinPlaeAspPhS,TSESTETACCCT UMSAUITSAAUCMCSTUM TMEAK AUOMTT W463 433 4516 43 433 4946 SIe Ile Pal Ala Thr AULAmSly Lye La.lb1w Baahr ly Ar,h9r Lye 51.t Ile Lye*C C C C ~~~~~ ~~~~ ~~~~~~~~~~~6516163 653 6546 653 6361

AM SToAT4aT CCT = #offUAUTSEI WAUAATTTMA1TrSMAICCTTSITSTA a a eTMw Tre Ph. Ile Pro Ala 113 Ile Lye Tyr A"g LysIeli leRa Ole911r La. Yal Pal MIT ATT AU MC ICB AU SIT WAU AU ST ATE A NAU WAU1AU MAU6 STE COG WAU

4" 416 493433 413 333 PrM IleIle31. lr Ile Pal Lye liSIle13le Val Lye 61. Sly GSi. VraliN v*C C C * a~~~~~~~~~~~~~~~~0 s653 11 6 373 7316TETSgfT4TTITT mTSAU 'TTAT eTMJIAa =UTETTTTTTTEgoTSITS1BSU

of ;Mf Haul!i a atrPValPwLosGIsaLosPheAla Leaule119 WPrLaPhePh.s,iCnPalPYalPat ko U T5STATAW IrTACEAAUclA USETUTAUITMA C

313 31 333 WA 3 Sly us'a Lm. La. Ly Ua. Th Ala Lot Sly Ala 31. Ala Asp Thr La. Lye Thr Sl1.*C * -~~~~~~~~~~~~~~~~~ 733~~~~~~~ 73 735 737 733

AUgm TTgmS mAuST11TAUUTMTamATSTTATTATSIESCRT TSTCTSTTSa a aLyePVal Lea Val Pie rsy Ph., OlnPro LotaObtLftLaaLaaer HisTyr Le.La. M M16J3E TA 3ISEEASESC ;CA ATT CISU'T AATTETMCECMSTCEAATT

373 51 WS3 5113 5113 5123 hr hr La. La. 51n Ala rg Ua. 51. Gsi Ile T 51. Ile La. her Arghr IIe

USUSUTTUUn3AUETMEUTITSUSETTSEATmSEO6355TIla 7113 7133 711146

Tap Tap EWLow Ang Lu Ilie Lu hor SyLut ORg TAr Tyr IIe Ph. Ala His her Tha TSA FAT AU EC (CT AU TS AU CrT MT GOT WAU CT TAT TTT AU UT STA ETC5133 5141 513 5163 5173 513 Si. Lut hr Lye Lu Pro Si. LU Lye Lu Pra Sap Sla Pro Tyr Ph. Sic hrn Pal hra a a a a a ~ ~~~~~ ~7153 7163 1717 711 713 723

or meATrTS3WSWT1TUSTSSuEccAMEMTTEECOG CT TTAETSSSCTA Ms A" a aCaa a ARhrOkglieSapaIleaUsl6y Ala Lye Lea Mme ArPHis Lou LAla LouProaIIe Wa WASTAce TYRATTCTTSA TA WU AUUSTT1EEAU1SSAA513 521 52 53 5241 la 51. 51. Pal Lur Lu SorhLu Ilie Lye Si. Sic PhehBr Thr Tap Sl Ainaaa a a a 7213 723 ~~~~~~~~~~~~~~~~~~~~~~72337246 725 726

glrrYMCEfCT%ATg6ATUSUS6jUUITTASAAEIay a a Sic Lvs Tyr SIc Lye Si. Lou hr L. Smp Lye Lye Arl Ala G1. Ar Lev Thr Ilie Let

SR6 53S33W36 533 56 Ala Org Ile hr Arg Tyr 51e hr Pal hrOrgPqal 51. Lye hor Org La. Asap hro PA*

5373

n-mm oTT MA M T

5313 C3M541 5413

TT

TTTUTTM6T U3SAISOU3SESATI6SAUam ao am a4 a41 aS Arghw riLua His.p Sin AlaIleAla LyePHisAlaPVal LauSIOlnaSlc Sl.hLysEU = UT UTif SET SM USlTM SIUT iTT SIT AME = ATT ITS 635M UTM CIT SAT 746 4 73 74LU PraisTy Ala Ala Trp htr Pal Ph.0 Wf 10UA M"liehru5WbTATST UW SEA SEA UT WTAU CIAS STT W3 AUA 16 MAU EU CAU ATTSIT AUT

* a a~~~~54 a4 a a TyrPVaila6%Ala AlahAm .lu OrgPe~Yl Tyr Lysehr Sla LosSla Sln IlelsSlarUTAUTITUS UT UT ~ ~~UTA E I IT 15A 16 SEA4E6A? 747 7413 7438 713M

kgpLyeMehrp&IhAnAiaSaphkmSinaRPlwLuPVal SlehrPVal ThrAla IleAT.ITMAWWW UCCTTaOaCaC aACTTTAAARW Taa a a a a~~~~~~~~~~~~~Sl.IleLa hr 1aley 51. Si. Tyrlc Lu Val Thr Oln LuPh. Lye hr Slu lea

MC EAUSEW T AUGSIM 16 SET AU AU ME AUTUSUTM AU CAU TIT 7513 5 73 7546 W 7566hrlI* IlieLys AlaNo aPValihrPoo61aShthhAn Ile Tr ULye Ms UTaN a a aTTinpOKe AU Era AU AU ADA ACA UTC MEC SIT AUS ITA ITSA SET TS WAU ITS GAU AUA

C Pot Ia a Lu km Lvs Lu an Ile T rwTAr 2h Ile 6li7U. Lou Thr Lu Silc Lea Qic LyemU UTf SIT SET SMUT TT AU AU AU A TITO CC ACC SIT AAU TAUMAUUSTW A A W 01 9163PAAT 0

SE 76TITTEAUAUSI A8AAla Sly Tyr al Ala Nias ly Me~LyePWI TAr Pal La. Ala TAr liyenSlyni m WWBT C6T1O6AACANCCC CT aTYMMWOT CAail!aaauAmhGsOl,ei.kg GSinSlAlahgr Val Ile AnSla Pro Pal Shr 6y Lye Pal Bln Sn

IleSo Lu IileGsLysTrVal Pat Ile11.hrnLulTroLauSly AlsPHisLouVal a Cammm aoTam a a a71 * L Lys PalPHis Thr Mely MyVaiPValThr Thr AlaUllaThr LaNFtPYal IlePYal

a1TTS= SU?TAUTSITUToff TTAA1fTSTITTUTA TT6wmfTAmT 763 773 7716 773 773 77466Ml UTUMCAS ETI AU BTT ACT SET EU ffA AU MT MAU MSTUtT ITT SIT

573357465753 5766 577 5763 Pro SI. Sp Sap 11w Lu Sic Pal Thr Sia Lu Val Sic Lys hAn Ilie Sly Ph. 'aaa a a a a 7753~~~~~~~~~~~~~~~~~T 7766 777 77131 77137613

UTTSMSI TAfTTCMETTSMWA 'noTEOSUCIT63SSTTSTTSCT6AB3f am a CPal Ala Pre Vli11g0 Ala Ila Ile Trp Sln GIsSinSiVal Sly Ile hr UT #TE S AU MT1 AU ATE SET AA ST AU SET TIT =I TACCCAU WauTUTAUT

5m all~ ~ 53133 SW4 An Saly Sic h Ala Ile Ilie L2.VailBSla Ph*Pro Tyr Thr ArgTyr T

sm 3119 33 336 Lu Pal Ily Lye Pal kmerle AnLUhAm Ala Ilie Si. Sap Sin Lye La. Sly Leef0 0 ~~~~737 7W3 759 753 7113 713ITAT SMtSWU M3T ATESTIC T UTffWamTEMUTIT eTUTUTAUST a a a CLu raPMrlieohrSl Ile 11whsAMe Amh Ile Or Phe rgjTy Lye Pr Pap TITTARTTarc AlT ITT ICT OITTWaUSMT OAT TIS= c 01MEAUSTAUEATC SIT

5316 533 514 V3al Me An Pal Ile Pa lhr Pal Ic MSi An hAlp hBr 11w Sly hrn Lye Pie IleG C 3 54 53 73791073 51

1ECMSTTTATTT1AUW1UT c AuEETETWTTA NOUSSSAUS ATTTStTa* a Pet!I Ghtr FMe Val Ile LUN h clie AmLao hr Ilie LYSe"sly SI. Pal IIe sly a0 SCMTITS ME 1 T5 AU CET STE SET KAUWAUTSA AU SET WAUTCAU ME STA STE

am 3'" * 6613 663 Pro Lu hr hr Sly hrt Ala Pal TAr Ala SI. !l L 11w SlyVaO2ghr Yai IleI 0 t 79% m me0113753131

3IC I OffCT OrTT 1A MWAMITCATE TWATTA ATT CA CSTTMTAT ATT a G a CPal Sly Wwr Sly hr Sly Lys erhTA Lue l LysLou Ilie Msi Org Ph. Tyr ie W6UTET CIT W UT CEU AU AU TCT PA AUA AU UT ITSA Cl? 3 UBT TAR STE

6mm 6169 6M m~~~661 hwW m Lu hr Pro UOlu Si.SI hrVal T1w 61. her LU His G1s Org-CET A W WU A U tEiffAST13Tg13UTTEIW ITSAUSTSEcc'TCTUaMta a a aaGkVeSISap Sy Slt Pai Lu Ile Sp Sly Ids Asap Luo Ala Lu Ala Asap ro hrn Ti 1AUmm3iA..wnuas,awwu.ra.u b~IN.

W6 6113 6113 6123 6133 6146 6133 3143 613 616 6173 611 31 oMIT UTUCU AU SSIT61 IT AU UT UTWT IT I URTUTBASIT SIT unu.i,u..w .u. IHLaIIA6STTTTELu A Arg Orle Pal SlyPal Pal Lu SIe Sap hr Pal Lu Lu shriOghr Ile Ile 1

615 616 6173 6113 6138 6203 Pid!G G aG EIUI

off UT AIiT TM EU SET MT mET MEATS 1= STE WAU AU SIrST TAT AU e AUSnap rlie hr Lu Ala hr Or. Sly hr hr Vali SI Lye Pal Ile Twr Ala Ala Lye

FIG. 2. Nucleotide sequence of the pSF4000 hemolysin region. The numbers above each line refer to the nucleotide position. Also listedabove the nucleotide sequence are selected restriction endonuclease cleavage sites that are referred to in the text. The predicted amino acidsequences for HlyC (bp 796 to 1,303), HlyA (bp 1,320 to 4,388), HlyB (bp 4,462 to 6,582), and HlyD (bp 6,604 to 8,037) are shown beneaththe DNA sequence. The positions for possible ribosome-binding sequences (Shine-Dalgarno [S.D.] sequences of the mRNA) are shown abovethe DNA sequence preceding each gene.

96

Page 4: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

VOL. 163, 1985 E. COLI CHROMOSOMAL HEMOLYSIN 97

hi a a~~~a3 a 6Im 23 2 m 2

aMiln STSUPMin3MAMM MTUT = TC5TCUCMC~AmTAATT AMICT OCIVTM CCC BPTTTSCATTOCTS6t%TS1CUS GMAATT46T =CTC76 6 56 166 116 126 ThrSrAUAlaA1Ala A1aS 11amIl1laVal1YalThr LUUAlaIIliSeproLK

m6 221 2t 2316 232 233

*3 0 a t 7 n X1 9nrImuMTII cIR

-- Alam6Aic LhTWAa GSl SiT SrA Aimasm mcmGMccf CTmm mr e Cm S amST MAU WISorPhl ,OWSrI~Am Ala ~Lys Phm LysbkgAaA wIll 1 yrSr l

196 266 216 2W 236 246 * *3i *r 23 29

#Pst I a 0 f I

23 mg k2'PtriuoLysmLy Lam Sly T' Asly kiSo Lam Lo Aa Ala P. 245Ls6INTh* 246 24a263 244 23

3T6326336346356361cm TMTC g

UASCT ATT UTICAIEwAaIUATA MC ATSTCTSETCTCA STAICT TEA ST* * *~~~~~~~0Sly AlaIle Amp Ala Sr L Ths AM Ile SmrThr Yal LamAlahrVYal SE'rSrSly

7ATSTTATIC7TMMTA M NCam M TT Te 4 nCmA,T 246 247 2466 243A 236 2516376 TM 31 '41 416' 426ATTTC5EE1CST55SA3SACCCSTSSC

PC IGTOh7 MTA T f W AF -M 79 MTAr M IlaIurAfla Ala AlaTThrh SrSwLmVYal Sly Ala ProYValSrAla LouYalSlyhAla

436 446 4 466 476 43 o 29 24 2m6 2516 257

496 TM 516 so go 54* Yal ThrSly Ile Ile hr SlyIle Lam SlsmAla mr Lys SlitAla fttPh SliaHis Yai256 am6 am6 2616 2626 2636tEll a a a* * * * *

TAT TUABCTEUTICSEBCACMCT TTA TCUM=CITTCITTTCAMNCOKSTATTM CC UTWATfS CCM17gT TT TSAITTOO ASAUASEMSC A ffTACT~T43 566 576~S TM so 666 Ala hor Lys Omt Ala Amp Vil Ile Ala AlasTsp 61s Lys Ly Him Sly LymAma Tyr Ph.*I a 546 2656~~~~~~~~~~~~~a 2666 2676 66 28cmMATAUPT JTS MATS MAC VACTT1TTA TATTICKlMATAMMATA * * m

616 626l 636 646 636 66 MM1TTSSMCCTTTAMSTTUTTACaa a a hAl~~~~~~"MI MFIATMSyARTAlaicCDXTOT O TamTr A A AmAC PM WAlLTATA

TM ATT ITT CIT SfT TAT STC AM TAT TM TARCCMMEUTA TAfSTPT M = SeRo 1 T§AmpAa rHis Al l h"'

n 2le7A s h LIeL

676 666 639 M6 I16 726 a aI ao a m a

Tra aTn CT PATElTICl TEA 61s ocA Ecs 13 wA 6A TATSMT ATT 1M3Tm W MTWHAA C yTm ocecrTm mMA(TTS A

73 746 723 7W6 TX 70 Bl y mLOeTrSrYlB YlLuIeTrSnGHsTpk

MamMTICaAMMAPA YMM M CT TorMaaaT I *aAma7I1CMMIMCA MTMETLissoEaTT1113A NoESET 01AUTMTEA CTMT sTESic6MTM0SEAMCAEC"CPTEUTUTAA71643616 ~~~~~~ 636 ~~~~ ThrlamU 11.1 61% Lam AlaIly V.1 Thr~ Amm Sly' LorBlLya a a ~ ~~~~a a6Y2j YjLysThr~7 SlyLr

MTASrA1EMTTTAT AAEM NCMTECAATTAWASOTCrT011ATS6TA1C6 a a 2a a a amcRratAnMm AnAm Pro LamSlmYalLm Sly VaYil hor T 0tcRTAATEAETIIIIACSMMCTTITE

a a aSa r TyrhIIM~Tyr Tyr O1.11 Sl6y Lym Ari mm Slu Ly Lys Pro A~s la Phi 51*ETC1MSScUATECCMATTAAESSM6NC1MSCAITC1ETTTTTSTMSmEAa MATTaR a ae a1 a aLmmTrp Alahgrbr Pro LamHis An mmTrp Pro.ValhSrLamPh*AlaIhlelmmYal MMSCTSCTSIIETSCECSSMTTC

916 TM 136 ~~~~e I 0 L1GIal Ph* Amp Pro LamLys Sly AmaIIm Amp LamhSr Amp her LyshSr hr Thr

la AlAaAn6TrAa knL Am ar YalTTACTCArACTSEUTATETTTTASCWSMTMTISWTS1TSTAa TT Tff ;TSIa' 6M GASWAAM A MAALaPrAlIlPso9mmlayrla mmmTr SmA1TP m11 Lam Lam Lym Rhm Val Thr Pro Los Lam Thr Pro Sly Slb Slu IleAr SWAriab rlg SEn

TATMT MT6ICUTMTTTAMT TAM M*TMATTM *UTAICTMT TMT OTTTaMa aM a am aTyr Cys her Tsp Ala iAm Lam hir Lam IlmAm 6s11.IlLy Tyr 1.44mAnAmp ValTICIIAMTAAE*ITITSEITTTTIMCT

163 164 13 low 167 166 hr Sly Lys TyrS61mTyrh11mThrSOlmemLmmvLal Lp Slj Yal Amp LoaTrohbr Val

Sr LamYal Pla Slm Amp Tp Thr hor SlykAmpbLysmTrpPhmIleVYal Trp llaAla MISEIEM1TTTTTEESESTMETAC1616 116 116 126 136 114 Ly OlyVYalSla Amp Lys 61h Ya1 Ty Amp Tyr her Ama Lau Ilm Sin Him Ala Ama'

MTMMOTMCSTOCCCTBTIC M MNM CMW319, 3136 3146 3156 3116 3176PrRmSlAp Amm Sly Ala Lom Tyr LI.TrkmblmL.RmPoApSmmm SCSTUSEITTSMAlATAlITA CCSMME95TSA

1156 1166 ~1176 916 1116 M . l mA l y S i i a l m ma 6~~~~~~~~~0YlIaona6aTy ra a Ile IlSr2lit ShrHis Le Sy pSlyAsAp

TIMM cATEATC siyccMATACMTECTATIMTU~W TOEAWAMITIEMI 3116 3191633Ph.tArg Ala Hi Arg YalAm Pro Lys Thr Naq Val SlyLYalVmirGslmPhm Himsl IIIETTTSCE3CTCUMISCTASE

12161231~~~~ 1246 123 1216 m~LYal RM*LaSr Ala Sly S Ala km Ile Tyr Ala SlY Lys Sly His Ama Hal HalST WATT"MT WA AS TTASESMTAITTTTUAnMIN*A TEATCACEsUTTA ATA a a aSlyLyhsAp Lym Sin La Ala km Ly 'Mh y_1*TrHmHm l ahm TT TAT MAT MA AM SE AC UT TAT CT ACC AUT GAT SEC WAA MAACAC MA GM1127612661216 1 1316 Tar~~~~~~~~~TypAmLy TN km ThrSl1y Tyr LimIThr Ii. AmpIly Thr Lys Ala Thr Ala Al#AC*ASAW R 4 ABTMNTM=TAM OR

aMO55 *T 33 33 1 332 333 3346 3356

SThrISEYalSL1PanTTTCMTTTAspTTACAUIIMMISTArTTM *SlyThSmVaLmkLmhApImh.rahSy MSMYTTAC SSSTASAEA TAEACT GTSTUTMbTTMTT TTACAMST123 1336 ~~1346 1is 136 1376CGGAUF

* a a * ~~~~~ ~~~PithI Sly AmasTyr TAr Val ThrAreY'al Lan Sly_Sly Amp Yal LYmal Lat Sin Sbl Yal Val6rATAmcM TTWAM AA SCA X O Mn n 336 3376 3D66 3A6 3416Vat Pro TAr hIle Tr Ala Ala SlAs Ile Lys Sr ThrLU6aSimr Ala L 61Slah Ala MISA;rITI TA IT#T MAMU sAAMIMA SE cmA TAT EASUTTA RA136613161466 1416 1426 1436 5~~~~~~~~L1% SnmGIs Val her Yal Sly Lys An Ths Ala LysITAr Sin Tyr Arnher Tyr AlaSEA MRT WATSB MC MEA CrA MIMSECP AMCTC6WC7A TEA A MAMASEA MA 343 34 345 3 341Ala km Lys Lat His her Ala Sly 11* hor UAr Lys Amp Ala La ys LyInAla Ala DilI a *

1446 1456 1466 1476 1466 49MrnTC T T aTE MT UT A mTaTAcSA6MST MT AETEA TAT TEE T615RTMSTMaaa a a a Rim~~~~~~~~~~~~~~~~~hTA HiNisklmn1SLysLAm La Thr Sv liar Amp km Lou Tyr her Yal Alua 31a

SlaThr AMAsnAla SlyAmAiLa IleLa LahIIProLys Amp Tyr Lys Sly Sla ETAESCSASCCSETTSUMTTC ACTC CAT*aa a a a ~~~~~~~~~~~~~~Lalle Sly TAor T*AniAla Amp Lys Ph. Ph, Sly horLy Ph. Ala %61lo Ph* HamSTC13 SEEl MT SICETITSC USAESCA MINMI EE MI ATI GM SIC cm TA 4 56 31

6ySlyrhSrLakmAmn pLauValgAnrhpAlaAmpSOlemLaSyhle GIsValIBlnTyr a a a a a15661576143 143 1666 1616 S~~~~~~~ECECS MTUTMT SET EATPAMMT SAC 51RT SESCTEA TAT GOT' MATMTMIMIaa * * a Sl~~~~~~~~6y AlaAmp Iy AmP Amp Rim hIle Bim RiAm Amp Sly ma Amp WIg La Tr Sly Amp6 MmT SEE CA ATESrTrT U USA TI SEE CaomAMIM EE AT 461WI 32 34 3656

AmpSlavLys mkSly Thr AlahIleThrLymsl6aVYaIPh.H PhrAlaOleLysmLaIle1626 1636 ~~1646 1656 156 1676 MISTUGECmElUTM E M E3 T A SEMT SAT SaE0 ~~~~~~~~~~~~LvSly kAmsp Thr Lao horSly Sly km Sly Amp Amp Ala La Tyr I1151 AmpSl

I1y LauTrlmAlm I1yVYal Thr IIlsPhsAla ProSil mLaAMLysmLaLaSlaLym1666 1616 rim 1ii 1726 17a SE PT MWTA116 MM MAESMT MRT MT TAC ETSA*SECSE GSAST SE SAT MATaaa a a a k~~~~~~~~~~~~~mAmp LysLa hIle Sly Sl Ala Sly km km Tyr Lan km Sly Sly An Sly Amp AmpTAMMEUMTWc* yMY A TTASNCSEUATSEEIMTkanTASSIMNCETAMM 3m sm 375 3716Tly 14SlykmL LaS1yol,Shr Ala OlekA Ile Sly 1km Bl a a a a

71 y17 I m77.1La Sly U CrT MB ITTCAMIM MT JET CfT CM AUMT gEA TEA yCC SIT MIMAAST MATCASEaa a a a I~~~~~~~~~~~~~~laLam 5l* VY111Sly Aphr La Ala Lm kmn Val Lau hr Sly SlyLys Sly kM

UTM PlFA EEl 13 ES IT MIMITRT EEUFT ACT CA i31 36 371 V- 361. 362 aB3Ly Al'a SlyhSrVYal LUm herThr Phi SIlm RPhmLalSIw TAr Ala LahSrhSrkNt a a a SE'TTMII99TSAET1431616143 1636 1641 143 SE~~~~W MA cEE TIE OK1 UT MI MI ETTBCT MT SEO A 9 R STC

JETUTUTMT SIC UT ~ kp Lys L Sl6y her 61._Sly Ala ~La La Amp 6l Sly Ila Glykm Amp Lam

LIle AspSilm La Ile LysLyms lmLyshrSBlySlyAmVYal horhbrhfrl61mLa a a16761666143 143 1,16~~~~~CTWIIASTM 'TANTNTMMT4T TAIT MT ATCT TC MITAT BBCAT CATATTaaa a a a ~~~~~~~~~~~~~~LaLys Sly Sl1 Tyr Sly kAmhp hIle Tyrfr" Tyr Lah Sl6y Tyr Sly Ham His 1le

CmmIEosUTATxnMICIA f wcA6sme'ATEEITSSEESTSm m*CSETTMTMT 356 3911 3M 3136 343Ala LysmAlahrw IleaSlaLa Ile AnIlaLaVaI Amp IIm'Ala Alahgr L AkmaAnaa14I13s1m 4 1166 1176SACT MTM MIMI USEM MT MAA ET UIT TTS SET MAT aTE MAT TE MS GAT

I a a a IleaAmp m~Sly Sly LysAnp Amp Lys LAw hr La AlaApIe "6TT WC 13 ETlMI1 MIW CEUTCM5 CFO MI UT SEA T16 =E MaT FM MISE39 aAmP maP.b mValiAmhrwPhmhSrIlaIlanLftAnLysmLaSlyhrwValahSrkmTIaLysmHis a

14 19 26 2616 M IISCCTMECU6 CMIWAUTMPTSWCECTRTCATST ATMFASECT WA TMATSOTTCTTaaa a a a V~~~~~~~~~~~~~~alAla Ph. A61uI6miAn Amp LaIl m,Va y y Ala luSly kn Val LauMCISETSEUST ET SITU 11631A MT TE CT MC CET MyIMT SE UmC 4

La kim Sly Val SlymLamLSimkm Laa ProkAnLa pAmpm Ile Sly Ala Sly IISEMMTTATTTIEATEII1STT266 26526 am67 Ms2666N26A1TE6AT= 6 c

' AAAW' GTMaaa a a a h~~~~~~~~~~~~~gr IleSly His Lys mkBlyIslmThrPh.LyAnmTip Ph5 Slm Lys Slu her Sly Amp

TEA MT SCT VA JE ST ATE TEA mcIESATEmESTTrTEE AEcISSETeS 46 46164166411 4126 4136Lam p7*rYalhSwIly 11leLafr Ala Ile hr AlahSrPhs IlesLafr kmARig a a a I a a

2166 2116 2126 2136 221 2156 ACTEIMTCCCSEMISAMIASAMfT TTTTMTAASEAUMCSI AATEfC AAMASATa a PithI a a a Ile hrkmHimsGIlehl.slmlnlehlPhuAmp LysAmp Sly Ar alIle Thr ProAspMTC MT SCc M SET WA SET IA CA UT lET MIAT6 CA ES OMI CT$ SIT 4146 4156 4166 4176 4166 4116pAlaAmp Thr Sly TAr Lr Al& Al&Aly ValSOlemLarTIraLys Val Laly a at a a; t

2116 2176 2166 ig2116all lE CIT MA MO CrA CTT 6 TAT MIA MI UST MIMIR MS CA UST TAT 6TS TAT 95a a a La a sArLaLyULyAI61aTyr616Tylla6rkmkmLysalaSuTyrYal Tyr61yMAT TE M 661ST JT MI ATP AW619m 1 CA CA cM6C TE TYUam Val 61y Lys Sly II Shr 61n Tyr Ile hIs Ala AlnArlAl Al 61n Sly Lu Shr

Page 5: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

98 FELMLEE, PELLETT, AND WELCH

A.1 2 3

.....

Di.I. _~n

4 5

Aw-

4__ -

6 7 8

4Ia

X.s

-(B

urP

_ t <Ba

.Kp

.w,_ U

cat-

cat)o'

ucCcatlo

HEMOLYSIS}.. ... . .. ...... _ ., .. .. _~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.-A(Al.

H58-% P SPS P Pso _ It

34(s)

1 E 13 P H P

X C A B D

FIG. 3. Polypeptides encoded by different plasmids in minicells. (A) Fluorograms of sodium dodecyl sulfate-10% polyacrylamide gelscontaining [35S]methionine-labeled polypeptides encoded by different hemolysin recombinant plasmid derivatives present in purified E. coliminicells. (B) Physical locations of deletions (A), Tnl insertions ( ), and subclones of hemolysin recombinant plasmids. The numbersenclosed within parentheses indicate the lane in (A) which contains the polypeptides encoded by that particular plasmid derivative. In (A)lanes 1 and 4 show polypeptides encoded by pSF4000. The lettered arrows next to these lanes indicate the location of the 110-kd HlyA (A),54-kd HiyD (D), 33- and 32-kd X polypeptides (X), chloramphenicol acetyl transferase (cat), and the 19-kd HlyC polypeptide (C). Lanes 2 and3 show the pSF4000APstI and pSF4000ASmaI deletion derivatives. Lane 5 contains the plasmid pSF4000::Tnl(34). The A* bands representthe truncated form of the 110-kd HlyA protein and its probable proteolytic breakdown products. B indicates the location of the HlyBpolypeptide. Lane 6 contains pUC9. Lanes 7 and 8 are pUC9-based EcoRI subclones of pANN202-312 in which both orientations of the insert(pWAM326 and pWAM327) have been examined. Polypeptides (B and B') of 77 and 46 kd are specifically encoded by these plasmids. p, andP designate the precursor and mature forms of the pUC9 P-lactamase, and the pUC arrows represent pUC9-associated polypeptides. Lanes9 and 10 contain pANN202-312. Lane 11 shows the plasmid pANN202-312ABglII (pWAF185). In the restriction map [Bg] represents a BgIIIsite just outside the hemolysin determinant that is present in pANN202-312 but not pSF4000. Again, the A* represents a truncated HlyAproduct. The horizontal bars below the restriction endonuclease fragment map show the assignment of the encoding regions for HlyC, HlyA,HlyB, and HlyD. X designates the two polypeptides outside of the hemolysin region whose function is unknown, but which probablyrepresent a precursor-product relationship (R. A. Welch, unpublished data).

synthesized by subclones of the hlyB region. It had previ-ously been reported that an EcoRI fragment spanning thehlyB region of pANN202-312 inserted into pUR222(pANN250-222) encoded a 46-kd polypeptide in minicells

(13). We constructed similar plasmids (pWAM326 andpWAM327) and found that not only did they encode a 46-kdpolypeptide but a 77-kd polypeptide as well (Fig. 3, lanes 7and 8). The orientation of the EcoRI fragment in pUC9

9 10 11

<wcat

B.

pSF4000

nACYC 184A (1Ul

LLA±)-~ 40

Ev F H lw3 Sa

J. BACTERIOL.

-KB

Page 6: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

E. COLI CHROMOSOMAL HEMOLYSIN 99

(pWAM326 vs pWAM327) did not appear to influence theamount of either polypeptide relative to the other. It isapparent that isopropyl-p-D-thiogalactoside induction of thelac promoter present upstream of the EcoRI insertion site inpWAM327 enhanced equally the synthesis of the 77- and46-kd polypeptides (Fig. 3, lane 8). The DNA sequenceanalysis of the EcoRI fragment indicated that there wouldnot be a translational fusion created with the pUC9 lacsequences at the EcoRI insert for either the hlyA or hlyDgene. This was further substantiated by the fact that for bothorientations of the EcoRI fragment in pUC9, all of thedetectable polypeptides remained similar in size. Althoughour sequence data was derived from pSF4000, DNA se-quence comparison of pSF4000 and pANN202-312 (hlyC andthe first 120 bp of hlyA) revealed a 97% homology (R. A.Welch and S. Pellett, manuscript in preparation). Additionalevidence for synthesis of the 79.9-kd HlyB polypeptide camefrom examination of the minicell results with the plasmidpSF4000::Tnl(34) (Fig. 3, lane 5). A polypeptide ca. 77 kd insize was observed when the hlyA reading frame was inter-rupted by the transposon insertion. Therefore, because a79.9-kd polypeptide is predicted to be encoded" by thepSF4000 hlyB region and a polypeptide close in size isencoded by the hlyB region of pANN202-312, we think thata correct assignment has been made.The inability to detect the loss of the HlyB polypeptides

with transposon insertions within hlyB appears to be due totwo factors. First, there apparently was low production ofHlyB compared with HlyC, HlyA, and HlyD (Fig. 3, lane 4versus lane 5 and lane 10 versus lane 11). Second, there wasprobable masking of the HlyB species by what are thought tobe proteolytic breakdown products of HlyA that are abun-dant in the minicell background (6).We employed a derivatiye of pANN202-312 to assign the

54-kd polypeptide to the hlyD region. pSF4000 and pANN202-312 shared a BglII site (bp 3,807; Fig. 2). pANN202-312 hadan additional BglII site -5 kb in the downstream direction,which lay ca. 600 bp-distal to the C terminus of HlyD (44). Adeletion between the two BglII sites was isolated in vitro,and this plasmid (pWAF185) was transformed into theDS410 minicell background. In lane 11 of Fig. 3 we see thatthis resulted in the loss of HlyA and the appearance of aprobable truncated form of HlyA. In addition, a 52- to 54-kdpolypeptide species was missing. This species is felt torepresent HlyD because a similar polypeptide, based oncomigration, was seen for pSF4000. The location of a Tnlinsertion near the hlyD region of pSF4000, which does notinterrupt hemolysin production, is indicated in Fig. 3B bythe arrow numbered 40. When this plasmid was examined inminicells, the 54-kd polypeptide was still evident (data notshown). Based on the DNA sequence analysis and theminicell results with the BglII deletion plasmid and theTn](40) insertion, it is likely the assignment of the 54-kdpolypeptide to the region shown on pSF4000 physical map iscorrect. The identification of the 54-kd HlyD polypeptideand its transcriptional order relative to the other cistrons isin disagreement with findings made in the laboratory ofGoebel (13, 18). Our results concerning HlyD have recentlybeen confirmed by N. Mackman and B. Holland (personalcommunication).DNA sequence features. The first general observation to be

made was the relatively low guanine-plus-cytosine content(40.2%) of this DNA sequence when compared with that forthe E. coli genome (50 to 52%) (25). This situation is similarto that seen for the E. coli heat-labile enterotoxin A and Bsubunits (46). The lower guanine-plus-cytosine content in

that instance was associated with a codon usage patterndistinct from that seen for a number of sequenced E. coligenes. Shown in Fig. 4 is a codon preference plot of the fourhemolysin genes based on the codon frequencies observedfor a large collection of sequenced genes from E. coli (10).From these DNA sequences, Gribskov et al. derived anoptimal codon usage pattern for E. coli. This permits thederivation of a codon preference statistic for each position ineach of the three reading frames. A sliding window alongeach reading frame of 25 codons permits a statistical mea-sure of the similarity in occurrence of that set of codons tothat predicted from the optimal codon usage table. The plotsfor the four hemolysin genes revealed that there was arelatively poor match in the hemolysin codon usage with theoptimal codon usage for E. coli genes. For both highly andmoderately expressed E. coli genes, the codon preferenceplot is often well above the random codon preferencestatistic (10). Also shown is the occurrence of codons thatoccur 5% or less of the time for E. coli in each of thesynonymous codon families (Fig. 4). In E. coli these rarecodons occur frequently in the nonencoding regions of DNAsequences. For the four hemolysin genes, the rare codonsoccurred as frequently in the ORFs as elsewhere in thesequence.

Presented in Fig. 5 are the results of a deletion analysis ofthe region upstream of the initial cistron, hlyC. The 132-bpregion between the PstI and BstEII sites was necessary forhemolysin synthesis. There was no sizable ORF extendingacross the BstEII site through to the putative translationalstart of hlyC. Although not shown, TnJ insertions to the leftof the PstI site did not have any detectable effect on thehemolysin phenotype, whereas those just to the right of thissite caused a reduction in the hemolysis zone size (45). Infact, cells harboring the BstEII deletion did show a slightzone of hemolysis surrounding colonies after 48 h of incuba-tion of blood agar plates. In the lower portion of Fig. 5 isshown the 437-bp region beginning at the Pstl site (bp 363)through to the putative translational start of HlyC (bp 6,796).Indicated within this DNA sequence are a number of fea-tures of possible consequence. There are numerous rela-tively small inverted repeats (3 to 7 bp), as well as severaldirectly repeated sequences clustered between the PstI andBstEII sites.A search within the DNA sequence from the PstI site (bp

363) to the beginning of hlyC for subsequences similar tothose described as consensus assignments for E. coli pro-moters showed that there are in fact numerous potentialtranscriptional initiation sites (14, 34). Identified in Fig. 5 arepossible promoter sequences based on this search. Thesubsequences shown are only those meeting the followingcriteria: seven or more matches with the 12 nucleotidesmaking up the -10 (T*A*TAAT*) and -35 (TT*G*ACA)hexanucleotides (with perfect matches always at thenucleotides marked with an asterisk), a -35 to -10 nucleo-tide spacing of 17 + 2 bp, and a purine residue 5 to 7 bpdownstream from the last T of the -10 hexanucleotide(representing the putative intitial nucleotide of the tran-script), Based on these relatively stringent conditions, wemade six different promoter-like sequence assignments up-stream of hlyC. Two of these (- 10, bp 407; -10, bp 468)reside within the PstI-to-BstEII region.The fact that all four ORFs were encoded sequentially on

one DNA strand suggests that all four genes could betranslated from one large transcript initiated upstream ofhlyC. However, some DNA sequence features and minicellresults suggest a more complicated operon configuration. In

VOL. 163, 1985

Page 7: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

100 FELMLEE, PELLETT, AND WELCH

zLu

mC-)2

LiiLLLii

z0a0C.)

J. BACTERIOL.

0 2000 4000 6000 8000Base Pai rs

FIG. 4. Codon preference plot of the E. coli hemolysin. The plot is divided into three panels, with each representing a different readingframe. The horizontal dashed line shows the calculated codon preference statistic for the theoretical random sequence of the same basecomposition as the codon frequency table. The four ORFs (C, A, B, and D) are shown as labeled open boxes beneath the plotted codonpreference statistic. Other ORFs are marked by vertical lines not crossing the horizontal line (AUG codons), followed by an open box, andthe stop codons are shown as vertical lines crossing the horizontal line. The short vertical bars beneath the ORFs indicate the occurrence ofrare E. coli codons (5% or less within a synonomous family).

Fig. 6 we show a possible mRNA structure predicted fromthe DNA sequence just downstream of the apparent Cterminus of hlyA. This structure is very similar in architec-ture to rho-independent transcriptional termination regionsidentified for a number of E. coli genes (34). Often thesestructures are depicted as long-base-paired stems, followedby four to eight unpaired uridylates. We show in Fig. 6 a

21-bp stemn that could be formed if base pairing with theuridylates occurs. Because AU pairs do not contribute asmuch stability to a duplex as GC pairs (39), a shorter 16-bpstem followed by seven unpaired uridylates, still would havea favorable free energy of formation (23.6 kcal [ca. 98.7kJ]/mol). In conjunction with this observation, we also notethat the amount of the HIyB polypeptide relative to HlyCand HlyA polypeptides was lower based on the intensity ofthe autoradiograph signal in gels of minicells (Fig. 3, lane 4versus lane 5). The low signal intensity cannot be explainedon the basis of relative low methionine content because therewere more methionines predicted in HlyB than in HlyA (lane12 versus lane 5).An examination of the DNA sequence surrounding the

possible terminator region did reveal a close match to theconsensus sequences for E. coli promoters. This region isindicated in Fig. 6B. In the absence of induction of the lacpromoter, the orientation of the EcoRI fragment cloned in

pUC9 did not appear to influence the relative amount ofexpression of the B and B' polypeptides based on theintensity of the autoradiographic signal (data not shown).This is evidence suggesting that hlyB has its own promoter.Goebel and co-workers subcloned the 5-kbBgllI pANN202-

312 fragment (pANN205-222) and demonstrated that theorientation of the fragment did not affect the ability tocomplement successfully a TnS insertion mutation in themost distal transport function (42). This suggested that thedistal hemolysin transport-associated function is under thecontrol of its own promoter (42). We observed a moreabundant expression of HlyD relative to HlyB (Fig. 3, lanes4 and 5). Under these circumstances, HlyD expression wasclearly not polar to hlyB and confirmed the strong likelihoodof an HlyD-specific promoter. Three possible matches to theconsensus sequence for E. coli promoters based on thecriteria we have chosen exist for hlyD within the hlyB-en-coding sequence. The -10 regions for these sequencesreside at bp 4,589, 4,824 and 5,211 (see Fig. 2).

Predicted amino acid sequence features. Based on thepredicted amino acid content, HlyC and HlyB would bebasic proteins (isoelectric points of 9.5 and 10.2), whereasHlyA and HlyD would be slightly acidic (isoelectric points of6.1 and 6.3). There are more charged amino acids per unitlength for HlyD (1 per 3.3 amino acids) than for HlyC (1 per

Page 8: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

E. COLI CHROMOSOMAL HEMOLYSIN

Sma l

AA

HLYCG

Pst I BstEI AHpal SmalI I HpaI I

4-I-. . 43

------301--.

Hemolysis

AIi --146--s

ALoV-"o

a ,- b-,- b

-~- P-CTGCAGASSTAGTGCGTAA

L--GCTAAAATTTSTTAA STTTUGA

PstII37a

--&

m= _L __ l

430 p2

TATTA6CATTACMGTGACCA6CTTTTATTC66CCCCTTCTTTC AACGTATTC* BstE II* I_

490 P

C6AAG 6t T6RCAGACACC7RTTR TCTATACGASATA550 P4

T6AATT6TTCTTG1TIWTCATRCATAACACA6dTTTCTcATAAC610 Pa pC

TTAATTCc CT-c-TCTCAATGT3CAGC6TCGTATGCATAT6TTTTTATTTCAAAT670

6AA6CAAG6TGCAGG6AEATAAAAATAACCATTTCTTTATGTGATCTTTTTATCAATGGAG730

ATA6TATCATmRXTAATGAACAAAACAARTCCATTAGAGGTTCTTGGGCATGTATCCTGG* MetAsnArgAsnAsnProLeuGluValLeuGlyHisValSerTrp798 HlyCFIG. 5. Deletion and DNA sequence analysis of the region upstream of HlyC. At the top is a restriction endonuclease fragment map of

the region upstream of HlyC. The four horizontal bars beneath the map indicate the rightward extent of in vitro-derived pSF4000 deletions.The numbered dashed lines indicate the number of bp from the rightward endpoint of the deletion to the initial HlyC ATG codon. The plusand minus designations beneath "Hemolysis" indicate whether these particular deletion derivatives still encoded a hemolytic phenotype onblood agar plates. The DNA sequence listed below the deletion map begins with the PstI site (Fig. 2; bp 363) and ends 45 bp after the beginningof the HlyC sequence. The first 15 predicted amino acids of HlyC are listed below the DNA sequence. The numbered asterisks refer to thebp number of this sequence as listed in Fig. 2. The lettered horizontal arrows indicate direct DNA sequence repeats, whereas the unletteredarrows pointing at one another indicate inverted repeat sequences. The boxes labeled pl, p2, etc., designate hexamers very similar insequence to the consensus sequence for the -35 region ofE. coli promoters. The unlabeled box 17 + 2 bp downstream from the -35 hexamersrepresents an additional hexamer very similar to the -10 consensus sequence for E. coli promoters.

3.8 amino acids), HlyA (1 per 3.8 amino acids), and HlyB (1per 4.3 amino acids). A curious finding involving HlyA wasthe regional feature of basic versus acidic amino acids in itsprimary sequence. Amino acids 1 to 230, 231 to 430, and 431to 1,023 had estimated isoelectric points of 10.2, 6.8, and 5.6,respectively. An interesting feature of the predicted amino

acid content was the absence of cysteine residues in HlyAand HlyD. Otherwise, we did not find an abundance or lackof any particular amino acid or class of amino acids.Shown in Fig. 6 are the successive predicted hydropathy

plots for the sequence of amino acids of each hemolysin genebased on the prediction of Kyte and Doolittle (20). Super-

101VOL. 163, 1985

Page 9: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

102 FELMLEE, PELLETT, AND WELCH J. BACTERIOL.

A.

AG (kcal/mol)at 25C = 30

5' U A A U U

4397B.

(-35)

5' T T A C A

4374

6U UC

6-"U

"AUUCUaA

6au

Aa

uNaA

ua

6C

CNAC

A

6 6 A 6 U C A u A A u 6 3'

4464

UAa6AUU

CU

UuN

UAUUU--

(-10)- 17 BP - T A A T T T 3'

4317

FIG. 6. Predicted mRNA structure and promoter-like DNA sequence distal to the HlyA C terminus. (A) Predicted secondary structure inthe mRNA downstream of hlyA. The numbers beneath the sequence indicate the numbers of the corresponding DNA bp as presented in Fig.2. The 21-bp stem and its calculated free energy of formation are depicted. Shown is the longest predicted stem structure within this regionwhere it would incorporate five of the seven consecutive uridylates. The 3' end of the mRNA sequence shown represents the tentative HlyBinitiation codon. (B) Promoter-like DNA sequence immediately upstream of the stem-loop structure.

imposed on each hydropathy plot is the predicted amino acidsecondary structure based on the rules of Chou and Fasman(5). An examination of the predicted N-terminal portion ofeach protein revealed the absence of a pattern similar to thatassociated with the signal sequence for secreted proteins (27,31, 37, 41). Both positively and negatively charged residueswere present, and there was no core of strongly hydrophobicamino acids in the predicted sequences.The hydropathy plots are useful in identifying potential

transmembrane hydrophobic domains, as well as potentialantigenic sites (20). Large uncharged hydrophobic regions(.15 amino acids) were apparent in HlyC (residues 25 to 40),HlyA (residues 140 to 160, 245 to 260, 295 to 325, and 375 to405), HlyB (residues 155 to 180), and HlyD (residues 55 to80). Charged hydrophilic regions representing potential an-tigenic sites existed at the HlyC C terminus (positivelycharged), the N terminus of HlyA (positively charged), andthe entire C-terminal third of HlyA, HlyB (residues 315 to330 and the C terminus), and HlyD (residues 15 to 40, 190 to215, and 318 to 330).

DISCUSSIONWe present the DNA sequence for an E. coli hemolysin

which was molecularly cloned from the chromosome of an04 serotype, uropathogenic isolate of E. coli. One strand ofthe DNA possessed four successive ORFs which physicallycoincide with four reported hemolysin cistrons (29, 30, 42).We also present evidence for the existence of pSF4000-encoded polypeptides similar in size to each of the predictedproteins. The apparent size of the HlyC and HlyApolypeptides coincides with previous reports (9). However,the evidence we present for the molecular mass of the two

hemolysin transport genes does not coincide. It has beenreported the HlyB (or HlyBa) and HlyD (or HlyBb) proteinsare 46 and 64 kd in molecular mass (12). Albeit that our DNAsequence was derived from a hemolysin determinant ofdifferent origin than that studied in the laboratory of Goebel,our analysis of pANN202-312-encoded polypeptides sup-ports the pSF4000 DNA sequence data that two proteins of79.9 and 54 kd are encoded by the hlyB and hlyD regions,respectively.

It is curious that in minicells we detected both 77- and46-kd polypeptides for the EcoRI fragment covering the hlyBcistron. Neither protein can be the result of translationalfusions between hlyA or hlyD sequences and the lac regionbordering the EcoRI site of pUC9. There were no sizableORFs in any of the other five possible reading frames of theEcoRI fragment. Thus, we are left with two possible expla-nations. The 46-kd B' polypeptide (Fig. 3, lane 8) may be aproteolytic cleavage product of polypeptide B. Alterna-tively, there may be different translational starts within theORF present in this fragment. We noted that at bp 5,317there was an ATG codon preceded by a potential ribosome-binding site (GCGG). The predicted molecular mass of aprotein covering the ORF from bp 5,317 to 6,582 is 46 kd,and the sequence of the first 21 amino acids of that potentialproduct has a signal-like sequence (27, 31, 37, 41). At thistime, we do not know which alternative explanation is true.Also, we do not know whether the hemolysin transportfunction assigned to the HlyB cistron region is carried out byeither or both of the 77- and 46-kd proteins.Although we do not provide direct evidence for the

existence of different hemolysin-specific mRNA species, anumber ofDNA sequence features and minicell results allow

Page 10: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

+ +- -+ - -* - + 4++ +-++ + -+ +-+ + ++4--. -+- -CHARGE

A - U A'V\A .UJJJJ. U A -2 STRUCTURE

50 100 150

4 -It, + t- 4-_- + t .+__ _ _+ + A+ -t+++ __++_- ++ _+- - 4

UJLAJA A~AP =J..1Itt O=r!.W ==r=9,01tLLF A AJ.±igagasal4AAA1

250 300

650 700

350 400

750 800_+ _+ * -_-+ + _

+ -4-+ + + _* +_+ ++___

.~~~~~.A^_N A y 0 B5Q~ °°°°°.1AA .=L A A~ii.-. L9..!.4 -4j iiA...v A .. A R. L*U UJU ItI I i2tlA =

i

900 950 1000

+ ++1R +++++ + + + ++_++ _-+

PtL~Aj<.~0... 0000

50* 4-+44+ - +4- 44+4- 4_+ + + + + -Nf '\'V'' a- -f ............................................................................................... ,,,,,,,, _, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

350 400

700

1 t _ _+ +_ _ __ _ +t+IIe

1 ~~~~~~~~~~AJt U oAs AL SOooooooM pot A

0 50 100 150 20044- 44- + - + + +4- 4-4-t- t1t _ 4-+- +- +4+---- 4- + +t -4_

,~~~~~< , 92-rLU'L

300 350 400 450

FIG. 7. Predicted hydropathy profile and secondary structure of the four hemolysin genes. The four sets of panels represent the combinedhydropathy and secondary structure predictions of the deduced amino acid sequences for HlyC, HlyA, HlyB, and HlyD. The numberedhorizontal axis under each panel represents the amino acid number. The left vertical axis indicates the relative hydrophobicity (positiveordinate) or hydrophilicity (negative ordinate). Plotted is the calculated hydropathy value for a window of nine amino acids as the framemoves consecutively one amino acid at a time toward the C terminus (20). Listed horizontally along the top of each panel are plus or minussigns indicating the occurrence of acidic and basic amino acids, respectively. Shown just above the bottom horizontal axis are the predictedsecondary structures calculated according to Chou and Fasman (5). The structures are alpha helix (Aaa), sheet (--), and turns (A). Theundesignated gaps are presumed to be random coils.

HIyC

HYDROPATHY 0-

-2-

HIyA 2

0-

-2

2

0.

-2.

0 50 100 150 200* _ _ _ --+ _+ + + +* 4+* _++

Ai.A A UJJ.....LJ.~~.... ....l*,..... i l

JLS----~ ~ ~ ~~~--I~~I p

2-

0-

-2-

2-

0-

-2-

_ ++- +_ _ + + _T_T . _: T + _ + + _ __ +. t + _ + _ J, _ _ _ +_-_ _~+ ... + + .+ _ _ _ . .

R.L _9 00t00 v qR-J AAAA M0A p6 A AARR FPAtlI -r----T- "qu *9~~~~~~

850

Hly B 2

0.

-2-

0

- 2 _ + + ++ X + + ++ _" _ _

-2 I&ML-.&

650

Hly D 2-

0-

-2-

2-

0-

-2-

250

E

T

ic

100 150 200

Page 11: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

104 FELMLEE, PELLETT, AND WELCH

for some predictions about the transcriptional organizationof the hemolysin determinants. It is possible that a singlepolycistronic mRNA species may encode all four genes. Thisseems unlikely because we observed in minicells higheramounts of the HlyD protein relative to the HlyB species.The identification of a potential mRNA secondary structurebetween the hlyA and hlyB genes that is similar to structuresencoded in regions of known transcriptional terminationsuggests that occasional readthrough into the hlyB regionmay account for the lower expression of hlyB relative tohlyA. Alternatively, the hlyA transcript may efficiently endat this site, and a second hlyB-specific transcript may beginfrom a weaker promoter. The ability to detect the 77- and46-kd polypeptides in either orientation of the EcoRI frag-ment in subclones covering the hlyB region suggests that apromoter specific for hlyB may exist. The promoter-likesequence shown in Fig. 6B is presented only as a potentialcandidate.The HlyD protein is present in quantities greater than that

of the HlyB protein. This can best be accounted for by thepresence of a strong promoter within the hlyB-encodingsequence. There is insufficient space in the hlyB and hlyDintercistronic region for the needed transcription initiationsignals. The existence of a hlyD-specific promoter has beensupported by complementation data (42).The in vitro deletions of the area upstream of hlyC

indicated that a region more than 300 bp from the putativeHlyC start codon was required for hemolysin synthesis. Ouranalysis of the DNA sequence of this region indicated thatno ORFs existed in this area. The presence of numerousdirect and indirect repeated DNA sequences suggested aregulatory site. The significance of any of these sequences isuntested and awaits the isolation of additional mutants in thisregion. We present six different subsequences as possiblecandidates for hlyC promoters. Because we cannot identify aterminator-like sequence between the hlyC and hlyA genes,we assume that a single mRNA encodes HlyC and HlyA.Overall, the tentative model we propose for the transcrip-tional organization of the hemolysin determinant dictatesthat there are three transcriptional units (hlyC-hlyA, hlyB,and hlyD). Work in our laboratory is presently directed atthe identification of the in vivo transcriptional start sites foreach of the predicted transcripts.DNA-DNA hybridization experiments have indicated that

the E. coli hemolysin determinant is unique to a limitednumber of E. coli isolates (45). In an evolutionary sense, wesuggested that this was evidence that the E. coli hemolysinonly recently came to reside in E. coli. We provide furtherevidence for this conjecture based on the discrepancy inguanine-plus-cytosine content between the E. coli genomeand the hemolysin sequence. In addition, the codon usagepattern for theE. coli hemolysin is unlike that of other E. coligenes (10, 11, 16). In the hemolysin-encoding genes there isfrequent use of rare E. coli codons. Therefore, it seemslikely that the hemolysin came from an organism not closelyrelated to E. coli.

In an accompanying paper, we provide evidence that theE. coli hemolysin is secreted extracellularly (6). Analysis ofthe amino acid content of extracellular proteins from avariety of gram-positive and gram-negative genera led Pol-lock and Richmond (32) to note that there are few if anycysteine residues in this class of proteins. It is interestingthat the predicted absence of cysteine in the hemolysinstructural protein conforms to their observation.The N-terminal amino acid sequence is known only for the

hemolysin structural gene (6). The other hemolysin proteins

(HlyC, HIyB [B'], and HlyD) have not been sufficientlypurified to permit this analysis. There is little physicalevidence concerning their location in the cell, although it hasbeen suggested the HlyC protein is in the cytoplasm andHlyB and HlyD proteins are in the outer membrane (42). Thesignificance of the observation that signal-like sequencesappear to be absent for HlyB and HlyD is unknown andawaits their purification and localization.We note that based on the hydropathy plots, potential

membrane spanning domains (20) occur within each protein.We are using a variety of genetic and biochemical ap-proaches to localize each protein in the cell. In turn, weintend to delineate a structural and functional role for eachhemolysin gene in the apparent hemolysin secretory path-way. In addition, we predict that there may be a complex setof interactions between the hemolysin protein and differenteucaryotic cells. We are examining the possible existence ofdiscrete domains within the hemolysin protein that may beinvolved in this process.

ACKNOWLEDGMENTSWe thank Mike Gribskov and Dick Burgess for help in use of the

protein sequence programs. In addition, we thank Bernie Weisblumfor his visits to our laboratory and Caroline Fritsch and ChristineCrawford for their help in the preparation of this manuscript.

This work was supported by Public Health Service grant AI-20323from the National Institutes of Health. Additional support camefrom the University of Wisconsin Graduate School, the Universityof Wisconsin Medical School, and the Shaw Fund. T.F., apredoctoral student in the Molecular Biology Program, was sup-ported by funds from the University of Wisconsin Graduate School.

LITERATURE CITED1. Berger, H., J. Hacker, A. Juarez, C. Hughes, and W. Goebel.

1982. Cloning of the chromosomal determinants encoding hemo-lysin production and mannose-resistant hemagglutination inEscherichia coli. J. Bacteriol. 152:1241-1247.

2. Cavalieri, S. J., and I. S. Snyder. 1982. Effect of Escherichia colialpha-hemolysin on human peripheral leukocyte viability invitro. Infect. Immun. 36:455-461.

3. Cavalieri, S. J., and I. S. Snyder. 1982. Effect of Escherichia colialpha-hemolysin on human peripheral leukocyte function invitro. Infect. Immun. 37:966-974.

4. Cavalieri, S. J., and I. S. Snyder. 1982. Cytotoxic activity ofpartially purified Escherichia coli alpha hemolysin. J. Med.Microbiol. 15:11-21.

5. Chou, P. Y., and G. D. Fasman. 1974. Prediction of proteinconformation. Biochemistry 13:222-245.

6. Felmlee, T., S. Pellett, E.-Y. Lee, and R. A. Welch. 1985.Escherichia coli hemolysin is released extracellularly withoutcleavage of a signal peptide. J. Bacteriol. 163:88-93.

7. Gadeberg, O., and I. Orskov. 1984. In vitro cytotoxic effect ofa-hemolytic Escherichia coli on human blood granulocytes.Infect. Immun. 45:255-260.

8. Gill, R. E., F. Heffron, and S. Falkow. 1979. Identification of theprotein encoded by the transposable element Tn3 which isrequired for its transposition. Nature (London) 282:797-801.

9. Goebel, W., and J. Hedgpeth. 1982. Cloning and functionalcharacterization of the plasmid-encoded hemolysin determinantof Escherichia coli. J. Bacteriol. 151:1290-1298.

10. Gribskov, M., J. Devereux, and R. R. Burgess. 1984. The codonpreference plot: graphic analysis of protein coding sequencesand prediction of gene expression. Nucleic Acids Res. 12:539-549.

11. Grosjean, H., and W. Fiers. 1982. Preferential codon usage inprocaryotic genes: the optimal codon-anticodon interaction en-ergy and the selective codon usage in efficiently expressedgenes. Gene 18:199-209.

12. Hacker, J., C. Hughes, H. Hof, and W. Goebel. 1983. Clonedhemolysin genes from Escherichia coli that cause urinary tract

J. BACTERIOL.

Page 12: Nucleotide sequence of an Escherichia coli chromosomal hemolysin

V3E. COLI CHROMOSOMAL HEMOLYSIN 105

infection determine different levels of toxicity in mice. Infect.Immun. 42:57-63.

13. Hartlein, M., S. Schiessl, W. Wagner, U. Rdest, J. Kreft, and W.Goebel. 1983. Transport of hemolysin by Escherichia coli. J.Cell. l3iochem. 22:87-97.

14. Hawley, D. K., and W. R. McClure. 1983. Compilation andanalysis of Escherichia coli promoter DNA sequences. NucleicAcids Res. 11:2237-2255.

15. Hull, S. I., R. A. Hull, B. H. Minshew, and S. Falkow. 1982.Genetics of hemolysin of Escherichia coli. J. Bacteriol.151:1006-1012.

16. Ikemura, T. 1981. Correlation between the abundance of Esch-erichia coli transfer RNAs and the occurrence of the respectivecodons in its protein genes: a proposal for a synonymous codonchoice that is optimal for the Escherichia coli translationalsystem. J. Mol. Biol. 151:389-409.

17. Jaurez, A., and W. Goebel. 1984. Chromosomal mutation thataffects excretion of hemolysin in Escherichia coli. J. Bacteriol.159:1083-1085.

18. Jaurez, A., C. 1%ughes, M. Vogel, and W. Goebel. 1984. Expres-sion and regulation of the plasmid-encoded hemolysin determi-nant of Escherichia coli. Mol. Gen. Genet. 197:196-203.

19. Knapp, S., J. Hacker, I. Then, D. Muller, and W. Goebel. 1984.Multiple copies of hemolysin genes and associated sequences inthe chromosomes of uropathogenic Escherichia coli strains. J.Bacteriol. 159:1027-1033.

20. Kyte, J., and R. F. Doolittle. 1982. A simple method fordisplaying the hydropathic character of a protein. J. Mol. Biol.157:105-132.

21. Laemmli, U. K. 1970. Cleavage of structural proteins during theassembly of the head of bacteriophage T4. Nature (London)227:680-685.

22. Levy, S. B. 1974. R factor proteins synthesized in Escherichiacoli minicells: incorporation studies with different R factors anddetection of deoxyribonucleic acid-binding proteins. J. Bacte-riol. 120:1451-1463.

23. Mackman, N., and B. Holland. 1984. Secretion of 107-k daltonpolypeptide into medium from a haemolytic Escherichia coliK12 strain. Mol. Gen. Genet. 193:312-315.

24. Maniatis, T., E. F. Fritsch, and J. Sambrook. 1982. Molecularcloning. Cold Spring Harbor Laboratory, Cold Spring Harbor,N.Y.

25. Marmur, J., and P. Doty. 1962. Determination of the basecomposition of deoxyribonucleic acid from its thermal denatura-tion point. J. Mol. Biol. 3:585-594.

26. Messing, J., R. Crea, and P. H. Seeburg. 1981. A system forshotgun DNA sequencing. Nucleic Acid Res. 9:309-321.

27. Michaelis, S., and J. Beckwith. 1982. Mechanism of incorpora-tion of cell envelope proteins in Escherichia coli. Annu. Rev.Microbiol. 36:435-465.

28. Miller, J. H. 1972. Experiments in molecular genetics. ColdSpring Harbor Laboratory, Cold Spring Harbor, N.Y.

29. Noegel, A., U. Rdest, and W. Goebel. 1981. Determination of thefunctions of hemolytic plasmid pHlylS2 of Escherichia coli. 1.

Bacteriol. 145:233-247.30. Noegel, A., U. Rdest, W. Springer, and W. Goebel. 1979.

Plasmid cistrons controlling synthesis and secretion of theexotoxin a-haemolysin of Escherichia coli. Mol. Gen. Genet.175:343-350.

31. Perlman, D., and H. 0. Halvorson. 1983. A putative signalpeptidase recognition site and sequence in eucaryotic andprocaryotic signal peptides. J. Mol. Biol. 167:391-409.

32. Pollock, M. R., and M. H. Richmond. 1962. Low cyst(e)inecontent of bacterial extracellular proteins: its possible physio-logical significance. Nature (London) 194:446-449.

33. Pustell, J., and F. C. Kafatos. 1984. A convenient and adaptablepackage of computer programs for DNA and protein sequencemanagement, analysis and hotnology determination. NucleicAcids Res. 12:643-655.

34. Rosenberg, M., and D. Court. 1979. Regulatory sequencesinvolved in the promotion and termination of RNA transcrip-tion. Annu. Rev. Genet. 13:319-353.

35. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequenc-ing with chain-terminating inhibitors. Proc. Natl. Acad. Sci.U.S.A. 74:5463-5467.

36. Shine, J., and L. Dalgarno. 1974. The 3'-terminal sequence ofEscherichia coli 16S ribosomal RNA: complementarity to non-sense triplets and ribosome binding sites. Proc. Natl. Acad. Sci.U.S.A. 71:1342-1346.

37. Silhavy, T., S. Benson, and S. Emr. 1983. Mechanisms of proteinlocalization. Microbiol. Rev. 47:313-344.

38. Stark, J. M., and C. W. Shuster. 1982. Analysis of hemolyticdeterminants of plasmid pHlyl85 by Tn5 mutagenesis. J. Bac-teriol. 152:963-967.

39. Tinoco, I., P. Borer, B. Dengler, M. Levine, 0. Uhlenbeck, D.Crothers, and J. Gralla. 1973. Improved estimation of second-ary structure in ribonucleic acids. Nature (London) New Biol.246:40-41.

40. Viera, J., and J. Messing. 1982. The pUC plasmids, an M13mp7-derived system for insertion mutagenesis and sequencingwith synthetic universal primers. Gene 19:259-268.

41. Von Heijne, G. 1983. Patterns of amino acids near signal-sequence cleavage sites. Eur. J. Biochem. 133:17-21.

42. Wagner, W., M. Vogel, and W. Goebel. 1983. Transport ofhemolysin across the outer membrane of Escherichia coli re-quires two functions. J. Bacteriol. 154:200-210.

43. Welch, R. A., E. P. Dellinger, B. Minshew, and S. Falkow. 1981.Haemolysin contributes to virulence of extraintestinal Esch-erichia coli infections. Nature (London) 294:665-667.

44. Welch, R. A., and S. Falkow. 1984. Characterization of Esch-erichia coli hemolysins conferring quantitative differences invirulence. Infect. Immun. 43:156-160.

45. Welch, R. A., R. Hull, and S. Falkow. 1983. Molecular cloningand physical characterization of a chromosomal hemolysin fromEscherichia coli. Infect. Immun. 42:178-186.

46. Yamamoto, T., T. Tamura, and T. Yokota. 1983. Primarystructure of heat-labile enterotoxin produced by Escherichiacoli pathogenic for humans. J. Biol. Chem. 259:5037-5044.

VOL. 163, 1985