nucleotide sequence of the escherichia coli pyre gene and of the dna in front of the protein-coding...

7
Eur J. Biochem 135, 223-229 (1983) J FtBS 1983 Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region Peter POULSEN, Kaj Frank JENSEN, Poul VALENTIN-HANSEN, Peter CARLSSON, and Lennart G. LUNDBERG University Institute of Biological Chemistry B, Copenhagen; lnstitute of Molecular Biology, Odense; and Department of Biochemistry and Biophysics, Chalmers Institute of Tcchnology, Goteborg (Received April 20, 1983) - EJB 83 0403 Orotate phosphoribosyltransferase (EC 2.4.2.10) was purified to electrophoretic homogeneity from a strain of Escherichia coli containing thepyrE gene cloned on a multicopy plasmid. The relative molecular masses (M,) of the native enzyme and its subunit were estimated by means of gel filtration and electrophoresis in the presence of dodecyl sulfate. The amino acid sequences at the N and C termini, as well as the amino acid composition, were determined. The nucleotide sequence of the structuralpyrE gene, including 394 nucleotide residues preceding the beginning of the coding frame, was also established. From the results the following conclusions may be drawn. Orotate phosphoribosyltransferase is a dimeric protein with subunits of M, 23 326 consisting of 21 1 amino acid residues. The pyrE gene is transcribed in a counter-clockwise direction from the E. coli chromosome as an mRNA with a considerable leader segment in front of the protein-coding region. This leader contains a structure with features characteristic for a (translated ?) rho-independent transcriptional terminator, which is preceded by a cluster of uridylate residues. This indicates that the frequency ofpyrE transcription is regulated by an RNA polymerase (UTP) modulated attenuation Orotate phosphoribosyltransferase (EC 2.4.2.10) catalyses one of the reactions in the de nova synthesis of UMP (Fig. 1). The pyr genes are scattered on the chromosome of enteric bacteria, and the structural gene for orotate phosphoribosyl- transferase, pyrE, is located at 81 min on the Eschrrichia coli linkage map, between dut and spoT [I]. Both in E. coli and in Sulmonella typhimuriurn the ex- pression of pyB, pyrE, and pyrF is repressed by a uridine nucleotide, which is U T P or UDP [2-41, while it is elevated by partial guanine starvation [5, 61. True regulatory mutants defective in the control of pyr gene activity have not been identified. However, it has been found that some RNA polymerase mutants (rpoBC) display high, constitutive pyrB and pyrE expression, while at the same time they contain high intracellular concentrations of uridine nucleotides 171. All other identified mutants with enhanced expression of the pyr genes have been shown to harbor defects in nucleotide interconver- sion pathways (pyrH, guaB) [S, 81. These observations indicate that RNA polymerase is involved in sensing the regulatory signals in pyr gene regulation. To characterize the biochemistry behind the control of pyrimidine biosynthesis further we have initiated a study of the individual pyr genes. This paper describes the nucleotide sequence of the E. coli pyrE gene and of its regulatory region. This gene had previously been cloned onto various multicopy plasmids [9 - 111, and from a strain containing one of these [I03 we have purified the gene product, orotate phosphoribosyl- transferase, and characterized its structure in some detail. The same plasmid served as a source of DNA restriction fragments for DNA sequencing. The results indicate that thepyrE gene is __ __ Ahhreviution. PRib-PP, S-phospho-r-~-ribosyI 1 -diphosphate. Enzyme. Orotate phosphoribosyltransferase or orotidine 5’-phos- phate: pyrophosphate phosphoribosyltransferase (EC 2.4.2.1 0). CTP -UTP CDP UDP T CMP dihydroorotate udpl /J cytosine uracil aspartaie pyr B ? ATP + HCO; + glutarnine ‘Or carbarnyl phosphate or9 I arginine c t Citrulline Fig. 1. Pathwuys for the biosynthesis of pyrimidirir nucleosirle rri- pho.cphutcs in E. coli. The enzymes areidentified by their corresponding gcne designations as follows : arg1, ornithine transcarbamylase (EC 2.1.3.3); cdd, cytidinedeaminase (EC 3.5.4.5); cod, cytosinedeaminase (EC 354.1); carAB, carbamoylphosphate synthase (EC 2.7.2.5); py,yrB, aspartate transcarbamylase (EC 2.1.3.2); pyrC, dihydro- orotase (EC 3.5.2.?);pyrD,dihydroorotate oxidase (EC I .3.3.1):pyrE, orotate phosphoribosyltransferase (EC 2.4.2.10); pyvF, OMP de- carboxylase (EC 4.1.1.23); pyrC, CTP synthetase (EC 6.3.4.2); pyrH, UMP kinase (EC 2.7.4.14); udk, uridine kinase (EC 2.7.1.48); udp, uridine phosphorylase (EC 2.4.2.3); upp, uracil phosphoribosyltrans- ferase (EC 2.4.2.9) transcribed with a considerable leader sequence, which con- tains a (translated?) rho-independent transcriptional termi- nator preceded by a uridylate cluster. This suggests that the frequency of pyrE transcription is controlled by an RNA polymerase modulated attenuation.

Upload: peter-poulsen

Post on 02-Oct-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region

Eur J. Biochem 135, 223-229 (1983) J FtBS 1983

Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region

Peter POULSEN, Kaj Frank JENSEN, Poul VALENTIN-HANSEN, Peter CARLSSON, and Lennart G. LUNDBERG University Institute of Biological Chemistry B, Copenhagen; lnstitute of Molecular Biology, Odense; and Department of Biochemistry and Biophysics, Chalmers Institute of Tcchnology, Goteborg

(Received April 20, 1983) - EJB 83 0403

Orotate phosphoribosyltransferase (EC 2.4.2.10) was purified to electrophoretic homogeneity f rom a strain of Escherichia coli containing t h e p y r E gene cloned o n a multicopy plasmid. The relative molecular masses (M,) of the native enzyme and its subunit were estimated by means of gel filtration and electrophoresis in the presence of dodecyl sulfate. The amino acid sequences at the N and C termini, as well as the amino acid composition, were determined. The nucleotide sequence of the structuralpyrE gene, including 394 nucleotide residues preceding the beginning of the coding frame, was also established. From the results the following conclusions may be drawn.

Orotate phosphoribosyltransferase is a dimeric protein with subunits of M, 23 326 consisting of 21 1 amino acid residues. The pyrE gene is transcribed in a counter-clockwise direction from the E. coli chromosome as an mRNA with a considerable leader segment in front of the protein-coding region. This leader contains a structure with features characteristic for a (translated ?) rho-independent transcriptional terminator, which is preceded by a cluster of uridylate residues. This indicates that the frequency ofpyrE transcription is regulated by an R N A polymerase (UTP) modulated attenuation

Orotate phosphoribosyltransferase (EC 2.4.2.10) catalyses one of the reactions in the de nova synthesis of UMP (Fig. 1). The pyr genes are scattered on the chromosome of enteric bacteria, and the structural gene for orotate phosphoribosyl- transferase, pyrE, is located at 81 min o n the Eschrrichia coli linkage map, between dut and spoT [I].

Both in E. coli and in Sulmonella typhimuriurn the ex- pression of p y B , pyrE, and p y r F is repressed by a uridine nucleotide, which is U T P o r UDP [2-41, while it is elevated by partial guanine starvation [ 5 , 61. True regulatory mutants defective in the control of pyr gene activity have not been identified. However, it has been found that some R N A polymerase mutants (rpoBC) display high, constitutive pyrB and pyrE expression, while a t the same time they contain high intracellular concentrations of uridine nucleotides 171. All other identified mutants with enhanced expression of the pyr genes have been shown to harbor defects in nucleotide interconver- sion pathways (pyrH, guaB) [ S , 81. These observations indicate that R N A polymerase is involved in sensing the regulatory signals in pyr gene regulation.

To characterize the biochemistry behind the control of pyrimidine biosynthesis further we have initiated a study of the individual pyr genes. This paper describes the nucleotide sequence of the E. coli p y r E gene and of its regulatory region. This gene had previously been cloned onto various multicopy plasmids [9 - 111, and from a strain containing one of these [I03 we have purified the gene product, orotate phosphoribosyl- transferase, and characterized its structure in some detail. The same plasmid served as a source of D N A restriction fragments for D N A sequencing. The results indicate that t h e p y r E gene is _ _ __

Ahhreviution. PRib-PP, S-phospho-r-~-ribosyI 1 -diphosphate. Enzyme. Orotate phosphoribosyltransferase or orotidine 5’-phos-

phate: pyrophosphate phosphoribosyltransferase (EC 2.4.2.1 0).

C T P -UTP

CDP UDP

T CMP

dihydroorotate

udpl /J cytosine uracil

aspartaie

pyr B

? ATP + HCO; + glutarnine ‘Or carbarnyl phosphate

or9 I

arginine c t Citrulline

Fig. 1. Pathwuys for the biosynthesis of pyrimidirir nucleosirle rri- pho.cphutcs in E. coli. The enzymes areidentified by their corresponding gcne designations as follows : arg1, ornithine transcarbamylase (EC 2.1.3.3); cdd, cytidinedeaminase (EC 3.5.4.5); cod, cytosinedeaminase (EC 354.1); carAB, carbamoylphosphate synthase (EC 2.7.2.5); py,yrB, aspartate transcarbamylase (EC 2.1.3.2); pyrC, dihydro- orotase (EC 3.5.2.?);pyrD, dihydroorotate oxidase (EC I .3.3.1):pyrE, orotate phosphoribosyltransferase (EC 2.4.2.10); pyvF, OMP de- carboxylase (EC 4.1.1.23); pyrC, CTP synthetase (EC 6.3.4.2); pyrH, UMP kinase (EC 2.7.4.14); udk, uridine kinase (EC 2.7.1.48); udp, uridine phosphorylase (EC 2.4.2.3); upp, uracil phosphoribosyltrans- ferase (EC 2.4.2.9)

transcribed with a considerable leader sequence, which con- tains a (translated?) rho-independent transcriptional termi- nator preceded by a uridylate cluster. This suggests that the frequency of pyrE transcription is controlled by an R N A polymerase modulated attenuation.

Page 2: Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region

2 24

MATERIALS AND METHODS

Muferiuls

Bovine serum albumin, chicken ovalbumin, sc-chymotryp- sinogen, cytochrome c, lysozyme and PRib-PP were obtained from Sigma Chemical Company. Yeast alcohol dehydrogenase and catalase were from Boehringer. E. coli thymidine phospho- rylase was prepared according to Schwartz [13]. Yeast carbo- xypeptidase was a gift from R. Hayashi. Restriction endo- nucleases were purchased from commercial suppliers (Bethesda Research Laboratories, New England Biolabs, or Boehringer). T4-DNA ligase, bacterial alkaline phosphatase, and T 4 poly- nucleotide kinase were from Bethesda Research Laboratories, while 32P labelled nucleoside triphosphates were from I. C. N. Corp. or Amersham International, England.

Enzyme assays

Orotate phosphoribosyltransferase activity was determined as previously described [2]: 1 unit converts 1 pmol orotate to OMP/min under the following conditions. The assays con- tained in a total volume of 500 pl: 0.10 M Tris/HCl pH 8.8, 6 mM MgCl,, 0.25 inM orotate, 0.01 -0.03 unit enzyme, and 1.0 mM PRib-PP. The reaction was carried out at 37 C and initiated by adding PRib-PP. The conversion of 1 pmol orotate to OMP results in an absorbance decrease at 295 nm equal to 3.67 (cm light path)- '. Protein concentrations were determined by the procedure of Lowry et al. [I41 using bovine serum albumin as standard. Alcohol dehydrogenase and thymidine phosphorylase activities were determined by published proce- dures [12, 131.

Polycic'rylamide gel elec'trophovesis

To check the purity after each step in the enzyme purifi- cation polyacrylamide gel electrophoresis of native protein was performed with 8 polyacrylamide at pH 9.3 as described by Davis [15], while electrophoresis of denatured proteins in the presence of sodium dodecyl sulfate was performed by the procedure of Weber and Osborn [16].

Relutive tnolecwlar mass rletevrnitiulions

Gel fillration. The relative molecular mass of the native orotate phosphoribosyltransferase was estimated by gel fil- tration [17] on acolumn (6.5 cm2 by 83 cm) of Sephadex G-100 25 mM Tris/HCl pH 7.6, 0.5 mM EDTA, 2 mM 2-mercapto- ethanol, and 50 mM sodium chloride. The flow rate was 18 ml/h and 2.7-ml fractions were collected. The markers used were: yeast alcohol dehydrogenase (Mr = 150000), E. coli thymidine phosphorylase (M, = 95000), bovine serum al- bumin ( M , = 68000), chicken ovalbumin ( M , = 43000- 46 OOO), a-chymotrypsinogen ( M , = 25 700), and cytochrome c ( M , = 12400) present three at the time in two runs together with orotate phosphoribosyltransferase.

SDS electrophoresis. This was carried out on denatured proteins in 11 - 15 "/, polyacrylamide gradient slab gels essen- tially as described [18] (system 11) except that a 5-17.3% sucrose gradient was incorporated in the gel. The following markers were used: bovine serum albumin ( M , = 68000), catalase ( M , = 60000), ovalbumin ( M , = 43000), yeast alcohol dehydrogenase (M, = 37000), a-chymotrypsinogen (M, = 25 700), and lysozyme (M, = 14 300).

Aniiiio acid unalysis

Duplicate samples of orotale phosphoribosyltransferase were hydrolysed in 6 M hydrochloric acid containing 0.05 :; phenol for 24 h, 48 h, or 72 h at 110 C in evacuated tubes. Analyses of the amino acid content in hydrolysates were performed on a Durrum D-500 amino acid analyser. The values for serine and threonine were estimated by extrapolation to zero-time hydrolysis, whereas the values after a 72-h hydrolysis were used as a measure of the content of valine and isoleucine. The cysteine content was determined after a 24-h hydrolysis of performic-acid-oxidised protein [19], and the content of tryp- tophan after a 24-h hydrolysis in 4 M methane sulfonic acid containing 0.2 3-(2-aminoethyl)-indole [20]. Horse heart cytochrome c', which contains one tryptophan residue, was hydrolysed in parallel as a control.

The N-terminal amino acid sequence of orodate phosphori- bosyltransferase was determined by manual Edman degra- dation as described by Klemm et al. [21]. The phenylthiohydan- toin derivatives were hydrolysed for 20 h in HI at 127 C, whereafter the amino acids were determined on the Durrum D-500 amino acid analyser.

The C-terminal amino acid sequence was established by digestion with carboxypeptidase Y [22, 231 at 20 C in 0.1 M pyridine acetate pH 7.6 containing 0.1 "/; sodium dodecyl sulfatc and norleucine as internal standard. The enzyme! substrate ratio was 1 :I00 (mol/mol). The reaction was termi- nated by pipetting 20-4 aliquots into 10 p1 100 2) acetic acid. The amino acids released were determined as described above.

PVUU

sic I

Cla I

Fig. 2. P1nsrnid.s employed f o r purifkccrtion of o~olcife phosphorihosyl- trumferase and ,for sequencing the pyrE gene. pKK6 is described elsewhere [9 ~ 111. pPP 1 was constructed by inserting the 2 x 103-base pair PvuII-Pvull fragment of pKK6, containing the pyrE gene, into the PvuI I site of pBR322. The heavy, the light, and the streaky lines represent respectively E. coli DNA, pBR322 DNA, and 7. DNA. kb = 103 base pairs

Page 3: Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region

225

Purification of' orotate phos~~ho~iho.syltr"~rzsferase ,from E. coli

The E. coliK12strainSOl573 (araD139, A(/uc)U169, rpsL, thi, A p y r E ) containing the p y r E plasinid pKK6 (Fig. 2) was grown in a glucose medium [24] supplemented with casamino acids and harvested in the late exponential phase. 222 g (wct weight) of frozen cells were suspended in 400 ml of 0.10 M Tris/HCl pH 7.6, 2 mM EDTA and disrupted with a Branson sonifier. The extract was spun at 12 000 x g and the supernatant was collected (fraction A). 2-Mercaptoethanol was added to a final concentration of 2 mM followed by 0.1 vol. of a 20 7; solution of streptomycin sulfate. After gentle stirring for an hour the precipitated material was removed by centrifugation (30 min at 12000 x 5). The supernatant was dialysed against 0.10 M Tris/HCI pH 7.6, 2 mM EDTA, 2 mM 2-mercapto- ethanol (fraction B) and incubated for 10 min in a 62 C water bath with stirring. The heated solution was cleared by centrif- ugation (15 min at 12000 x g ) and dialysed against 25 mM Tris/HCl pH 7.6 containing 0.5 mM EDTA and 2 mM 2- mercaptoethanol (hereafter called buffer A) (fraction C). This fraction was pumped onto a column (5.2 cm2 by 38 cm) of DEAE-cellulose (Whatman DE-52) previously equilibrated with buffer A. After washing the column with 1 vol. buffer A

Fig. 3. Coor~~us.sic~-strririPd polyacrylarnide gels o f the purified o r o t u / r plzo.spl~orihosyltransferuse. (A) Non-denaturing gel electrophoresis of fraction F (13 pg protein applied). (B) Non-denaturing gel elec- trophoresis of fraction G (21 pg protein applied). (C) Sodium dodecyl sulfate gel electrophoresis of fraction G (21 pg protein applied)

containing 0.075 M sodium chloride, it was eluted with a linear gradient (total volume 1.0 1) up to 0.30 M sodium chloride in buffer A. The flow rate was 40 ml/h and 14-ml fractions were collected. The fractions containing orotate phosphoribosyl- transferase activity (peak at about 0.17 M sodium chloride) were pooled and concentrated by ammonium sulfate fractio- nation. The solution was first brought to 40 :d saturation by adding solid ammonium sulfate and then stirred for 30 min. After removing the precipitate by centrifugation (30 min at 12000 x g) the concentration of ammonium sulfate was in- creased to 7 5 % saturation by adding the solid salt. After stirring for another 30 min the precipitated material, contain- ing orotate phosphoribosyltransferase, was collected by cen- trifugation (30min at 12000 xg) and dissolved in buffer A (fraction D). This fraction was pumped through a column (5.3 cm2 by 83 cm) packed with Sephadex G-100. The flow rate was 50 ml/h and 12-ml fractions were collected. The fractions containing the enzyme were pooled (fraction E) and pumped (32 mlih) onto a column (2.0 cm2 by 35 cm) of matrix red A (Amicon), which was equilibrated with buffer A containing 0.25 M potassium chloride. The column was eluted with a linear gradient (total volume 1.0 1) up to 0.90 M potassium chloride in buffer A and 12-ml fractions were collected. The peak (around 0.34 M potassium chloride) was pooled, dialysed against buffer A, and concentrated by application to a column (0.6 cm2 by 11 cm) of DEAE-cellulose (DE-52) in buffer A. The enzyme was eluted with buffer A containing 0.30 M sodium chloride (fraction F). Since the preparation still con- tained some contaminating protein bands (Fig. 3), fraction €; was applied (42 ml/h) to a column (0.5 cm2 by 24 cm) of hydroxyapatite, which was equilibrated with 10 mM potassium phosphate pH 6.8 containing 0.5 mM EDTA and 2 mM 2- mercaptoethanol. After washing the column with 50 in1 of the same buffer, orotate phosphoribosyltransferase was eluted with a 70O-ml linear gradient up to 100 mM potassium phosphate with 0.5 mM EDTA and 2 mM 2-mercaptoethanol while 10-ml fractions were collected. The peak of enzyme activity, appearing around 25 mM phosphate, was concen- trated on a column of DEAE-cellulose as described above and dialysed against buffer Acontaining 55 %glycerol (fraction G). When stored at - 20 'C for more than a year this fraction has not lost any activity. Table 1 gives the data of the purification and Fig. 3 shows that the product appears to be homogeneous when analysed by electrophoresis in polyacrylamide gels. A purification procedure for orotate phosphoribosyltransferase using OMP-Sepharose chromatography was previously de- scribed [25], but the specific activity of the product appeared to be considerably lower than the specific activity of the product obtained by the procedure described here.

Table 1. Purification of' E. coli orofatr ~~/i(i,sphori/~osy~trarr.s~ruse Protein wasestimated as described by Lowry [14]. The specific activity offraction G is 324units/mg dry weigthprotein calculated from amino acid determinations; = 5.9

Fraction Treatment Volume Total Total Specific Purification Yield activity activity protein

in 1 A . Crude extract 590 B. Strep. SO4 supernatant 670 C. 62 "C 615 D. DE-52 and (NH4),S04 40- 75 :,$ 12 E. G-100 filtration 60 F. Red A 7 G. Hydroxy apatite 12

units 9310

11 980 9217 8 604

10 200 5900 2 940

mg 14 160 12 060 6 765

396 102 19.2 7.4

uni ts/mg fold % 0.66 1 .0 100 0.99 1.5 I 29 1.36 2.1 99

21 .I 33 92 100 152 110 307 466 63 397 60 1 32

Page 4: Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region

226

D N A sequencing techniques

Plasmid pPP1 (Fig. 2) was cut with appropriate restriction endonucleases, dephosphorylated with bacterial alkaline phos- phatase, and labeled at the 5' ends using T 4 polynucleotide kinase. After subcutting with another restriction enzyme, the desired labelled fragments were isolated by preparative gel electrophoresis and electroeluted into dialysis bags. The DNA sequence determinations on these fragments were carried out by the method of Maxam and Gilbert using cleavages specific for dA, dA/dG, dC/dT, and dC [26, 271. The methods were modified as follows. Only 0.5 pl dimethylsulfate was added to the dG reactions. For fragments up to 300 base pairs the reaction conditions were 25 rnin at 37 "C for the dAjdG reaction, 5 min at 20 "C for the dG reaction, and 10 min at 20 "C for the dCjdT and the dC reactions. For fragments larger than 300 base pairs the reaction conditions were 15- 20 min at 37 "C for the dAjdG reaction, 5 min at 10 "C for the dG reaction and 10inin at 1 0 T for the dCjdT and the dC reactions.

From plasmid pKK6 (Fig. 2) the 2.0 x 10-3-base-pair PvuII-PvuII fragment was isolated by preparative agarose gel electrophoresis followed by electroelution [28]. Subsequently this fragment was cut with various restriction endonucleases. The resulting fragments were labeled at their 5' or 3' ends as described elsewhere [29], and the labeled fragments werc isolated by electrophoresis in 5 "/, polyacrylamide gels. Single- end-labelled fragments were isolated either after subcutting with suitable restriction enzymes followed by electrophoresis as above, or by separating the labeled strands by electrophoresis in denaturating gels. From these gels the labelcd DNA was eluted by diffusion after grinding the gel piece. The single-end- labelled fragments were subjected to nucleotide sequence determinations according to Maxam and Gilbert [27].

RESULTS

Purijicat ion o j orotate phosphoribosyltrunsjera.~e

The purification procedure described (Table 1) yields an electrophoretically homogeneous enzyme (Fig. 3) purified nearly 8000-fold compared to wild-type extracts. The specific activity of the product is 400 unitsjmg (Lowry) protein or 324 units/mg dry weight as calculated from the amino acid analysis. The strain used for the purification harbors the multicopy plasmid pKK6 containing the pyrE gene (Fig. 2) and hence it overproduces orotate phosphoribosyltransferase. In crude ex- tracts of wild-type cells the specific orotate phosphoribosyl- transferase activity is about 0.05 units/mg protein. Hence, the enzyme comprises roughly 0.01 of the cellular protein.

Molecular characteristics of' orot ate phosphoribosyltrunsfer~i.~~~

polyacrylamide gradient gels containing sodium dodecyl sulfate, a relative molecular mass of 23 000 was found for the subunit of orotate phosphori- bosyltransferase. This is in agreement with results of protein synthesis experiments using the maxicell technique [9, 3 I]. By gel filtration on a Sephadex G-100 column, orotate phos- phoribosyltransferase co-eluted with chicken ovalbumin (Mr = 43000-46000). These results indicate a dimeric struc- ture of the native enzyme.

The amino acid composition found for the purified orotate phosphoribosyltransferase is shown in Table 2 together with the amino acid composition predicted from the nucleotide

By electrophoresis in 1 1 - 15

Table 2. Amino ucid composition of E. coli orotuie phospho- rihosyltrunqeruse

Amino acid Residues from

amino acid DNA sequence analysis

---- ~ _ _ _ _ _ ~

Aspartic acid Asparagine Threonine Serine Glutamic acid Glutamine Proline Glycine Alanine Valine Methionine Isoleucine Leucine Tyrosine Phenylalanine H istid ine Lysine Arginine Tryptophan Cysteine

Number of residues hl,

17.3

9.2" 10.6" 27.3

6.6 20.2 23.1 11.7" 3.9

13.6" 23.1 7.0

11.1 4.4

11.5 11.7 0' 2 . v _ _

21 5

11 6 9 9

19 7 6

19 23 12 3

14 23 8

12 4

12 12 0 2

21 I _ _

23 326

a Extrapolated to 0-time hydrolysis. Value obtained from 72-h hydrolysis. Estimated separately by hydrolysis in methane aulphonic acid. Determined as cysteic acid.

W G 3 a,

0 c - L

0 0.1 c E 4

iioioio 60 120 180

TIME (rnin)

Fig. 4. Digestion o~orotutephosphorihosyltrun.~fi~r~ise ,t,ith curhosypep- tidase Y at 2O"C,pH 7.6 in the presence o f 1 :< sodium dodecylsulfirte. Norleucine was included in the reaction mixture as internal standard

sequence. The first 13 N-terminal amino acids were established by manual Edman degradation giving the sequence Lys-Pro- Tyr-Gln-Arg-Gln-Phe-Ile-Glu-Phe-Ala-Leu-Ser-. Hereafter the sequential degradation started to loose coherence. The kinetics of amino acid release by degradation with carboxypep- tidase Y indicated that the C-terminal sequence of orotate phosphoribosyltransferase is -Glu-Phe-Gly-Val (Fig. 4). These results served as benchmarks for locating the coding region of the structural pyrE gene in the nucleotide sequence.

Page 5: Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region

221

I - c -

b b Mlu I Ava I BamHl MlUl Stu I PvuII BssHll

I I 1 I 1 1 I I 1 1 1 1 I 1 I I

0 100 ; I: 5w A ' T L H T T H E A A E E E 4 - 7- c

* I

.+ + -

Fig. 5. Restriction endonuclease sites andfragments used,fou DNA sequc.ncing. The arrows above and below the center line represents the fragments used for sequencing the upper and the lower strand. The heavy and the light lines indicate respectively the sequenced and the non-sequenced part of the fragments. The proximal ends of thc arrows represent the labeled ends. The coordinates refers to the number of the base pairs. Abbreviations of restriction endonucleases: A = AhalII, E = EcoRII, H = HinfI, and T = TaqI

ze 5e ee I ACG GCI GCC TGC GIG GCG C I G C I A GAT GCG C I A CAG AAG C I G GIG GAA AAC GGC AAG CTG AAA ACC AAT CCG A I G AAA GGG A I G GTA

Ihr G l y A l a C y s V a l A l e Leu Val A s p A l e L e u G l n L y s Leu V a l G l u A a n G l y L y a Leu L y e Ihr A a n P r o Met L y a G l y Met V a l

178 GCC GCA GI1 1 C 1 GIC GGA A 1 1 GTG AAC GGC GAA GCG GI1 IGC GAT C I G GAA I A C GI1 GAA GAC I C 1 GCC GCA GAG ACC CAC A I G AAC C I A

SD-1 -35 regionlla p-l 148 - A l a A l a V a l Ser V a l G l y I le V a l A s n G l y Glu A l a V a l C y s A s p Leu Glu I y r V a l G l u A s p Ser A l a A l a G l u Thr A s p Met A s n V a l

-35 reaion 20J z3e GTG A I G ACC GAA GAC GGG CGC A I C A T 1 GAA GIG CAG GGG ACG GCA GAA GCC GAG CCG T I C ACC C A I GAA GAG C I A C I C A I C T T G TIG CCT V a l M e t Ihr G l u A s p G;y A r g Ile Ile G l u V a l G l n G l y l h r A l a G l u G l y G l u P ro Phe lhr His G l u G l u Leu Leu I l e Leu Leu A l a

2 9 8 CTG GCC CGA GGG GGA ATC GAA ICI: AT1 CIA GCG ACG CAG i A G m G GCA AAC

P-2 - - Leu A l a A r g G l y G l y l i e G l u 5er I le A l a A l a Thr G l n L y s A l a A l a L e u A l a A s n End Met Ser A r g L e u Phe

3 8 I11 G I C IGI AGA AAA C I A AGA IGA GGA GCG AAG GC Phe V a l C y s A r g L y s V a l A r g E n d

423 453 403 ATG AAA CCA TAT CAC CGC CAG T I T A17 GAA 111 GCG C I I AGC AAG CAG GTG 11A AAG T I T GGC GAG T I T ACG C I C AAA I C C GGG CGC AAA

( M e t ) L y s P ro I y r G l n A r g G l n Phe I l e G l u Phe A l a Leu Ser L y s G l n V a l Leu L y s Phe G l y G l u Phe l h r L e u L y s Ser G l y A r g L y s

513 543 573 AGC CCC I A I TTC T I C ACC GCC GGG CIG I T 1 A A I ACC GGG CGC C A I C I G GCA C I G I I A GGC CGI T I T TAC G C I GAA GCG IIG G I G L A 1 TCC Ser P r o I y r Phe Phe A a n Ala G l y Leu Phe A s n Ihr G l y A r g A s p Leu A l a Leu Leu G l y A r g Phe I y r A l a G l u A l a L e u Val A s p S e r

603 633 663 GGC A 1 1 GAG T I C GAT C I G C I G T T I GGC C C I GCT TAC AAA GGG A I C CCG A I T GCC ACC ACA ACC GCT G I G CCA CTG GCG GAG CAT CAC GAC G l y Ile G l u Phe A s p Leu Leu Phe G l y Pro A l a l y r L y s G l y I le Pro l l e A l a Ihr l h r Ihr A l a V a l A l a Leu A l a G l u HIS H l s A s p

693 723 753 CTG GAC C I G CCG I A C I G C I T 1 AAC CU: A M GAA GCA AAA GAC CAC CGI GAA GGC GGC A A I C I G GI1 GGI AGC GCG I I A CAA GGA CGC C I A Leu A s p Leu P r o T y r C y s Phe A s n A r g L y s G l u A l a L y s A s p Hls G l y Glu G l y G l y A s n L e u V a l G l y Ser A l a Leu G l n G l y A r g V a l

103 81 3 843 A I G CTG C I A GAT GAT GIG A I C ACC GCC GGA ACG G C I CGC GAG I C G A I G GAG AT1 A 1 1 CAG GCC A A I GGC GCG ACG C T I G C I GGC C I G T I C Met Leu V a l A s p A s p V a l I le lhr A l a G l y T h r A l a A r g G l u S e r Met Glu Ile I l e G l n A l a A s n G l y A l a Ihr L e u A l e G l y V a l L e u

873 903 933 AT1 I C G C I C GAT C G I CAG GAA CGC GGG CGC GGC GAG A l l ICG GCG A T 1 CAG GAA GI1 GAG C G I GAT I A C AAC IGC AAA GIG A X I C T A I C l l e Ser Leu A s p A r g G l n G l u A r g G l y A r g G l y G l u Ile S e r A l a I le G l n G l u V a l Glu A r g A s p I y r A s n C y s L y s V a l Ile Ser I l e

963 993 1023 ATE ACC C I G AAA GAC CTG A T 1 GCT TAC C I G GAA GAG AGG C I G GAA A I G GCG GAA C A I CTG GCG GCG GI1 AAG GCC I A I CGC GAA GAG I l l I le l h r Leu L y s A s p Leu I l e A l a T y r Leu G l u G l u L y s P ro G l u Met A l a G l u H i s Leu A l a A l a V a l L y s A l a I y r A r g G l u G l u Phe

1053 1083 1113 CCC G A A AGA AAC I C CCG GAT GAA YG I C A I C C GGC g C A I A TTA C I G CAA 'C IG GCC GCA A 1 1 AGC GGC CAG CGG GCG TCA AAA I C A G l y V a l E n d E

Fig. 6. The nucleotide seyuenre and enrodedpoi.vpeptidees of the pyrEgene. The putative promoters are represented by their -35 regions and Pribnow boxes (P-1 and P-2). The possible Shine-Dalgarno sequences (SD-1, SD-2, and SD-3) are indicated by a line showing the contiguous bases complementary to the 3'-oligonucleotide of E. coli 16s RNA [30]. Dyad symmetries are represented by arrows with the center of symmetry shown by dots. The last nucleotides in the sequence are also presented on the opposite strand in a paper concerning the dut gene [29]. ORF refers to the stop codon of an open reading frame read on the opposite strand [29]. The representing deoxy and the hyphens representing phosphodiester links have been omitted

D N A .sequence normally regulated pyrE gene (results not shown). Since a BamHI site had earlier been reported t o be present in the pyrE gene [9], we concentrated on sequencing fragments around this site. Fig. 5 shows the strategy used. Both strands have been

The starting material for DNA sequencing were the two plasmids showii in Fig. 2. Both plasmids contain an intact and

Page 6: Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region

228

sequenced at least once and all restriction sites have been crossed by sequence determinations from different fragments.

The nucleotide sequence of the pyrE gene, as well as the predicted amino acid sequence of orotate phosphoribosyltrans- ferase, is shown in Fig. 6. The triplet for the translational start codon (ATG) of the pyrE gene is positioned at base pair 394. The open reading frame extends for 633 pairs and ends with a TAA stop codon triplet at position 1029. The resultant polypeptide has a relative molecular mass of 23 326, which is in good correspondence with the experimentally observed value. The N-terminal amino acids 2- 14 deduced from the DNA sequence are in full agreement with the results of the Edman degradation, indicating that the N-terminal methionine is removed from the enzyme after translation. Also the last four C-terminal amino acids predicted from the DNA sequence are in complete accordance with the results of the yeast carboxy- peptidase digestion (Fig. 4). However, the slow appearance of leucine and alanine in the carboxypeptidase digestion is uncxplained. From Table 2 it is seen that the amino acid composition deduced from the nucleotide sequence is in good agreement with the data found by the amino acid analysis of the purified orotate phosphoribosyltransferase.

Fig. 6 also shows the nucleotide sequence 394 base pairs upstream from the translational start of the structural gene of pj’rE. The features of this region relevant to pyrE transcription are discussed below.

DISCUSS I O N

The pyrE gene, encoding orotate phosphoribosyltrans- ferase, is located at about 81 min on the linkage map of the E. roli chromosome, between dut and spotT [ I ] and was previously cloned on multicopy plasmids [9 - 2 1 1 . Orotate phosphoribo- syltransferase was purified from a strain containing one of these plasmids, pKK6 [lo], and some structural features of the enzyme were determined, including molecular masses and terminal amino acid sequences.

The nucleotide sequence was determined using pKK6 or its derivative, pPPl (Fig. 2), as sources of DNA. We focussed on sequencing DNA fragments around the Barn H I cleavage site located in the p,vrE gene [9]. Straddling this site a 633-residue open reading frame is found, which encodes a 211-residue polypeptide (Fig. 6). This reading frame is the structural gene for orotatc phosphoribosyltransferase, since the N-terminal and C-terminal amino acid sequences, the amino acid com- position, and the molecular predicted for this polypeptide from the nucleotide sequence correspond very well to the data found for the purified enzyme. The codon usage in translation of the pyrE messagc is similar to that found for weakly expressed genes [32], in accordance with the finding that orotate phospho- ribosyltransferase comprises only about 0.01 of the total protein in wild-type cells. The pyrE message reads from spoT towards dut, hence the direction of pyrE transcription is counter clockwise from the E. coli chromosome. Translation ends at position 1029 (last nucleotide in valine codon) with a TAA stop codon triplet. The inRNA may terminate in a G + C- rich stem-loop structure (AG = -98 kJ as calculated by the Tinoco rules [33]) found at position 1040- 1063 (Fig. 6). Interestingly, an open reading frame is found in the opposite strand coding for a 210-amino-acid polypeptide of unknown function [29]. This polypeptide reads from dut towards pyrE and terminates translation at the same position where pyrE translation is stopped (Fig. 6) [29].

Our primary interest in the nucleotide sequence of pyrE derives froin studies on the control of pyr gene expression.

From the enzyme levels found in mutants defective in nu- cleotide interconversion, it was concluded that the repressing metabolite for pyrB, pyrE, and pyrF expression is UTP (or UDP, or a compound metabolically derived there from) [2- 41. It has also been found that reductions in the pools of guanosine nucleotides greatly enhances expression of the three genes [ 5 , 6 ] . The receptor which senses these regulatory signals seems to be the elongating RNA polymerase. First, the only type of mutants identified which overproduce the pyrimidine biosyn- thetic enzymes, while at the same time containing high intracellular pools of repressing nucleotides, are rpoBC mu- tants [7]. Second, the purified RNA polymerase of one such mutant was found to have a 4-6-fold elevated K,,, for UTP in the elongation of transcripts of T7 DNA and synthetic DNAs ( K . F. Jensen, unpuplished observations).

The sequence of 394 nucleotide pairs preceding the start of the p y E coding region is also shown in Fig. 6. The structure nearest to the initiator codon (AUG) that has any similarity to the canonical promotor sequence [34,35] has its Pribnow box. d(A-A-T-C-G-A-A), centered around nucleotide 287, indicat- ing the start of a transcript at position 298, i. e. 97 nucleotides upstream of the coding frame. Another promoter may be located 150 nucleotide pairs more upstream from this position (Fig. 6). However, it is even possible that pyrE is being cotranscribed with an unknown gene. The entire sequence upstream of position 325 in Fig. 6 constitutes an open reading frame which was classified as a coding DNA segment by ‘test- code’ analysis [36] ( P = 0.98 and testcode indicator, I = 1.08). This open reading frame covers both of the two putative promoters mentioned above, and it almost certainly represents the promotor distal part of an unknown gene. Thus protein synthesis in maxicells, directed by plasmid pKK6, have shown that an approximately 35-kDa polypeptide is encoded by the DNA in front of the pyrE gene [31]. This same polypeptide is synthesized from plasmid pPP 1 (P. Poulsen, unpublished results). Since the cloned PvuII fragment in pPPl (Fig. 2) contains only approximately 800 unscquenced base pairs prior to the MluI site where the sequence begins (Fig. 5), the 325- nucleotide open reading frame observed (Fig. 6) must be included in the gene for the more than 300-amino-acid poly- peptide of unknown function.

Between the end of the unknown coding frame (position 325 in Fig. 6) and the start ofpyrEis found a structure with features of a rho-independent transcriptional terminator, i. e. a G + C - rich stem-loop structure followed by a stretch of uridylate residues (Fig. 6 and 7) resembling the attenuator in the t rp operon [37]. This suggests that the frequency OfpyrEtranscrip- tion is regulated by attenuation. A similar structure was recently found to precede the pyrBZ operon [3X, 391 and was

A T G A A G G

1 - A C-6 C-I; G - C 340 A - T C - G G-C 350

A G C - G G - C G - C

m C - G G - C G - C 3XI A - T 360

A G 310

T C G A A T C C A T T G T A - T G G C A A A C T G A T T T T T A - T T I T T T T I G T C T G l

Fig. I. The secondary structures of the rho-independen f trunst~ription terniimtion site in the leuder region o f f h e pyrE genc. The free energies ofthemRNA hairpinsare:(298-318) AG = -44 kJ/moland(334- 355) AG = - 72 kJ/mol, when calculated according to Tinoco el al. [33]. The d representing deoxy and the hyphens representing phospho- diester links have been omitted

Page 7: Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region

shown to function as a transcriptional terminator in vitro [39]. The expression of the Iwo genes are similarly regulated although pyrBI responds more than pyrE [2, 5 , 71.

In contrast to the attenuators in the amino acid biosynthetic operons, the attenuators in the pyr genes may be translated. The nucleotide sequence of the pyrBI operon allows for the formation of a 44-amino-acid leader peptide straddling the attenuator [38, 391. Depending on the position of the pyrE promoter, two hypothetical leader peptides may be formed. If PI (Fig. 6) is operative, a 52-amino-acid peptide is allowed, ending at position 325 just prior t o the stem of the attenuator. Alternatively, if pyrE is cotranscribed with another gene, this peptide represents the carboxy-terminal part of a longer protein. However, regardless of promoter position, an 11 - amino-acid oligopeptide may be formed from the region in front of the pyrE gene. This peptide has its ribosome binding site positioned at the beginning of the stem structure and its initiator codon located in the top of the loop of the attenuator (position 344 in Fig. 6).

lmmediately preceding the G + C-rich stem of the pyrE attenuator is found a block of five thymidylate residues (uridylates in the mRNA). Thus it seems likely that the frequency of pyrE transcription is regulated by the coupling between the elongating R N A polymerase and the translating (leading) ribosome. This coupling could be modulated by variations in the rate of R N A chain elongation in thymidylate- rich blocks induced by variations in the cellular UTP pool. Thus high pools should result in decoupling and a high frequency of termination, while low UTP pools may result in tight coupling and little termination at the pyrE attenuator. Such a model has recently been proposed to account for thc regulation ofpyrBZexpression [39] and it is in accordance with the results from studies on the effects of R N A polymerase mutations on pyv gene expression 171.

Peter Poulsen is grateful to Vibeke Barkholt Pedersen for exccllent guidance in performing the protein sequence determinations. Department of the Carlsberg Laboratory is thanked for making amino acid analyses. We are indebted to Olle Karlstroin for performing computer analysis and for helpful discussions. The work was sup- ported by grants from the Danish National Science Council and from Svm o ch Ehha- Christ incl Haghrrgs St iftelse .

REFERENCES

1. Bachman, B. J . & Brooks Low, K. (1980) Microbiol. Rev. 44, 1 -

2. Schwartz, M. & Neuhard, J. (1975) J . Bacteriol. 121, 814-822. 3. Kelln, R. A,, Kinahan, J . J., Foltermann, K. F. & O’Donovan, G.

4. Pierard, A. N., Glansdorff, N., Gigot, D., Crabeel, M., Halleux, P.

56.

A. (1975) J . BUCtPriOl. 124, 764-774.

& Thiry, L. (1976) J . Bacteriol. 127, 291 -301.

5. Jensen, K. F. (1979) J . Bacteriol. 138, 731-738. 6. Levine, R. A. & Taylor, M. W. (1982) J . Bucteriol. 149, 1041-

7. Jensen, K. F., Neuhard, J . & Schack, L. (1982) EMBO J . 1,69 -

8. Justesen, J. & Neuhard, J . (1975) J . Bacteriol. 123, 851 -854. 9. An, G., Justesen, J., Watson, R. J. & Friesen, J. (1979) J . Bucteriol.

10. Lundberg, L. G., Nyman, P. 0 . & Karlstrom, 0 . (1979) Acta

1 1 . Lee, J. S., An, G., Friesen, J. D. & Isono, K. (1981) Mol. Gen.

12. Biihner, M. & Sund, H. (1969) Eur. J . Biochern. 11, 73-79. 13. Schwartz, M. (1971) Eur. J . Bioclzcvn. 21, 191 -198. 14. Lowry,O. H., Rosenbrough, N. J., Farr, L. & Randall, R. J.(1953)

15. Davis, B. J . (1964) Ann. N . Y . Acad. Sci. 121, 404-427. 16. Weber, K . & Osborn, M. (1969) J . Biol. Chern. 244, 4406-4412. 17. Andrews, P. (1965) Bioclirm. J . 96, 595-606. 18. Machold, O., Simpson, D. J . & Mgller, B. D. (1979) Curlsherg Res.

19. Stanford, M. (1962) J . Biol. C‘hem. 238, 235-237. 20. Liu, T. Y . & Chaiig, Y. H. (1971) J . Riol. Cluvn. 246,2842- 2848. 21. Klemm, P., Poulsen, F., Harboc, M. K. & Foltmann, B. (1976)

22. Hayashi, R., Moore, S. & Stein, W. H. (1973) J . Bid. Cliem. 248,

23. Gaastra, W., Klemm, P., Walker, J . M. & De Graff, F. K. (1979) FEMS Microhiol. Lett. 6 , 16- 18.

24. Clark, D. J . & Maal$e, 0. (1967) J . Mol. Biol. 23, 99- 102. 25. Dodin, G. (1981) FEBS Lett. 134, 20-24. 26. Maxam, A. & Gilbert, W. (1977) Proc. Natl Acad. Sci. IJSA, 74,

27. Maxam, A. &Gilbert, W. (1980) Meth0d.s Enzymol. 65,499- 580. 28. McLhnnel, M. W., Simon, M. N. & Studier, F. W. (1977) J . Mol.

29. Lundberg, L. G., Thorcsen, H. O., Karlstram, 0. H. & Nyman,

30. Steitz, J . A. & Jakes, K. (1975) Proc,. Nuti Acad. Sci. U S A , 72,

31. Lundberg, L. G., Karlstrom, 0. H. & Nyman. P. 0. (1983) G c w ,

32. Crrosjean, H. & Fiers, W. (1982) Gene 18, 199-209. 33. Tiiioco, I., Jr, Borer, P. N., Dengler, B., Levine, M. D., Uhlenbeck,

0. C., Crotbers, D. M. & Gralla, J. (1973) Naturc N m Biol. ( L o i d . ) 246, 40-41.

34. Kosenberg, M . & Court, D. (1979) Annu: Rev. Genet. 13, 319- 353.

35. Siebenlist, U., Simpson, R. B. &Gilbert, W. (1980) Cell, 20,269- 281.

36. Fickett, J. W. (1982) Nucleic Acids Res. 10, 5303-5318. 37. Yanofsky, C. (1981) Nature (Lorid.) 289, 751 -758. 38. Roof, W. D., Foltermann, K. F. & Wild, J. K. (1982) Mol. Getz.

39. Turnbough, C. L., Jr , Hicks, K . L. & Donahue, J . P. (1983) Proc.

1049.

74.

137, 1100-2110.

Clam. Scond. B33, 597 - 598.

Gc,ttPt. 184, 228-2223.

J . Bid. Cliem. 193, 265-215.

Cornniun. 44, 235- 254.

Actci Cheni. S c a d . B30, 979 -984.

2296 - 2302.

560 - 564.

Bid. 110, 1 19 - 146.

P. 0. (1983) EMBO J . 2, 967-971.

4734- 4138.

22, 127-131.

Gellet. 187, 391 -400.

Nut1 Acad. Sci. USA, 80, 368-372.

P. Poulsen and K. F. Jensen, lnstitut for Biologisk Kemi B, K$benhavn5 Universitet, S$lvgade 83, DK-1307 KQbenhavn K, Denmark P. Valentin-Hansen, Institut for Molekylaer Biologi, Odense Universitet, Campusvej 55, DK-5230 Odense M, Denmark P. Carlsson and L. G. Lundberg, Institutionen firr Biokemi och Biofysik, Chalmers Tekniska Hogskola, Fack, S-402 20 Goteborg, Sweden