protein sequencing and the making of molecular genetics

TIBS 24 – MAY 1999

2030968 – 0004/99/$ – See front matter © 1999, Elsevier Science. All rights reserved. PII: S0968-0004(99)01360-2

In the early history of molecular gen-etics, proteins – not nucleic acids – wereat the centre of research. This was thecase long after Watson and Crick had pro-posed their double-helical model of DNA.The reason was simple: proteins couldbe sequenced; nucleic acids could not.

Here, I present a brief history of pro-tein sequencing, charting its develop-ment from a tool in protein chemistry toits uses in the study of gene functionand the way it informed initial attemptsat nucleic acid sequencing. I focus onSanger’s sequencing work in theBiochemistry Department in Cambridge(UK) and the use of his techniques bymolecular geneticists in the nearbyPhysics Department. Sanger’s work wonhim a double Noble Prize, and his closeinteraction with molecular biologistsculminated in common plans for a newLaboratory of Molecular Biology on theoutskirts of Cambridge.

A new toolProtein sequencing started as part of

studies of protein structure and func-tion. From the late 19th century on-wards, proteins were associated withthe basic functions of life, includingheredity, and were the object of intensestudy and debate. Knowledge of proteinstructure was seen as the key to studiesof protein function and as a step to-wards the synthetic production of pro-teins. Various theories regarding proteinstructure were proposed, and discarded,on the basis of sedimentation studies,amino acid analysis and X-ray studies.They invariably assumed that proteinsexhibited a high degree of regularity.

Sanger’s sequencing work developedfrom research into a new method forend-group determination in proteins,which he undertook as a postdoctoralstudent under Charles Chibnall in theBiochemistry Department in Cambridgeduring the war. End-group determi-nation was an important tool for the estimation of the number and length ofpolypeptide chains in proteins, yieldingbasic information on protein structure.It could be used to identify proteins aswell as to test their purity. Many differ-ent methods for end-group determi-nation were described in the literature –

a review article published in 1945 listedmore than 20 (Ref. 1) – but none yieldedreliable results.

Acting on Chibnall’s suggestion,Sanger tried fluorodinitrobenzene as areagent. Organic chemists had tended tosteer clear of fluoroderivates because ofthe latter’s toxicity, but during the warthe chemicals were synthesized for re-search into chemical warfare. Sangerfound that fluorodinitrobenzene reactedunder much milder conditions than didthe generally used chlorine compounds.Furthermore, the dinitrophenylaminoacids were stable to the acid hydrolysisused to break down proteins and werebright yellow in solution. This madethem amenable to the new method ofpartition chromatography.

Using his new technique, Sanger es-tablished that insulin consisted of twochains – not 18 chains, as Chibnall hadpostulated on the basis of its high free-amino-group content. Exploiting thesame technique, Sanger identified short,4–5-residue, sequences at the N-terminiof the two chains of the molecule2.Extending the approach to peptides de-rived by partial acid hydrolysis, andlater by enzymatic hydrolysis, Sangerand his co-workers3,4, in several years ofpainstaking work, were able to establishthe complete sequence of the two in-sulin chains.

Sanger has suggested that end-groupdetermination had already marked achange in the course of protein chem-istry from an interest in amino acid anal-ysis to one in the arrangement of aminoacids in chains. Moving from there to se-quencing did not require a great intellec-tual leap5. The decisive breakthrough,according to Sanger, was the develop-ment of new fractionation techniques byRichard Synge and Archer Martin in thecontext of research into the compos-ition and characteristics of wool – re-search that was financed by the Interna-tional Wool Secretariat6. Sanger’s thesisfinds confirmation in the fact that Syngeand Martin themselves successfully ap-plied their new fractionation techniquesto determination of the structure of thepentapeptide GramicidinS. Their resultswere published before Sanger presentedhis first bit of sequence7.

Other researchers were active in thefield. Pehr Edman8 at the University ofLund in Sweden developed an elegantprocedure that was based on the use ofphenylisothiocyanate as a reagent andthat allowed stepwise degradation of theprotein. With the development of morereliable fraction collectors and of sensi-tive methods for detection of the colour-less reaction products, Edman’s methodcompletely superseded Sanger’s se-quencing method. In the late 1950s,William Stein and Stanford Moore at theRockefeller Institute in New York de-vised an automatic amino acid analyserthat yielded quantitative results and facilitated the analytical work. Edman’sprocedure, in conjunction with the auto-matic analyser, allowed researchers totackle larger proteins.

Sanger was interested in sequencingbecause he felt that it would provide in-sight into the mechanism of action of insulin. The expectation was that, oncethe mechanism of action of one proteinwas known, it would give clues to thefunctioning of protein hormones and en-zymes more generally. When the full se-quence of insulin, including the positionof the three disulphide bridges, did notgive any clue to the protein’s function,Sanger explored new ways of using sequencing to achieve his aim. One avenue he pursued was to identify the‘active centre’ of insulin by determiningand comparing the sequences of hom-ologous proteins from different species.Another approach he followed was tolabel the active centre and to determinethe sequence around it. In the course oftheir work, Sanger and his collaboratorsdeveloped sensitive autoradiographicmethods and ‘fingerprinting’ techniquesthat allowed them to deduce the se-quences of peptides without carrying outa complete amino acid analysis. Insulin’smechanism of action proved resilient toall these attacks, but the new methodsbecame important analytical tools forstructure determination (Fig. 1).

In a review article published in thelate 1960s, Brian Hartley, who worked inSanger’s group, documented the expo-nential growth in both the speed andvolume of sequencing work9. Despitethe problems encountered with insulin,researchers expected that protein se-quencing would yield insight into thestructure and function of proteins. Incombination with X-ray analysis, proteinsequencing led to the first atomic modelof a globular protein (myoglobin) and,some years later, to the first atomicmodel of an enzyme (lysozyme)10,11; a

REFLECTIONS

Protein sequencing and themaking of molecular genetics

REFLECTIONS TIBS 24 – MAY 1999

204

reaction mechanism for lysozyme wassoon proposed. Sequencing also gaverise to evolutionary studies of proteinsas a completely new area of research.Finally, protein sequencing was keenlyseized upon by researchers concernedwith the molecular mechanism of genefunction.

The sequence hypothesisSanger’s first sequencing results sug-

gested to Crick – even before 1953 – that genes determined the amino acidsequence of proteins (Ref. 12, pp. 34–36).Later, Crick expanded this insight in thesequence hypothesis, which stated that‘the specificity of a piece of nucleic acidis expressed solely by the sequence ofits bases, and that this sequence is a(simple) code for the amino acid se-quence of a particular protein’13.

The sequence hypothesis was a deci-sive step in early speculation on the genetic code. It boldly assumed that the amino acid sequence determined thefolding of a protein. A few years before,Linus Pauling14 had postulated the exist-ence of a gene ‘responsible for the fold-ing of polypeptide chains’ to explain the different electrochemical charges ofsickle-cell and normal haemoglobin,

which appeared to have thesame amino acid compos-ition. In 1957, Crick still con-sidered some possible excep-tions to the rule, especiallyg-globulins and adaptive en-zymes. Heuristic reasons,however, made the sequencehypothesis attractive. In hiscelebrated lecture on proteinsynthesis, which he deliv-ered before the Society forExperimental Biology, Crickconceded frankly: ‘Our basichandicap at the moment isthat we have no easy andprecise technique with whichto study how proteins arefolded, whereas we can atleast make some experimentalapproach to amino acid se-quences. For this reason, if forno other, I shall ignore foldingin what follows and concen-trate on the determination ofsequences’13. Thus, Sanger’sanalysis not only allowed theformulation of the hypothesisbut also provided the experi-mental tool to test it.

It is worth noticing that, inthe same lecture in whichCrick for the first time explic-

itly stated the ‘central dogma’ of mol-ecular biology and defended an ‘infor-mational’ versus a biochemical view ofthe problem of protein synthesis, hestressed the central and unique impor-tance of proteins in biology. Crick ex-pected that, in contrast to the multipleand complex functions of proteins, nu-cleic acids acted in a ‘uniform andrather simple’ way13.

Before testing the sequence hypoth-esis, it was necessary to show that an in-herited defect was in fact laid down inthe amino acid sequence of a protein. Asis well known, Vernon Ingram’s experi-ments on sickle-cell haemoglobin, whichwere performed in the Cavendish lab-oratory, provided such proof. RefiningSanger’s fingerprinting techniques,Ingram succeeded in tracking down thedifference between normal and sickle-cell haemoglobin to a single amino acidresidue (Fig. 2)15.

Building on this first success, Crick,Brenner and Ingram, together withSeymour Benzer and George Streisinger,who gathered in Cambridge in 1957,tried to show that the order of mu-tations in a gene lined up with the orderof changes in the amino acid sequence ofthe corresponding protein. The original

plan was to use Benzer’s finely mappedmutants of the rII region of bacterio-phage T4 as a test case. When Benzerfailed to isolate the corresponding pro-tein, the group resorted to Streisinger’sbacteriophage-T2 mutants, in which thetips of the tail fibers were affected. Thekey technique was again fingerprinting,which the group combined with the ra-dioactive marking technique Sanger hadpioneered.

Radiographic techniques were muchmore sensitive than other chromato-graphic techniques. Sanger himself wasexperimenting with slices of oviducts,which he incubated with radioactivephosphate to get labelled ovalbumin.This was quite a lengthy and laboriousprocedure. As early as October 1956,Crick wrote to Brenner, who was still in Johannesburg: ‘I stressed to Fred[Sanger] how extremely favorable thephage system might be for thismethod…He seemed very interested’16.

The experiments on the tail-fiber mu-tants did not yield conclusive resultsand were abandoned. Apparently, thegroup had not succeeded in isolatingthe right protein. However, Brenner con-tinued to use the same techniques inwork on the amber mutants of the phageT4. These mutants, which grew only onthe Escherichia coli B strain, producedonly fragments of the head protein – theprotein that the affected gene encoded.By examining the fingerprints of the dif-ferent mutants, Brenner and his collabo-rators were able to establish that thelength of a fragment corresponded tothe position of the mutation on the gen-etic map, thus proving collinearity. Afew months earlier, Charles Yanofsky andhis collaborators at Stanford University,using similar techniques, had provedthe same point in studies on trypto-phane synthetase mutants of E. coli17,18.

Besides proving collinearity, sequenc-ing data were also used to establishsome general features of the geneticcode. On the basis of the few sequencesthen available, Crick disproved all poss-ible versions of the first genetic codeproposed by George Gamow (Ref. 12,p. 94). From published data on proteinsequences and neighbor analysis, Brennerlater deduced that an overlapping codewas impossible19. A further handle onthe problem of the genetic code camefrom analysis of the effects of chemicalmutagens in combination with geneticand protein sequence analysis20,21.

The code itself was established by en-tirely different in vitro translation tech-niques, but the early experiments in

Figure 1Fred Sanger, holding an autoradiogramme, pho-tographed in his laboratory in the BiochemistryDepartment in Cambridge in the late 1950s.(Photograph courtesy of F. Sanger.)

TIBS 24 – MAY 1999

205

which protein sequencing played a cen-tral role were nonetheless crucial forthe formulation of the problem. Proteinsequencing also remained an importanttool for checking the validity of thecode. Amino acid substitutions inhemoglobin variants established by fin-gerprinting, for instance, proved thatthe genetic code that had been estab-lished for bacteria and viruses was alsovalid for humans22.

Interestingly, Crick and Brenner notonly used Sanger’s sequencing work as aconceptual and practical tool for theirown work in the newly defined field ofmolecular genetics, but also activelytried to interest Sanger in their work.They first approached him in the early1950s, trying to convince him to movefrom the Biochemistry Department tothe Cavendish Laboratory. Nothingcame of this plan at the time. However,in 1957, the Cavendish group and Sangerjoined forces, and together negotiatedthe creation of a new Laboratory ofMolecular Biology23. To my knowledge,it was the first institution to carry thatname. The combination of (both two-and three-dimensional) structural andgenetic approaches became central tothe definition of molecular biology atCambridge. The creation of the new lab-oratory also had repercussions on theresearch agendas of those involved inthe new venture.

From protein to nucleic acid sequencingWhen trying to account for his ‘con-

version’ from protein to nucleic acid se-quencing, Sanger referred to ‘the atmo-sphere’ in the Laboratory of MolecularBiology and to the influence of his newcolleagues. ‘With people like FrancisCrick around,’ he reckoned, ‘it was diffi-cult to ignore nucleic acids or to fail torealize the importance of sequencingthem’23. Originally, the main objective ofnucleic acid sequencing was to try to‘break the genetic code’24. However, nu-cleic acid sequencing got going seri-ously only after the code was broken.

Initially, nucleic acid sequencingseemed an even-more daunting under-taking than protein sequencing hadbeen. This was due to the lack of puresmall substrates and to the compos-ition of nucleic acids. Because nucleicacids possessed only four monomers,researchers expected the interpretationof results to be much more difficult.This expectation was based on the existing approaches to studying pro-tein sequences, which required analy-sis of degradation products and the

subsequent rearrangement of the pieces.New developments in sequencing tech-niques reversed the picture.

The first nucleic acid to be sequencedwas alanine tRNA, the first small RNA tobe isolated. The methods used weresimilar to those established for proteinsequencing: enzymatic degradation fol-lowed by fractionation, analysis and interpretation of the degradation prod-ucts25. These methods were too labori-ous to be applied to larger RNA or DNAmolecules; however, the procedures formore rapid and reliable sequencing sub-sequently developed by Sanger and oth-ers continued to rely on methods pio-neered with proteins or on information

derived from protein sequencing. This isespecially true of autoradiography andlabelling techniques, which allowed oneto ‘read off’ the sequence from the auto-radiogramme directly and therefore didnot require complex interpretative pro-cedures. This latter method becamemuch more powerful in nucleic acid se-quencing than it ever had been with pro-teins. A key development in nucleic acidsequencing was the introduction ofcopying techniques (instead of sequenc-ing by degradation). But, again, the firstprimers employed to get the poly-merases started were synthesized byusing information provided by aminoacid sequencing. Protein sequencing

REFLECTIONS

Figure 2Fingerprints of normal and sickle-cell haemoglobin. Note the difference in peptide 4. Figurereproduced, with permission, from Ref. 15.

BOOK REVIEW TIBS 24 – MAY 1999

206

also served as an important check forthe still unreliable DNA-sequencingdata26 (see also recent articles on DNAsequencing27,28).

ConclusionsIn the debate on the role of bio-

chemists in the history of molecular bi-ology, which has been conducted heat-edly since the late 1960s29–31, Sangerrepresents an interesting case. Despitejoining the Laboratory of MolecularBiology in Cambridge, he never gave uphis identity as a biochemist – or, moreprecisely, he never saw the necessity todraw a distinction between the twofields. Interestingly, too, protein se-quencing is never mentioned among thetechniques that biochemists introducedinto molecular biology. My intentionhere, however, is not to fuel an old de-bate. The aim of my brief historical ex-cursion is rather to show the importantrole of protein sequencing in the earlyhistory of molecular genetics. Today,ever faster and cheaper nucleic acid se-quencing methods have overshadowedmore-cumbersome methods of proteinsequencing. However, this is only afairly recent development. Long beforenucleic acid sequencing techniques wereat all conceivable, protein sequencingwas at the forefront of research and

offered a powerful tool for formulatingand testing hypotheses about the func-tions of genes.

AcknowledgementI thank Sydney Brenner, Francis Crick,

John Kendrew and Fred Sanger for ex-tensive discussions, and Denis Thieffryfor constructive comments on an earlierversion of this paper.

SORAYA DE CHADAREVIAN

Dept of History and Philosophy of Science,University of Cambridge, Free School Lane,Cambridge, UK CB2 3RH.

References1 Fox, S.W. (1945) Adv. Protein Chem. 2,

155–1772 Sanger, F. (1949) Biochem. J. 45, 563–5743 Sanger, F. and Tuppy, H. (1951) Biochem. J. 49,

481–4904 Sanger, F. and Thompson, E. O. P. (1953)

Biochem. J. 53, 353–3745 Sanger, F. (1985) Curr. Contents 28, 236 Dowling, L. M. and Sparrow, L. G. (1991) Trends

Biochem. Sci. 16, 115–1197 Consden, R., Gordon, A. H., Martin, A. J. P. and

Synge, R. L. M. (1947) Biochem. J. 41,596–602

8 Edman, P. (1950) Acta Chem. Scand., 4,283–293

9 Hartley, B. S. (1970) in British BiochemistryPast and Present. Biochemical SocietySymposium No. 30 (Goodwin, T. W., ed.), pp. 29–41, Academic Press

10 Kendrew, J. C. et al. (1960) Nature 185, 422–427

11 Blake, C. C. F. et al. (1965) Nature 206, 757–76112 Crick, F. (1990) in What Mad Pursuit: A Personal

View of Scientific Discovery, Penguin13 Crick, F. (1958) in The Biological Replication of

Macromolecules. Symposia of the Society ofExperimental Biology XII, pp. 138–163,Cambridge University Press

14 Pauling, L. (1952) Proc. Am. Philos. Soc. 96,556–565

15 Ingram, V. M. (1958) Biochim. Biophys. Acta 28,539–545

16 Judson, H. (1979) Eighth Day of Creation. TheMakers of the Revolution in Biology, p. 331,Jonathan Cape

17 Sarabhai, A. S., Stretton, A. O. W., Brenner, S.and Bolle, A. (1964) Nature 201, 13–17

18 Yanofsky, C. et al. (1964) Proc. Natl. Acad. Sci.U. S. A. 51, 266–272

19 Brenner, S. (1957) Proc. Natl. Acad. Sci.U. S. A. 43, 687–694

20 Crick, F. H. C., Barnett, L., Brenner, S. andWatts-Tobin, R. (1961) Nature 192, 1227–1232

21 Kay, L. Who Wrote the Book of Life? A History ofthe Genetic Code, Stanford University Press (in press)

22 Beale, D. and Lehmann, H. (1965) Nature 207,259–262

23 de Chadarevian, S. (1996) J. Hist. Biol. 29,361–386

24 Sanger, F. (1988) Annu. Rev. Biochem. 57, 1–2825 Holley, R. W. et al. (1965) Science 147,

1462–146526 Sanger, F. (1988) Annu. Rev. Biochem. 57, 1–2827 Wu, R. (1994) Trends Biochem. Sci. 19,

429–43328 Sutcliffe, J. G. (1995) Trends Biochem. Sci. 20,

87–9029 Cohen, S. C. (1984) Trends Biochem. Sci. 9,

334–33630 Abir Am, P. G (1992) Osiris 7, 210–23731 The Tools of the Discipline: Biochemists and

Molecular Biologists (1996) [special issue] J. Hist. Biol. 29, 327–462

It’s all in the title…

Oncogenes and TumourSuppressors (Frontiers in MolecularBiology, No. 19)

edited by Gordon Peters and Karen H.Vousden, Oxford Science Publications,1997. £29.95 (xix 1 328 pages) ISBN0 19 963594 3

In his landmark papers of 1910 and 1911,Peyton Rous described a spontaneouslyarising fibrosarcoma in a Plymouth Rockhen. The characterization of this tumour,which proved to be transplantable andtransmissible as a cell-free filtrate,ultimately led to the discovery of acutelytransforming retroviruses. Thesignificance of Rous’s work, however, wasrecognized only some fifty years later: hewas awarded the Nobel Prize in 1966. Infact, it was not until 1980, a decade afterRous’s death, that the transforming geneof the avian-sarcoma virus that bears hisname was finally sequenced and shown to

have a normal cellular counterpart. Thisdefinitive proof that non-transformedcells harbour genes that have thepotential to become oncogenic (proto-oncogenes) earnt Harold Varmus and J. Michael Bishop the Nobel Prize in 1989and ushered in the era of modernmolecular oncology.

The concept of tumour-suppressorgenes arguably could be accredited to theperspicacity of Boveri, who suggested, in1914, that normal cells possess ‘definitechromosomes which inhibit division’ andthat their elimination would result inunlimited growth in tumour cells. Muchlater, in 1971, Alfred Knudson’s ‘two-hit’hypothesis explained the incompletepenetrance of inherited cancers andaccurately anticipated the molecularlesions that would be discovered in the retinoblastoma gene (RB1) and other growth-inhibitory genes or anti-oncogenes.

This is the historical backdrop forOncogenes and Tumour Suppressors. Theeditors, given the brief of summarizingthe major developments in oncogenesisthat have occurred over the past twodecades, have assembled a strong cast of

authors to produce a slim volumecomprising eleven chapters. A search ofthe literature highlights the enormity oftheir task. There are .61 000 referencesin Medline dealing with oncogenes ortumour-suppressor genes and, if oneincludes papers that deal with the cellcycle, the number rises to @200 000! Tomake matters worse, over one third of thereferences are dated 1996 or later. Giventhat most of the chapters in this bookwere written in 1996, the book could havebeen considered well out of date by thetime it hit the shelves! So, was this projecta futile exercise doomed to failure fromthe start, or a highly commendableattempt to grapple with odds that wouldhave made Hercules shirk?

Before passing judgement, let usexamine how the editors set about theirtask. The book comprises two sections:the first devoted to oncogenes, which areconsidered as part of a signalling cascade;and the second to tumour-suppressorgenes and their intimate relationship withcell-cycle control. Part I commences withan introductory chapter on mechanismsof oncogene perturbation that touches onviral oncogenesis, chromosomal

protein sequencing and the making of molecular genetics

Documents