the amino acid sequence of human macroglobulins

21

Click here to load reader

Upload: f-w-putnam

Post on 28-Sep-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS *

F. W. Putnam, A. Shimizu, C. Paul, T. Shinoda and H. Kohler f

Indiana University Bloomington, Ind. 47401

Due to the importance of IgM immunoglobulin as the predominant anti- body in the newborn animal and in the primary immune response, as well as its role in various autoimmune diseases, we have undertaken to determine the complete amino acid sequence of a human IgM macroglobulin with a covalent molecular weight approximating one million. This protein (Ou) from a patient with macroglobulinemia serves as a model for structural study of IgM antibodies, just as Bence-Jones proteins have for light chains and myeloma globulins have for IgG immunog1obulins.l Structural study of such proteins has shown that antibodies are composed of a pair of heavy and a pair of light polypeptide chains disulfide-bonded together to form a tetrachain immuno- globulin molecule with two specific combining sites (FIGURE 1) . In the three major classes of immunoglobulins IgG, IgM, and IgA (or yG, yM, and yA), the heavy chains are y, p , and a, respectively, and the light chains may be either K or A. IgM is unique in having a pentameric structure with a sedimen- tation coefficient of 19s (corrected value); even higher polymers of about 29s and 38s may be present in small amounts.

Because of the myriad possible number of antigens that exist, a capacity for great variability is required for the structure of the antibody combining site. Amino acid sequence analysis of pathological proteins has shown that this is provided for by a variable-sequence region of some 110-120 amino acid residues at the NH, terminus of each chain.' Both the heavy and the light chains are divided into regions of variable and constant sequence. The variable (V) regions of the heavy and light chains differ in sequence and together determine the structure of the combining site. The constant (C) regions determine the class of the chain and certain associated biological properties. Whereas both the variable and constant regions of light chains are characteristic of the light chain type (i.e., are either both K or X),3 the class character of heavy chains is not expressed in the variable region (V,) but only in the constant region (CH). Thus, a comparison 4, of the NH,-terminal region of several p, y, and (Y chains has indicated that four variable-sequence subgroups are common to heavy chains ( V H I - V H I V ) . Although the complete amino acid sequence of the y chains of at least five human IgG myeloma glob- ulins is now and much sequence information is available on the VH portions of y and p chains,4, lo only fragmentary portions of the sequence of the C,, region of p chains are published.,. Comparative sequence data on human p and y chains would be of great value for testing the hypothesis that immunoglobulins evolved by gene duplication after early divergence of V and

*This work was supported by NIH grant CA-08497 from the National Cancer

t Present address: Department of Pathology, The University of Chicago, Chicago, Institute and by grant GB-18483 from the National Science Foundation.

Ill. 60637.

83

Page 2: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

84 Annals New York Academy of Sciences

FIGURE 1. Schematic diagram of the pentameric structure of human IgM im- munoglobulin. Each monomeric unit consists of a pair of p heavy chains and a pair of light chains joined together by disulfide bonds. Each heavy chain has four interchain disulfide bonds: two between the pair of p chains in the monomer, one to the light chain, and one intersubunit bridge between the monomers. The NHz- terminal end of each polypeptide chain is depicted by a zigzag line to indicate the variable region and the COOH terminus by a straight line to represent the constant region. The light chains are denoted by the smaller lines and may be of the K or X type. but all light chains in any given pentameric molecule are of the same type. The antigen-combining site of IgM antibodies is determined by the vari- able regions of the light and heavy chains. As an antibody, the IgM pentamer is assumed to be decavalant, but it often functions as if it were pentavalent presumably because of steric factors.2 (For a more detailed diagram, see FIGURE 2.)

C genes and that each polypeptide chain is specified by two such genes. We report the amino acid sequence of 257 residues in the NH2 terminus of the p chain of the IgM macroglobulin Ou and also part of the COOH-terminal sequence. Together with our previous sequence for the K light chain of this molecule, this accounts for the entire primary structure of the Fabp portion and the hinge region of this molecule, and thus provides a prototype for the combining site of IgM antibodies.

EXPERIMENTAL

Materials and Methods

The Fabp and Fcp fragments (FIGURE 2) were prepared 12 by digestion of IgM Ou with trypsin for one hour a t 60°, and were separated on Sephadex

Page 3: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

Putnam et al. : Human Macroglobulins 85

G-200. Their homogeneity was checked by immunoelectrophoresis, immuno- diffusion, and ultracentrifugation. The Fabp fragment was cleaved with CNBr in 70% formic acid, and the CNBr fragments were separated on Sephadex G-100 in 30% acetic acid.10, *1 The elution was monitored by measuring the optical absorbance a t 280 nm, by the ninhydrin reaction, and by carbohydrate analysis with the phenol-sulfuric acid method. The first peak contained frag- ments bound together by disulfide bridges. In the next peak, three CNBr frag- ments were eluted; these comprised the three fragments (FI, F2 and F3) that correspond to the first 105 residues from the NH, terminus of the p chain, for which the sequence has been published.1° Small peptides were isolated from the latter part of the Sephadex eluate. One of these (F4) was a tyrosine- rich peptide (Ala,Gly,Tyr,,Met) that was isolated in good yield. Other small peptides resulting from secondary cleavage due to the strong acid used in the CNBr reaction were isolated in low yield. This included a tetrapeptide from

COOH

COOH

FIGURE 2. Schematic diagram of the polypeptide chain structure of a monomeric unit of the human IgM macroglobulin Ou. The K light chains and the p heavy chains are each divided into a variable region (zigzag lines) at the NH2 terminus and a constant region (straight line). Trypsin cleaves the IgM pentamer molecule into monomeric Fabp fragments and a decameric Fcp portion of the heavy chain. Tryptic cleavage occurs in the hinge region just prior to the first p-p interchain disulfide bridge. Oligosaccharides designated C1 to C 5 are located on each p chain. (For the location of C1, C2, and C5 in the amino acid sequence, see FIGURE 3.) C3 and C4 are both in the Fcp region, but their relative position is undetermined. Complex oligosaccharides (Cl , C2, and C3) are indicated by solid circles and simple oligosaccharides (C4 and C5) by stippled circles. Fabp consists of the NH2- terminal portion of the p chain (designated Fdp) and the whole K light chain. Fdp extends from the NH? terminus of the p chain to the break in the solid line just before C2. Fcp constitutes the remainder of the p chain to the COOH-terminus. (From A. Shimizu, C. Paul, H. Kohler, T. Shinoda & F. W. Putnam.13 Reproduced by permission of Nature New Biology.)

Page 4: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

86 Annals New York Academy of Sciences

the NH, terminus of the K light chain, which was obtained in low yield because of the slow cleavage of the Met-Thr bond by CNBr.

The material in the first peak was pooled, dried, reduced in 6 M guanidine and alkylated with ethylenimine to break the disulfide bonds and convert the half-cysteines to Ae-cysteine, and was then reapplied to Sephadex G-100 under the same conditions. Three peaks were obtained. The first proved to be the light chain on the basis of the amino acid composition and size. The second peak was the only one with a high carbohydrate content; the amino acid com- position of this component fits the sequence of fragment F5. The sequence of the NH,-terminal 25 residues of F5 was determined with the Beckman sequencer Model 890. F5 and also the third peak, F6, were digested with chymotrypsin.

In addition, the whole p chain was cleaved with trypsin and with ther- molysin in separate experiments.l08 l1 The whole p chain and the Fcp frag- ment were also cleaved with CNBr then reduced in 6 M guanidine and alkyl- ated with ethylenimine. The CNBr fragments were separated as for Fabp, and were then digested with trypsin.

By the above procedures chymotryptic peptides were obtained from F5 and F6, tryptic peptides from the whole p chain and from the CNBr fragments of the whole p chain and of Fcp, and thermolysin peptides from the whole p chain. These were in addition to the tryptic and chymotryptic peptides earlier sequenced lo from F1, F2 and F3 and the three small CNBr fragments, e.g., F4, and the heptadecapeptide and the COOH-terminal octapeptide (F1 1 ) . All the peptides obtained by enzymatic cleavage were purified by a combination of gel filtration with Sephadex G-25 in N acetic acid, ion-exchange chromatog- raphy on chromobeads or Dowex 1-X2 with the Technicon peptide auto- analyzer, and paper chromatrography, and paper electrophoresis. Conventional methods already described were used to determine the sequence of the small peptides, and the Beckman sequencer was used to determine the sequence for from 15 to 25 steps of various unblocked CNBr fragments and of Fcp.

RESULTS AND DISCUSSION

The Variable Region

FIGURE 3 gives the sequence of the Fd portion of the p chain (residues 1-213), continuing through the hinge region after a gap and also includes 31 residues at the COOH terminus of the p chain. The 257-residue sequence includes CNBr fragments F1 through F6 and the beginning portion of F7. Much information has been obtained on the sequence of the CNBr fragments from F7 to the COOH terminus, but the data are still incomplete. The missing region extends for about 200 residues and includes two glycopeptides and probably one interchain disulfide bridge (the intersubunit bridge), for which the sequence has been completed. In the sequence shown, proof of the overlap is missing for the bond between Lys-213 and the valine following it; from zero to 60 additional residues may be present between the lysine and the valine. The peptides here were obtained in low yield because the COOH-ter- minal end of Fabp was heterogeneous owing to secondary cleavage by trypsin. However, the NH,-terminal end of Fcp isolated by this method was quite

Page 5: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

Putnam et al. : Human Macroglobulins 87

1 10 20

CLP-VN-THR-LEU-MR-GLU-SER-CLY-P~-AWL-LEW-V*L-LYS-PU~LYS-CUI-P~-LEU-~R-LEU-

I 120 V*L-*SN-SEU-VN-~I~AWL-CLY-NR-Nl-NR-NR-NR-~I-AS?-VN-I~-CLY-LYS-CLY-TH~- (D

I 130 140

MR-VN-THR-VN-SER-SER~LY-SER-~-SER-AWL-?~MR-LLU-PHL-PLD-LEU-VN-SER-~S- h I A

1110 S ASx-Seu-ILE-MHP-PHE-sER-~-Lys-~R(~~,ASx,sEu,ASx,LY~)ILe-SIu-SER-THu-*UiCLY-

7 P H I - P R O - S E R - V N - u w - A - ~ Y ~ Y - L Y S - n l - * U ( T 190 200

I

Fd I Fc > * 2; F51 210

LYS-ASP-VAL-~I-QH-CL'% (HIS .ASX.MR.CWoVAL-CIS-LYS/VAL-ASP-HIS-MG-GLY-LEW-~R-

I0 PHE-CUI-GLX-~~-*U-SEHP-sE~CIS-VN-?LD-*S?-CLU-~P-~R-AWL-lLE--v~-PHL- 1 AWL-ILE-PIM-PIIO-SER-PHI-AWL-SER- ILL-PHL-LEU-THR-LYS-SER-THU-LYS-LEWI- F7 - - F8 - F9 ~ ~ V N - M I - C I U - A I I C - ~ R - V ~ A S P - L Y S - S E U - ~ R - C L Y - 5 si

FlOJ I0

J LYS-PRO-THR - L E U - N R - ~ - V N - S E I - L B U - V U - I B T - S L I - * S X

Ls-s

-S- S

FIGURE 3. Amino acid sequence of the F& portion,14 the hinge region,12 and part of the COOH terminus l 3 of the p heavy chain of the IgM immunoglobulin Ou. The CNBr fragments are denoted F1, F2, etc. The location of three interchain disulfide bridges and of two intrachain disulfide bridges is shown; the position of all these is established, except for the second intrachain bridge which is based on homology to other heavy chains and to light chains. Carbohydrate (CHO) is identified at three sites in the p chain and is present at two other sites in the incorn- plete portion of the sequence represented as F7, F8, and F9. The overlap of F3 and F4 is based on homology to other heavy chains; that after Lys-213 is not yet established.

homogeneous by physicochemical and immunochemical criteria and showed a single sequence with the sequencer.

The significant features of the structure shown are ( 1 ) the variable sequence from the NH, terminus through about residue 123, ( 2 ) the disulfide bridge between the light and heavy chains a t Cys-140, ( 3 ) the first part of the constant region ( C p l ) , which contains the second intrachain disulfide loop ending a t Cys-212, (4) the hinge region including the juncture of Fdp and Fcp and the first heavy-heavy interchain disulfide bridge, and ( 5 ) the 31-residue COOH-terminal sequence including an oligosaccharide and another heavy- heavy interchain disulfide bridge.

Page 6: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

88 Annals New York Academy of Sciences

The SubgroupSpecific Region

The variable-sequence region of the Ou p chain is of the VHII sub- group 41 lo, l1 and has an homology of from 61-75% with the first 100 residues of three human y l heavy chains 6t of the same variable-sequence subgroup, i.e., Cor, Daw and He. However, in the same region the Ou p chain has a homology of 47% with the Nie y l chain,9 which is of subgroup VHIII, and of only 31% with the Eu y l chain,7 which is of subgroup VHI. This high degree of homology in the NH, terminus of human p and y l heavy chains of the same variable-sequence subgroup contrasts with the low degree of homology in the constant (C) regions of the two different classes of heavy chains. In their constant region, all four y l chains have an identical sequence except for one (possibly two) of the 332 residues but have only about 30% homology with the constant region of the p chain.

The relationships among these proteins are illustrated in FIGURE 4, which shows the amino acid sequence for the Ou p chain and for the y l chains Cor, Daw, He, and Eu. TABLE 1 gives a matrix of the number of identities in the first 100 residues of all of these proteins. In order to obtain the maximum identity in sequence among the chains compared in FIGURE 4, the half-cystines

FIGURE 4. Comparison of the NH2-terminal sequence of the first 105 residues in the human p chain Ou, with corresponding sections from the y l heavy chains Cor, Daw, He, and Eu. Identical residues in two or more of the proteins are en- closed in boxes. A few gaps have been introduced in the sequences to secure the maximum number of identities. In all the proteins, the half-cystine residues corre- sponding to Cys-22 and Cys-97 in the Ou p chain were placed in register because they are linked together in an invariant disulfide bond. Data for the y l heavy chains are from various sources.6-8

Page 7: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

Putnam et al. : Human Macroglobulins 89

TABLE 1

'HI1

Iden tit ies

\ loo, 47 47 45 40 100

31 34 30 28 53 100 1

Protein I Ou Car Daw He Nie Eu I

I 'HI1 'HI1 "HI1 'HI1 "HI11 'HI I Subgroup

that form the first intrachain bridge were placed in register (see Cys-22 and Cys-97 in the p chain) and a few gaps were put in the sequences to get the best fit. It is interesting that in some of the proteins these gaps coincide; this suggests that the gaps reflect true mutational events. The fact that large blocks of sequence coincide in the V regions of the VHII subgroup proteins indicates that heavy chains of different classes such as p and y l may be more closely related in their variable regions than are heavy chains of the same class. All of these heavy chains, however, share a common ancestry, for even in their variable regions all five have 23 amino acid residues in common. Indeed, most of the latter are also shared by light chain V regions indicating that all variable- region genes derive from a common ancestor.

The Four Variable-Region Subgroups of Heavy Chains

Although FIGURE 4 compares the NH,-terminal sequences for only two variable-region subgroups of heavy chains, four subgroups of homologous sequence can be recognized by comparison of as few as the first twenty residues. This is shown in FIGURE 5, which compares the NH,-terminal sequences of a number of p and (Y chains determined in our laboratory with similar data on y l and y4 chains studied by other 16-20 In the first group (VHI), the p chain Di from our laboratory is aligned with y l and y3 chains reported by l6, l 7 About 75% of the residues are identical. The second group (VHII) compares the first 20 residues of the Ou p chain and three of the y l chains6,* for which more extensive sequences are given in FIGURE 4. The identity in sequenc for the first 20 residues of the VHIr subgroup varies from 66 to 75%. As shown in FIGURE 4 and TABLE 1, the same correspondence in sequence continues throughout the first 100 residues of these four members of the VHII subgroup. Furthermore, the distance in sequence between Eu and any of these proteins in the first 20 residues is about the same as it is through- out the first 100. This shows that the amino acid sequence characteristic of

Page 8: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

90

"HI Eu T I D i Y

zuc "3 Ste "I

"HI I oau "I ou Y

Cor "1

He "I

"ni i i Yin # Y O U

M U

l!d .I

"HlV rabbit Aal rabbit A13 cur v Bu. oau Y

Bal u Re v

Annals New York Academy of Sciences

Viriable Subgroups of a, y and ,,-Chains

5 10 15 20 Pa-Val-GI n-Lw-Val-GI n-Ser-Gly-Ala-G1 u-Val -Lyr-Lyr-Pm-Gly-SerSer-VI1 - ~ y r - V i I P U - V a l - G l n - L e u ~ , G l x . S e r . G l y , A l a , ~ , ~ ) L y s - L y r ( P w , G l y , ~ , ~ 4 L y s PCA-VaI-Gln-~-Yal-Glu-Ser-Gly-Ala-Asp-Leu-Val-Lyr-Pw-Gly-~ PCA-Va l -~ -Lw-Va I -G lu -Ser -~ -A la -Gl u-Yal-Lys-Lys-Pm-Gly-&-Ser-~-Lyr-Val

5 10 15 20 PU-Val-Thr-Leu-Aq-~lu-Ser-Gly-Pro-Ala-Leu-Val-Aq-Pw-ThrGl n-Thr-Lw-Thr-Lw PU-Val -Thr-Leu-E-GI u-Ser-Gly-Pro-Ala-Lau-Val-~-Pw-~-Gl n-&Leu-ThrLeu PU-Val -Thr-Lw-Arg-Gl u-Ser-Gly-Pro-Ala-Leu-VaI-~-Pro-ThrGI n-Thr-Lw-ThrLw PU-Val-Thr-Leu-k-Cl u-~-Gly-Pm-Thr-Leu-Val-~-Pro-Thr-Gl u-Thr-Leu-Thr-Leu

5 10 15 20 Glu-Val-Gln-Lw-Val-Glu-SerGly-Gly-Gly-Leu-] le-Gln-Pw-Gly-Gly-SerLw-A~-Lw 61 u-Val-GI n-Leu-Val -Glu-Ser-Gly-Gly-Gly-Leu-Y11-G1 n-Pw-Gly-Gly-Ser-Lw-Aq-Lw GI u-Val-GI n-Leu-Val -Glu-Ser-Gly-Gly-Als-Leu GI u-Val -GI n-Lw-Ytl-Glu-Ser-GIy-Gly-Gly-Leu-~-~-Pro-Gly

5 10 15 20 PU-Ser-Val-GI u-GI u-Ser-Gly-Gly-Arg-Leu-Val -Thr-Pm-Thr-Pm-Gly-Leu-Thr-Leu-Thr PU-Ser-12-GI u - 6 1 u - S ~ r - G l g - 4 l v - ~ - L ~ - V a l - ~ s - P r o - ~ - ~ - ~ L e u - T h r L w - ~ r PCA-Ser-Val -A&& PU-Ser-Val-&-& PCA-Ser-Ytl-& PCA-Ser-A&&

FIGURE 5. Comparison of the four variable subgroups from the y , (I, and p chains of human myeloma globulins and Waldenstrom macroglobulins and from normal rabbit y globulin allotypes. Underlined residues differ from the sequence in the first line of each subgroup. Sequences determined in this work are for Di, Ou, Wo, Na, Ha and Re.4 The other proteins were sequenced by other workers.16-20 (From H. Kohler, A. Shimizu, C. Paul, V. Moore & F. W. Putnam.4 Reproduced by permission of Nature.)

subgroups V,, and VHII can be identified from the first 20 residues of heavy chains just as it can for the subgroups of light chains.

The third subgroup that we have identified is designated VHIII; it includes all three major classes of heavy chains y, a and p (including the subclasses y4 and a l ) . Recently, an IgE myeloma protein has been reported to be a member of this subgroup.2' A major characteristic of the VHIII subgroup is the presence of an unblocked NH,-terminal residue, generally glutamic acid. This enables the direct determination of the NH,-terminal sequence by means of the automatic amino acid sequencer rather than through the use of indirect procedures requiring the prior isolation of a blocked NH,-terminal peptide, as is the case for the VHI and V,,, subgroups. As a result, since our first report of this new subgroup,2* a great deal of sequence data on VHIII myeloma globulins of the IgG and IgA classes and Waldenstrom macroglobulins of the IgM class has been amassed with use of the sequenator. This has been ex- tended to the structural study of monoclonal autoantibodies with great success by Capra and others,ps. 24 as reported in this monograph and elsewhere.

The variable region of only one heavy chain assigned to the VHIII class has been fully sequenced.9 This is a y l chain from the myeloma globulin Nie.

Page 9: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

Putnam et al. : Human Macroglobulins 91

Surprisingly, this heavy chain has a blocked NH., terminus; it begins with pyrollidone-carboxylic acid (PCA) just like the members of subgroups VHI and V,rrr. Although it is assigned to subgroup VHIII, Nie is homologous to the VHI chain Eu in their first 30 residues. This indicates a need for complete sequence information on more heavy chains before a permanent subgroup classification can be established.

The fourth subgroup we have proposed (VHIV) is miscellaneous, for it comprises the predominant sequence of two y chain allotypes from the rabbit and a number of short p chain sequences in man,?, including the p chain Re from our laboratory. This heterogeneous subgroup, which may have to be subdivided in the future, has a serine residue at position 2 as its most char- acteristic feature. Unfortunately, no human heavy chains of this subgroup have yet been sequenced, and the data from the rabbit represent a “majority sequence” because of the heterogeneity in primary structure of normal im- munoglobulins.

As has been pointed out e l~ewhere ,~~ 11 the class character of heavy chains is not expressed in the variable region (V,) but only in the constant region ( C , ) . This is in contrast to the situation in light chains, where both the vari- able and constant regions carry the signature of the light chain type K or A. Formulae for immunoglobulins were recently proposed by a WHO committee 25

in which the variable regions of IgG, IgA, and IgM were provisionally desig- nated Vy, Vn, and Vp. The results shown in FIGURES 4 and 5 and elsewhere in this monograph show that the variable regions of y, (Y and p heavy chains do not fall into groups that can be classified by the nature of the C region of the heavy chain, but rather into subgroups that are independent of the C regions. On the other hand, as shown later in the text, the C regions of the human heavy chains are characteristic of the class of the chain and differ widely from each other. As a result, the C regions of y, (I and p chains may be designated Cy, Ca and Cp, respectively.

Phylogenetic Tree of Variable-Region Subgroups of Heavy Chains

By use of the data in TABLE 1 a diagram can be constructed that depicts the structural homology of the variable regions of these p and yl heavy chains in the form of a phylogenetic tree. Each branch in the tree represents one protein. The length of the branch from a common node represents the muta- tion distance from a presumed common ancestral sequence of the two proteins branching from that node. By summing up the length of the branches includ- ing the length of the arms separating the intervening nodes, one can estimate the mutation distance between any pair of proteins. To be exact, the data plotted should be the nucleotide differences (corrected for mutations to synonymous codons, etc.), but the figures shown in FIGURE 6 are the amino acid sequence differences uncorrected for the insertion of gaps to obtain the homology shown in FIGURE 4. We are undertaking a computer analysis of these and other data to construct a better phylogenetic tree; however, FIGURE 6 expresses graphically the structural relationships among the V, regions of these proteins. Ou, Daw, and Cor represent a subfamily of the VHII subgroup that is somewhat separated from He. Nie, representing the VHIrI subgroup, is intermediate in homology and thus in mutation distance from the VH,, sub- group, and Eu, which is in the V,, subgroup, is furthest removed from He of all the proteins compared.

Page 10: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

92 Annals New York Academy of Sciences

VUI VHll

FIGURE 6. Schematic phylogenetic tree for the variable regions of human p and y l heavy chains. The Figure illustrates the differences in amino acid sequence for the first 100 residues in the NH2 terminus of the p chain Ou and the five y l chains, Cor, Daw, He, Nie, and Eu. The length of the branch from a common node represents the number of amino acid substitutions from an ancestral sequence assumed to be common to the two proteins branching from that node. The sum of the lengths of the branches between any two proteins, including the lengths of the arms separating intervening nodes, gives an estimate of the difference in sequence in the first 100 residues of the variable regions for any pair of proteins. The Figure is based on the matrix table of identities (TABLE l ) , adjusted slightly to get the best fit of the tree for all pairwise combinations of the six proteins. Data for the y l heavy chains are from various sources.6-9

Similar phylogenetic trees have been constructed from computer compari- sons of the cytochromes c and the hemoglobins from many different spe- cies.2s* 27 They bear a remarkable resemblance to the tree shown in FIGURE 5 for the variable regions of heavy chains from the same species. The average mutation rate based on the evolution of proteins such as the cytochromes and globins has been calculated to be equal to the fixation of one amino acid sub- stitution per 140 residues per 1 0 7 years.*G$ 27 If the structural divergence of the variable regions of human heavy chains is assumed to have occurred through evolutionary mutation at the same rate, the genes for the VH regions of the Ou p chain and the Eu y l chain would have begun to diverge several hundred million years ago. The VH regions of even closely related y l chains such as Daw and Cor differ by more than 20 residues; this is greater than the difference in sequence 20 between human and frog cytochrome c. Clearly, the differentiation in structure of the variable regions of human heavy chains has occurred over eons of evolution, or at a rate far faster than for any other known protein, or by some somatic mechanism of hypermutation that is unique to immunoglobulin genes. The same mechanism must have given rise to the variability in the V regions of light chains, for based on earlier data from our

Page 11: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

Putnam et al. : Human Macroglobulins 93

95 Asp-Thr-Ala-Thr-Tyr-Tyr-Cys-Ala-Arg

Asp-Thr-Ala-Thr-Tyr-Tyr-Cys-Ala-Arg

Asp-Thr-Ala-Thr-Tyr-Tyr-Cys-Ala-Arg

laboratory 1 p and others similar phylogenetic trees 2 7 9 28 have been drawn from computer comparison of the variable-sequence regions of human light chains.

100 105 Val-Val-Asn-Ser-Val-Met - Ala- Ile-Thr-Val-Ile-Pro-Ala-Pro-Ala-

Se r-Cys-Gly-Ser-Gln - - -

The Hypervariable Deletion Region

Special interest attaches to the comparative structure of p and y chains in the region from residues 91-123 (FIGURE 7) , for this contains three features that may specify the antigen-combining site. These are: (1) a nearly constant sequence prior to Cys-97 including quasi-invariant tyrosine residues that have been implicated in the antibody active site by affinity labeling experiments; (2) a highly variable segment that may contain a series of deletions at the very point where some long deletions begin in several “heavy chain disease” pro- teins; (3) a sequence around residue 115 that is shared by Subgroup VwII chains just prior to the onset of the C region. All these factors suggest that the essential features of the antigen-combining site are probably similar in all anti- bodies and that specificity is largely conferred by short regions of hypervaria- bility near the two half-cystines which are brought into close proximity by the intrachain disulfide bridge of the V region (FIGURE 8).

The end of the 100-residue subgroup-specific region is marked by a very conservative sequence, including the half-cystine at residue 97 that terminates

OU ’ ‘HI1 Cor 11 VHII

Daw 11 VHII

He 71 VHII

Eu 11 VHI

N i e 11 VHIIl

Rabbit y

MOPC-173

HI I 0” ” v Cor 11 VHII

DaW 71 VHII

He 11 V H I I

Eu 11 VHI

Nie 11 VHIII

Rabbit I

110 Gly-Tyr-Tyr-Tyr-Tyr-Tyr-Met

- _ _ _ _ _ _ _ _ _ _ _ _ _ - -___

Not determined

FIGURE 7. Amino acid sequence of the hypervariable deletion region of human p and y l chains and of rabbit and mouse y chains. (For source of data, see text.) Gaps have been inserted to maximize the homology. Identical residues at the same position are enclosed in boxes with solid lines.

Page 12: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

94 Annals New York Academy of Sciences

FIGURE 8. Photograph of the portion of a molecular model of the human p chain Ou depicting the two hypervariable regions. The NH2 terminus of the p chain is at the lower left. The disulfide bridge between Cys-22 and Cys-97 in the variable region joins the chain into a loop at the top. The first hypervariable region around position 30 is at the top left. The hypervariable deletion region containing the five tyrosine residues is at the lower right. Since the three-dimensional structure is not known, the fi chain is shown in a planar conformation.

the intrachain disulfide bridge of the variable region. This conservative section is believed to be located close to the antibody combining site. The quasi- invariant sequence Asp-Thr-Ala-Thr-Tyr-Tyr-Cys-Ala-Arg-, which concludes the first 100 residues of the V, region of Ou, was first reported for the p chain lo and later determined at the corresponding position in the y l chains Daw and Cor.6 In this conservative area, even the rabbit 4 1 and mouse y chains 43 are more similar to these three human proteins of Subgroup VHrI than is Eu. A Thr-Tyr dipeptide, which was labeled in the heavy chain of rabbit anti-DNP antibodies by the affinity-labeling technique, is thought by Thorpe and Singerzo to be the Thr-Tyr sequence located at this site.

Immediately following the disulfide loop there is a hypervariable region that appears to reflect many deletions because from one to ten gaps must be. placed here in different heavy chains in order to place them in register both at Cys-97 and at the beginning of the constant region. As seen in FIGURE 7, there is essentially no homology among the human heavy chains in the hyper- variable region even when a number of gaps have been inserted to align the sequences. In the six human heavy chains, five different amino acid residues are present at positions 100 and 102, and six at 101. This hypervariability is not restricted to the human chains, for no satisfactory sequence has yet been found for position 100-110 in the normal rabbit y chain, although a majority sequence could be determined for most of the rest of the chains0 PoiterS0 has speculated: “We may have IgG molecules with Yo to lozo different

Page 13: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

Putnam et al. : Human Macroglobulins 95

sequences in this very unstable region, able to combine with sufficiently high affinity as to be recognizable as antibodies for an equivalent number of differ- ent antigen structures.”

Capra 24 has called attention to the presence of a hypervariable region around position 30 in human p and y heavy chains just inside the intrachain disulfide loop beginning with Cys-22. Evidence for such a hypervariable region with deletions is seen around positions 31 and 32 in FIGURE 4. As shown in FIGURE 8, giving a molecular model of the Fcp fragment, the two hypervaria- ble regions are drawn together by the disulfide bond. This is analogous to the situation in light chains. As we illustrated earlier with molecular models,3. 31

the hypervariable regions in human light chains are brought into close prox- imity by the intrachain disulfide bridge between Cys-23 and Cys-88 in K

chains and Cys-21 and Cys-86 in X chains. The hypervariable regions of light chains also have deletions like those illustrated in FIGURE 7 for the heavy chains. In the X chain, these occur around positions 27 and 95 and are thought to be characteristic of subgroups of X chains.sz* 33 Furthermore, both in light and heavy chains these are the areas most reactive with affinity-labeling re- agents for the antibody-combining site.?g, 30 Thus, except for the fact that the intrachain loop of the variable region is about ten residues longer in heavy chains than in light chains, the two chains are closely homologous in structure and are complementary in function. This suggests they had a common evolu- tionary origin during the development of primitive immunoglobulin genes.

A unique feature of the p chain hypervariable region is the presence of the pentatyrosine sequence (FIGURE 8). This has not been found in any other immunoglobulin or other protein, but the Cor y l heavy chain does have an Ala-Gly-Tyr-Met sequence in the deletion region.

We have referred to this part of the VEr sequence as the deletion region because from four to ten gaps must be placed in the y l chain sequence relative to the Ou p chain in order to get the characteristic Val-Thr sequence in register just before the onset of the constant region. At least one heavy-chain disease protein has a deletion in this region.34 Smithies and coworkers34 have pointed to such deletions in immunoglobulin polypeptide chains as evidence for breakage and repair of DNA, and suggest that this may be one mechanism through which antibody variability is generated.

After the deletion region and just prior to the constant region, Subgroup VHrr heavy chains share a sequence closely homologous to that of Asp-Val- Trp-Gly . . . Val-Thr- in the Ou p chain. This terminates the variable region. The constant region of y l chains begins with the sequence Val-Ser-Ser- cor- responding to position 115 in the Eu numbering system (FIGURE 9). The same starting sequence appears to begin the constant region at position 124 in the Ou p chain.

THE CONSTANT REGION

We do not yet have complete evidence for the beginning of the C region of the p chain, but we propose that it initiates at or about position 124 with the sequence Val-Ser-Ser-. This is based on data from various IgM proteins we and others have studied, for example, the fact that the sequence around the light-heavy interchain disulfide bridge appears to be identical in several

11 , 36-38 Also, regardless of the class of the chain there is a very strong

Page 14: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

96 Annals New York Academy of Sciences

homology in sequence in all four proteins of the VHII variable-sequence sub- group just prior to the onset of the C region of the y l chain. Of course, com- plete sequence analysis of several p chains, which we are doing, will be re- quired to establish the exact point of division between the VH and C,, regions of human p chains.

The beginning of the C region in K and X light chains has a short, almost identical sequence (Ala-Ala-Pro-Ser-Val) which is thought to reflect a recogni- tion point, i.e., a sequence of codons for the union of V and C genes during translocation. Although this sequence is lacking in heavy chains, it can be reconstructed by joining together portions of the sequence of human p and y l chains at the beginning of the C region (FIGURE 9). This vestigial homology is yet another indication of a common evolutionary origin of the genes for light and heavy chains.

The most unexpected finding from sequence analysis of the Ou p chain is the surprisingly low homology of the constant portions of the human p and y l chains. This weak homology, which is illustrated in FIGURE 9, made the sequence analysis of the p chain a completely independent project, unlike the case of the human y l chain in which there was 65% homology to the Fc por- tion of rabbit y chain earlier determined by Hill and a s s o ~ i a t e s . ~ ~

The Light-Heavy lnterchain Disulfide Bridge

Although there is some similarity at the beginning of the C regions of the p and y l chains in the Val-Ser-Ser sequence (FIGURE 9 ) , the two chains differ significantly in the location of the disulfide bridge to the light chain. This is located some 90 residues further along the yl chain at Cys-220, whereas in the p chain, as in the human y2, y3, and y4 chains,40 as well as in the rabbit y chain21 the guinea pig y2 chain,42 and the mouse y2a and y2b the light-heavy disulfide bridge is made through a half-cystine homologous with Cys-140 in the human p chain (FIGURE 9). As shown in FIGURE 9, there is a strong homology of all of these chains from the beginning of the C region up to the next intrachain disulfide loop (Cys-144 in the y chains). Thereafter, the homology diminishes rapidly.

The First lntrachain Loop of the Constant Region

As shown in FIGURE 3, the first intrachain loop of the C region of the p chain extends from Cys-153 to Cys-2 12, thus enclosing 58 residues compared to 55 in the human y l chain and only 50 in the rabbit y chain. In both light and heavy chains the average length of the loops is about 55-60 residues. AS illustrated p rev i~us ly ,~~ 31 these loops exert a profound influence on the con- formation of immunoglobulins and undoubtedly confer some of the affinity of light and heavy chains for each other.

Within the first disulfide loop of the C region only a few identical residues can be aligned, even when frequent gaps are placed in the p. and y chains to achieve maximum homology. Thus, only 15 residues of the human p and y l chains are identical within the l o o p l e s s than half the number that coincide in the human and rabbit y chains. Another major difference is the presence of a complex oligosaccharide on the p. chain at Asx-170. Because of its hydro-

Page 15: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS
Page 16: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

98 Annals New York Academy of Sciences

C P b r A l e Ser Ser - ?let Cy. - - - b p Lys mr His - Ihr Cys Pro Pro Cys

- - - - - Scr Lys Pro Ihr Cys

C l x Pro ASX Pro Cys Ihr 5n Cys Pro Lys Cys

Asp Clu Asp Thr A h I l e Arp

G l u Leu Leu Cly Cly Pro Ser

Scr Phe A h Ser I l e Phc Lcu

LY. Pro Ly. Asp mr Leu ?Ict

Pro Pro Pro C l u Lcu Leu Gly Cly Pro Ser Val Phe 11s Phe Pro Pro Lys Pro Lys Asp Ihr Leu net

Leu C l y Cly Pro Ser Val Phe I l e Phe Pro Pro Lys Pro Lys ASP Ihr Leu Net

Y

FIGURE 10. Amino acid sequence of the hinge region of the human p and y l chains,al 79 1 2 the rabbit y chain,*' and guinea pig y2 chain.42

p chain. In this 44-residue segment, the p chain has only three proline residues in contrast to nine in both the human and the rabbit chains. In the guinea pig y2 chain there are 12 prolines in the hinge region.'O Another characteristic of the p chain hinge region is the large complex polysaccharide containing gluco- samine that is present just before the heavy-heavy interchain bridge (see also FIGURE 3). Carbohydrate is not present in the hinge region of the human y l chain, but an oligosaccharide containing galactosamine occurs just before the interchain bridge in some rabbit y chains.4' Thus, the hinge region is a singular differentiating characteristic of the p and y chains and must make an important contribution to both their physical and biological properties.

All of these factors, i.e., the proline content, the number of disulfide bonds, and the carbohydrate content, undoubtedly contribute to a conformational hiatus in the heavy chains.12 The discontinuity, whether it be effected by the rigid distortion due to the proline peptide bonds or to the disulfide bridges or the stereochemical interference of the large oligosaccharide, results in a loose- ness or extended chain conformation. This, in turn, imparts an apparent flexibility to the structure and exposes susceptible bonds to the action of pro- teolytic enzymes. Although the distortion due to proline is absent in the p chain, the large hydrophilic oligosaccharide requires that the hinge region be at the surface of the IgM pentamer. Heating IgM at 60' must produce a conformational change making the Arg-Gly bond accessible for proteolytic cleavage because incubation with trypsin or thermolysin at 60°, but not at 37", produces well-defined Fabp and Fcp fragments in good yield with predominant cleavage at the site indicated in FIGURE 3.

Homology in Sequence of rhe Constant Portions of the Human p and y l Chains

After the unexpected finding of the great simiIarity in sequence of the variable regions of the Ou p chain and of certain y l chains now assigned to the V,,, subgroup, it was the more surprising to discover the low degree of homology of the constant portions of human p and yl chains. Overall, the identity in sequence of the Cp and Cy regions is only about 30% even when many gaps are placed to maximize the homology. This is shown in FIGURE 9. In the comparison of some 120 positions in the two chains, only 33 identities

Page 17: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

Putnam et al. : Human Macroglobulins 99

are found although 18 gaps were placed in one chain or the other to maximize the coincidence of identical residues. This is about the same as the homology of the constant regions of either the human K or A light chain with either the human p or y l heavy chain and thus betokens an evolutionary relationship from common ancestral genes. For example, the identity in sequence of homologous sections of the constant regions of the human p and X chains is about the same (29%)14 as that of the p and y l chains (31% as shown in FIGURE 9). In corpparison, the identity in primary structure of the constant regions of human K and X chains is 37% and that for the constant regions of human y l and rabbit y chains is about 65%.7 The figure does not change much if the guinea pig y2 chain is compared, for about five-sixths of the residues common to the human p and y l chains in the section shown in FIGURE 9 are also common to the two animal chains. These represent the sequences that have been preserved throughout the course of evolution. Of course, the percentage figures given here are not precise because they do not allow for the gaps and because part of the p chain sequence is incomplete, but they illustrate clearly the great difference in the primary structure of p and y chains despite the residual homology owing to their common origin and simi- larity in function.

The Carboxyl-Terminal Sequence of the p Chain

We have extensive data for the region of undetermined sequence repre- sented by CNBr fragments F7, F8, and F9 in FIGURE 3. Although we do not yet have all the necessary overlaps to assemble a complete sequence for the several hundred residues represented by this region, the degree of homology with the y l chain is no greater than that shown for the COOH-terminal se- quence of the Fdp portion in FIGURE 9. If only the relationship were greater, it would have been much easier to put together at least a tentative sequence based on homology as is so often done in structural studies of immuno- globulins

The low degree of homology in the COOH-terminal sections of the p and y l chains is illustrated by the last portion of the p chain sequence shown in FIGURE 3. This contains the final 31 residues including the octapeptide F11 earlier isolated as a CNBr fragment lo and an interchain disulfide bridge for which only the composition was reported by Beale and coworkers.35* 36 Despite the penultimate half-cystine, which gives the p chain some resemblance to the K and X light chains, there is no very strong homology in this region between either the K or X light chains or the y l heavy chain with the p chain. Indeed, the homology of other immunoglobulin chains with the p chain is generally greater within the disulfide loops than in the interchain regions.

Amino Acid Sequence of the Light Chain of ISM Ou and Complete Structure of the Fabp Fragment

The sequence of the Fd portion of the p chain shown in FIGURE 3 together with the sequence of the K light chain earlier published *l Completes the deter- mination of the Fabp portion of IgM Ou-the portion which, in IgM anti- bodies, contains the specific antigen-combining site. The amino acid sequence

Page 18: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

100 Annals New York Academy of Sciences

of the light chain (FIGURE 11 ) is of the KI subgroup for which the Bence-Jones protein Ag is the r e f e r e n ~ e ; ~ ~ only 18 positions differ in the variable regions of these two K light chains. The Ou light chain is even more closely related to the Bence-Jones protein Hau, as is shown by the phylogenetic tree for K light chains constructed by Hilschmann and cowokers.49

This IgM protein is only the second immunoglobulin for which the complete amino acid sequence of the Fab portion has been determined and the first among the IgM immunoglobulins. Hence, an opportunity was presented to compare the sequence of the variable regions of the light and heavy chains on the same molecule for two different classes of immunoglobulins. When this was done, it was found that the variable regions of the light and heavy chains on the macroglobulin Ou are no more related in amino acid sequence than are the variable regions of the light and heavy chains of different immunoglobulin molecules. Thus, the light chains of the IgM globulin Ou and of the IgGl myeloma globulin Eu are both of the KI subgroup, although the heavy chains are p and y l , respectively. Although the variable regions of the two light chains are closely homologous in sequence, the variable regions of the two heavy chains have only about a 30% identity as shown above. Thus, in these two immunoglobulins of different classes there is no closer similarity in se- quence between the light and heavy chains on the same molecule than there is between the light and heavy chains of different molecules.

1 7 7

7r- h

-I* c

h l n o A c l d Sequence of cha Variable Part of 1~!1 Ou Llght Chain ( r )

*H- 1-77

Page 19: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

Putnam et al. : Human Macroglobulins 101

Three Variable-Gene Pools Common to IgM, IgG, and IgA Immunoglobulins

The above results, as well as much evidence based on partial sequence analysis, show that the same kind of light chain can combine with heavy chains of different class and subgroup. Furthermore, different kinds of light chains can combine with heavy chains of the same class and subgroup. Any light- chain variable subgroup can apparently combine with any heavy-chain variable subgroup, leading to many possible permutations. The only apparent restric- tion is that hybrid molecules are not formed, that is, .within any immuno- globulin molecule the pair of light chains is identical and the pair of heavy chains is identical. As a result, we have proposed4 the existence of three variable-gene pools. The first, the heavy-chain variable-gene pool is common to all y, (Y, and p chains and comprises the four subgroups, VHI, VHII , VHIII, VHIV The second variable-gene pool codes for the K chains only and consists of the three subgroups KI, KII, and KIII, and the third variable-gene pool codes for X chains and may have four or five subgroups. This proposal conforms with the hypothesis advanced several times by us and others, and frequently discussed in this monograph, that two genes, a V gene and a C gene, are required to code for each kind of polypeptide chain.

The existence of a common variable-gene pool for p , a, and y chain bio- synthesis would obviously reduce the number of genes needed for the immuno- globulins and thus would accord with a principle of economy in nature. The proposed variable-gene pool common to p and y chains could also explain the biological and serological relations between IgM and IgG immunoglobulins, i.e., the time sequence of IgM and IgG immunoglobulin biosynthesis and the cross-reactivity of the two classes.

The demonstration of a unique sequence for the variable regions of the light and heavy chains of the same IgM molecule without any evidence for sequence ambiguity excludes a duality in the primary structure of macroglobulins. Such a duality had been suggested by the finding of two categories of binding sites, five weak and five strong in the IgM pentamer. This heterogeneity of binding sites is thus probably attributable to stereochemical masking of sites in such multivalent molecules.

REFERENCES

1. PUTNAM, F. W. 1969. Science 163: 633. 2. METZGER, H. 1970. Advan. Immunol. 12: 57. 3. PUTNAM, F. W., K. TITANI, M. WIKLER & T. SHINODA. 1967. Cold Spring

4. KOHLER, H., A. SHIMIZU, C. PAUL, V. MOORE & F. W. PUTNAM. 1970. Nature

5. WANG, A. C., J. R. L. PINK, H. H. FUDENBERG & J. OHMS. 1970. Proc. Nat.

6. PRESS, E. M. & N. M. HOGG. 7. EDELMAN, G. M., B. A. CUNNINGHAM, W. E. GALL, P. D. GOTTLIEB, U. RUTIS-

HAWSER & M. J. WAXDAL. 8. CUNNINCHAM, B. A., M. N. PFLUMM, U. RUTISHAUSER & G. M. EDELMAN.

1969. 9. PONSTINGL, H., J. SCHWARZ, W. REICHEL & N. HILSCHMANN. 1970. Hoppe-

Seylers Z. Physiol. Chem. 351: 1591. 10. WIKLER, M., H. KOHLER, T. SHINODA & F. W. PUTNAM. 1969. Science 163: 75.

Harbor Symp. Quant. Biol. 3 2 9.

227: 1318.

Acad. Sci. U. S. A. 66: 657. 1970. Biochem. J. 117: 641.

1969. Proc. Nat. Acad. Sci. U. S. A. 63: 78.

Proc. Nat. Acad. Sci. U. S. A. 64: 997.

Page 20: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

102 AM& New York Academy of Sciences

11. KOHLER, H., A. SHIMIZ~, C. PAUL & F. W. PUTNAM. 1970. Science 1 6 9 56. 12. PAUL, C., A. SHIMIZU, H. K~HLER & F. W. PUTNAM. 1971. Science 172 69. 13. SHIMIZU, A., F. W. PUTNAM, C. PAUL, J. R. CLAMP & I. JOHNSON. 1971.

14. SHIMIZU, A., C. PAUL, H. KOHLER, T. SHINODA & F. W. PUTNAM. 1971. Sci-

15. MJJXKA, S. I. & H. F. DEUTSCH. 1970. J. Biol. Chem. 245: 5534. 16. FISHER, C. E., W. H. PALM & E. M. PRESS. 17. FRANGIONE, B. & C. MILSTEIN. 1969. Nature 224 597. 18. PINK, J. R. L. & C. MILSTEIN. 1969. FEBS Symp. 15: 177. 19. WILKINSON, J. M. 1969. Biochem. J. 112: 173. 20. BENNETT, J. C. 1968. Biochemistry 7: 3340. 21. TERRY, W. D., M. OGAWA & S. KOCHWA. 1970. J. Immunol. 105: 783. 22. KOHLER, H., A. SHIMIZU, C. PAUL, A. VAN DALEN & F. W. PUTNAM. 1970.

Fed. Proc. 29: 275 (Abs.) 23. CAPRA, J. D. & H. G. KUNKEL. 1970. Proc. Nat. Acad. Sci. U. S. A. 67: 87. 24. CAPRA, J. D. 1971. Nature New Biol. 230 62. 25. WORLD HEALTH ORGANIZATION. An extension of the nomenclature for immuno-

26. FITCH, W. M. & E. MARGOLUSH. Science 155: 279. 27. DAYHOFP, M. 0. 1969. Atlas of Protein Sequence and Structure 1969. Nat.

28. HOOD, L. & D. W. TALMAGE. 1970. Science 168: 325. 29. THORPE, N. 0. & S. J. SINGER. 1969. Biochemistry 8 4523. 30. PORTER, R. R. 1970. In Homologies in Enzymes and Metabolic Pathways and

31. PUTNAM, F. W. 1967. Nobel Symp. 3: 45. 32. HILSCHMANN, N. 1969. Naturwiss. 56 195. 33. PUTNAM, F. W. & H. KOHLER. Naturwiss. 56.439. 34. SMITHIES, O., D. M. GIBSON, E. M. FANNING, M. E. PERCY, D. M. PARR &

G. E. CONNELL. 1971. Science 172 574. 35. BEALE, D. & N. B u m s s . 1969. Biochim. Biophys. Acta 181: 250. 36. BEALE, D. &A. FEINSTEIN. 1969. Biochem. J. 112 187. 37. PINK, J. R. L. & C. MILSTEIN. 1967. Nature 214: 94. 38. FRANGIONE, B. 1971. This monograph. 39. HILL, R. L., R. DELANEY, R. E. FELLOWS, JR., & H. E. LEBOVRZ.

40. FRANGIONE, B., C. MILSTEIN & E. C. FRANKLIN. 1968. Biochem. J. 106 15. 41. FRUCHTER, R. G., S. A. JACKSON, I.. E. MOLE & R. R. PORTER. Biochem. J.

42. BIRSHTEIN, B. K., Q. Z. HUSSAIN & J. J. CEBRA. 1971. Biochemistry 10: 18. 43. BOURGOIS, A. & M. FOUGEREAU. 1970. FEBS Lett. 8: 265. 44. DE PREVAL, C., J. R. L. PINK & C. MILSTEIN. 1970. Nature 228: 930. 45. DORRINGTON, K. J. & C. TANFORD. 1970. Advan. Immunol. 12: 333. 46. TURNER, K. J. & J. J. CEBRA. 1971. Biochemistry 10: 9. 47. SMYTH, D. S. & S. UTSUMI. 1967. Natrre 216 332. 48. PUTNAM, F. W., K. Trrm & E. WHITLEI, JR. 1966. Proc. Roy. SOC. (London)

49. WATANABE, S. & N. HILSCHMANN. 1970. Hoppe-Seylers Z. Physiol. Chem.

Nature New Biol. 231: 73.

ence. 173: 629.

1969. FEBS Lett. 5: 20.

globulins. 1969. Bull. World Health Org. 41: 975.

Biomed. Res. Found. Silver Spring, Md.

Metabolic Alterations in Cancer: 352-360. North-Holland. Amsterdam.

1966. Proc. Nat. Acad. Sci. U. S. A. 56 1762.

116: 249.

B166 124.

351: 1929.

DISCUSSION

DR. HOOD: Do you have any additional evidence that there is a fourth sub- group in heavy chains similar to the proteins that Bennett described a couple of years ago? I noticed you had it on your last slide.

Page 21: THE AMINO ACID SEQUENCE OF HUMAN MACROGLOBULINS

Putnam et af. : Human Macroglobulins 103

DR. PUTNAM: Only the fact that we do have two proteins that begin with the same sequence that he has, and we judge upon that basis that there may be a fourth sub-group.

However, I think that that’s open to question and I agree with your sug- gestion that the more we look at these sub-groups the more we see them begin to divide into sub-sub-groups.

DR. HOOD: I think Wilkinson in looking at the data on the alpha chains in rabbits noted that there was an N terminal peptide he found in alpha chains not present in the gamma chains. In fact it was similar in sequence to this fourth sub-group and the interesting point that one can make is that all four or five of the proteins described in this putative sub-group are all in the p class. I’m wondering if you have any comments about this apparent restriction of this fourth sub-group?

DR. PUTNAM: No., I have no further information. DR. COHN: May I ask one question and ask Dr. Putnam, Dr. Hood and

Dr. Edelman to answer it in that order. If I gave you three sequences in the variable regions how would you decide

whether to put them in the same sub-group or not. DR. PUTNAM: If we’re beginning to draw sub-groups on a provisional basis

I would say that the sequences would have to be at least 75% alike before I would begin to place them in the same sub-group. Now I myself have doubts as to whether there are very clear sub-groups particularly in lambda chains.

DR. HOOD: That’s a good question. I think one has to preface the answer to this by stating that it depends very much on your view of how diversity is generated. My own feeling is that diversity is generated by separated and dis- creet germ line genes and that the major sub-groups that were defined initially according to Dr. Putnam’s criteria point out very clearly that there are major classes of evolutionary branches. Now it’s quite easy to divide the three pro- teins that you might give us into one or two or three major branches if they are as clear for example as the kappa chain branches are one with another. But the critical question that you’re really getting at is what precisely is the definition of a sub-group or a sub-sub group and if we go into one of the kappa chain classes how are you going to fractionate this down? I think the answer to your question is a statistical one, namely, we’re going to have to do a lot of sequences and we’re going to have to have each of us ask the question, how many parallel mutations are we willing to live with? That is, how many parallel mutations are required before we’ll say separate and discreet germ line genes seems to be an easier way out of this parallel mutation. My own feeling is if we go on and get additional data and look at these trees we’re going to find further and further sub-fractionation or sub-group classification. I don’t think I can give a very precise answer to the question. I don’t know how far we can logically drive the limits of this type of analysis. I think the number is going to be very much greater than ten, and it will be a number that I think most people will agree upon.