mouse heavy-chain - pnas · atg gga tgg i.c xg g atc ttt ctc ttc ctc ctg tca gg a act gca g/g taagg...

5
Proc. Natl Acad. Sci. USA Vol. 79, pp. 132-136, January 1982 Evolution Mouse immunoglobulin coding sequences for the heavy-chain variable region arose as repeats of the two short building blocks (repetitious origin/hypervariable region/framework region/noncoding region) SUSUMU OHNO, KUNIKI KATO, TOYOHARU HozuMI, AND TAKESHI MATSUNAGA City of Hope Research Institute, Duarte, California 91010 Contributed by Susumu Ohno, September 3, 1981 ABSTRACT The coding sequence for the 97-amino-acid-res- idue-long immunoglobulin heavy-chain variable (VH) regions of the mouse apparently arose as repeats of the two short building blocks. Three of the recognizable copies of the one 21-base-long prototype sequence A-C-T-G-G-A-T-A-T-G-A-C-C-T-G-G-A-G- T-G-G are invariably found to occupy the fixed positions within the 5' half of each VH coding sequence. Interestingly, the first and third copies specify the relatively invariant regions represented by the 7th to 13th and 41st to 47th amino acid residues (the first and second framework regions), whereas the second copy specifies the first hypervariable region (31st to 35th amino acid residues). These copies maintain at least 57.2% (12 out of 21) base sequence homology to the above-noted prototype building block. Base se- quences of the other 14- to 15-base-long prototype building block differ from each other by as much as 60% between individual VHS5 Yet one of its copies invariably occupies the terminal region of each VH coding sequence, thus specifying the very invariant third framework region. Other copies occupy unfixed positions in the VH and its attendant hydrophobic leader coding sequence as well as in adjacent noncoding sequences. The homology thus revealed between the VH coding sequence and its adjacent noncoding se- quences suggests their concordant evolution. In the previous paper (1), it was suggested that the evolutionary sequence conservation by the serum albumin family of proteins is due not to vigilant surveillance by natural selection, but rather to the originally repetitious nature of their coding sequences. It should be recalled that analbuminemia is an apparently harmless trait in both man and the rat (2, 3). The 18-base-long sequence T-T-C-A-C-A-G-A-G-G-A-G-C-A-G-C-T-G specify- ing Phe-Thr-Glu-Glu-Gln-Leu was suggested as the prototype building block of the coding sequence for the original domain of this family of proteins. Variable (V) regions of immunoglobulin light (L) as well as heavy (H) chains that confer antigen-binding sites to individual immunoglobulins apparently constitute another family of poly- peptides that are largely ignored by natural selection. It is granted that antigen-binding sites directed against the common pathogens of the contemporary world are surely surveyed by natural selection. Yet even here, the loss of a few VL or VH genes should prove harmless to individuals, because the immune re- sponse against any antigen is, as a rule, multiclonal. Further- more, the uniqueness of this biological system is found in its ability to generate antigen-binding sites directed against anti- gens not yet in existence (4, 5). Individual VLS or VHS that are generated in anticipation of a future need should be ignored by natural selection for the present. Indeed, the attrition rate of individual VL and VH genes appears to be inordinately high, as evidenced by frequent reports of pseudo- or silent genes (6-8). The examination of nearly all the published and in press se- quences of mouse germ-line DNA segments containing im- munoglobulin VH coding sequences led to the conclusion that this family of coding sequences also arose as repeats of the two short prototype building blocks. While recognizable copies of the identical 21-base-long prototype building block were found in all the mouse VH coding sequences examined, the other 14- to 15-base-long building blocks were idiosyncratic to each subgroup of VHs and even to each VH. Furthermore, copies of the latter prototype building block were also found in adjacent noncoding sequences. In this paper, the above-noted feature shall be described in detail, using two very dissimilar mouse immunoglobulin VH genes as examples. The VH amino acid se- quence specified by the pCH 108A gene (8) differs from that specified by MOPC141 gene (9) by 1 insertion and 53 substi- tutions; there is only 44.9% amino acid sequence homology between the two. Copies of the Identical Prototype Building Block Sequence A-C-T-G-G-A-T-A-T-G-A-C-C-T-G-G-A-G-T-G-G Occupy the Three Fixed Positions Within Divergent VH Coding Se- quences. As shown in Figs. 1-4, the three recognizable copies demonstrating greater than 57.2% (12 out of 21) base sequence homology to the above noted prototype are invariably found in precisely fixed positions within the 5' half of all VH coding se- quences thus far examined. The first of the three copies spec- ifies the 7th to the 13th amino acid residues. Because this region remains relatively invariant from VH to VH, it is regarded as the first framework region. Indeed, 5 of the 7 amino acid residues of this region are identical between pCH 108A and MOPC141 (Figs. 2 and 4). By contrast, the second copy of the above-noted prototype specifies the extremely variable 30th to 36th amino acid residues known as the first hypervariable region. Indeed, only 3 of the 7 amino acid residues of this region are identical between pCH 108A and MOPC141 (Figs. 2 and 4). Yet, these second copies of the two VHs retain 66.7% and 61.9% base se- quence homology with the prototype. The third copy of this 21- base-long prototype building block invariably specifies the 41st amino acid, in addition to the 6-amino-acid-residue-long second framework region (42nd to 47th amino acid residues). This re- gion remains quite invariant from VH to VH; there is 71.4% (5 out of 7) amino acid sequence homology between this region of pCH 108A and that of MOPC141 (Figs. 2 and 4). Accordingly, the third copy of each VH coding sequence, as a rule, demon- strates the greatest base sequence homology to the 21-base-long prototype building block. While the above-noted three copies of the prototype are invariably found in the fixed positions within the 5' half of every mouse VH coding sequence exam- ined, additional copies may be found in the 3' half of some, but not all, of the VH coding sequences. For example, the fourth copy specifies the 58th to 64th amino acid residues ofpCH 108A Abbreviations: H, heavy; L, light; V, variable. 132 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertise- ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

Upload: dokhuong

Post on 14-Apr-2018

223 views

Category:

Documents


1 download

TRANSCRIPT

Proc. Natl Acad. Sci. USAVol. 79, pp. 132-136, January 1982Evolution

Mouse immunoglobulin coding sequences for the heavy-chainvariable region arose as repeats of the two short building blocks

(repetitious origin/hypervariable region/framework region/noncoding region)

SUSUMU OHNO, KUNIKI KATO, TOYOHARU HozuMI, AND TAKESHI MATSUNAGACity of Hope Research Institute, Duarte, California 91010

Contributed by Susumu Ohno, September 3, 1981

ABSTRACT The coding sequence for the 97-amino-acid-res-idue-long immunoglobulin heavy-chain variable (VH) regions ofthe mouse apparently arose as repeats of the two short buildingblocks. Three of the recognizable copies of the one 21-base-longprototype sequence A-C-T-G-G-A-T-A-T-G-A-C-C-T-G-G-A-G-T-G-G are invariably found to occupy the fixed positions withinthe 5' halfofeach VH coding sequence. Interestingly, the first andthird copies specify the relatively invariant regions representedby the 7th to 13th and 41st to 47th amino acid residues (the firstand second framework regions), whereas the second copy specifiesthe first hypervariable region (31st to 35th amino acid residues).These copies maintain at least 57.2% (12 out of 21) base sequencehomology to the above-noted prototype building block. Base se-quences of the other 14- to 15-base-long prototype building blockdiffer from each other by as much as 60% between individual VHS5Yet one of its copies invariably occupies the terminal region ofeachVH coding sequence, thus specifying the very invariant thirdframework region. Other copies occupy unfixed positions in theVH and its attendant hydrophobic leader coding sequence as wellas in adjacent noncoding sequences. The homology thus revealedbetween the VH coding sequence and its adjacent noncoding se-quences suggests their concordant evolution.

In the previous paper (1), it was suggested that the evolutionarysequence conservation by the serum albumin family ofproteinsis due not to vigilant surveillance by natural selection, but ratherto the originally repetitious nature of their coding sequences.It should be recalled that analbuminemia is an apparentlyharmless trait in both man and the rat (2, 3). The 18-base-longsequence T-T-C-A-C-A-G-A-G-G-A-G-C-A-G-C-T-G specify-ing Phe-Thr-Glu-Glu-Gln-Leu was suggested as the prototypebuilding block of the coding sequence for the original domainof this family of proteins.

Variable (V) regions of immunoglobulin light (L) as well asheavy (H) chains that confer antigen-binding sites to individualimmunoglobulins apparently constitute another family of poly-peptides that are largely ignored by natural selection. It isgranted that antigen-binding sites directed against the commonpathogens of the contemporary world are surely surveyed bynatural selection. Yet even here, the loss ofa few VL orVH genesshould prove harmless to individuals, because the immune re-sponse against any antigen is, as a rule, multiclonal. Further-more, the uniqueness of this biological system is found in itsability to generate antigen-binding sites directed against anti-gens not yet in existence (4, 5). Individual VLS or VHS that aregenerated in anticipation of a future need should be ignored bynatural selection for the present. Indeed, the attrition rate ofindividual VL and VH genes appears to be inordinately high, asevidenced by frequent reports ofpseudo- or silent genes (6-8).

The examination of nearly all the published and in press se-quences of mouse germ-line DNA segments containing im-munoglobulin VH coding sequences led to the conclusion thatthis family of coding sequences also arose as repeats of the twoshort prototype building blocks. While recognizable copies ofthe identical 21-base-long prototype building block were foundin all the mouse VH coding sequences examined, the other 14-to 15-base-long building blocks were idiosyncratic to eachsubgroup of VHs and even to each VH. Furthermore, copies ofthe latter prototype building block were also found in adjacentnoncoding sequences. In this paper, the above-noted featureshall be described in detail, using two very dissimilar mouseimmunoglobulin VH genes as examples. The VH amino acid se-quence specified by the pCH 108A gene (8) differs from thatspecified by MOPC141 gene (9) by 1 insertion and 53 substi-tutions; there is only 44.9% amino acid sequence homologybetween the two.

Copies of the Identical Prototype Building Block SequenceA-C-T-G-G-A-T-A-T-G-A-C-C-T-G-G-A-G-T-G-G Occupy theThree Fixed Positions Within Divergent VH Coding Se-quences. As shown in Figs. 1-4, the three recognizable copiesdemonstrating greater than 57.2% (12 out of 21) base sequencehomology to the above noted prototype are invariably found inprecisely fixed positions within the 5' half of all VH coding se-quences thus far examined. The first of the three copies spec-ifies the 7th to the 13th amino acid residues. Because this regionremains relatively invariant from VH to VH, it is regarded as thefirst framework region. Indeed, 5 of the 7 amino acid residuesof this region are identical between pCH 108A and MOPC141(Figs. 2 and 4). By contrast, the second copy ofthe above-notedprototype specifies the extremely variable 30th to 36th aminoacid residues known as the first hypervariable region. Indeed,only 3 of the 7 amino acid residues of this region are identicalbetween pCH 108A and MOPC141 (Figs. 2 and 4). Yet, thesesecond copies of the two VHs retain 66.7% and 61.9% base se-quence homology with the prototype. The third copy ofthis 21-base-long prototype building block invariably specifies the 41stamino acid, in addition to the 6-amino-acid-residue-long secondframework region (42nd to 47th amino acid residues). This re-gion remains quite invariant from VH to VH; there is 71.4% (5out of 7) amino acid sequence homology between this regionofpCH 108A and that ofMOPC141 (Figs. 2 and 4). Accordingly,the third copy of each VH coding sequence, as a rule, demon-strates the greatest base sequence homology to the 21-base-longprototype building block. While the above-noted three copiesof the prototype are invariably found in the fixed positionswithin the 5' half of every mouse VH coding sequence exam-ined, additional copies may be found in the 3' halfof some, butnot all, of the VH coding sequences. For example, the fourthcopy specifies the 58th to 64th amino acid residues ofpCH 108A

Abbreviations: H, heavy; L, light; V, variable.

132

The publication costs ofthis article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertise-ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

Proc. Nati. Acad. Sci. USA 79 (1982) 133

CA AAGTCCCGCTCATG AATATGCAAATTACCT AAGTGTATGGT AGT TAAAAACAGGGATAT

CAACACGCTGAAAACAACATATGTCCAATGTCCTCTCCACAGACACTGAACACACTGACTCTTAACC-19 -10 -5

r eT.ee 7e er r ->:z

ATG GGA TGG i.C XG G ATC TTT CTC TTC CTC CTG TCA GG A ACT GCA G/G TAAGG

GGCTCACCATTTCCAAATCTGAAGAAAAGAAATGGCTTGTGATGTCACTG ACATCCACTCTGTCT TTCTC

ae. >s-S er Z -a7F̂rn ---}etaft. nbarc.mew G aS Pr o G u Leu vtazTCCTCAG/GC GTC CAC TCT GAG G-TC CAG CTT CAG CAG TCA GGA CC; G GTG

[-.Jc ap a Se r _ e Sec f- s aA Ser '7Tc r T rP ;A e

AAA CCT GGG GCC TCA GTG AAG ATA TCC TGC AAG GCT TCT GGA TAC ACA TTC

.e e .- Z TAGT GAC TAC AAC ATG CAC T FG rTG AAG CAG AGC CAT GGA AAG AGC CTT GAG TGG.* A *t"1l1T5WHV-1- HV-1 IW-2---- FW-2

50 60we !vAe 2;n> r zwr Asv. n Zy

ATT GGA TAT ATT TAT CCT TAC AAT GGT GGT ACT GGC TAC AAC CAG AAG TTC AAG*I* *4 *** 4*4* *4**

HV-2------------------------------70 80

er r Xro SC! S,ere .er-h -A r a.- et 'u 2euAGC AAG GCC ACA TTG ACT GTA GAC AAT TCC TCC AGC ACA GCC TAC ATG GAG CTC

90 97;-re :_;e e, 7.- > e r.tr -:4 , S-r_cvc a r r. -c wi

AGC AGC CTG ACA TCT GAG GAC TCT G CA GTC TAT TAC TGT GC A AGA/CACAGTGTTA44*** s *on *N

FW-3-FW-3

C AAACACATCCTGAGT GTGTCAGAAACCCTGAGGTGCAGCAAGCTTCCTTGGGACTGACAAGAGTTAGAG

AATAGTCGCTTGCAGAC

FIG. 1. The 661-base-long germ-line DNA segment of the mouse that contains the coding sequence for pCH 108A immunoglobulin VH and itsattendant hydrophobic leader coding sequence (8) is reproduced. In all figures of this paper, only one of the DNA strands, the one that correspondsto its transcript, is shown. All the bases, excepting the canonical A-G-C-T-G and cocanonical G-G-G-T-G pentamers, are shown in small capitalletters. A-G-C-T-G and G-G-G-T-G are identified by large capital letters in boxes. Amino acid residues of the VH coding sequence shown in italicsare numbered in large type. Minus signs in front of numbers identify amino acid residues of the hydrophobic leader. The eight recognizable copiesdemonstrating greater than 62.5% base sequence homology to the idiosyncratic 15-base-long prototype building blocks are marked by solid bars,and bases homologous with the prototype are identified by asterisks. The four copies of the longer 21-base-long prototype building blocks are markedby shaded bars. The beginning and the end of each of the two hypervariable regions are indicated by HV- and those of each of the three frameworkregions are signified by FW-.

in the midst of its long second hypervariable region (Figs. 1 and2). Thus, it is possible that this prototype sequence A-C-T-G-G-A-T-A-T-G-A-C-C-T-G-G-A-G-T-G-G originally contributedto the construction of the entire length of mouse immunoglob-ulin VH coding sequences. Yet its recognizable copies are foundneither in adjacent noncoding sequences nor in attendant hy-drophobic leader coding sequences of all the published and inpress germ-line immunoglobulin VH DNA sequences exam-ined. As far as its three copies that invariably occupy the fixedpositions within the 5' half of VH coding sequences are con-cerned, it is likely that they are placed under surveillance bynatural selection. These three copies are always translated inthe same reading frame. Accordingly, their base sequence con-servations are reflected in the amino acid sequence conserva-tion. It should be recalled that the first and third copies specifyrelatively invariant first and second framework regions, and thateven the second copy that specifies the first hypervariable re-gion remains reasonably homologous with the first and thirdcopies. Could it be that these three copies are intimately in-volved in the construction of antigen-binding sites directed

against common pathogens of the contemporary world, hence,the surveillance by natural selection?

Idiosyncrasy of the Other Prototype Building Block and In-discriminate Occurrences of Its Copies. By contrast, the otherprototype building block, which is only 14-15 bases long, isquite free wheeling. Within the VH coding sequence, the onlyfixed position its copy occupies is the very terminal region,therefore contributing to the most invariant portion ofthe thirdframework region (Figs. 1 and 3). Although one ofthe few othercopies residing within the VH coding sequence tends to occupydifferent positions within the second hypervariable region, thismay merely reflect the enormity of this hypervariable region,which covers the 51st to 68th amino acid residues. Other rec-ognizable copies demonstrating greater than 60% base se-quence homology to this shorter prototype building block arefound scattered in adjacent noncoding sequences: the terminalregions of the 5' and 3' intergenic spacers as well as the inter-vening sequence between the hydrophobic leader and VH cod-ing sequences. Interestingly, one of its copies almost invariablyoccupies the coding-noncoding junction signaling the hydro-

Evolution: Ohno et aL

-

Proc. Nati Acad. Sci. USA 79 (1982)

TWO BUILDING BLOCKS OF PCH 108A MOUSE IG VH GENE

ONE SHARED WITHSPACERS AND INTERVENINGSEQUENCE

PROTOTYPELYS SER MET CgALTAAG TCC AI C

THE OTHER UNIQUE TO THE CODINGSEQUENCE

PROTOTYPETHR GLY TYR ASP LEU G fRPACT GGA TAT GAC CT GA IGG

1) A*AG* I&&CCI* &IC *ATG 68.8%3 18

2) AA*G* I&T AIG* GI AG*I 80.035 48

-5-yTh-rAl -aG-

3) AAC IGC AGG TA AGG 60.0%169 182

4) AAC I&& ACT CTG TCI 66.7%233 247

-1 1- ZyV-alH-isS-erG,-luV-

5) GCG ICC ACT CTG AGG 66.7%260 274

67 71LYS Ala Thr LEU Thr

6) AAG GCC ACA TIG ACT 66.7%469 483

HYPERVARIABLE 292 96-laV-alT-yrTyrCy-sAZ-

7) CAG TCT ATTACTG TGC 62.5%545 560

FRAMEWORK 3

8) *AA CAC ACA TI AGT 73.3%576 590

7Ser GLY

1) TCA GGA289

30THR Asp

2) ACT GAC358

41 HH

His GLY3) CAT GGA

391

58THR GLY

4) ACT GGC442

Pro Glu LEU VatCCT GAG CTG GTG

* ** *** * *

FRX4EWORK ITYR Asn MetTAC AAC ATG** ** **

YPERVARIABLELye Ser LEUAAG AGC CTT

FRAMEWORK

TYR Asn GinTAC AAC CAG*

HYPERVARIABLE 2

13LysAAA 57.2%

309

36His TRPCAC TGG 66.7%

3781 47GLU TRPGAG TGG 66.7%

4112

64Lys PheAAG TTC 66.7%

462

FIG. 2. Those recognizable copies of the two prototype building blocks identified in Fig. 1 are tabulated in two columns. Each column is headedby the prototype building block sequence shown in large capital letters. Amino acid residues specified by the prototype are shown in small capitalletters. Its copies are aligned in descending order of their positions. The position of each copy in the sequence of Fig. 1 is identified by italic numbersof the first and last bases. Bases homologous with the prototype are indicated by asterisks, and the extent of base sequence homology of each copywith the prototype is given in percentage. With regard to those copies residing within coding sequences, amino acid residues specified by them areshown in small capital letters if they are homologous with those specified by the prototype building block. Otherwise, they are shown in italics.

phobic leader sequence end (Figs. 1 and 3), while another tendsto occupy the junction that marks the end of VH coding se-quences (Fig. 3). The above substantiates the evolutionary der-ivation of the pretranscriptional coding sequence fusion mech-anism unique to immunoglobulin genes from the ubiquitousposttranscriptional splicing mechanism (5), because upstreamsignal sequences for both are embodied in recognizable copiesof the same prototype sequence.

Reflecting its free-wheeling character, however, this pro-totype does not exist as a single sequence universal to all mousegerm-line VHS. Instead, this prototype building block is idio-syncratic to each subgroup ofVHS and even to each VH. In Figs.1 and 2, it should be noted that the 661-base-long sequence ofpCH 108A (8) contains eight recognizable copies demonstratinggreater than 60% base sequence homology to the 15-base-longprototype sequence A-A-G-T-C-C-A-T-G-C-T-G-A-G-T dis-tributed along the entire length. Yet when this prototype se-quence is applied to the 651-base-long sequence of MOPC141

(9), shown in Figs. 3 and 4, only two recognizable copies of itcan be identified. Instead, this germ-line VH DNA segmentcontains nine copies demonstrating greater than 62.5% homol-ogy to the 14-base-long prototype sequence A-A-G-C-T-C-A-C-A-C-T-G-T-G. The prototypes ofpCH 108A and MOPC141 areclearly related, yet only 60% homology can be noted. This not-able difference in their shorter prototype building blocks canbe reconciled with an obviously remote relatedness betweenpCH 108A and MOPC141. As already noted, there exists only44.9% amino acid sequence homology between their products.By contrast, the germ-line NPb 182-2 VH gene (10) and pCH108A are quite closely related, because the amino acid se-quences specified by these two VH coding sequences are 79.6%homologous. Yet the germ-line NPb 182-2 DNA (10) containssix recognizable copies of the new, 14-base-long prototype se-quence A-C-A-G-T-T-A-C-T-G-A-G-C-A and only two recog-nizable copies ofthe 15-base-long pCH 108A prototype buildingblock sequence A-A-G-T-C-C-A-T-G-C-T-G-A-G-T. The base

134 Evolution: Ohno et aL

Proc. Natd Acad. Sci. USA 79 (1982) 135

AAGCTTGCCCTGTG CTTCCTTTATCCTCTCAGG AACCTCCCCCAATG CAAATCAGCCCTCAGGCAG

AGGATAA AAGCTCACACAAAG ATGAGAAGCCCCATCATCTTCTCATAGAGCCTCCATCAGAGC

-19 -10 -5Met A1da V-o Leu Ala Leu Leu Phe Cys Leu Val Thr Phe Pr-o SATG GCT GTC CTG GCA TTA CTC TTC TGC CT& GTA ACA TTC CC TAM

TCAGGGTTTCACAAGAGGGACTAAAGACATGT CAGCTAA TGTGTG ACTAATGGTAATGTCACTTGTCACTA-4 1 10-s `e.ez Per rGrln Val C-Zn Leu Lus GuZu Ser Gly Pro Gl-y Leu 7aZ Ala Pro

GG/T ATC CTT TCC CAG GTA C AE AAG GAG TCA GGA CCT GGC CTG GTG GCG CCCFW-1 4. -51 9n FW-1

20 30.er Gn P' er Leu Ser ie Thr Cys Thr 7ol Ser Cly Phe Ser JLeu Thr Ply TyrTCA CAG AGC CTG TCC ATC ACA TGC ACC GTC TCA GGG TTC TCA TTA ACC GGC TAT

40 50Va Ams Tr VolaZ Arg Pan Pro Pro 1Z Lys Gly Leu GlU Trp Lea. ly Met

GGT GTA AAC TGG GTT CGC CAG CCT CCA GGA AAG GGT CTG GAG TGG CTG GGA ATG

W-! NV-2------ ----HV-1 '**--------------------- W-2--

50.2- , i-sr u- Aer Atr As-r Thr A-s'n Ser Ale 1.eu s aver reu

ATA TG AT GG A AGC ACA GAG TAT A AT TCA GCT CTC AAA TCC AGA CTG

__________________________________________________________________________

7 r80- n w- z .n q n . r . e " .S ;> S vr.Ke. > 3 n i e.oat.ee

AGC ATC AGC AAG GAC AAC TCC AAG AGC CAA GTT TTC TTA AAA ATG AAC AGT CTC

-------HV-290

CAA ACT GAT GAC ACA GC C AGG TAC TAC TGT G C C AG/ AGACOCAGTG AGGG

FW-3-- -

AAGTCCAA TGTGAGCCTGCACAAATACCTCTCTGCAGGGATGATCACAACCAGCAGGGGGCGCTGAGGACCC

AAGGACTT

FIG. 3. The 651-base-long germ-line DNAfragment of the mouse containing the coding sequence for MOPC141 VH and its attendant hydrophobicleader coding sequence (9) is reproduced in a manner corresponding to that of Fig. 1.

sequence homology between the above two shorter prototypesis only 40%.What can one make of the observed idiosyncracy of the

shorter of the two prototype building blocks as well as the in-discriminate occurrence of its copies in coding and noncodingsequences alike? As to the idiosyncracy, it is likely associatedwith the unique capacity of the immune system to anticipateits future needs by generating antigen-binding sites directedagainst antigens not yet in existence (4, 5). As to the indiscrim-inate occurrence of its copies, Smith (11) made a convincingprediction that, if long ignored by natural selection, any sub-stantial length of DNA would spontaneously acquire a repeti-tious character uniquely its own.The Ultimate Ancestor of All the Prototype Building Block

Sequences. All the idiosyncratically divergent shorter proto-type building blocks of immunoglobulin VHS share one distinctfeature that suggests their common ancestry. One copy of eachinvariably specifies the third framework region, in which re-sides the triplet amino acid sequence Tyr-Tyr-Cys that has beenconserved by the majority ofVHS ofnot only the mouse but alsoman. In Figs. 1-4 it should be noted that the Tyr-Tyr-Cys ofpCH 108A is specified by the base sequence TAT-TAC-TGT,whereas the same triplet amino acid sequence in MOPC141 isspecified by TAC-TAC-TGT. The T-A-C-T-G pentamer foundin both is a two-base deviant of the A-G-C-T-G pentamer (12).In the case of pCH 108A, this T-A-C-T-G pentamer is derived

from T-G-C-T-G residing in its shorter prototype building block(Fig. 2). Needless to say, T-G-C-T-G is a single-base deviant ofA-G-C-T-G. In the case of MPOC141, the corresponding T-A-C-T-G is derived from C-A-C-T-G in its shorter prototype (Fig.4). Aside from C-A-C-T-G, this MOPC141 prototype buildingblock contains A-G-C-T-C, a single-base deviant of the canon-ical A-G-C-T-G.

As already remarked in the previous paper (1), all the pro-totype building blocks of divergent coding sequences are prob-ably related to each other, because their ultimate ancestrymight be found in the 20-base-long primordial sequence A-G-C-T-G-A-G-C-T-G-A-G-C-T-G-G-G-G-T-G of the entire eu-chromatic region of mammalian chromosomes, in which non-coding base pairs outnumber coding base pairs roughly 35 to1 (12). Indeed, the other 20-base-long prototype building blockof mouse immunoglobulin VHS presently identified as A-C-T-G-G-A-T-A-T-G-A-C-C-T-G-G-A-G-T-G-G contains the deca-meric sequence A-C-C-T-G-G-A-G-T-G. This 10-base-long se-quence is clearly derived from the A-G-C-T-G-G-G-G-T-G por-tion of the 20-base-long primordial sequence. It should berecalled that the 18-base-long sequence T-T-C-A-C-A-G-A-G-G-A-G-C-A-G-C-T-G was identified as the prototype buildingblock of the original coding sequence for the serum albuminfamily of proteins in the previous paper (1). This prototypebuilding block too contains the sequence A-G-G-A-G-A-G-C-T-G. In fact, the canonical A-G-C-T-G pentamer and cocanon-

Evolution: Ohno et aL

Proc. NatL Acad. Sci. USA 79 (1982)

TWO BUILDING BLOCKS OF'MOPC141 MOUSE IG VH GENE

ONE SHARED WITHSPACERS AND INTERVENINGSEQUENCES

PROTOTYPE

A AGC TCA CMP G

THE OTHER UNIQUE TO THE CODINGSEQUENCE

PROTOTYPE

AC A8FT AG C F E GTG1) *A AG*C ITG*CC*C I*GI *G 78. 5%

1 14

2) A ACC TCC CCC AAT G 64.3%34 47

3) A AGC TCA CAC AAA G 78.5%74 87

-5-O SER CY-

4) A AGC TGT171

5) C AGC TAA217

56-Y SER

6) A *AGC432

AAG TGT G 71.4%184

7TG IGI G 64.3% Ser GLY Pro GIy LEU Val

229 1) TCA GGA CCT GGC CTG GTG286

59Thr Asp TyrACA GAC TAT

A-A 71.4%445

HYPERVARIABLE L92 95

-a Arg Tyr Tyr CYS A-7) C AGG IAC TAC IGT G 64.3%

540 553FRAMEWORK 3

97-a Se-

8) C AG- AGA CAC AGI G 64.3%555 567

30THR GLY

2) ACC GGC355

FRAMEWORK I

TYR Gly Val AsnTAT GGT GTA AAC

HYPERVARIABLE 141

Pro GLY Lys Gly LEU GLU3) CCA GGA AAG GGT CTG GAG

388FRAMEWORK 2

9) A AGT CCA A572

TGT G 64.3%583

FIG. 4. Those recognizable copies of the two prototype building blocks identified in Fig. 3 are tabulated in the same manner as in Fig. 2.

ical G-G-G-T-G pentamer are invariably prominent in all thecoding and noncoding sequences of the mammalian .euchro-matic region thus far published (12). Not surprisingly, the 661-base-long sequence of Fig. 1 and the 651-base-long sequenceof Fig. 3 contain two each of A-G-C-T-G and one each of G-G-G-T-G, whereas only 0.5 of each is expected by chance.

This work was supported in part by National Institutes of HealthGrant RO1 ATT 5620.1. Ohno, S. (1981) Proc. NatL Acad. Sci. USA 78, 7657-7661.2. Boman, H., Hermodson, M., Hammond, C. A. & Motulsky, A.

G. (1976) Clin. Genet. 9, 513-526.3. Nagase, S., Simamune, K. & Shumiya, S. (1979) Science 205,

590-591.

4. Cohn, M. (1968) Nucleic Acids in Immunology, eds. Plescia, 0.

& Braun, W. (Springer, New York), pp. 224-231.5. Ohno, S., Epplen, J. T., Matsunaga, T. & Hozumi, T. (1981)

Prog. Allergy 28, 8-39.6. Proudfoot, N. J. (1980) Nature (London) 286, 840-841.7. -Bentley, D. L. & Rabbitts, T. H. (1980) Nature (London) 288,

730-733.8. Givol, D., Zakut, R., Effron, K., Richavi, G., Ram, D. & Cohn,

J. B. (1981) Nature (London) 292, 426-430.9. Sakano, H., Maki, R., Kurosawa, Y., Roeder, W. & Tonegawa,

S. (1980) Nature (London) 286, 676-683.10. Bothwell, A. L. M., Paskind, M., Reth, M., Imanishi-Kari, T.,

Rajewsky, K. & Baltimore, D. (1981) Cell 24, 625-637.11. Smith, P. G. (1976) Science 191, 528-535.12. Ohno, S. (1981) Differentiation 18, 65-74.

13AlaGCG 61.9%

*306

TRPTGG 61. 9%

375

47TRPTGG 66.7%

408

136 Evolution: Ohno et aL