immunoglobulin heavy chain locus of the rat: striking homology to

5
Proc. Natl. Acad. Sci. USA Vol. 83, pp. 6075-6079, August 1986 Immunology Immunoglobulin heavy chain locus of the rat: Striking homology to mouse antibody genes (gene families/evolution/DNA sequence) MARIANNE BRUGGEMANN*, JILL FREE*, AUSTIN DIAMONDt, JONATHAN HOWARDt, STEVE COBBOLD*, AND HERMAN WALDMANN* *University of Cambridge, Department of Pathology, Division of Immunology, Laboratories Block, Addenbrooke's Hospital, Hills Road, Cambridge CB2 2QQ, England; and tDepartment of Immunology, AFRC Institute of Animal Physiology, Babraham, Cambridge CB2 4AT, England Communicated by C. Milstein, April 14, 1986 ABSTRACT DNA encoding the rat diversity segment (D), joining segment (JH), and constant (C) region IL, y2a, yl, y2b, E and a of the Ig heavy chain has been isolated from a cosmid library. Restriction mapping allowed us to identify two gene clusters: D-JH-CA and Cyl-Cy,2b-G-Csa in addition to a single C2 gene. Analysis of genomic DNA by Southern blotting permitted identification of the CA gene and led to the proposal of the following gene order for the rat Ig heavy chain locus: D-JH-C/A-Cr(Cy, CynJ)-Cy1-C&2b-C8-C.. There is striking ho- mology between the rat and mouse Ig heavy chain loci as regards gene order and distance between CH genes. Partial DNA sequencing confirms this homology and shows that exon sequences are more conserved than are intron sequences. One of the most conserved intron regions between rat and mouse is that spanning the Ig heavy chain enhancer (91% homology). However, the relationship between the different C.y subclasses in rat differs from that in mouse. Comparison of the C,, CH3 domains shows that the rat Cy,2b gene is most homologous to mouse Cy,,,b, whereas the rat Cy1 and C2 genes, both very similar to each other, are most homologous to the mouse Cy1 gene. Mouse and human Ig heavy chain genes are well character- ized (1). The gene order in the mouse [D-JH-CM-Ca-Cy3-Cyl- Cy2b-Cy2a-Ce-Ca (2)] differs strikingly from that in the human [D-JH-CM-CS // Cy3-CylC0,62-Cal // Cy2-C--CeCa2 (3, 4)]; D is the diversity segment; JH is the heavy chain joining segment, and C is the constant region. Thus, while in mouse the four Cy genes are clustered together, the evolution of the locus in humans seems to have involved a duplication of a segment including C, CG, and Ca genes (4). Pseudogenes have been found in the human but not in the mouse Ig heavy chain gene cluster. Extensive divergence between mouse and human sequences has been reported for the JH cluster, for C8, and for C, (3, 5, 6). Comparison of hinge exons (C.8, Cas) shows length heterogeneity, but less divergence has been found for other Ig domains (7, 8). To establish a better picture of the evolution of the gene cluster, it is necessary to characterize Ig heavy chain genes of other species. Here we present information about the structure of the rat Ig heavy chain locus. Eight heavy chain isotypes are known in the rat (,u, 8, y2c, y2a, yl, y2b, E, and a) (9). The order of the rat CH genes is not known, but an assignment to chromosome 6 has been proposed (10). Rat CH genes have an added interest in that rat monoclonal antibodies directed against human cell-surface antigens are receiving increasing attention as potential therapeutic tools (11). There are well-established differences between rat and mouse antibodies in the ability of different isotypes to mediate effector functions like fixation of human complement or CH-mediated adherence reactions (12). Protein compari- son suggests that rat and mouse diverged some 107 years ago (13). Thus, because of this close relationship, it might be possible by sequence comparison to localize areas on the protein capable of mediating different effector functions. When such information is available through knowledge of gene structure, it may be possible by transfection of Ig genes that have been manipulated in vitro to construct antibody molecules that exhibit the desired properties. MATERIALS AND METHODS Screening of the Cosmid Library. The cosmid library was derived from a partial Mbo I digest of DNA from PVG rat liver and was cloned into the vector pTL5 (14-16). The library (5 x 105 colonies) was screened with nine different mouse DNA probes: a 2-kilobase pair (kb) Pst I fragment of plasmid p4-7-1 and a 1-kb Pst I fragment of plasmid p223, which include mouse D segments FL16 and SP2 (17); a 2-kb BamHI-EcoRI fragment of plasmid pSV-V,41 (18), including the enhancer and part of the JH region; a 0.77-kb EcoRI fragment of plasmid ppu/118, which covers C,, (19); a 0.85-kb Pst I fragment of plasmid pABy3-3, which covers Cy3 (T. Honjo, personal communication); a 2.5-kb Sac I fragment of plasmid IgH2, which includes the C71 gene (20); a 1.4-kb EcoRI fragment of plasmid pBR1.4, which includes part of the Cy2b switch region (21); a 0.55-kb Pst I fragment of plasmid pABy2a-1, which covers Cy2a (22); and a 0.5-kb Pst I fragment of plasmid pABa-1, which covers Ca (T. Honjo, personal communication). Other probes used in cosmid characterization included a 3.7-kb Sac I fragment encompassing the switch (S) region S, and a 4.5-kb BamHI fragment encompassing the CM and membrane ,u region of plasmid pSV-Vul (18); a 0.8-kb EcoRI-HindIII fragment from plasmid pSy3 encompassing the switch Sy3 region (23); a 8-kb EcoRI fragment of plasmid pSp51, a subclone of CHSp~u7 (24), encompassing the CB region; and a 0.9-kb and a 0.4-kb Pst I fragment of plasmid C230 encompassing CE (25). Southern Filter Hybridization. The DNA probes were nick-translated (26) to a specific activity of 2 x 108 cpm/,ug. Colony hybridization and Southern blots were performed as described (27). DNA Sequencing. Nucleotide sequencing was carried out by using M13 single-stranded vectors tgl30 and tgl31 (28) with the dideoxy chain-termination procedures (29) and universal sequencing primer or synthetic oligonucleotides provided by G. Allen (Wellcome Laboratories). Abbreviations: C, constant region; D, diversity segment; S, switch region; J, joining segment; kb, kilobase pair(s). 6075 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Upload: vuhanh

Post on 11-Jan-2017

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Immunoglobulin heavy chain locus of the rat: Striking homology to

Proc. Natl. Acad. Sci. USAVol. 83, pp. 6075-6079, August 1986Immunology

Immunoglobulin heavy chain locus of the rat: Striking homology tomouse antibody genes

(gene families/evolution/DNA sequence)

MARIANNE BRUGGEMANN*, JILL FREE*, AUSTIN DIAMONDt, JONATHAN HOWARDt, STEVE COBBOLD*,AND HERMAN WALDMANN**University of Cambridge, Department of Pathology, Division of Immunology, Laboratories Block, Addenbrooke's Hospital, Hills Road, Cambridge CB2 2QQ,England; and tDepartment of Immunology, AFRC Institute of Animal Physiology, Babraham, Cambridge CB2 4AT, England

Communicated by C. Milstein, April 14, 1986

ABSTRACT DNA encoding the rat diversity segment (D),joining segment (JH), and constant (C) region IL, y2a, yl, y2b,E and a of the Ig heavy chain has been isolated from a cosmidlibrary. Restriction mapping allowed us to identify two geneclusters: D-JH-CA and Cyl-Cy,2b-G-Csa in addition to a singleC2 gene. Analysis of genomic DNA by Southern blottingpermitted identification of theCA gene and led to the proposalof the following gene order for the rat Ig heavy chain locus:D-JH-C/A-Cr(Cy, CynJ)-Cy1-C&2b-C8-C.. There is striking ho-mology between the rat and mouse Ig heavy chain loci asregards gene order and distance between CH genes. PartialDNA sequencing confirms this homology and shows that exonsequences are more conserved than are intron sequences. Oneof the most conserved intron regions between rat and mouse isthat spanning the Ig heavy chain enhancer (91% homology).However, the relationship between the different C.y subclassesin rat differs from that in mouse. Comparison of the C,, CH3domains shows that the rat Cy,2b gene is most homologous tomouse Cy,,,b, whereas the rat Cy1 and C2 genes, both verysimilar to each other, are most homologous to the mouse Cy1gene.

Mouse and human Ig heavy chain genes are well character-ized (1). The gene order in the mouse [D-JH-CM-Ca-Cy3-Cyl-Cy2b-Cy2a-Ce-Ca (2)] differs strikingly from that in the human[D-JH-CM-CS // Cy3-CylC0,62-Cal // Cy2-C--CeCa2 (3, 4)];Dis the diversity segment; JH is the heavy chain joiningsegment, and C is the constant region. Thus, while in mousethe four Cy genes are clustered together, the evolution of thelocus in humans seems to have involved a duplication of asegment including C, CG, and Ca genes (4). Pseudogeneshave been found in the human but not in the mouse Ig heavychain gene cluster. Extensive divergence between mouse andhuman sequences has been reported for the JH cluster, for C8,and for C, (3, 5, 6). Comparison of hinge exons (C.8, Cas)shows length heterogeneity, but less divergence has beenfound for other Ig domains (7, 8). To establish a better pictureof the evolution of the gene cluster, it is necessary tocharacterize Ig heavy chain genes of other species. Here wepresent information about the structure of the rat Ig heavychain locus.

Eight heavy chain isotypes are known in the rat (,u, 8, y2c,y2a, yl, y2b, E, and a) (9). The order of the rat CH genes isnot known, but an assignment to chromosome 6 has beenproposed (10).

Rat CH genes have an added interest in that rat monoclonalantibodies directed against human cell-surface antigens arereceiving increasing attention as potential therapeutic tools(11). There are well-established differences between rat and

mouse antibodies in the ability of different isotypes tomediate effector functions like fixation ofhuman complementor CH-mediated adherence reactions (12). Protein compari-son suggests that rat and mouse diverged some 107 years ago(13). Thus, because of this close relationship, it might bepossible by sequence comparison to localize areas on theprotein capable of mediating different effector functions.When such information is available through knowledge ofgene structure, it may be possible by transfection of Ig genesthat have been manipulated in vitro to construct antibodymolecules that exhibit the desired properties.

MATERIALS AND METHODSScreening of the Cosmid Library. The cosmid library was

derived from a partial Mbo I digest of DNA from PVG ratliver and was cloned into the vector pTL5 (14-16).The library (5 x 105 colonies) was screened with nine

different mouse DNA probes: a 2-kilobase pair (kb) Pst Ifragment of plasmid p4-7-1 and a 1-kb Pst I fragment ofplasmid p223, which include mouse D segments FL16 andSP2 (17); a 2-kb BamHI-EcoRI fragment ofplasmid pSV-V,41(18), including the enhancer and part of the JH region; a0.77-kb EcoRI fragment of plasmid ppu/118, which covers C,,(19); a 0.85-kb Pst I fragment of plasmid pABy3-3, whichcovers Cy3 (T. Honjo, personal communication); a 2.5-kb SacI fragment of plasmid IgH2, which includes the C71 gene (20);a 1.4-kb EcoRI fragment of plasmid pBR1.4, which includespart of the Cy2b switch region (21); a 0.55-kb Pst I fragmentof plasmid pABy2a-1, which covers Cy2a (22); and a 0.5-kbPst I fragment of plasmid pABa-1, which covers Ca (T.Honjo, personal communication).

Other probes used in cosmid characterization included a3.7-kb Sac I fragment encompassing the switch (S) region S,and a 4.5-kb BamHI fragment encompassing the CM andmembrane ,u region of plasmid pSV-Vul (18); a 0.8-kbEcoRI-HindIII fragment from plasmid pSy3 encompassingthe switch Sy3 region (23); a 8-kb EcoRI fragment of plasmidpSp51, a subclone of CHSp~u7 (24), encompassing the CBregion; and a 0.9-kb and a 0.4-kb Pst I fragment of plasmidC230 encompassing CE (25).

Southern Filter Hybridization. The DNA probes werenick-translated (26) to a specific activity of 2 x 108 cpm/,ug.Colony hybridization and Southern blots were performed asdescribed (27).DNA Sequencing. Nucleotide sequencing was carried out

by using M13 single-stranded vectors tgl30 and tgl31 (28)with the dideoxy chain-termination procedures (29) anduniversal sequencing primer or synthetic oligonucleotidesprovided by G. Allen (Wellcome Laboratories).

Abbreviations: C, constant region; D, diversity segment; S, switchregion; J, joining segment; kb, kilobase pair(s).

6075

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 2: Immunoglobulin heavy chain locus of the rat: Striking homology to

6076 Immunology: Bruiggemann et al.

RESULTS

Structure of the Ig Heavy Chain Locus. By screening the ratcosmid library with mouse probes covering the D, JH, CM) Sa,Cy, and Ca region, 52 positive colonies were isolated. In Fig.1 the complete EcoRI/Sma I restriction map obtained from17 cosmids is depicted, further restriction analysis in thevicinity of the CH genes being indicated below the line. CM,C., and Ca genes were identified by their homology with theappropriate mouse CH probes. The rat Cy genes were detect-ed by cross-hybridization to the various mouse Cy probes.[The isotypes of the cloned rat C>, genes were identified intransfection experiments in which the CH genes were linkedto a variable region and expressed in light chain-producingmyeloma cells. Antibodies produced by these transfectantswere analyzed serologically and biochemically (M.B., un-

published data).] Overlapping fragments were found for theD-JH-CM and C,,1-C,,2b-Ce-Ca regions. In addition, a singleCy2a gene-containing cosmid was isolated. All linked geneswere found to be aligned in the same transcriptional orien-tation.

Restriction Map of the Cosmids. A restriction map of theD-JH-CM region was obtained from six overlapping cosmids(Fig. 1 Top) by hybridization with D, JH, and CM regionprobes. In subsequent analysis, a SM and a membrane , probewere used. The exact position of D, JHl, and enhancer genesegments was obtained by partial sequencing. The arrange-ment of the rat D-JH-CM region is indistinguishable from thatof the mouse locus (2, 30).The second region of the Ig heavy chain locus, obtained

from 10 overlapping cosmids, identifies the cluster Cyl-CY2b-Ce-Ca (Fig. 1 Bottom). A Cy2a-containing cosmid could not beoverlapped (Fig. 1 Middle). To identify the above gene order,we used mouse C,, C6, and Cca probes and found that thealignment, Cy-Cy-Ce-Ca, corresponds with the order of themouse CH genes (2). In addition, the positions of the CY Sregions were determined by using mouse Sy2b and Sy3 probes.

S regions of C6 and C,, as well as the membrane regions ofCsCe, and Ca were not determined.Correspondence Between Cloned and Genomic DNA. To

determine whether the restriction map ofthe cloned ratDNAagrees with the genomic arrangement, we hybridized JH, CA,,Cy, and Ca region probes that had been obtained from thecosmids to PVG rat liver DNA that had been digested withvarious enzymes (Fig. 2).

Southern blot analysis showed that the Jr-CM region lies ona 25-kb EcoRI fragment. Similarly, in a BamnHI digest, the JHand the 5' CM region lies on a 9.5-kb fragment (Fig. 2A). Bothfindings agree with the cosmid map. Hybridization with aprobe that spans the JH segments and the enhancer revealedthe presence of a second hybridizing band of 15 kb inEcoRI-digested liver DNA. The origin of this band was notidentified, although other Southern blots (not shown) showedthat sequences in the probe located downstream of theenhancer hybridized to this fragment and that it did notcontain DNA homologous to the JH segments themselves.Therefore, this additional band does not reflect the existenceof a second JH cluster. Various weak bands obtained whenhybridizing a probe that includes C, to genomic DNA are dueto the presence of S. sequences in the C. probe; this wasdemonstrated by using a C. probe lacking the S region (notshown). The 3' end of C., which lies on an 11.5-kb BamHIfragment, can be seen in Fig. 2A as a faint band directly abovethe 9.5-kb BamHI fragment. Our cosmid DNA did notcontain the 11.5-kb BamHI fragment 3' of CM, nor did a probeincluding the complete C, gene hybridize to any of thecosmids, although the CM-Ca distance in the mouse genomeis only 4.5 kb (24). However, from hybridizing the C, probeto genomic DNA, we found the same size BamHI fragmentas was obtained with the 3' C. probe (data not shown), whichleaves the conclusion that the end of the cosmid clones areeither very close to the presumed C, region or that thedistance between C. and C8 in rat is greater than in mouse.When using the different cloned rat C, genes to probe

genomic DNA, we found identical multiple bands with each

Sm R R.. () .A. I I

SHXBB XHX SXBSXH

DJHi Sp M mPimm. 22-3-1

Imsee 5-2-1mom 45-3-1

IL LILilOkb

R RI, _. I

HXSBSSBg BSBgX H B

SY2a Y2a21-2-1

R Sm R ASm Sm R Sm SmSmI1 L6 I --- IL6 I I 1

31-1-1i 18-1-3i 24-3-1

SmR-_1i

I 11-11 1111 I III I I 111B SBg BSBgBBH Bg X SBSX SXBgBH

SY, Yi sY2b Y2b

I 11 1111111 T HIB SHBgB SBBgBHBHX S X BH S

£ a... -*...15-3-Ia ... 55-l-l... --m.m.6-4-I

7-5-I mum m u.

23-2-,mo...10-2-l mum

18-2-l...37-I-l"" I28-4-I m..

FIG. 1. Organization and restriction enzyme cleavage map of the PVG rat CH gene cluster. (Top) D-JH-C. region. (Middle) Sy2.-C.2. region(could not be overlapped). (Bottom) SylCyl1Sy2b-C.2b-Ce.Ca, cluster. Structural genes are indicated by closed boxes, S,, and Sy regions and the,u membrane (m) exon are indicated by hatched boxes. The putative Ig heavy chain enhancer region is marked as "E." Restriction sites abovethe line (Sm = Sma I, R = EcoRI) correspond to mapped cosmid DNA indicated as clone numbers (dashed lines indicate vector attachment,arrows indicate extension ofclones). Restriction sites below the line are obtained by hybridization to specific mouse probes and are only completefor unique restriction sites encompassing hybridizing fragments (B = BamHI, Bg = Bgl II, H = HindIII, S = Sac I, X = Xba I).

-- . I.. .. 1.-.. i IN I'l, I "...

Proc. Natl. Acad Sci. USA 83 (1986)

Page 3: Immunoglobulin heavy chain locus of the rat: Striking homology to

Proc. Natl. Acad. Sci. USA 83 (1986) 6077

A B

JH CC2a C'i1

25-- APgtmNps.

9 5--_ 1.

CCo2b

--15- 11-9.5-8.5 * -9

'-7

2.5 --

B R B R B R B R

FIG. 2. Southern blot-hybridization analysis of rat JH (3.6-kb BamHI-Xba I fragment) and C,4 (7.2-kb HindIII fragment) (A); Cy2a (3.6-kbSac I fragment), C., (3.8-kb Sac I fragment), and Cv2b (3.2-kb Sac I fragment) (B); and Ca (2-kb Sac I-EcoRI fragment) (C) region probes obtainedfrom the cosmid clones. PVG liver DNA (10 gug) was digested with EcoRI (lanes R), BamHI (lanes B), or Sac I (lane S) and fractionated on0.7% agarose gels. Sizes are shown in kb.

probe but a considerably stronger signal from the homolo-gous genes (Fig. 2B). For example, probing genomic South-ern blots with the Sac I fragment encompassing Cyj strongly"lit up" EcoRI fragments containing C-y (11 kb) and Cy2a (9.5kb). Strong hybridization was also seen to an 8.5-kb EcoRIfragment, which we believe to contain Cy2c. A 15-kb EcoRIfragment was more weakly detected, and that is the sizeexpected for the Cy2b EcoRI fragment. Indeed, a Cy2b probelit up this band most strongly. Similarly, when hybridizing theCy2a or Cyj probes to a BamHI digest, we saw a strong bandof 4.8 kb (both Cy2a and C-y lie on 4.8-kb BamHI fragments)as well as a weak hybridization to a 2.5-kb band (which comesfrom sequences 3' of Cry). The BamHI fragment from the 3'end of Cy2a was expected to be 5 kb long; this was notdistinguished from the 4.8-kb Cyl/Cy2a doublet. As expected,the Cy2b gene was found on a 6.4-kb BamHI fragment, andstrong hybridization was obtained by using Cy2b as a probe.When probing with Cy1, a 10.5-kb BamHI fragment wasdetected in addition to the bands that can be attributed toCy2a, Cyi, and Cy2b. As there are four y subclasses known inthe rat, we assign the Cy2c gene to the 10.5-kb band. It appearsthat there are only four Cy genes in the rat and no pseudo-genes, as all of the Cy-hybridizing bands in the Southern blotcan either be accounted for on the basis of the restriction mapof the cloned Cy2a, Cy1, and Cy2b genes or can be assigned toC~y2c-By probing genomic DNA with a probe for the 3' end of Ca,

we saw only one band in various digests. The Ca gene lies ona 25-kb EcoRI fragment (not shown) and on a 9-kb BamHIfragment. The 3' part of the gene lies on a 7-kb Sac I fragment(Fig. 2C). These findings are in agreement with the cosmidmap.

Partial Sequence of Ig Genes. To determine whether thestriking similarities of the gene order of the rat and mouse Igheavy chain gene clusters reflects close homology at thesequence level, partial sequences of rat Ig heavy chain geneswere obtained.By comparing a 900-nucleotide stretch of the rat D-JH

region (Fig. 3A) to the published mouse sequence (30), wecould identify the rat D segment analogous to the mouseDQ52; it is located about 700 nucleotides 5' of JH1. This rat Dsegment is 11 nucleotides long and differs from the mouseDQ52 in that it is 1 nucleotide longer and has 1 base exchanged.Compared to the mouse, the rat JH1 shows a duplication ofthefirst 3 bases that makes it exactly one triplet longer than themouse analogue; in addition, it has 7 base exchanges thatwould lead to three amino acid differences. The nonamer andheptamer sequences implicated in V-D-J joining are wellconserved. In the spacer between DQ52 and JH, several

deletions and insertions were found. The overall homology is82%.The putative rat Ig heavy chain enhancer region located 2.5

kb 3' of JH1 (Fig. 3B) shows an overall homology of 91% tothe mouse enhancer (31). Viral enhancer core sequences (32)and a consensus octamer sequence necessary for transcrip-tional activity recently described for promoters and enhanc-ers (33) were unaltered. The enhancer was the most con-served intron region sequenced in this work.Comparison of rat and mouse intron and exon structure of

C,. and Ca is shown in Fig. 3 C and D. The 342-nucleotide-long stretch of CH2 of C, revealed 34 nucleotide differences,and 268 nucleotides of the adjacent intron showed 42 nucle-otide differences with a pronounced cluster right in themiddle of the intron. The overall homology was 88%, whilethe exon shows 90% and the intron sequence shows 84%homology to the mouse C, gene (34). By comparing the CH3domains of rat and mouse (35) Ca,, we found 93% homologyon the nucleotide and 90% homology on the deduced aminoacid level. The adjacent intron shows 88% homology.For comparison of rat and mouse Cy genes, we chose the

CH3 domains because they show more divergence than otherdomains of mouse as well as rat antibodies. By comparingsequences in and around the CH3 exon of rat and mouse Cygenes, we found that rat Cy2b is most similar to mouseCy2a/Cy2b (36) (about 78% homology for CH3), while the ratCy2a and rat Cy, are extremely similar to each other (96%homology) and show strong homology to mouse Cy, (20)(about 80% for CH3) (Fig. 3E). The differences between ratCy, and rat Cy2a are mainly clustered at the 5' end of CH3.

DISCUSSION

We have identified two clusters of the rat Ig heavy chain genelocus: D-JH-C,, and Cyl-Cy2bCeCa. The arrangement of therat CH genes is strikingly similar to that of the mouse. Thedistance between mouse genes Cy2b-15 kb-C72a-14 kb-C_-12kb-Ca (2) corresponds with the distance found in the ratbetween Cyl-14.5 kb-Cy2b-21 kb-CE-10 kb-Ca. Only the dis-tance between the most 3' Cy and the C, gene in the rat issomewhat greater than shown for BALB/c mice.Of the eight antibody isotypes described in the rat (9), our

analysis of genomic and cosmid DNA has identified sevenunique CH genes: C,,, four Cys, C6, and Ca. The C8 gene is notincluded on our cosmids, but a partial C8 cDNA has beenisolated (37), and a unique C,, gene has been described (38).We have identified the Cy2c gene by hybridization of Cyprobes to genomic DNA. The extent of cross-hybridization ofthis C72c band to different Cy probes suggests that the Cy2c

B R B S

Immunology: Briiggemann et al.

Page 4: Immunoglobulin heavy chain locus of the rat: Striking homology to

6078 Immunology: Bruggemann et al. Proc. Natl. Acad. Sci. USA 83 (1986)A

D-JH REGION

CCCTGTGGTC TGTGACTGGT AGGTGGTTTT GACTAAGCAA AGCATCACAGC GCAA 66 C

--< D >-------TGCTAACTGG GAGCACAGTG ACTTGTGGCT CAACAAAAAC CTTCTGTTTG

C 6 SC C

CAGCTTTTCA GGGSCAGCCT GAGCTATGAG GAAGTAGAGG GGCTTGAGAAG C C A

AGCCGGGGAA GAAAAGAGCG GACCTGAGAG GAAAGGGAGC TTTCTGGAGGT T A TA T T

GCAGGAGACA GTACAGAGGA GAACAACGTG CTGTGGACAA GTCCTGAGT6T 6 A G GT A G T AGA

GGGAAAGGAT GAGCAAACAT AGGCATCAGG AGGGTGGGTG CAGTGTCCTGA TGC A A A A A A

Bar HICCAAGAAC T GCCAAGAGG TCCCAGGCCA GAGCAGGCAG GTAAGATTTA

6 T A . A GGAG 6

CTGAGAGTAC AGGGTTGGGG CAGCTCCCTC ACTCCTTTCC TTGCTCCTTC6 A T G T T G T

TCCTGTTTCC TTCTCCTCTT CCCTCTTTAC AGGGTCTCGC TAGGCTAGCCCT ..G ... A T

AAGGCTAGCC TGAAAGAT A CCATCCTTCA GGTGTGCCCA TCCAGCTGAGT A A G T

TTAAGGTGGA GA TCTCTC CAAACAGCTG AGTTTCTGAG GTTT6GACCCA GA T C TG

Bar HICACAGGGGAT CCCAAGGAAC TTTGGGCTGG GTTTTATATG GCCCTGGATG

T CG 6 6G . CAC

GAGGGCTACT TAACTGTGTC TATAGTTACT CTSATGA CT GGGTCTAGGGT C 6 A TG A A C

AACTCAGGTC ACTGTCTCCT CAGGTGAGTC CTGTGTCTGG GGACTGTGGA66 C..AGG CA 6

GTTCAGATGG GCCAAGGCAG GCTGTAGTGA GAGTTTTAGT ACAGGAACAG6 T C T A G A T

JH1-------< TyrTyrTrpTyrPheAspPh.TrpGi yPro

TGGCAGAACA GAGACTGTGC TACTACTGGTACTTTGACTTCTGGGGCCCAA ..C 16 6

Val AlaGIyThrnetVaI ThrVai-SerSern>GGAACCATSGTCACCGTGTCCTCAGG TAAG ACC TT TCTTTCTGCA

6 C T CTGGCTTTThr

C TTCCCATT CTGAAACTGG GAAA GATGT TCCCAGATTT CCCCATGTCTA . A A T C A

BIgH ENHANCER

Sau3AGATCAGGACC AGGA CAC T GCAGCAGCTG GCAGGAAGCA GGTCATGTGG

A . A CPst I

< --

CAAGGCTATT TTGGGAAGGG AAAATAAAAC CACTAGGTAA ACTTCTAGTT6 6 C

GTGGTTTGAA 6AAGTGGTTT TGAAATGCTC TSTCCAGCCA CGCAGAACTGCA C A CA C

AAAGTCCAGG CTGASAAAAA CAACACCTGG ATAATTTGCA TTTCTAAAATC C 6

AAGTTGAG6A TTCATC GAA ACTGGAAA6G TCCTCTTTTA ACTTAGTGAGG C 6 T

TTCAATCTTT TAATTTTAGC TTGASAAGTT CTAGTTTCCC TCAAACTTAAC T

GTTTATCGAC TTCTAAAATA T AT TCATTTTCAA AATTAAGTTA6 ATTTAGA 6

Eco RI

CFu CH2 REGION

GGCCCAGGCA TGGCCCAGAG GGAGCAGCGA GTGGGTCTTA AGCCAGCCTG6 6 6 AG

< CH2AGCTCACACC TCAACCTTTC ATTCCAGCTG TCGTTGAGAT GAACCCCAAT

TG CA T

GTGAGTGTGT TCATTCCACC ACGTGATGCC TTCTCTGGCC CTGCACCCCGAA GC 6 6 A

CAAGTCCAGA CTCATCTGCG AGGCCACCAA CTTCAGTCCC AAACAGATCAT A 6 C A C

CAGTATCCTG GCTACAGGAT GGGAAGCCT6 TGAAATCTGG CTTCACCACAA TC 6

Bar HIGAGCCAGTGA CTGTCGAGGC CAAAOGATCC AGACCCCAAA CCTACAAGGT

T 6 CA AA C

CATAAGCACA CTGACCATCA CT6AAAGCSA CT6GCTGAAC CTGAATGT6TT T T

TCACCT6CCG CGTGGATCAC A6SGGTCTCA CCTBCTTGAA GAACBTGTCCA T T

CH2 >TCCACATSCG CT6CCAGTGA 6TA6CCT6T6 CTAABCCCAA T6CCTAGCCC

T 6 . G A

TCCCACATTA GA6CAGTCCT CCTACGGTTG TB6CCAATGC CACCCAGACA6 G A AA A 6

TSGTCATTTG CTTCTTGASC CTTGGCTTCC AACAGTGGCC AAGGCCAAGGC A T C C G A A

ATGAGCASTA G6CAGC AGO G6GATGAGAS TCAGATGGAG 6GAATCAGCAA TT A a A C A C.

TCTTCCCTTA AGCAGATTTS GAAGAT66AS ACTGAGCTTT TATCCAACTTT 6 66 A T

CACAACTAGA

Dd CH3 REGION

CH2 >ACCCCCTTAA CCGGCAAAAT TGCCAAAGTC ACAGGT6GGC CCAGATGCATGG A T C A T

ACCTGGGACA TTGTATGATG TTCCCTGCTT GCGTACCTSC TTTCTTCCTCA. C C A A G A

TAATACAGAT GCTCAG CTGCTCAGGC CCTTGTGTCA CAGAGGGAAAACTAA T T A A

CTGGAGCTAT CCAAAGAACT GCCCAGAA 6 GAAGGGCAGA G6TCTCTTGCT TG G 6 65

< CH3TCTCCTTGTC TGAGCCATAA CTCTTCTTTC TACCTTCCAG AAAACACCTT

TG

CCCACCCCAG GTCCACCTCG TACCGCCGCC GTCGGAGGAG CTGGCCCTGAGC

Sac IATGAGCTCGT GTCCCTGACA TGCCTGGTGC GAGGATTCAA CCCTAAAGAT

T CT A

GTGCTG6TGC GTTGGCTACA AGGGAATGAG GAGCTGCCCT CTSAAAGCTAA 6 T A T C A

CCTAGTGTTT GAGCCCCTGA GGGAGCCAGG CGAAGGAGCC ATCACCTACCA A 6 C

TGGTGACAAG CGTGCTGCGT GTGTCAGCTG AAACCT6GAA GCAGGGTSCCT A A A

CAGTACTCCT GCATG6TGGG CCACGAGGCC TTGCCCATGA GCTTCACCCAA

GAAGACCATC GACCGTCTGT CGGGTAAACC CACCAACGTC AACGTGTCTGT 6

CH3 >TGATCATGTC AGAGGGAGAT 6GCATTTGCT ACTGAGCCAC CTTSCCT6

C C

EY2b/yl/y2a CH3 REGION

< CH3Leu Val Arg Lys

GTGTCTGTTT CTATCCCACA GGG CTA GTC AGA AAA CCA CA6 GTA TACArg Thr Gln Val His

GACCACTCTC TGTATCCACA GGC AGA ACA CAA GTT CCG CAT GTA TACGACCACTCTC TGTATCCACA GGC ACA CCA CGA GGT CCA CAG GTA TAC

Gly Thr Pro Arg Gly Pro Gin Val Tyr

Val Gly Thr Gin Leu G1u Gin ThrGTC ATG GGT CCA CCG ACA GAG CAG TTG ACT GAG CAA ACG GTC

Ser Thr Asn GiuACC ATG TCA CCT ACC AAG GAA GAG ATG ACC CAG AAT GAA GTCACC ATG GCG CCT CCC AAG GAA GAG ATG ACC CAG AGT CAA GTCThr Met Ala Pro Pro Lys Glu Glu Met Thr Gin Ser Gin Val

Leu Leu Thr Ser Leu AsnAGT TTG ACC TGC TTG ACC TCA GGC TTC CTC CCT AAC GAC ATCAGT ATC ACC TGC ATG GTA AAA GGC TTC TAT CCC CCA GAC ATTAGT ATC ACC TGC ATG GTA AAA GGC TTC TAT CCC C'CA GAC ATTSer lie Thr Cys net Val Lys Gly Phe Tyr Pro Pro Asp Ile

Giy Val Thr Ser His Ile Glu LysGGT GTG GAG TGG ACC AGC AAC GGG CRT ATA G6A RAG ARC TAC

Val GinTAT GTG GAG TGG CAG ATG AAC GGG CAG CCA CRG G66 ARC TACTAT ACG GAG TGG AAG ATG AAC GGG CAG CCA CAG GAA AAC TACTyr Thr Glu Trp Lys Met Asn Gly Gin Pro Gin Glu Asn Tyr

Gl u Val Ser PheAAG AAC ACC GAG CCA GTG ATG GAC TCT GAC GGT TCT TTC TTCAAG AAC ACT CCA CCT ACG ATG GAC RCA GAT GGG AGT TRC TTC66G AAC ACT CCA CCT ACG ATG GAC ACA GAT GGG AGT TAC TTCLys Asn Thr Pro Pro Thr Met Asp Thr Asp Gly Ser Tyr Phe

Met Glu Arg Ser Arg Asp SeeATG TAC AGC AAG CTC ART GTG GAA AGG AGC AGG TGG GAT ASC

LysCTC TAC AGC AAG CTC AAT GTG AAG AAG GAA A66 TGG CAG CR6CTC TAC AGC AAG CTC ART GTA AAG AAA GAA ACA TGG CAG CAGLeu Tyr Ser Lys Leu Asn Val Lys Lys Glu Thr Trp Gin Gin

Arg Ala Pro Val ValAGA GCG CCC TTC GTC TGC TCC GTG GTC CAC GAG GGT CTG CACGGA AAC ACG TTC ACG TGT TCT GTG CTG CAT GAA GGC CTG CACGGA AAC ACT TTC ACG TGT TCT GTG CTG CAT GAG GGC CTG CACGly Asn Thr Phe Thr Cys Ser Val Leu His Glu Gly Leu His

CH3 >Val Ile Arq Pea

AAT CAC CAC GTG GAG AAG AGC ATC TCC CGG CCT CCG GGT AAAAAC CAC CAT ACT GAG AAG 6GT CTC TCC CAC TCT CCG GGT AAAAAC CAC CAT ACT GAG AAG AGT CTC TCC CAC TCT CCT GGT AAAAsn His His Thr Glu Lys Ser Leu Ser His Ser Pro Gly Lys

TGAGCACGGC ACCCAGAAAGC TCTCAGGTCC TAAGGGACAC TGACACCCATGACCCCAGA GTC AGTGGCC CCTCTTGGCC TAAAGGATGC CAAAACCTACTGATCCCAGA GTCCAGT6GCC CCTCTTGGCC TAAAGGATGC CAACACCTAC

TCTCCACCCI TCCCTTGTGT AAATAAAGCA CCCAGCACTG CCCTGGGACCCTCTACCACC TTTCTCGTGT AAAT6A6GCA CCCAGCTCTG CCTTGGGACCCTCTACCACC TTTCTCGTGT AA6TAA6GCA CCCAGCTCTG CCTTGGGACC

CTGCAARACT GTCCTGGTTC TTTCCGGGGT ATAGAGCCTA GGTCACGGGCCTGCRAAAAT GTCCTGGITC TTTCTGAGAT ACAGAGTCCA GCGAGGTCATCTGCAAAAAT GTCCTGGTTC TTTCTGAGAT ACAGAGTCCA GTGAGGTCAT

ITTAAGGTCT GGCTGGGGTT TAAGGCCAGA GTTGTCTTCA GGAAGAGAGTGGGCTGAGGG G TCTCCAGG GTTTG1GGCC TGAGGTTTGA CTAAGGAAAATGGCTGAGGG GCTATCCAGf GTTTGAGGCC TGAGGTTTGA CTAAGGAAAA

SacGAGGTTT6GA CACTGCCAGA CTCAGAGCTC G5511 --2bAGG6TGATCT ACACTGCCAG ACACAGCACT G551-YlRGGGTGGTCT ACACTGCCAG ACGCAGCACT G21-Y2a

FIG. 3. (A) Nucleotide sequence of the rat D-JH region with deduced amino acid sequence for JH1. D and J segments are marked with arrowheads, recognition sequences for recombination are indicated by dashes above the line, and nucleotides in the mouse sequence that differ fromthat in the rat are indicated below the line. (B) Comparison of the rat Ig heavy chain enhancer region to the mouse (EcoRI-Pst I fragment), withthose nucleotides in mouse that differ from rat given below the line. Enhancer core sequences in both transcriptional orientations (arrows) anda consensus octamer sequence are indicated by dashes above the line. (C) C,, CH2 domain and (D) Ca CH3 domain (endpoints depicted by arrowheads) were compared, with differences in the analogous mouse sequence indicated below the line. (E) Comparison of the CH3 domains(endpoints indicated by arrow heads) with deduced amino acid sequence of G55II Cy2b from cosmid 55-1-1 (Top), G55I CY1 from cosmid 55-1-1(Middle), and G21 Cy2. from cosmid 21-2-1 (Bottom). The polyadenylylation signal is indicated by dashes above the line.

gene is most closely related to Cyi, although it also cross-hybridizes to Cy2a. This might imply that there are threeclosely related Cy genes in the rat: Cy2a and Cy1, as deter-mined by sequence analysis, and Cy2c, as determined bycross-hybridization with the rat C,, probes. All of the C regiongenes isolated here (CA, C,,2a-C.l, Cy2b, CE, and CJ), havebeen shown to be functional genes by DNA transfectionexperiments (M.B., unpublished data); thus, none of them isa pseudogene.From Southern blot analysis, we have evidence (not

shown) that the expression of a particular rat y chain isotypeis accompanied by deletion of the genes located 5' to theexpressed C region in a similar fashion to that described forthe mouse. Thus, in the DNA of different rat hybridomas, thenumber of CY hybridizing fragments varies according to theisotype expressed. Fewer bands can be seen in Cy, or C.2bexpressing cell lines, while IgM-, IgG2c-, and IgG2a-produc-ing hybridomas show up to five hybridizing fragments inEcoRI-digested DNA. However, an assessment as to wheth-er Cy2a or Cy2c is the most 5' Cy gene is made difficult by the

presence in these hybridomas ofDNA rearrangements on thenonexpressed chromosome.As we did not isolate the C8 gene, our evidence that its

location is 3' of C,6 stems only from Southern blot-hybridiza-tion, where we find that the 3' end of C,. and part of Cc areon a BamHI fragment of the same size (data not shown).

In genomic Southern blot analysis with a JH region probethat includes JH1-JH4 and the Ig heavy chain enhancer, wefind two hybridizing BamHI and EcoRI bands. One bandcorresponds with the cosmid map. The second band is due toenhancer-downstream sequences, as was revealed by usingprobes covering different parts of the JHrS, region, andprobably reflects sequence similarities to a fragment foundelsewhere in the genome.Taking these data together, we propose the following gene

order for the rat Ig heavy chain locus: D-JH-CjA-Ca-[C.y2c,Cy2a1-C.ylC.y2b-CeCa. A high degree of sequence homologybetween rat and mouse C genes has been described for CE (38)and C, (37) as well as for the K light chain locus (39), a resultthat we confirm and extend here. Comparison of the se-

Page 5: Immunoglobulin heavy chain locus of the rat: Striking homology to

Proc. Natl. Acad. Sci. USA 83 (1986) 6079

quences of rat and mouse Ig heavy chain genes suggests thatIg heavy chain gene protein-encoding sequences and regu-latory sequences (enhancer, joining, and splice sites) havediverged more slowly than most intron sequences. This is insharp contrast to the situation encountered with the IgK-chain locus where Sheppard and Gutman (39), studying ratallotypes, found that K light chain genes show a fasterdivergence of exon than intron sequences. We have noexplanation to offer for this difference, although one has littleidea of the constraints that might operate on the variousintron sequences. Interestingly, a more marked differencebetween the mouse and rat K chain loci is found in the JKregion, where the rat contains additional JK segments thatmay have arisen by unequal crossing over (40).The location of the rat Cyl-Cy2b genes within the Ig heavy

chain gene cluster is equivalent to the position of mouseCy2b-Cy2a; however, sequence homology shows that rat Cylis more similar to mouse Cy, than to mouse Cy2b. Comparisonof the rat Cy2a to the rat Cy, gene and adjacent regions shows96% sequence homology, which extends over at least 2.5 kb(M.B., unpublished observation), suggesting that the gener-ation of rat Cy2a and rat C., from an ancestral Cy gene musthave occurred rather recently. In mouse, the most homolo-gous pair (Cy2b and Cy2a) displays about 84% homology (36),which means that the divergence of mouse Cy2b and mouseCy2a preceded that of rat Cy2a and rat Cyl. Similarly, thedivergence of an ancestral Cy gene in mouse branched threetimes to produce Cy3, Cyl, and Cy2 from which later Cy2b andCy2a were produced by duplication. There seems to be adifferent situation in the rat; an ancestral Cy gene wasduplicated, and one of the descendants became Cy2b, whilethe other descendant gave rise to Cy2a, Cyl, and possibly Cy2c.We thank T. Honjo, D. Katz, U. Krawinkel, F.-T. Liu, K. Marcu,

M. Neuberger, T. Rabbitts, M. Reth, J. Rogers, and F. Sablitzky forgifts of DNA probes and A. Winoto and L. Hood for facilities andadvice during the preparation of the cosmid library. We are gratefulto S. Hunt for valuable unpublished information and M. Neubergerfor many helpful discussions. This work was supported by grantsfrom the Medical Research Council and from the U.S. NationalInstitutes of Health (Grant CA-34913). M.B. is a recipient of a"Ausbildungsstipendium" from the Deutsche Forschungsgemein-schaft.

1. Honjo, T. (1983) Annu. Rev. Immunol. 1, 499-528.2. Shimizu, A., Takahashi, N., Yaoita, Y. & Honjo, T. (1982)

Cell 28, 499-506.3. Ravetch, J. V., Siebenlist, U., Korsmeyer, S., Waldmann, T.

& Leder, P. (1982) Cell 27, 583-591.4. Flanagan, J. G. & Rabbitts, T. H. (1982) Nature (London) 300,

709-713.5. White, M. B., Shen, A. L., Ward, C. J., Tucker, P. W. &

Blattner, F. R. (1985) Science 228, 733-737.6. Ishida, N., Ueda, S., Hayashida, H., Miyata, T. & Honjo, T.

(1982) EMBO J. 1, 1117-1123.7. Takahashi, N., Ueda, S., Obata, M., Nikaido, T., Nakai, S. &

Honjo, T. (1982) Cell 29, 671-679.8. Flanagan, J. G., Lefranc, M.-P. & Rabbitts, T. H. (1984) Cell

36, 681-688.9. Bazin, H., Beckers, A. & Quriujean, P. (1974) Eur. J. Immu-

nol. 4, 44-48.

10. Wiener, F., Babonitz, M., Spira, J., Klein, G. & Bazin, H.(1982) Int. J. Cancer 29, 431-437.

11. Hale, G. & Waldmann, H. (1985) in Hybridoma Technology inthe Biosciences and Medicine, ed. Springer, T. A. (Plenum,New York), pp. 453-471.

12. Hale, G., Clark, M. & Waldmann, H. (1985) J. Immunol. 134,1-5.

13. Dayhoff, M. 0. (1978) Atlas of Protein Sequence and Struc-ture (National Biomedical Research Foundation, Washington,DC).

14. Steinmetz, M., Winoto, A., Minard, K. & Hood, L. (1982) Cell28, 489-498.

15. Lund, T., Grosveld, F. G. & Flavell, R. A. (1982) Proc. Natl.Acad. Sci. USA 79, 520-524.

16. Diamond, A. G., Windle, J. M., Butcher, G. W., Winoto, A.,Hood, L. & Howard, J. C. (1985) Transplant. Proc. 17,1808-1811.

17. Reth, M. G. & Alt, F. W. (1984) Nature (London) 312,418-423.

18. Neuberger, M. S. (1983) EMBO J. 2, 1373-1378.19. Matthyssens, G. & Rabbitts, T. H. (1980) Nucleic Acids Res.

8, 703-713.20. Honjo, T., Obata, M., Yamawaki-Kataoka, Y., Kataoka, T.,

Kawakami, T., Takahashi, N. & Mano, Y. (1979) Cell 18,559-568.

21. Lang, R. B., Stanton, L. W. & Marcu, K. B. (1982) NucleicAcids Res. 10, 611-630.

22. Schreier, P. H., Bothwell, A. L. M., Mueller-Hill, B. & Bal-timore, D. (1981) Proc. Natl. Acad. Sci. USA 78, 4495-4499.

23. Stanton, L. W. & Marcu, K. B. (1982) Nucleic Acids Res. 10,5993-6006.

24. Moore, W., Rogers, J., Hunkapiller, T., Early, P.,Nottenburg, C., Weissman, I., Bazin, H., Wall, R. & Hood,L. E. (1981) Proc. Natl. Acad. Sci. USA 78, 1800-1804.

25. Liu, F.-T., Albrandt, K., Sutcliffe, J. G. & Katz, D. H. (1982)Proc. NatI. Acad. Sci. USA 79, 7852-7856.

26. Rigby, P. W. J., Dieckmann, M., Rhodes, C. & Berg, P. (1977)J. Mol. Biol. 133, 237-251.

27. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) MolecularCloning: A Laboratory Manual (Cold Spring Harbor Labora-tory, Cold Spring Harbor, NY).

28. Kieny, M. P., Lathe, R. & Lecoq, J. P. (1983) Gene 26, 91-99.29. Sanger, F., Coulson, A. R., Barrell, B. G., Smith, A. J. H. &

Roe, B. A. (1980) J. Mol. Biol. 143, 161-178.30. Sakano, H., Kurosawa, Y., Weigert, M. & Tonegawa, S.

(1981) Nature (London) 290, 562-565.31. Banerji, J., Olson, L. & Schaffner, W. (1983) Cell 33, 729-740.32. Gillies, S. D., Morrison, S. L., Oi, V. T. & Tonegawa, S.

(1983) Cell 33, 717-728.33. Mason, J. O., Williams, T. & Neuberger, M. S. (1985) Cell 41,

479-487.34. Kawakami, T., Takahashi, N. & Honjo, T. (1980) Nucleic

Acids Res. 8, 3933-3945.35. Tucker, P. W., Slightom, J. L. & Blattner, F. R. (1981) Proc.

Natl. Acad. Sci. USA 78, 7684-7688.36. Ollo, R., Auffray, C., Morchamps, C. & Rougeon, F. (1981)

Proc. Natl. Acad. Sci. USA 78, 2442-2446.37. Sire, J., Auffray, C. & Jordan, B. R. (1982) Gene 20, 377-386.38. Steen, M.-L., Hellmann, L. & Pettersson, U. (1984) J. Mol.

Biol. 177, 19-32.39. Sheppard, H. W. & Gutman, G. A. (1981) Proc. NatI. Acad.

Sci. USA 78, 7064-7068.40. Sheppard, H. W. & Gutman, G. A. (1982) Cell 29, 121-127.

Immunology: Brfiggernann et al.