complete sequence assembly and characterization of the c57bl/6

15
of January 2, 2019. This information is current as Heavy Chain V Region Characterization of the C57BL/6 Mouse Ig Complete Sequence Assembly and and Anne E. Corcoran Colette M. Johnston, Andrew L. Wood, Daniel J. Bolland http://www.jimmunol.org/content/176/7/4221 doi: 10.4049/jimmunol.176.7.4221 2006; 176:4221-4234; ; J Immunol References http://www.jimmunol.org/content/176/7/4221.full#ref-list-1 , 42 of which you can access for free at: cites 90 articles This article average * 4 weeks from acceptance to publication Fast Publication! Every submission reviewed by practicing scientists No Triage! from submission to initial decision Rapid Reviews! 30 days* Submit online. ? The JI Why Subscription http://jimmunol.org/subscription is online at: The Journal of Immunology Information about subscribing to Permissions http://www.aai.org/About/Publications/JI/copyright.html Submit copyright permission requests at: Email Alerts http://jimmunol.org/alerts Receive free email-alerts when new articles cite this article. Sign up at: Print ISSN: 0022-1767 Online ISSN: 1550-6606. Immunologists All rights reserved. Copyright © 2006 by The American Association of 1451 Rockville Pike, Suite 650, Rockville, MD 20852 The American Association of Immunologists, Inc., is published twice each month by The Journal of Immunology by guest on January 2, 2019 http://www.jimmunol.org/ Downloaded from by guest on January 2, 2019 http://www.jimmunol.org/ Downloaded from

Upload: others

Post on 03-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Complete Sequence Assembly and Characterization of the C57BL/6

of January 2, 2019.This information is current as

Heavy Chain V RegionCharacterization of the C57BL/6 Mouse Ig Complete Sequence Assembly and

and Anne E. CorcoranColette M. Johnston, Andrew L. Wood, Daniel J. Bolland

http://www.jimmunol.org/content/176/7/4221doi: 10.4049/jimmunol.176.7.4221

2006; 176:4221-4234; ;J Immunol 

Referenceshttp://www.jimmunol.org/content/176/7/4221.full#ref-list-1

, 42 of which you can access for free at: cites 90 articlesThis article

        average*  

4 weeks from acceptance to publicationFast Publication! •    

Every submission reviewed by practicing scientistsNo Triage! •    

from submission to initial decisionRapid Reviews! 30 days* •    

Submit online. ?The JIWhy

Subscriptionhttp://jimmunol.org/subscription

is online at: The Journal of ImmunologyInformation about subscribing to

Permissionshttp://www.aai.org/About/Publications/JI/copyright.htmlSubmit copyright permission requests at:

Email Alertshttp://jimmunol.org/alertsReceive free email-alerts when new articles cite this article. Sign up at:

Print ISSN: 0022-1767 Online ISSN: 1550-6606. Immunologists All rights reserved.Copyright © 2006 by The American Association of1451 Rockville Pike, Suite 650, Rockville, MD 20852The American Association of Immunologists, Inc.,

is published twice each month byThe Journal of Immunology

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

by guest on January 2, 2019

http://ww

w.jim

munol.org/

Dow

nloaded from

Page 2: Complete Sequence Assembly and Characterization of the C57BL/6

Complete Sequence Assembly and Characterization of theC57BL/6 Mouse Ig Heavy Chain V Region1

Colette M. Johnston,2 Andrew L. Wood, Daniel J. Bolland, and Anne E. Corcoran3

The mechanisms that regulate variable (V) gene selection during the development of the mouse IgH repertoire are not fullyunderstood, due in part to the absence of the complete locus sequence. To better understand these processes, we have assembledthe entire 2.5-Mb mouse IgH (Igh) V region sequence of the C57BL/6 strain from public sequences and present the first completeannotated map of the region, including V genes, pseudogenes, repeats, and nonrepetitive intergenic sequences. In so doing, we havediscovered a new V gene family, VH16. We have identified clusters of conserved region-specific intergenic sequences and haveverified our assembly by genic and intergenic Southern blotting. We have observed that V pseudogenes are not evenly spreadthroughout the V region, but rather cluster together. The largest J558 family, which spans more than half of the locus, has twostrikingly different domains, which suggest points of evolutionary divergence or duplication. The 5� end contains widely spacedJ558 genes interspersed with 3609 genes and is pseudogene poor. The 3� end contains closely spaced J558 genes, no 3609 genes,and is pseudogene rich. Each occupies a different branch of the phylogenetic tree. Detailed analysis of 500-bp upstream of allfunctional genes has revealed several conserved binding sites, general and B cell-specific, as well as key differences betweenfamilies. This complete and definitive assembly of the mouse Igh V region will facilitate detailed study of promoter function andlarge-scale mechanisms associated with V(D)J recombination including locus contraction and antisense intergenictranscription. The Journal of Immunology, 2006, 176: 4221–4234.

I n the humoral immune system, a diverse Ab repertoire isproduced in the first instance by combinatorial and junctionaldiversity of IgH (Igh) and IgL chain (Igl) gene loci by a

process termed V(D)J recombination. The Igh locus on mousechromosome 12 is the first to undergo ordered V(D)J recombina-tion in early B lymphocytes. First, a D gene is recombined with aJ gene, and second, one of many V genes is recombined with thisDJ gene segment to produce a V(D)J rearranged gene. This isultimately expressed on the B cell surface as an IgH polypeptide,which associates with an IgL to form the BCR. To provide a largediversity of gene segments for recombination, the Igh locus, incommon with the Ig�L locus, is highly complex. It spans a regionof �3 Mb, containing 8 constant region genes, 4 J genes, 10–13 Dgenes and an unknown number of V genes, estimated to be 150(1–3). These genes have been classified into 15 families based onsequence homology (4). Understanding the genomic organizationof this large group of genes and the consequent constraints on theirrecombination ability is crucial to understanding Ab diversity.Several groups have mapped the position of functional V genes by

deletion mapping (4, 5) and yeast artificial chromosome (YAC)4

contig assembly (1), which have given much useful information,and have predicted, with reasonable accuracy, the relative posi-tions of genes within the locus. However, these studies have beenlimited in accuracy about the largest families, in particular theD-distal J558 family, which contains approximately half of the Vgenes, and have not been able to provide information about Vpseudogenes, repeats, and intergenic sequences.

It is believed that all of the functional V genes in the Igh locusare used in the large number of V(D)J recombinations that occur ina B cell population, albeit they may not all survive to be includedin the mature B cell repertoire. This assumption has not been ver-ified due to the absence of the complete locus sequence. However,there is a large amount of variation between mouse strains, both innumbers and relative usage of V genes (6). Studies in BALB/c,commonly used for immunological studies, have suggested thatgene usage varies throughout ontogeny, with the 3� V genes pref-erentially recombined in fetal liver B cells (7, 8), and early in bonemarrow B cell development (9). In contrast, the C57BL/6 straindoes not appear to exhibit this 3� recombination bias (10–12),suggesting strain-specific developmental differences in usage of Vgenes. Despite the lack of complete locus sequence, many factorsaffecting recombination have been studied in small V gene fami-lies, in which the number and relative position of genes could bepredicted reasonably accurately by Southern blotting and deletionmapping, and for which sequence was available. These include therecombination signal sequence (RSS), heptamer, nonamer (13,14), and spacer sequences (15); relative V gene promoter strength(16–18); distance between rearranging elements (19); requirementfor additional regulatory elements (18). However, with notable ex-ceptions (20), these studies have relied on sequences of the genes

*Laboratory of Chromatin and Gene Expression, Babraham Institute, Babraham Re-search Campus, Cambridge CB2 4AT, United Kingdom

Received for publication November 4, 2005. Accepted for publication January17, 2006.

The costs of publication of this article were defrayed in part by the payment of pagecharges. This article must therefore be hereby marked advertisement in accordancewith 18 U.S.C. Section 1734 solely to indicate this fact.1 This research was supported by the Biotechnology and Biological Sciences Re-search Council (to C.M.J. and A.L.W.), the Association for International CancerResearch (to D.J.B.), and a Medical Research Council Career Development Award (toA.E.C.).2 Current address: X Chromosome Group (Team 61), Wellcome Trust Sanger Insti-tute, Hinxton, Cambridge CB10 1SA, U.K.3 Address correspondence and reprint requests to Dr. Anne E. Corcoran, Laboratoryof Chromatin and Gene Expression, Babraham Institute, Babraham Research Campus,Cambridge CB2 4AT, UK. E-mail address: [email protected]

4 Abbreviations used in this paper: YAC, yeast artificial chromosome; BAC, bacterialartificial chromosome; Bright, B cell regulator of IgH transcription; DICE, down-stream Ig control element; RSS, recombination signal sequence; SATB1, special AT-rich sequence binding protein; Inr, initiator.

The Journal of Immunology

Copyright © 2006 by The American Association of Immunologists, Inc. 0022-1767/06/$02.00

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 3: Complete Sequence Assembly and Characterization of the C57BL/6

themselves, or at most a small amount of flanking sequence, andhave not been able to assess the role of chromatin context, pro-vided by the relatively large tracts of intervening sequence (2–50kb) between the relatively small (500 bp) V genes. Further, it re-cently has become increasingly clear that large-scale chromatinremodelling events, such as nuclear relocalization (21), antisenseintergenic transcription (22), and locus contraction (23), precedeV(D)J recombination. These are not confined to genes, but ratheraffect large chromatin domains. Thus, to gain a complete under-standing of how the Ig repertoire is established, it will be necessaryto investigate the large noncoding regions between genes.

In this study, we set out to assemble and annotate the primarysequence data of the mouse Igh V region from publicly availablesources, including the mouse genome sequencing project (En-sembl) (24), to provide a detailed picture of the locus, includingexact numbers and positions of genes within each family and theircorrect genomic context. This assembled and fully annotated se-quence is the first report that places V genes relative to flankingregions, pseudogenes, repeats, and nonrepetitive intergenic se-quences, enabling study of their role in V(D)J recombination. Fur-ther, studies of V(D)J recombination in the C57BL/6 mouse strainhave been limited. Because the C57BL/6 genome sequence is be-ing assembled in Ensembl, and this is likely to be the major mousestrain for future study, it will be vital to investigate the recombi-nation patterns of V genes in this strain with the benefit of com-plete locus knowledge.

Materials and MethodsSequence assembly

BLAST software (25) searches of National Center for Biotechnology In-formation (NCBI) and Ensembl databases, using known published V genesequences (26, 27) and those deposited in public V gene databases, includ-ing the International ImMunoGeneTics Information System (�http://imgt.cines.fr�), identified bacterial artificial chromosome (BAC) sequences fromC57BL/6 mice containing IgH (Igh) V region genes. Sequences were thenassembled using Sequencher software (Gene Codes) and confirmed by vi-sual inspection of overlaps.

Sequence analysis

Analysis of the sequence and identification of V genes and non-Ig geneswas performed using NIX (�www.hgmp.mrc.ac.uk/NIX�), a web-basedplatform that combines results from the following complementary se-quence analysis programs and databases, thereby providing optimal con-sensus for gene predictions: GRAIL (�http://compblo.ornl.gov/Grail-1.3/�),Fex (�http://www.softberry.com�), Hexon (�www.softberry.com�),MZEF (�http://rulal.cshl.org/tools/genefinder/�), GeneMark (�http://opal.blology.gatech.edu/GeneMark/�), GeneFinder (�www.biostat.jhsph.edu/�wmchen/gf.html�), FGene (�www.softberry.com�), BLAST (�www.ncbl.nlm.nih.gov/BLAST/�), Polyah (�www.softberry.com�), and tRNAscan(�www.genetics.wustl.edu/eddy/tRNAscan-SE/�). This platform was dis-continued after July 2005 because of the restructuring of the Human Ge-nome Mapping Project, but individual programs are still available. Anno-tation of V genes and pseudogenes (leader, exon, RSS heptamer, andnonamer) was performed in MacVector software (Oxford MolecularGroup) and exported to GenBank. The complete annotated sequence, in-cluding start and end positions of leader, V gene exon, and start of RSSheptamer and nonamer is available from EMBL (accession no.BN000872). The phylogenetic tree was constructed using TreeView soft-ware (28) (�http://taxonomy.zoology.gla.ac.uk/rod/treeview.html�) frommultiple sequence alignments generated by the ClustalW software tool atthe European Bioinformatics Institute (29) (�www.ebi.ac.uk/clustalw/index.html�). Alignment of transcription factor binding sites at V geneflanking regions was performed using Genomatix MatInspector software(�www.genomatix.de�) (30).

Analysis of repeat sequences

The RepeatMasker software program screens DNA sequences for low-complexity DNA sequences and interspersed repeats. The sequence wassubmitted to the RepeatMasker server (�http://woody.embl-heidelberg.de/Repeatmasker�), originally developed by A. Smit and P. Green (unpub-

lished data). Analysis was performed using RepeatMasker version 2002/05/15 with the rodent repeat database, cross�match software (version0.990329; �http://repeatmasker.org�), and the RepBase database (version7.4; G.I.R.I. �http://www.girnst.org/repbase/�). The most sensitive settingwas used for maximum accuracy.

Southern blotting

Genomic DNA from C57BL/6 kidney was digested with EcoRI or HindIII,separated by agarose gel electrophoresis, and blotted onto Biodyne B nylonmembrane (Pall). V gene family and intergenic PCR products were clonedinto pGEM-T Easy vectors (Promega), excised, purified, radioactively la-beled with [�32P]dCTP, and used to probe Southern blots. Southern blothybridization results were analyzed with a phosphorimager (Fujifilm).

ResultsSequence assembly of the mouse IgH locus

The Igh locus has been partially assembled by the Ensembl project(24) (�www.ensembl.org�). However, at the date of manuscriptsubmission, the Ensembl assembly is incomplete, with single con-tig coverage over large parts of NCBI Build m34 (freeze Septem-ber 2005), many contigs in draft form, and over 40 large and smallgaps encompassing �450 kb of sequence. Thus, we chose to as-semble the locus manually, which has enabled us to include manyBAC sequences not available on Ensembl, order the contigs cor-rectly, provide at least two-fold coverage over large parts, andclose all of the gaps. The Igh locus spans 3.3 Mb on mouse chro-mosome 12, from the 3� enhancer at position 108.5 Mb to the firstnon-V gene at 111.85 Mb (Ensembl). We have assembled 2.75 Mbof the Igh locus, which contains the complete mouse Igh V region(2.5 Mb) and upstream flanking sequence (109.11–111.85 Mb).We used Sequencher software, which enabled us to incorporateand align all BACs that we found by BLAST analysis (Fig. 1). Forsimplicity, only the most complete BACs are shown. A list of otheroverlapping BACs used is available on request. We noticed a num-ber of mistakes in the ordering of contigs in the Ensembl assembly.For example, contig AC074328 overlaps both AC073939 andAC087166 at the 5� end of the Igh locus in our assembly but isplaced further 3� in Ensembl. This may be because this locus con-tains a large number of very similar V genes and a high density ofLINE1 and other repeats (31), both of which make automated as-sembly less accurate.

Analysis of our assembled sequence using NIX software (�www.hgmp.mrc.ac.uk�) enabled us to identify sequences with homologyto known V gene fragments, including diverged pseudogenes andgene remnants. Further, it aided identification of germline genes,because many genes deposited with databases are recombined, ma-ture, somatically hypermutated, and, hence, nongermline V genes.V genes and pseudogenes were assigned to families using the no-menclature originally suggested by Brodeur and Riblet (32). Asequence was assigned to a V family if it had at least 80% identityat the nucleotide level over the entire coding exon sequence (ex-cluding the leader). A sequence was classed as a V gene if it hadan intact translation initiation codon (ATG), splice junctions, RSS,had no in-frame stop codons or frameshifts, and had a minimumlength of 291 bp. Where a sequence had all the features of a normalcoding gene, except for a noncanonical splice junction, we classedit as a coding gene, even though it may be nonfunctional. How-ever, this applied to only one gene (J558.13.103), which had aGT-to-GC splice site change, which allows splicing, but at lowerefficiency. A sequence was classed as a V pseudogene if it con-tained stop codons, frameshifts, lacked an ATG, or had a signifi-cantly altered RSS (spacer longer or shorter than 23 � 1 bp; anyalterations in the first three nucleotides of the heptamer (CAC); �2mismatches elsewhere in the heptamer; or �1 G residue in thenonamer). Most pseudogenes had more than one of the above

4222 SEQUENCE ASSEMBLY OF THE MOUSE Igh V REGION

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 4: Complete Sequence Assembly and Characterization of the C57BL/6

defects. For those with only one defect, this was usually in theRSS. If no evidence of expression of the V(D)J rearranged genewas found in cDNA and EST libraries, these were classed as pseu-dogenes. Pseudogenes were assigned to specific V families if theyhad at least 80% nucleotide identity to functional genes of thatfamily. In addition, the 7183 and 3609 families have three and onepseudogene members, respectively, that are related to other pseu-dogenes of that family with �80% similarity, but which have�80% similarity to the genic members, and we included these inthe family. A small number of pseudogenes (hereafter named PG),although recognizable as V gene remnants, could not be assignedto any particular family, due to similar homology levels to several

families. In total, based on these definitions, 110 genes and 85pseudogenes were identified (Table I).

Features of the mouse Igh locus V region

The distribution of V genes and V pseudogenes in the Igh V regionis shown in Fig. 2. All the V genes and pseudogenes are in thesame orientation, indicating that there have been no obvious in-versions during the evolution of this locus. There are large inter-genic distances (2–50 kb) between the relatively small genes(�500 bp). The 3�end of the V region is more compact than the 5�end, with average intergenic distance (between genes and/or pseu-dogenes) �10 kb, compared with 25 kb for the 5� end.

The relative positions of V gene families generally confirmsprevious mapping studies (1, 4, 5). The J558 family spans over50% of the V region (�1.53 Mb) at the 5� end (Fig. 2). Notabledifferences include the complete interspersion of the 3609 familywith the 5� Mb of the J558 region, except for a single member 3�of the J558 family, in contrast to its previous estimated position atthe 3� end of, or 3� of the J558 family (4). The interspersion ofmembers of the VH15 and VH10 families with the 3� end of theJ558 region shows that the J558 region is not completely separatefrom the other families, contrary to previous predictions (4). The“middle” families (i.e., all except J558, 3609, 7183, and Q52)show a reasonably high level of interspersion and family membersare not necessarily clustered together. The notable exception is thefive-member J606 family, which is not interspersed with any otherfamily. At the 3� end of the V region, the Q52 and 7183 familiesare completely interspersed but do not overlap with any other genefamilies. Pseudogenes belonging to a particular family always mapto the region occupied by the functional genes. Where pseudo-genes are present, the overall density of V gene sequences is in-creased. This is most marked in the 7183/Q52 region and the 3�end of the J558 region.

Table I. Summary of the V region genes and pseudogenes in the mouselgh locus grouped by family

V Gene Family Genes Pseudogenes

7183 10 11Q52 9 4S107 3 1SM7 4 036–60 6 2VH11 2 0X24 1 1VH16 1 0VGAM3.8 4 03609N 1 1VH12 1 0J606 5 0VH10 2 1VH15 1 03609 8 8J558 52 37Unclassified pg 0 19Total 195 110 85

FIGURE 1. Assembly of the mouse IgH chain V region. Sequencher alignment of BAC sequences used in the assembly of the V region. Arrows indicateorientation of BAC sequences. The diagram key at the bottom illustrates direction and degree of coverage as follows: �, single-strand coverage; u, doublecoverage of the same strand; f, coverage of both strands. The suffix (.number) indicates the version of the BAC used in the assembly.

4223The Journal of Immunology

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 5: Complete Sequence Assembly and Characterization of the C57BL/6

The upstream boundary of the Igh V region is marked by twonon-Ig genes, vasoactive intestinal peptide receptor type 2 (Vipr2)(first exon; Fig. 2) and zinc finger protein type 386 (Zfp386). Vipr2is expressed from the opposite strand to the V genes and has pre-viously been experimentally mapped to the telomeric end of chro-mosome 12 (33). We are confident that Vipr2 lies outside of theIgh locus, because the entire region from the telomere to Vipr2 hassynteny with human chromosome 7, yet the human Igh locus mapsto chromosome 14. Thus, this Vipr2 positioning is not conservedacross species and thus is not functionally important (33) and mayindeed be an evolutionary chromosomal breakpoint in the humanlineage. The 3� end of the V region is flanked by the downstreammarker D12Mit263, positioned at 109.05 Mb.

Identification of a novel V gene family

There is an additional gene located at 2.005 Mb, betweenVGAM3.8.1.57 and SM7.3.54, which does not fit the criteria forinclusion in any of the 15 V gene families. This gene has an openreading frame, a normal leader and RSS, and conserved splicejunctions, but does not have �80% identity to any other known V

gene. Its closest homology is to 7183.20.37 (72% identity at thenucleotide level over the exon sequence only), and it shows nu-cleotide identity in the 60–70% range to 7183, X24, VH10, VH11,S107, J606, and 3609N. At the protein level, the gene has homol-ogy in the range 47–57% to predicted proteins from these families.As a comparison, the single-member VH12 gene family has a clos-est identity of 73–78% (to the 36–60 family) and the single-mem-ber VH15 family has 66–68% identity to the SM7 family at thenucleotide level. We have named this novel gene VH16(VH16.1.55), in accordance with the existing nomenclature.

Identification of non-V pseudogenes in the V region

We have not found any functional non-V genes within the V re-gion but have found numerous pseudogenes, both processed andunprocessed, detailed in Table II. Between the second and thirdJ558 gene at the 5� end is a processed ornithine decarboxylasepseudogene, previously mapped within the first 9% of the Igh Vregion (34). This is in two fragments, the first corresponding toexons 1–11, and the second to exons 11 and 12. This and otherprocessed pseudogenes may have arisen from locus evolutionary

FIGURE 2. Scale map of V genes and pseudogenes in the mouse IgH locus V region. The 110 genes and 85 pseudogenes are depicted as trianglespointing in the direction of their orientation in the locus. Genes are numbered systematically and colored according to family. Pseudogenes are shown asopen triangles, with colors corresponding to their family of origin. The system of V gene nomenclature is as follows: (V gene family name).(gene numberwithin family).(gene number within locus), numbered from the 3� end. The 19 unclassified pseudogenes, depicted as open gray triangles, are included inthe total locus numbering, and are called PG 1 to 19. The gray box toward the bottom of the figure contains the “middle” V gene families located betweenthe J558/3609 and the 7183/Q52 regions.

4224 SEQUENCE ASSEMBLY OF THE MOUSE Igh V REGION

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 6: Complete Sequence Assembly and Characterization of the C57BL/6

events. For example, Q7TQD3, a pseudogene of the signaling mol-ecule ATTP, contains all 13 exons of the corresponding functionalgene on chromosome 7, but spliced together over a distance of 4kb, rather than the original 35 kb covered in chromosome 7. Fur-thermore, it is placed between two LINE L1 repeat segments.Thus, it was possibly generated as a processed RNA molecule,which was then transposed with the repeat. We suggest that noneof the non-V genes are functionally significant because their loca-tion is not conserved across species (35).

The mouse V region has a high proportion of interspersedrepeats.

This assembly has enabled detailed analysis of type and position ofinterspersed repeat sequences. Different repeats are known to bedifferentially associated with gene-rich active euchromatin(SINEs) and gene-poor inactive heterochromatin (LINEs) (36).RepeatMasker analysis revealed that the region contains a greaterproportion of interspersed repeats (52.4%) than most of the mousegenome (39%) (37). LINE (L1) elements constitute 40.4% of theIgh V region sequence, the value predicted by detection of L1sequence at YAC ends during YAC assembly of the locus (40%)(1, 31). This is similar to other AgR loci, such as the Ig� locus(41%) (38), and in marked contrast to 15% in the mouse autosomalgenome overall (39). The region also contains a correspondinglylower proportion of SINEs (2.1%) than the rest of the genome.Both of these features have been proposed to be characteristic ofmonoallelically expressed loci (39). In contrast, the human IgH Vregion contains a much lower proportion of LINEs (23%), whichis not significantly different from the rest of the human genome(35, 40). Of the 338 identifiable LINE elements in the human IghV region, only two are full length. In our mouse sequence, weidentified 22 full-length LINES out of a total of 1016 identifiableelements. It has been estimated that a significant proportion offull-length LINEs in the mouse genome are capable of active ret-rotransposition (41). Thus, the large number identified in the

mouse V region may have contributed to the greater expansion ofthe mouse locus compared with the human locus (discussedbelow).

FIGURE 3. Dot plot of V region vs V region (masked). Alignment ofthe masked (y axis) vs ummasked (x axis) sequences of the mouse IgHchain locus V region. “Middle” families refers to the 11 families not spe-cifically named that are all interspersed in this region. The positive strandwas searched in this analysis, with a stringency of 95% identity over 50-bpwindows.

Table II. Nonvariable region genes and pseudogenes within the mouse lgh V region locusa

Name

Locus Position (bp)

Str. DescriptionGene LocationEnsembl v34Start End

Vipr2 201 298 (�) Vasoactive intestinal peptide receptor 2 ex1 Chr12 (111.8 Mb)Zfp386 11702 13394 (�) Zinc finger protein 386 (Kruppel-like) ex3 Chr12 (111.8 Mb)Zfp386 17346 17577 (�) Zinc finger protein 386 (Kruppel-like) ex2 Chr12 (111.8 Mb)Zfp386 24433 24522 (�) Zinc finger protein 386 (Kruppel-like) ex1 Chr12 (111.8 Mb)Odc1 78928 80116 () Ornithine decarboxylase ex1–10 Chr1 (152.6 Mb)Odc1 89856 90035 () Ornithine decarboxylase ex11–12 Chr1 (152.6 Mb)Olfr141 137380 138138 () Olfactory receptor 141 pseudogene Chr2 (86.5 Mb)QZTQDG 175243 179493 (�) Signaling molecule ATTP ex1–13 Chr7 (41.0 Mb)Rpl7a 240067 240794 (�) Ribosomal protein L7a ChrX (160.4 Mb)Hmgb2 297130 297735 (�) High-mobility group box 2 ex2–5 Chr8 (56.6 Mb)Hmgb2 411290 412141 (�) High-mobility group box 2 ex1–5 Chr8 (56.6 Mb)Cdc5 452512 455407 (�) Cdc5-like/NpwBP pseudogene Chr17 (42.9 Mb)Rpl7a 479805 480709 (�) Ribosomal protein L7a ChrX (160.4 Mb)Toeb2 524683 525060 () Transcription elongation factor B Chr17 (21.6 Mb)Hmgb2 543987 544776 (�) High-mobility group box 2 ex2–5 Chr8 (56.6 Mb)Hmgb2 593946 595111 (�) High-mobility group box 2 ex1–5 Chr8 (56.6 Mb)Birc3 826533 826895 () Baculoviral inhibitor of apoptosis ex3–6 Chr9 (7.9 Mb)Hmgb2 845465 846268 (�) High-mobility group box 2 ex1–5 Chr8 (56.6 Mb)Hmgb2 894486 897235 (�) High-mobility group box 2 ex1–5 Chr8 (56.6 Mb)Rad23b 1804775 1805505 () RAD23b ex1–9 homolog (S. cerevisiae) Chr4 (55.3 Mb)Wdr22 1882119 1882424 () WD repeat domain 22 Chr12 (77.2 Mb)Rpl26 2245559 2245935 (�) Ribosomal protein L26 Chr11 (68.6 Mb)Ankrd12 2440199 2441262 () Ankyrin repeat domain 12 ex1–7 Chr17 (63.7 Mb)

a Str, refers to the orientation of the gene in relation to V region genes. (), same orientation; (�), opposite orientation; ex, exon. All listed genes are pseudogenes with85–99% identity to the original gene from which the pseudogene is derived, except the first two genes listed, which are functional genes outside the V region. Gene locationrefers to the chromosomal position of the corresponding original gene.

4225The Journal of Immunology

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 7: Complete Sequence Assembly and Characterization of the C57BL/6

4226 SEQUENCE ASSEMBLY OF THE MOUSE Igh V REGION

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 8: Complete Sequence Assembly and Characterization of the C57BL/6

Patterns of duplication and clustering in the V region

The Igh locus in the mouse has expanded by means of numerousduplication events, probably facilitated in part by the large amountof recombination that occurs naturally in the locus, and by theputative ability of the RAG genes to act as general transposases(42). To visualize the extent of duplication and identify patternswith the benefit of intergenic sequences we aligned the sequenceagainst a repeat-masked copy of itself. This analysis is depicted inFig. 3. Within the large J558 region, there are two strikingly dif-ferent sequence patterns. In the 5� part, in which the J558 genes arevery interspersed with 3609 genes, the whole region is relativelygene poor, with large intergenic distances (7–50 kb) between thegenes. Because the V genes are small (500 bp), it is clear that thereare extensive repeated sequences in many intergenic regions. Incontrast, toward the 3� end of the J558 region, which contains no3609 genes, there is a discrete domain comprising a tight clusteringof J558 genes and pseudogenes, with intergenic distances rangingfrom 3 to 25 kb. This region exhibits a much greater level ofduplication over both genic and intergenic regions. The 5� and 3�ends of the J558 region identified in the dot plot (Fig. 3) have verydifferent numbers of associated pseudogenes. The 5� A region(genes J558.89pg.195 to 3609.2pg.138) occupies 1 Mb and con-tains 29 J558 genes, 13 J558 pseudogenes, 7 3609 genes, and 83609 pseudogenes. The 3� B region (genes J558.47.137 toJ558.6.96) occupies 400 kb and contains 19 J558 genes and 23pseudogenes. These contrasting patterns are strongly suggestive ofa different pattern of duplication and evolution between the twoparts of the J558 region.

Downstream of the A and B J558 regions, there is a segmentalduplication of a cluster of four genes from different families (J558,PG, VH10, and J558), indicating a single duplication of this entiresubdomain. Adjacent to this is the discrete J606 region. Furtherdownstream, there is another tandem duplication involving genesfrom four families (VH11, 36-60, X24, and SM7). Interspersion ofthese small families is likely to have arisen by duplication of geo-graphical regions containing clusters of V genes from differentfamilies. This would explain the apparent randomness of intersper-sion of V gene families, and suggests that, in a chromatin context,the families may be less important than geography. At the 3� end,the 7183 and Q52 families, which are completely interspersed witheach other, form another separate domain, which, similar to theJ558 region, is highly repetitive.

Sequence comparison between V gene families

Sequence relatedness between V region gene families was as-sessed by ClustalW analysis of all V gene sequences that had afull-length exon. This included all family-specific pseudogenesand unclassified pseudogenes. Sequences entered into Clustal didnot include the leader exon. The alignments were used to create aphylogenetic tree, which accurately measures evolutionary diver-sity. The V genes fall into three distinct groups as shown in Fig.4a. Group 1 includes the J558, SM7, VH15, and VGAM3.8 genes.It can be further divided into sequence-related subgroups J558 Aand B. Strikingly, all of the J558 B genes originated from thedense 3� J558 B cluster identified in the dot plot, whereas the J558A genes originate from the 5� J558 A region, further underlining

the likely divergent evolution of these two parts of the J558 region.Group 2 includes the 3609, Q52, VH12, and 36–60 genes, whereasgroup 3 includes the 7183, VH10, J606, 3609N, VH16, S107, X24,and VH11 families. Fig. 4b, inset, illustrates that families belong-ing to these three evolutionary groups are not clustered together inthe IgH locus. Groups 1, 2, and 3 correspond to clans I, II, and III,originally ordered by corresponding protein structure conservationof representative family members (43, 44). This clan conservationoccurs most strongly in the framework 1 region, which predictsclan identity, and framework 3, which is family specific within aclan. Our study extends these groupings to all V genes and con-firms their integrity at the nucleotide level.

Southern blotting analysis of genic and intergenic regionsverifies the assembly

The assembly was verified experimentally by Southern blotting forthe 3609 family, which spans the 5� end of the sequence (Fig. 5a),the VGAM3.8 family in the middle region (Fig. 5b), and the Q52family at the 3�end (data not shown). The sequence enabled pre-diction of correct restriction fragment size. The 3609 genes weredetected with a 1.7-kb probe containing 3609.13pg.178 (gene M).Fig. 5a shows 16 3609 bands detected in C57BL/6 kidney digestedwith EcoRI or HindIII. These are the predicted number and frag-ment sizes (Fig. 5e). Overall, all except two of the 3609 genes werecorrectly detected at least once. Furthermore, no additional bandsother than those predicted were observed, confirming the accuracyof sequence assembly. In a few cases, the bands were only detect-able in one of the digests. For example, the HindIII digest did notdetect genes B, C, or O, due to large fragment sizes, which pre-cluded their resolution by Southern blot. In other cases, identical-sized bands were predicted for more than one gene, and thus it wasnot possible to determine whether all were detected. Gene P wasnot detected with the HindIII digest and shares a weak band withgenes N and O in the EcoRI digest. Because gene N, but neither Onor P, was detected in the HindIII digest, it is possible that genesO and P haven’t been detected in either digest. If so, this is likelydue to the relatively low homology of these 3609 pseudogenes tothe probe (82.7 and 81.8%, respectively), because the level of ho-mology required for detection in this Southern blotting procedureis 85%. For all genes, the bands in the Southern blot are not nec-essarily stoichiometric, because some target sequences are moresimilar to the probe sequence than others.

The correct number and size of bands also were obtained for theVGAM3.8 family in C57BL/6, corresponding to the four genesidentified in the sequence (Fig. 5b). These results confirm that thenumber and position of genes in our contig is correct for these twofamilies. Numbers of VGAM3.8 bands for C57BL/6 were thencompared with the BALB/c strain. Under the same conditions,BALB/c gave a larger number of bands (Fig. 5c), consistent withprevious Southern blotting analyses, and predictions of betweenfive and eight VGAM3.8 members in the BALB/c strain (45, 46).This emphasizes the differences in V gene number that occur be-tween mouse strains.

Different intergenic sequence patterns within a family may fur-ther suggest when evolutionary divergence occurred. Despite thehigh proportion of interspersed repeats observed (40.4%), we have

FIGURE 4. a, Phylogenetic tree showing sequence relationships between 195 V gene segments. All genes and pseudogenes listed in Table I are includedin this analysis. The short branch lengths of some pairs of genes/pseudogenes indicates that they may have arisen by recent duplication events. J558B (dotplot cluster) delineates the genes found in the 3� gene-dense part of the J558 region. b, V gene family distribution within the mouse Igh V region The lengthof boxes indicates the distance spanned by members of the gene family and is not related to numbers of genes. Single genes are represented by verticallines. Gray, white, and black shading indicates members of evolutionary groups 1, 2, and 3, respectively.

4227The Journal of Immunology

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 9: Complete Sequence Assembly and Characterization of the C57BL/6

detected several clusters of repeated intergenic sequences that areunique to the Igh V region and are region specific. For example,there is an intergenic region of �3.5 kb in the 5� J558 A region,which is repeated 13 times. A second group of intergenic se-quences at the 3� J558 region is repeated eight times. In addition,there are several other V gene family-specific intergenic repeatedsequence throughout the V region. We have examined the largestgroup of these repeated sequences in the 5� J558 A region bySouthern blotting, using a 1.15-kb probe cloned from the g targetsequence (Fig. 5d). The locations of the target sequences areshown in Fig. 5e, together with predicted fragment sizes. The 13bands predicted were clearly identified. Several resolved to thesame band, because the fragment sizes were identical, but the pres-ence of more than one band was clear from the stoichiometry ofthe signal. This result further verifies correct assembly of the locussequence.

V gene flanking sequences contain common and family-specificbinding sites

We examined flanking sequences for every functional V gene, tosearch for known regulatory features identified in smaller-scaleanalyses, as well as binding sites for additional transcription fac-tors known to be required for B cell development. To date, analysisof promoter sequences flanking V genes has been confined toavailable sequences from a few families. Nevertheless, several im-portant functionally conserved binding sites have been identified,as well as some significant differences between families. First, rep-resentative members of some, but not all, V gene families possessa consensus TATA box and/or an initiator (Inr) element (16). Thecore promoters of mammalian genes frequently contain a TATAbox located 25–30 bp upstream of the transcription start site and/oran Inr element, which overlaps the transcription start site. Bothexhibit a large amount of functional sequence degeneracy, andeach can act independently or cooperatively to direct basal tran-scription by RNA polymerase II, and (47). Second, all Igh V genepromoters studied to date contain an octamer sequence (ATG-CAAAT), located �70 bp upstream of the transcription start site(48) and 10–25 bp upstream of the TATA box when one is present.This sequence binds the POU family transcription factors Oct1 andOct2 and is necessary, but not sufficient, for transcription of Vgenes (49). In some V gene families, there is a conserved heptamersequence that is between 2 and 22 bases upstream of the octamer(50), but it is unknown whether this is present in all families.Further upstream, there is a conserved pyrimidine-rich region (50),shown to bind the ets family member, Pu.1 (51), which also isrequired for V gene transcription. More recently, downstream Igcontrol element (DICE), located downstream of the transcriptionstart site, has been implicated in promoter activity in the 7183 andJ558 families (52). In addition, several B cell-specific transcriptionfactors have been shown to be required for V(D)J recombination,but in most cases, it is unclear whether they bind the V gene pro-moter, an IgH enhancer, or have an indirect effect. These includePu.1 (53), Ikaros (54), Pax5 (55, 56), Stat5 (57), NF-�B (58), E2A(59), LEF1 (60), NFAT (61), OctPOU (62), and EBF (63, 64).Finally, several chromatin remodelling factors that bind matrix at-tachment regions have been implicated in regulating chromatinaccessibility for V gene recombination (65–67). These include Bcell regulator of IgH transcription (Bright) (67–69), special AT-rich sequence-binding protein (SATB1) (67, 70), and members ofthe winged helix/forkhead family of transcription factors (38, 71).

In this study, we aligned the 500 bp upstream of each V geneand searched for the consensus sequences for all of the aboveelements, using Genomatix MatInspector. Only functional genes

FIGURE 5. Southern blotting analysis to verify integrity of sequenceassembly. a, Blotting of the 3609 V gene family in C57BL/6. C57BL/6DNA digested with EcoRI or HindIII was electorphoresed, blotted, andprobed with a 3609 V family-specific probe (3609.9.164). A–P correspondto 3609 genes 1–16. Further details are shown in (e). b and c, Comparativeanalysis of the VGAM3.8 family in C57BL/6 and BALB/c. DNA preparedas above from C57BL/6 and BALB/c strains was probed with a VGAM3.8family-specific probe (VGAM3.8.1.57). Numbers 1–4 correspond toVGAM3.8 genes 1–4. Full gene name, percent homology, and expectedband size are shown in (e). d, Blotting of conserved J558 intergenic repeatregion. DNA from C57BL/6 kidney was probed with a probe generatedfrom one of the repeated sequences (23 kb 3� J558.66.165). a–m, individ-ual repeats, described in more detail in (e). e, Full gene/intergenic regionname, percent homology to probe sequence, and expected size of bandscontaining each gene, based on the assembled sequence, for regions blottedin a–d.

4228 SEQUENCE ASSEMBLY OF THE MOUSE Igh V REGION

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 10: Complete Sequence Assembly and Characterization of the C57BL/6

were considered, because these sequences were not as well con-served among pseudogenes. The search criteria were stringent, andthus it is possible that not all functional binding sites for a givenfactor were identified, because one or two sequence mismatcheswould preclude detection but may still constitute a functional sitein vivo. We observed highly conserved 5� flanking sequences up-stream of functional genes, both within families, and common toall genes (Fig. 6), particularly in the first 150 bp upstream. Mostfamilies had TATA-related sequences at a conserved distancedownstream of the octamer, but only the related J558 and SM7families contained a strong consensus TATA box at this position.We have made an exception to our otherwise highly stringentsearch criteria specifically for detection of TATA-related se-quences, because they vary considerably in the V region, and it hasbeen shown that a wide variety of variation in sequence can func-tionally replace the conventional TATA box (72). In addition, theJ558 and SM7 families had a second TATA sequence furtherdownstream, but this was a TATA-related sequence similar toother families, rather than a TATA box. The figure displays all ofthe TATA-related sequences, regardless of adherence to consensussequence. We searched only for the Inr element downstream of theTATA box, because its sequence is very small and degenerate(47). The vast majority of V genes contain an Inr element at afamily-specific conserved distance from the TATA box. Most fam-ilies contain only one, with the exception of J558 and SM7, whichcontain two or three, all at conserved distances from theTATA box.

Every V gene promoter contains an octamer sequence, whichexactly matches the consensus sequence ATGCAAAT, except forfive members of the J558 family that have single point mutations.The heptamer also is present in all promoters, with the exceptionof the VH10 family, and single members of the Q52, SM7, and36–60 families, usually at a distance from 2 (J558) to 21 bp up-stream from the octamer. This distance is conserved within, but notbetween, families. Single-nucleotide divergence from the consen-sus sequence is common and also family specific, as reported pre-viously (50). Unusually, the heptamer is downstream of the oc-tamer in the X24 and VH11 families.

All other binding sites exhibit considerable group and family-specific variation. Remarkably, the presence of DICE is specific togroup 1 families. Every J558 5� flanking sequence contains aDICE element at a conserved position downstream of the octamer,as does the related SM7 family. In both cases, two Inr elementsflank the DICE element. This sequence is absent from all otherfamilies, except for two members of the 7183 family.

Several other binding sites are present, not necessarily abun-dantly, but in a highly conserved pattern. Group 2 families (3609(excluding the outlying member at the 3� end of the J558 region)and Q52) contain conserved Pax5 binding sites 3� of the octamer.We chose to make the criteria for identification of a Pax5 site verystringent, because this paired binding site is highly degenerate(56). If the stringency is lowered other related families in group 2,such as 36–60, and some families in group 3, also have a Pax5 sitein this conserved position. However, notably, the J558 and SM7families (group 1) did not exhibit any conserved Pax5 sites, evenat quite low stringency (data not shown). Several of the 3609 andQ52 genes also have an Ikaros binding site between the heptamerand the octamer, This is only rarely present in other families. The3609 and Q52 genes also contain an E box, immediately upstreamof the heptamer for 3609, but in a relatively conserved position in5 of 9 Q52 genes, a position shared with 22 of 52 J558 genes.

Pu.1 binding sites are present upstream of 51 genes spreadacross several families, at positions that are conserved within butnot between groups (Fig. 6). Several group 3 families (7183, S107,

VH10, and VH11) contain a site between the octamer and the hep-tamer, which does not occur in any other groups. Pu.1 sites alsooccur in 20 J558 genes at the same �240 position shown below forQ52. However, in both cases, the Pu.1 site does not corresponds tothe pyrimidine-rich region between 0 and 50 bases upstream of theheptamer (50), reported previously to bind Pu.1 (51).

Differences also were observed within groups. For example, ingroup 2, six of nine Q52 genes have a conserved Pu.1 site at �240,and a conserved NF-�B site between the heptamer and the oc-tamer, neither of which were present in the 3609 family. NF-�Bsites also flank the TATA box in 12 of 39 group 1 J558 B genes,but notably in only 1 of 13 J558 A genes.

Binding sites for factors associated with matrix attachment re-gions and chromatin remodelling (Bright, SATB1, forkhead) wererelatively abundant but not positionally conserved. This confirmsprevious observations, which led to the proposal that different pat-terns of association with the nuclear matrix may influence differ-ential recombination efficiency (67). For example, Bright sites arepresent in 41 of 110 sequences in non-conserved positions between�250 and �500. An exception is the J558A subfamily, most ofwhich have a Bright site conserved at position �475. Forkheadbinding sites are present on 42 sequences, in unconserved positionson several genes, but strikingly in a conserved position flanking theTATA box in 25 of 39 J558 A genes. They are absent from J558B flanking regions, the second notable difference between theflanking regions of these J558 subgroups.

We also examined the 500-bp sequence downstream of all func-tional V genes for all of the binding sites investigated in the 500-bpupstream sequence above, in addition to the RSS. However, withthe exception of the RSS, we found no significant conserved se-quences (data not shown).

Overall, this analysis provides a comprehensive picture of nu-merous potential binding sites flanking V genes, both those pre-viously identified in representative families, and others previouslyunknown (e.g., Pax5 and NF-�B). The complete sequence willenable functional investigation of these and other potential bindingsites and their role in V(D)J recombination.

DiscussionWe report in this study the first complete sequence assembly andannotation of the entire mouse Igh V region. Mapping shows Vgenes in relation to V pseudogenes, repeats, intergenic sequences,and non-V pseudogenes for the first time. The assembly illustratessome of the ways in which the locus may have evolved. Analysisof upstream sequences indicates numerous conserved consensussequences and striking differences between families.

Comparison with other mouse IgH locus studies

Overall, the relative order of the gene families within the mouseIgh locus V region agrees broadly with the comprehensive mappublished by Mainville et al. (4), which was based on deletionmapping of IgH locus rearrangements. Minor differences, whichmay be due in part to strain differences, because the previous mapwas a composite of IgH a, b, and j haplotypes, are as follows: 1)the 3609 family (with the exception of one member, which liesdownstream of the most 3� J558 gene) maps to the 5� end 1 Mb,rather than the 3� end of the J558 region (0.5 Mb); 2) the singleVH15 member and the VH10 genes map within the 3� end of theJ558 region, rather than 3� of J558; 3) the J606 genes are clusteredtightly together in a pseudogene-free region immediately down-stream of the J558 region and are not interspersed with any otherfamily; and 4) there is an additional 3609N gene (pg) 3� of the firstVGAM3.8 gene.

4229The Journal of Immunology

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 11: Complete Sequence Assembly and Characterization of the C57BL/6

4230 SEQUENCE ASSEMBLY OF THE MOUSE Igh V REGION

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 12: Complete Sequence Assembly and Characterization of the C57BL/6

In a more recent study, YAC contigs assembled from three hap-lotype libraries, were extensively mapped by Southern blotting,estimating 150 V genes in the V region (1). Our assembly alsobroadly agrees with this study, albeit the final number of V genesis higher (195). For example, YAC mapping predicted 15 3609genes (1), compared with the 16 mapped here. A notable exceptionis that the J558 region does not overlap the 3� gene families asdescribed in that study. However, the authors speculated that low-stringency blotting may have caused some cross-hybridization tothe related 3� located SM7 genes. As can be seen from our phy-logenetic tree (Fig. 4a), SM7 genes are the most closely related tothe J558 family. Further, mapping of the YAC end sequences fromthis study to our assembled locus has revealed a small number ofdeletions in the YAC assembly.

A more recent study (2) used the available Ensembl sequencefrom NCBI assembly v30 to assemble a map of the V region andthus reflects flaws in the Ensembl assembly. These include gapstotaling 450 kb. A total of 104 genes and 37 pseudogenes weredocumented, compared with the 112 genes and 83 pseudogenesreported in this study. The order of genes is generally similar, butthere are several differences in assigning genes and pseudogenes.For example, they assign fewer functional genes: 3609 (4 vs 8);J606 (3 vs 5); and 7183 (8 vs 10). This study assigned our newsingle gene family VH16 as a pseudogene (5-P-34), even though itcan be translated in frame without stops or frameshifts. Further, itwas assigned to the 7183 family, despite only 72% identity at thenucleotide level, and its location is a considerable distance fromthe otherwise self-contained 7183 cluster.

In addition to locus mapping studies, previous reports sought topredict the numbers of V genes in the locus by retrieval of allknown V gene sequences. Gu et al (26) assembled a database of 67J558 genes from CB.20 mice, an Ighb haplotype strain closelyrelated to C57BL/6, and estimated that the total number of func-tional J558 genes would be �100, an estimate independently ver-ified for C57BL/6 (27). We have detected 89 J558 genes and pseu-dogenes in our contig, which is very close to this number.However, the number of functional J558 genes (52) was muchlower. This is likely due to the fact that the germline genes iden-tified above were isolated from a combination of germlinegenomic sequence and expressed cDNA libraries. Thus, at leastsome of the genomic sequences are likely to have beenpseudogenes.

Comparison with the human IgH

The entire human Igh V region on chromosome 14 has been se-quenced (35) and covers a 1-Mb region, containing 123 V genes,of which 44 are functional and 79 are pseudogenes. This contrastswith the mouse V region in which functional genes outnumberpseudogenes (110 vs 85). Thus the mouse locus contains morefunctional genes 56%, compared with 36% for human), and fewerpseudogenes. It is more expanded over a much wider area (2.5Mb), possibly due to a greater number of full-length LINE ele-ments (22, compared with 2 for the human locus), as discussedabove. There is a high degree of conservation among human pseu-

dogenes, with 77% highly conserved except for a few point mu-tations. The 5� flanking octamer is highly conserved across allhuman V gene families for functional V genes. The distance be-tween octamer and TATA and the sequence of the TATA box alsoare well conserved within, but not between families. Thus, somefeatures are conserved between mouse and human. In striking con-trast, the heptamer upstream of the octamer, present in all but onemouse V gene family, is present only in one family in the humanV locus. There are eight non-Ig sequences within the human Vregion, although several are pseudogenes and are unlikely to beimportant. They do not correspond to any of those we have foundin the mouse locus and thus are not functionally conserved.

Features of mouse V gene 5� flanking regions

Extensive studies of V gene flanking sequences have been ham-pered by the lack of available sequence, and have been confined tosmall families from which flanking sequences could be cloned(18), because V gene databases (e.g., ImMunoGenetics, V genedatabase) generally do not contain flanking sequence. Within afamily, flanking sequence up to 4 kb upstream of V genes can beconserved to almost the same extent as V gene sequences (73).Availability of all 5� flanking sequences will enable detailed studyof the elements and factors required for promoter function. Previ-ous studies have shown that family-specific differences in TATAsequence, presence or absence of an Inr element, and heptamer-octamer spacing (16) correlate with family-specific differences inpromoter strength (16, 18). In particular, it has been suggested thatJ558 genes may be recombined most frequently because they havea stronger promoter than other families, attributed to a strongTATA box and Inr elements (16). Our studies support the differ-ence in TATA and Inr elements between the J558 (and SM7) andother families. These were the only two families that contain astrong consensus TATA box, albeit all other families have aTATA-related sequence in a similar conserved position. Because agreat deal of divergence from the conventional TATA box cannonetheless support binding of TATA binding proteins (72), fur-ther functional studies are required to determine which family-specific TATA sequences are functional. Contrary to previous re-ports, all V genes have an Inr element in a similar position to theJ558 genes, but J558 and SM7 genes (and the third group 1 family,VGAM3.8) all have a second and sometimes a third Inr element inclose proximity, consistent with increased potential for transcrip-tion initiation.

A second TATA-related sequence was identified upstream ofthe heptamer in the Q52 family on the opposite strand. Thus, iffunctional, it would promoter transcription in the opposite direc-tion. Bidirectional transcription has previously been reported for asmall number of V gene promoters (74, 75).

We have shown that all V genes have an octamer, at a conserveddistance from the TATA sequence, similar to the human V region(35). The heptamer 2–22 bases upstream of the octamer also isvery conserved (only absent in the two-member VH10 family).This sequence has been proposed to be involved in activation of Hchain promoters before � (76). However, data from the human IgH

FIGURE 6. Putative transcription factor binding sites within the upstream regions of each functional V gene. Sequences are grouped by V gene familyand listed in the order in which they appear in the locus, beginning at the 3� end. Depicted is 500 bp of upstream sequence, starting on the right at �1bpfrom the ATG in the leader exon. Color-coded boxes representing the binding sites analyzed are provided as a labeled key at the bottom of the figure. Theconsensus sequences used (references in text, optimized by MatInspector) are as follows, in standard International Union of Pure and Applied Chemistrynotation: heptamer, CTCATGA; octamer, ATGCAAAK; TATA box, TAAATA; Inr, YYANWYY; Pu.1, TGRRGAAGT; DICE, TRTYBTCTHYACMR;Bright, RATWAA; Ikaros, NNNYGGGAWNNN; Pax-5, NCNNNRNKCANNGNWGNRKRGCSRSNNN; NF-�B, GGGRNNYYCC; forkhead,RWAAAYAW; SATB1, ATC rich; OctPOU, NNGAATATKCANNNN; NFAT, NANWGGAAAANN; LEF1, WWCAAAG; E-box, CASSTG; EBF,TWCCCNNGGGAWT (data not shown); and Stat-5, TTCYNRGAA (data not shown).

4231The Journal of Immunology

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 13: Complete Sequence Assembly and Characterization of the C57BL/6

locus do not support this, because the heptamer sequence is foundonly in one family (35).

The DICE element has been shown recently to be important forIg promoter specificity and interaction with the Ig intronic en-hancer (52). This study shows that the DICE binding motif ishighly conserved downstream of the Inr element, but only in theJ558 and related SM7 gene families. This was a surprising findingbecause the DICE sequence was identified originally in a 7183flanking region (77), and we have seen only two 7183 genes witha consensus DICE binding sequence. Its exclusive presence inJ558 and SM7 promoters may provide another reason for the rel-atively high strength of these promoters, compared with other fam-ilies (16, 18).

Binding sites for LEF1, NFAT, and OctPOU were observed in�10 genes and/or in nonconserved positions and thus were indi-vidually judged not to be significant. However, transcription fac-tors are known to act cooperatively in clusters, and some patternswere extremely family-specific and thus, taken together, may besignificant. To take the example above, LEF1, NFAT, and Oct-POU, although uncommon, were observed as a conserved clusterin five of nine 3609 family members.

EBF and Stat5 binding sites were rarely seen and were not in-cluded in Fig. 6. However, absence of a binding site does notpreclude a role for any factor in regulation of V gene transcriptionand recombination. For example, Stat5 has been shown recently tobe a key regulator of V gene promoters but to bind indirectly as acoactivator with Oct1 (78). The lack of EBF sites was particularlyinteresting because this B lineage commitment factor, in cooper-ation with E2A, has been shown to induce both Igh and Igl V(D)Jrecombination in nonlymphoid cells in the presence of RAG1 andRAG2 (79). Absence of EBF sites agrees with recent data sug-gesting that they are also absent from V � promoters (38). Thus,EBF may act preferentially on Ig enhancers, where there areknown binding sites, or in an indirect manner. The situation forE2A is less clear, because some E-box sites were observed at con-served positions here and also in V � promoters (38) albeit in aminority of V flanking regions overall.

We also found no CpG islands within the V gene flanking se-quences or indeed anywhere within the V region. CpG islands areassociated with the majority of housekeeping gene and many tis-sue-specific promoters, and thus we conclude that V gene promot-ers are not “classical” CpG island promoters. This observationsupports a model that suggests that paucity of CpG islands is afeature characteristic of monoallelically expressed loci (39).

Overall, we have observed a high degree of conservation ofsequence extending into the 5� flanking region, both in terms ofsequences common to most or all (octamer and heptamer), and interms of binding sites conserved within but not between families orgroups (e.g., DICE element in J558 and Pax5 site in 3609 and Q52families). These observations thus corroborate previous predic-tions from small families and set the stage for future experimentalstudies to determine the functional significance of these putativebinding sites.

A function for pseudogenes?

One of our aims was to determine the number and position of Vpseudogenes, which were unknown. In AgR loci, pseudogenes arebelieved to have arisen as part of the process of expansion of theV gene repertoire to generate adequate Ab diversity. A substantialproportion (40%) of germline V sequences have been estimated tohave inactivating mutations (80, 81). However, in many cases,such V genes have been found on nonproductively rearranged al-leles and undergo somatic hypermutation, suggesting that they canbe actively recombined and expressed and may play a functional

role (82). In other systems, pseudogenes have been shown to bothinhibit and activate corresponding coding genes, both in cis and intrans (83, 84). Thus, even inability to recombine does not precludea role for V pseudogenes. Our sequence shows that 44% of germ-line V gene sequences are pseudogenes, in agreement with previ-ous estimates (80, 81). We have observed a high level of sequenceconservation, as previously observed in the human locus, becausethe majority of pseudogenes fall easily into recognizable V genefamilies and occupy similar positions on the phylogenetic tree(Fig. 4a), while even those that were unclassified generally fallwithin the tree, although a few (e.g., PG10.56 and PG13.69) ap-pear rather diverged. Strikingly, these pseudogenes are not evenlyspread throughout the V region, but rather are clustered. Evenwithin one family (J558), the 3� end has a much higher ratio ofpseudogenes to genes than the 5� end, which correlates with ahigher gene density overall. Conversely, some regions are devoidof pseudogenes, for example, the J606 family. It will be interestingto determine whether different patterns contribute to different lev-els of local accessibility and V(D)J recombination. For example,higher pseudogene density may result in increased germline tran-scription and greater localized accessibility.

Intergenic nonrepeat sequences

We have shown that, in addition to a high proportion of inter-spersed repeats, the V region contains several region-specificgroups of conserved intergenic sequences, which we have con-firmed by Southern blotting. Recently, a number of studies haveproposed that transcription from intergenic sequences may havenumerous roles, including large-scale regulation of chromatinstructure to control gene expression (85–90). We have recentlyavailed of the sequence described in this study to demonstrate thatantisense intergenic transcription occurs throughout the Igh V re-gion in cells undergoing V to DJ recombination (22). We postulatethat this process opens up the silent V region following (D)J re-combination. The relationship among locus positioning, chromatinremodelling, and V(D)J recombination has not been extensivelystudied, except over the V genes, and even then only in small-scalestudies as the full sequence and knowledge of all genes was notavailable. Limited studies have been confined previously to the 3�7183 family (20). Because this family is close to the DJ region andrearranges early, its regulation may differ from the rest of the locus(91, 92). Further, this study did not include flanking or intergenicsequences. The complete upstream and downstream sequence forall V genes reported in this study will enable far more extensivestudies of flanking and intergenic regions to determine their role inlarge-scale mechanisms, such as intergenic and antisense transcrip-tion, locus contraction, the intergenic regulatory regions involved,and many other aspects of the role of chromosome positioning andchromatin context in V(D)J recombination.

AcknowledgmentsWe thank Peter Fraser for critically reviewing this manuscript andSimon Andrews of the Babraham Institute Bioinformatics facility for helpwith bioinformatics analysis.

DisclosuresThe authors have no financial conflict of interest.

References1. Chevillard, C., J. Ozaki, C. D. Herring, and R. Riblet. 2002. A three-megabase

yeast artificial chromosome contig spanning the C57BL mouse Igh locus. J. Im-munol. 168: 5659–5666.

2. de Bono, B., M. Madera, and C. Chothia. 2004. VH gene segments in the mouseand human genomes. J. Mol. Biol. 342: 131–143.

3. Ye, J. 2004. The immunoglobulin IGHD gene locus in C57BL/6 mice. Immuno-genetics 56: 399–404.

4232 SEQUENCE ASSEMBLY OF THE MOUSE Igh V REGION

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 14: Complete Sequence Assembly and Characterization of the C57BL/6

4. Mainville, C. A., K. M. Sheehan, L. D. Klaman, C. A. Giorgetti, J. L. Press, andP. H. Brodeur. 1996. Deletional mapping of fifteen mouse VH gene familiesreveals a common organization for three Igh haplotypes. J. Immunol. 156:1038–1046.

5. Brodeur, P. H., G. E. Osman, J. J. Mackle, and T. M. Lalor. 1988. The organi-zation of the mouse Igh-V locus: dispersion, interspersion, and the evolution ofVH gene family clusters. J. Exp. Med. 168: 2261–2278.

6. Jeong, H. D., J. L. Komisar, E. Kraig, and J. M. Teale. 1988. Strain-dependentexpression of VH gene families. J. Immunol. 140: 2436–2441.

7. Yancopoulos, G. D., S. V. Desiderio, M. Paskind, J. F. Kearney, D. Baltimore,and F. W. Alt. 1984. Preferential utilization of the most JH-proximal VH genesegments in pre-B-cell lines. Nature 311: 727–733.

8. Jeong, H. D., and J. M. Teale. 1989. VH gene family repertoire of resting B cells.Preferential use of D-proximal families early in development may be due todistinct B cell subsets. J. Immunol. 143: 2752–2760.

9. Malynn, B. A., G. D. Yancopoulos, J. E. Barth, C. A. Bona, and F. W. Alt. 1990.Biased expression of JH-proximal VH genes occurs in the newly generated rep-ertoire of neonatal and adult mice. J. Exp. Med. 171: 843–859.

10. Freitas, A. A., M. P. Lembezat, and A. Coutinho. 1989. Expression of antibodyV-regions is genetically and developmentally controlled and modulated by the Blymphocyte environment. Int. Immunol. 1: 342–354.

11. Atkinson, M. J., D. A. Michnick, C. J. Paige, and G. E. Wu. 1991. Ig generearrangements on individual alleles of Abelson murine leukemia cell lines from(C57BL/6 BALB/c) F1 fetal livers. J. Immunol. 146: 2805–2812.

12. Grandien, A., A. Coutinho, J. Andersson, and A. A. Freitas. 1991. EndogenousVH gene family expression in immunoglobulin-transgenic mice: evidence forselection of antibody repertoires. Int. Immunol. 3: 67–73.

13. Akamatsu, Y., N. Tsurushita, F. Nagawa, M. Matsuoka, K. Okazaki, M. Imai, andH. Sakano. 1994. Essential residues in V(D)J recombination signals. J. Immunol.153: 4520–4529.

14. Feeney, A. J. 2000. Factors that influence formation of B cell repertoire. Immu-nol. Res. 21: 195–202.

15. Montalbano, A., K. M. Ogwaro, A. Tang, A. G. Matthews, M. Larijani,M. A. Oettinger, and A. J. Feeney. 2003. V(D)J recombination frequencies can beprofoundly affected by changes in the spacer sequence. J. Immunol. 171:5296–5304.

16. Buchanan, K. L., E. A. Smith, S. Dou, L. M. Corcoran, and C. F. Webb. 1997.Family-specific differences in transcription efficiency of Ig heavy chain promot-ers. J. Immunol. 159: 1247–1254.

17. Whitcomb, E. A., B. B. Haines, A. P. Parmelee, A. M. Pearlman, andP. H. Brodeur. 1999. Germline structure and differential utilization of Igha andIghb VH10 genes. J. Immunol. 162: 1541–1550.

18. Love, V. A., G. Lugo, D. Merz, and A. J. Feeney. 2000. Individual V(H) pro-moters vary in strength, but the frequency of rearrangement of those V(H) genesdoes not correlate with promoter strength nor enhancer-independence. Mol. Im-munol. 37: 29–39.

19. Storb, U., D. Haasch, B. Arp, P. Sanchez, P. A. Cazenave, and J. Miller. 1989.Physical linkage of mouse � genes by pulsed-field gel electrophoresis suggeststhat the rearrangement process favors proximate target sequences. Mol. Cell. Biol.9: 711–718.

20. Williams, G. S., A. Martinez, A. Montalbano, A. Tang, A. Mauhar,K. M. Ogwaro, D. Merz, C. Chevillard, R. Riblet, and A. J. Feeney. 2001. Un-equal VH gene rearrangement frequency within the large VH7183 gene family isnot due to recombination signal sequence variation, and mapping of the genesshows a bias of rearrangement based on chromosomal location. J. Immunol. 167:257–263.

21. Kosak, S. T., J. A. Skok, K. L. Medina, R. Riblet, M. M. Le Beau, A. G. Fisher,and H. Singh. 2002. Subnuclear compartmentalization of immunoglobulin lociduring lymphocyte development. Science 296: 158–162.

22. Bolland, D., A. Wood, C. Johnston, M. Morgan, L. Chakalova, P. Fraser, andA. E. Corcoran. 2004. Antisense intergenic transcription in V(D)J recombination.Nature Immunology.

23. Fuxa, M., J. Skok, A. Souabni, G. Salvagiotto, E. Roldan, and M. Busslinger.2004. Pax5 induces V-to-DJ rearrangements and locus contraction of the immu-noglobulin heavy-chain gene. Genes Dev. 18: 411–422.

24. Hubbard, T., D. Barker, E. Birney, G. Cameron, Y. Chen, L. Clark, T. Cox,J. Cuff, V. Curwen, T. Down, et al. 2002. The Ensembl genome database project.Nucleic Acids Res. 30: 38–41.

25. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basiclocal alignment search tool. J. Mol. Biol. 215: 403–410.

26. Gu, H., D. Tarlinton, W. Muller, K. Rajewsky, and I. Forster. 1991. Most pe-ripheral B cells in mice are ligand selected. J. Exp. Med. 173: 1357–1371.

27. Haines, B. B., C. V. Angeles, A. P. Parmelee, P. A. McLean, and P. H. Brodeur.2001. Germline diversity of the expressed BALB/c VhJ558 gene family. Mol.Immunol. 38: 9–18.

28. Page, R. D. 1996. TreeView: an application to display phylogenetic trees onpersonal computers. Comput. Appl. Biosci. 12: 357–358.

29. Chenna, R., H. Sugawara, T. Koike, R. Lopez, T. J. Gibson, D. G. Higgins, andJ. D. Thompson. 2003. Multiple sequence alignment with the Clustal series ofprograms. Nucleic Acids Res. 31: 3497–3500.

30. Cartharius, K., K. Frech, K. Grote, B. Klocke, M. Haltmeier, A. Klingenhoff,M. Frisch, M. Bayerlein, and T. Werner. 2005. MatInspector and beyond: pro-moter analysis based on transcription factor binding sites. Bioinformatics 21:2933–2942.

31. Herring, C. D., C. Chevillard, S. L. Johnston, P. J. Wettstein, and R. Riblet. 1998.Vector-hexamer PCR isolation of all insert ends from a YAC contig of the mouseIgh locus. Genome Res. 8: 673–681.

32. Brodeur, P. H., and R. Riblet. 1984. The immunoglobulin heavy chain variableregion (Igh-V) locus in the mouse. I. One hundred Igh-V genes comprise sevenfamilies of homologous genes. Eur. J. Immunol. 14: 922–930.

33. Mackay, M., J. Fantes, S. Scherer, S. Boyle, K. West, L. C. Tsui, E. Belloni,E. Lutz, V. Van Heyningen, and A. J. Harmar. 1996. Chromosomal localizationin mouse and human of the vasoactive intestinal peptide receptor type 2 gene: apossible contributor to the holoprosencephaly 3 phenotype. Genomics 37:345–353.

34. Richards-Smith, B. A., P. H. Brodeur, and R. W. Elliott. 1992. Deletion mappingof the mouse ornithine decarboxylase-related locus Odc-rs8 within Igh-V. Mamm.Genome 3: 568–574.

35. Matsuda, F., K. Ishii, P. Bourvagnet, K. Kuma, H. Hayashida, T. Miyata, andT. Honjo. 1998. The complete nucleotide sequence of the human immunoglob-ulin heavy chain variable region locus. J. Exp. Med. 188: 2151–2162.

36. Chen, T. L., and L. Manuelidis. 1989. SINEs and LINEs cluster in distinct DNAfragments of Giemsa band size. Chromosoma 98: 309–316.

37. Schonbach, C. 2004. From masking repeats to identifying functional repeats inthe mouse transcriptome. Brief Bioinform. 5: 107–117.

38. Brekke, K. M., and W. T. Garrard. 2004. Assembly and analysis of the mouseimmunoglobulin � gene sequence. Immunogenetics 56: 490–505.

39. Allen, E., S. Horvath, F. Tong, P. Kraft, E. Spiteri, A. D. Riggs, andY. Marahrens. 2003. High concentrations of long interspersed nuclear elementsequence distinguish monoallelically expressed genes. Proc. Natl. Acad. Sci. USA100: 9940–9945.

40. Matsumura, R., F. Matsuda, H. Nagaoka, J. Fujikura, E. K. Shin, Y. Fukita,M. Haino, and T. Honjo. 1994. Structural analysis of the human VH locus usingnonrepetitive intergenic probes and repetitive sequence probes. Evidence for re-cent reshuffling. J. Immunol. 152: 660–666.

41. Goodier, J. L., E. M. Ostertag, K. Du, and H. H. Kazazian, Jr. 2001. A novelactive L1 retrotransposon subfamily in the mouse. Genome Res. 11: 1677–1685.

42. Hiom, K., M. Melek, and M. Gellert. 1998. DNA transposition by the RAG1 andRAG2 proteins: a possible source of oncogenic translocations. Cell 94: 463–470.

43. Kirkham, P. M., F. Mortari, J. A. Newton, and H. W. Schroeder, Jr. 1992. Im-munoglobulin VH clan and family identity predicts variable domain structure andmay influence antigen binding. EMBO J. 11: 603–609.

44. Kirkham, P. M., and H. W. Schroeder, Jr. 1994. Antibody structure and theevolution of immunoglobulin V gene segments. Semin. Immunol. 6: 347–360.

45. Sims, M. J., U. Krawinkel, and M. J. Taussig. 1992. Characterization of germ-linegenes of the VGAM3.8 VH gene family from BALB/c mice. J. Immunol. 149:1642–1648.

46. Press, J. L., and C. A. Giorgetti. 1993. Molecular and kinetic analysis of anepitope-specific shift in the B cell memory response to a multideterminant anti-gen. J. Immunol. 151: 1998–2013.

47. Lo, K., and S. T. Smale. 1996. Generality of a functional initiator consensussequence. Gene 182: 13–22.

48. Parslow, T. G., D. L. Blair, W. J. Murphy, and D. K. Granner. 1984. Structure ofthe 5� ends of immunoglobulin genes: a novel conserved sequence. Proc. Natl.Acad. Sci. USA 81: 2650–2654.

49. Mason, J. O., G. T. Williams, and M. S. Neuberger. 1985. Transcription cell typespecificity is conferred by an immunoglobulin VH gene promoter that includes afunctional consensus sequence. Cell 41: 479–487.

50. Eaton, S., and K. Calame. 1987. Multiple DNA sequence elements are necessaryfor the function of an immunoglobulin heavy chain promoter. Proc. Natl. Acad.Sci. USA 84: 7634–7638.

51. Schwarzenbach, H., J. W. Newell, and P. Matthias. 1995. Involvement of the Etsfamily factor PU.1 in the activation of immunoglobulin promoters. J. Biol. Chem.270: 898–907.

52. Tantin, D., M. I. Tussie-Luna, A. L. Roy, and P. A. Sharp. 2004. Regulation ofimmunoglobulin promoter activity by TFII-I class transcription factors. J. Biol.Chem. 279: 5460–5469.

53. Klemsz, M. J., S. R. McKercher, A. Celada, C. Van Beveren, and R. A. Maki.1990. The macrophage and B cell-specific transcription factor PU.1 is related tothe ets oncogene. Cell 61: 113–124.

54. Georgopoulos, K., D. D. Moore, and B. Derfler. 1992. Ikaros, an early lymphoid-specific transcription factor and a putative mediator for T cell commitment. Sci-ence 258: 808–812.

55. Barberis, A., K. Widenhorn, L. Vitelli, and M. Busslinger. 1990. A novel B-celllineage-specific transcription factor present at early but not late stages of differ-entiation. Genes Dev. 4: 849–859.

56. Czerny, T., G. Schaffner, and M. Busslinger. 1993. DNA sequence recognition byPax proteins: bipartite structure of the paired domain and its binding site. GenesDev. 7: 2048–2061.

57. Liu, X., G. W. Robinson, F. Gouilleux, B. Groner, and L. Hennighausen. 1995.Cloning and expression of Stat5 and an additional homologue (Stat5b) involvedin prolactin signal transduction in mouse mammary tissue. Proc. Natl. Acad. Sci.USA 92: 8831–8835.

58. Wirth, T., and D. Baltimore. 1988. Nuclear factor NF-�B can interact function-ally with its cognate binding site to provide lymphoid-specific promoter function.EMBO J. 7: 3109–3113.

59. Blackwell, T. K., and H. Weintraub. 1990. Differences and similarities in DNA-binding preferences of MyoD and E2A protein complexes revealed by bindingsite selection. Science 250: 1104–1110.

60. Schilham, M. W., and H. Clevers. 1998. HMG box containing transcription fac-tors in lymphocyte differentiation. Semin. Immunol. 10: 127–132.

61. Kel, A., O. Kel-Margoulis, V. Babenko, and E. Wingender. 1999. Recognition ofNFATp/AP-1 composite elements within genes induced upon the activation ofimmune cells. J. Mol. Biol. 288: 353–376.

4233The Journal of Immunology

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from

Page 15: Complete Sequence Assembly and Characterization of the C57BL/6

62. Verrijzer, C. P., M. J. Alkema, W. W. van Weperen, H. C. Van Leeuwen,M. J. Strating, and P. C. van der Vliet. 1992. The DNA binding specificity of thebipartite POU domain and its subdomains. EMBO J. 11: 4993–5003.

63. Hagman, J., M. J. Gutch, H. Lin, and R. Grosschedl. 1995. EBF contains a novelzinc coordination motif and multiple dimerization and transcriptional activationdomains. EMBO J. 14: 2907–2916.

64. Lin, H., and R. Grosschedl. 1995. Failure of B-cell differentiation in mice lackingthe transcription factor EBF. Nature 376: 263–267.

65. Webb, C. F., C. Das, K. L. Eneff, and P. W. Tucker. 1991. Identification of amatrix-associated region 5� of an immunoglobulin heavy chain variable regiongene. Mol. Cell. Biol. 11: 5206–5211.

66. Webb, C. F., C. Das, S. Eaton, K. Calame, and P. W. Tucker. 1991. Novelprotein-DNA interactions associated with increased immunoglobulin transcrip-tion in response to antigen plus interleukin-5. Mol. Cell. Biol. 11: 5197–5205.

67. Goebel, P., A. Montalbano, N. Ayers, E. Kompfner, L. Dickinson, C. F. Webb,and A. J. Feeney. 2002. High frequency of matrix attachment regions and cut-likeprotein x/CCAAT-displacement protein and B cell regulator of IgH transcriptionbinding sites flanking Ig V region genes. J. Immunol. 169: 2477–2487.

68. Webb, C. F., E. A. Smith, K. L. Medina, K. L. Buchanan, G. Smithson, andS. Dou. 1998. Expression of bright at two distinct stages of B lymphocyte de-velopment. J. Immunol. 160: 4747–4754.

69. Rajaiya, J., M. Hatfield, J. C. Nixon, D. J. Rawlings, and C. F. Webb. 2005.Bruton’s tyrosine kinase regulates immunoglobulin promoter activation in asso-ciation with the transcription factor Bright. Mol. Cell. Biol. 25: 2073–2084.

70. Nakagomi, K., Y. Kohwi, L. A. Dickinson, and T. Kohwi-Shigematsu. 1994. Anovel DNA-binding motif in the nuclear matrix attachment DNA-binding proteinSATB1. Mol. Cell. Biol. 14: 1852–1860.

71. Cirillo, L. A., F. R. Lin, I. Cuesta, D. Friedman, M. Jarnik, and K. S. Zaret. 2002.Opening of compacted chromatin by early developmental transcription factorsHNF3 (FoxA) and GATA-4. Mol. Cell 9: 279–289.

72. Singer, V. L., C. R. Wobbe, and K. Struhl. 1990. A wide variety of DNA se-quences can functionally replace a yeast TATA element for transcriptional acti-vation. Genes Dev. 4: 636–645.

73. Cohen, J. B., K. Effron, G. Rechavi, Y. Ben-Neriah, R. Zakut, and D. Givol.1982. Simple DNA sequences in homologous flanking regions near immunoglob-ulin VH genes: a role in gene interaction? Nucleic Acids Res. 10: 3353–3370.

74. Nguyen, Q. T., N. Doyen, M. F. d’Andon, and F. Rougeon. 1991. Demonstrationof a divergent transcript from the bidirectional heavy chain immunoglobulin pro-moter VH441 in B-cells. Nucleic Acids Res. 19: 5339–5344.

75. Sun, Z., and G. R. Kitchingman. 1994. Bidirectional transcription from the hu-man immunoglobulin VH6 gene promoter. Nucleic Acids Res. 22: 861–868.

76. Kemler, I., E. Schreiber, M. M. Muller, P. Matthias, and W. Schaffner. 1989.Octamer transcription factors bind to two different sequence motifs of the im-munoglobulin heavy chain promoter. EMBO J. 8: 2001–2008.

77. Tantin, D., and P. A. Sharp. 2002. Mouse lymphoid cell line selected to have highimmunoglobulin promoter activity. Mol. Cell. Biol. 22: 1460–1473.

78. Bertolino, E., K. Reddy, K. L. Medina, E. Parganas, J. Ihle, and H. Singh. 2005.Regulation of interleukin 7-dependent immunoglobulin heavy-chain variablegene rearrangements by transcription factor STAT5. Nat. Immunol. 6: 836–843.

79. Romanow, W. J., A. W. Langerak, P. Goebel, I. L. Wolvers-Tettero,J. J. van Dongen, A. J. Feeney, and C. Murre. 2000. E2A and EBF act in synergywith the V(D)J recombinase to generate a diverse immunoglobulin repertoire innonlymphoid cells. Mol. Cell 5: 343–353.

80. Givol, D., R. Zakut, K. Effron, G. Rechavi, D. Ram, and J. B. Cohen. 1981.Diversity of germ-line immunoglobulin VH genes. Nature 292: 426–430.

81. Rechavi, G., D. Ram, L. Glazer, R. Zakut, and D. Givol. 1983. Evolutionaryaspects of immunoglobulin heavy chain variable region (VH) gene subgroups.Proc. Natl. Acad. Sci. USA 80: 855–859.

82. Schiff, C., M. Milili, and M. Fougereau. 1985. Functional and pseudogenes aresimilarly organized and may equally contribute to the extensive antibody diver-sity of the IgVHII family. EMBO J. 4: 1225–1230.

83. Hirotsune, S., N. Yoshida, A. Chen, L. Garrett, F. Sugiyama, S. Takahashi,K. Yagami, A. Wynshaw-Boris, and A. Yoshiki. 2003. An expressed pseudogeneregulates the messenger-RNA stability of its homologous coding gene. Nature423: 91–96.

84. Lee, J. T. 2003. Molecular biology: complicity of gene and pseudogene. Nature423: 26–28.

85. Yelin, R., D. Dahary, R. Sorek, E. Y. Levanon, O. Goldstein, A. Shoshan,A. Diber, S. Biton, Y. Tamir, R. Khosravi, et al. 2003. Widespread occurrence ofantisense transcription in the human genome. Nat. Biotechnol. 21: 379–386.

86. Shendure, J., and G. M. Church. 2002. Computational discovery of sense-anti-sense transcription in the human and mouse genomes. Genome Biol. 3:research0044.10044.14.

87. Gribnau, J., K. Diderich, S. Pruzina, R. Calzolari, and P. Fraser. 2000. Intergenictranscription and developmental remodeling of chromatin subdomains in the hu-man �-globin locus. Mol. Cell 5: 377–386.

88. Orphanides, G., and D. Reinberg. 2000. RNA polymerase II elongation throughchromatin. Nature 407: 471–475.

89. Cawley, S., S. Bekiranov, H. H. Ng, P. Kapranov, E. A. Sekinger, D. Kampa,A. Piccolboni, V. Sementchenko, J. Cheng, A. J. Williams, et al. 2004. Unbiasedmapping of transcription factor binding sites along human chromosomes 21 and22 points to widespread regulation of noncoding RNAs. Cell 116: 499–509.

90. Martens, J. A., L. Laprade, and F. Winston. 2004. Intergenic transcription isrequired to repress the Saccharomyces cerevisiae SER3 gene. Nature 429:571–574.

91. Corcoran, A. E., A. Riddell, D. Krooshoop, and A. R. Venkitaraman. 1998. Im-paired immunoglobulin gene rearrangement in mice lacking the IL-7 receptor.Nature 391: 904–907.

92. Chowdhury, D., and R. Sen. 2001. Stepwise activation of the immunoglobulin muheavy chain gene locus. EMBO J. 20: 6394–6403.

4234 SEQUENCE ASSEMBLY OF THE MOUSE Igh V REGION

by guest on January 2, 2019http://w

ww

.jimm

unol.org/D

ownloaded from