genomic v-gene repertoire in reptiles - biorxiv v-gene repertoire in reptiles david n. olivieri1,...
TRANSCRIPT
Genomic V-gene repertoire in reptiles
David N. Olivieri1, Bernardo von Haeften2, Christian Sanchez-Espinel3, Jose Faro2,4,5, FranciscoGambon-Deza4,6
1School of Computer Science, University of Vigo, Ourense 32004, Spain. 2Area of Immunology, Facultyof Biology, and Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain.
3Nanoimmunotech SL, Pza. Fernando Conde, Montero Rıos 9, 3620, Vigo, Spain. 4Instituto Biomedico deVigo, 36310 Vigo, Spain. 5Instituto Gulbenkian de Ciencia, 2781-901 Oeiras, Portugal. 6Servicio Gallego
de Salud (SERGAS), Unidad de Inmunologıa, Hospital do Meixoeiro, 36210 Vigo, [email protected], [email protected]
Abstract
Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionarylineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged, onegave rise to Squamata, while the other gave rise to Testudines, Crocodylia and birds. In thisstudy, we determined the genomic variable (V)-gene repertoire in reptiles corresponding to thethree main immunoglobulin (Ig) loci and the four main T-cell receptor (TCR) loci. We showthat squamata lack the TCR γ/δ genes and snakes lack the Vκ genes. In representative speciesof testudines and crocodiles, the seven major Ig and TCR loci are maintained. As in mammals,genes of the Ig loci can be grouped into well-defined clans through a multi-species phylogeneticanalysis. We show that the reptile VH and Vλ genes are distributed amongst the establishedmammalian clans, while their Vκ genes are found within a single clan, nearly exclusive fromthe mammalian sequences. The reptile and mammal V-genes of the TRA locus cluster into sixcommon evolutionary clans. In contrast, the reptile V-genes from the TRB locus cluster into threeclans, which have few mammalian members. In this locus, the V-gene sequences from mammalsappear to have undergone different evolutionary diversification processes that occurred outsidethese shared reptile clans.
Keywords: Immunologic Repertoire, Reptile Evolution
1. Introduction
The order Reptilia is believed to have evolved from amphibian-like creatures, the labyrinthodonts,which date back to the Carboniferous period some 312 million years ago (Mya) (Laurin & Reisz,1995; Laurin & Girondot, 1999). The descendants of those precursors became progressively moreterrestrial and independent of their aquatic origins, venturing out on a larger expanse of land withmany new different habitats (Benton & Donoghue, 2007; deBraga & Rieppel, 1997; van Tuinen& Hadly, 2004; Reisz & Muller, 2004). Evolving from this primordial amphibious ancestor, alineage began that would later split into two separate lines, the Sauropsids and the Synapsids, thelatter of which would eventually give rise to the line of extant mammals (Benton, 2005; Hedges &
Preprint February 12, 2014
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
Kumar, 2009). Shortly after this separation of the Sauropsids and Synapsids, the sauropsids furtherdivided into two separate lineages. One gave rise to the present day Squamata (lizards and snakes)and Tuatara, while the other gave rise in a very early period to Testudines, later to Crocodylia, andfinally, gave way to birds (Benton, 2005; Hedges & Kumar, 2009).
The first jawed vertebrates developed an adaptive immune system consisting of B and T lym-phocytes together with their associated antigen receptors, immunoglobulins (Ig) and T-cell recep-tors (TCR), respectively. This adaptive immune system evolved in parallel within Reptilia and thesister group of mammals from their common amphibian ancestor (Janes et al., 2010).
Although Igs and TCRs share a basic modular structure (Charlemagne et al., 1998; Schroeder& Cavacini, 2010; Narciso et al., 2011), Igs are made up of four polypeptide chains (two iden-tical larger or heavy (H) chains and two identical smaller or light (L) chains), while TCRs areheterodimers (Schroeder & Cavacini, 2010). The H chains of Igs consist typically of four or moreglobular domains, with the N-terminal domain varying amongst the Igs of different B-cell clones(termed variable (V) domain), and the remaining domains being shared by many B-cell clones(termed constant (C) domains). In contrast, L chains of Igs and TCR polypeptide chains consist oftwo domains, an N-terminal V domain, and a C domain. In addition, there are two different kindsof TCRs, one composed of α/β heterodimers and the other of γ/δ heterodimers. In both Igs andTCRs, only the V domains interact with antigen.
Each chain of the Igs and of TCRs is encoded in different loci. Jawed vertebrates have threemain loci corresponding to Ig genes (IGH, coding for H chains, and IGK and IGL, coding for κand λ L chains, respectively), and four main loci associated with the TCR genes (TRA, TRB, TRGand TRD, coding for TCR α, β, γ and δ chains, respectively) (Cannon et al., 2004; Danilova &Amemiya, 2009; Flajnik & Kasahara, 2010; Sun et al., 2012). In all these loci, multiple unique Vgene sequences (V-genes) exist that share a common homology. Each Ig and TCR V domain isencoded by a unique combination of a V-gene, a D segment (only in Ig H and TCR β and δ chains)and a J segment, the largest contribution to the V domain amino acid sequence being from the V-gene. Thus, identifying the V-gene repertoire within the genome of a species provides informationabout its potential for mounting an adaptive immune response against antigen.
The Ig and TCR loci may have undergone an independent evolution, because they are locatedin different chromosomes, or in widely separated regions within the same chromosome. One ex-ception to this general trend exists, the TRD locus, which is embedded within the TRA locus. Thisindependent evolution hypothesis is supported by the formation of V-gene clusters in phylogenetictrees constructed from these sequences (Olivieri et al., 2013). If true, this explanation would implythe existence of an ancestral gene (or genes) that define each of the extant Ig and TCR V-genes.
In reptiles, published studies of the Ig genes, together with the functionality of their products,are scarce (Gambon-Deza et al., 2012; Magadan-Mompo et al., 2013; Magadan-Mompo et al.,2013). At present, there are no studies of V-genes in the the TCR loci of reptiles. In Squamata,all the species studied so far have IgM, an IgD consisting of eleven domains, and IgY (Sun et al.,2011; Gambon-Deza et al., 2012). In Anole, the Vκ and Vλ chains have been described andthe absence of the IGK locus in snakes (Wei et al., 2009; Gambon-Deza et al., 2009) has beenconfirmed. Testudines have a more complex IGH locus structure (Li et al., 2012; Xu et al., 2009;Magadan-Mompo et al., 2013), having one IgM gene, one IgD gene encoding eleven constantdomains, and multiple genes for IgY and IgD2. Previous studies of the IGH locus in crocodiles
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
have described three genes for IgM, one gene for IgD encoding seven viable constant domains,three IgA genes, three IgY genes that are similar to IgY in birds, and two IgD2 genes, whichencode chimeric Igs with IgA and IgD related constant domains (Magadan-Mompo et al., 2013).These studies have also detected in crocodiles that the IgA heavy chain is similar to the IgX andIgA of amphibians and birds, respectively.
In contrast to the growing literature related to Igs in reptiles, there are fewer studies of TCRs inthese species. In this article, we uncover the genomic V-gene repertoire of both immunologic re-ceptors in species representative of the three main orders of extant Reptilia: Squamata, Testudines,and Crocodylia, and show relations that provide a deeper understanding of the evolution of thislarge vertebrate lineage (Modesto & Anderson, 2004; Benton, 2005).
2. Methods
For these studies, we used genome data from whole genome shotgun (WGS) assemblies thatare publicly available at the NCBI and provided in FASTA format in the form of contigs. Un-til now, complete V-gene annotations have only been available in two species (i.e., human andmouse), while incomplete annotations have existed in a few species, and even in these cases, onlywithin certain loci. We used our VgenExtractor software tool (Olivieri et al., 2013) to obtain theV-gene sequences from genome files. Our software algorithm searches and extracts the genomesequences of functional V-genes based upon known motifs. In particular, the algorithm scans largegenome files and extracts candidate exon sequences that fulfill specific criteria: they are flanked byrecombination signal sequences, they have a valid reading frame without stop codon, the lengthis at least 280 bp long, and they contain two canonical cysteines and a tryptophan at specificpositions.
From the set of translated candidate exons produced by the VgenExtractor tool, a fraction ofthese sequences could pass the initial filter, purely by chance, but have structures that are quitedifferent from a functional V-gene. Such sequences are easily discarded by subjecting all the can-didate sequences to a BLASTP comparison against a V-gene consensus with a modest threshold(e.g., evalue = 10−15). This operation produces a set of exons possessing the structural require-ments for being functional V-genes.
Once viable V-genes are obtained we need to classify them into their respective loci. Thecorresponding amino acid sequences were analyzed with the suite of tools provided by Galaxy(Goecks et al., 2010). The analysis workflow of the amino acid sequences consist of a multipleBLASTP alignment against a set of consensus sequences, each representing one of the seven mainIg and TCR V-gene loci. Each candidate V-gene sequence is assigned to the locus having thehighest score with respect to the corresponding consensus.
We developed another independent algorithm for classifying V-genes. In this method, givena V-gene sequence, we determine to which locus this sequence most likely belongs by obtaininga heuristic score calculated from the results of a BLASTP comparison against the NR proteindatabase. The score is a weighted combination of information from sequence similarity results(or hits). In particular, the score is constructed from the following: (a) the relevance and contentwords from the protein description, (b) the similarity score, c) and the sequence coverage. A scoreis calculated for each locus, and the V-gene sequence is assigned to the most likely locus. This
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
method has been validated from known annotations in the human and mouse (Olivieri et al., 2013).This technique is used to show the consistency of results obtained with the the simpler workflowdescribed above and can be used to resolve ambiguous loci predictions of the simpler method,which may arise in rare cases.
To compare the repertoire of V-genes in extant species, we carried out single and multi-speciesphylogenetic analysis. The sequences for this study are available at a new public database reposi-tory vgenerepertoire.org (Olivieri & Gambon-Deza, 2014). At present, approximately 20,000V-gene sequences from 82 mammals and 12 reptiles are indexed. For both the single and multi-species phylogenetic analysis, sequences were aligned using Clustal-omega (Sievers & Higgins,2014) and trees were constructed with FastTree (Price et al., 2010) (using maximum likelihoodand WAG matrix). We used MEGA5 (Tamura et al., 2011) for visualization.
In Crocodylidae we used the NCBI accession numbers for the following species: Americanalligator (Alligator mississippiensis; AKHW000000001), Chinese alligator (Alligator sinensis;AVPB0000000001), while the following species used assemblies found in www.crocgenomes.org: saltwater crocodile (Crocodylus porosus; croc sub2.assembly), and gharial (Gavialis gangeti-cus; PRJNA172383). In Testudines we used the NCBI accession numbers for the followingspecies: spiny softshell turtle (Apalone spinifera; APJP00000000.1), green sea turtle (Cheloniamydas; AJIM00000000.1), Chinese soft-shelled turtle (Pelodiscus sinensis; AGCU00000000.1),painted turtle (Chrysemys picta bellii; AHGY00000000.1). In Squamata we used the NCBI ac-cession numbers for the following species: green anole (Anolis carolinensis; AAWZ00000000.2),Burmese python (Python bivittatus; AEQU000000000.2), the king cobra (Ophiophagus hannah;AZIM00000000.1), and data from the Boa Assemblathon 2 (Bradnam & et.al., 2013) (Boa con-strictor; scaffold 6C file).
3. Results
From the WGS data obtained from the NCBI repositories of twelve Reptile species, we usedthe VgenExtractor tool to obtain functional V-genes and classify them with respect to the locusto which they belong. Table 1 presents a summary of all the functional V-genes encountered inthe analysis. For many of these species, it is the first time that Ig and TCR V-genes have beendescribed, and it completes the sparse data presently available with respect to these genes.
3.1. SquamataWithin the Squamata order, we studied the genomes of one Iguanidae (i.e., Anolis carolinen-
sis), and three snakes (i.e., Python bivittatus, Ophiophagus hannah and Boa constrictor). Thephylogenetic trees of the V-genes for these species are shown in Figure 1.
Within the antibody V genes of the Anole, we did not detect the Vκ genes, contradictingprevious studies (Alfoldi, 2011; Wei et al., 2009). A further analysis showed that we did notdetect these Vκ sequences because they lack tryptophan at position 41 (Figure 2). This resultis surprising, since the presence and position of this amino acid is a general characteristic of allfunctional V domains described so far, implying a structural alteration in this chain. The snakespecies, which arose later in the evolution of this lineage, no longer possess the κ chain genes.A possible explanation for this loss in snakes may be due to this structural modification of the κ
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
60
8793
7792
908796
81
95
9779 99
8095
29
86
7659
66
708799
88
91
82
95
85
100
99
84
9894
100
9095
8793
80
90
100
0
79
78
90
83
0
8198
11
96121672
99
3785
801478890
87
94
76
75 83 92
47 91
54
72
0
96
9964
7563
96
8070
858399
99
95
9691
87
87
95
90
73
6
94
0
69
79
75
88
99
89
179492
99
18
75 86 86
98
83
9441
95
8097
2185
8488
96
0.0
0.5
1.0
1.5
2.0
77
89
8370
93
9695
3829
66
8169
92
91
90
79
78
56
86
95
66
23
85
37
76
90
83
9067
83
42
77
50
20
49
81
95
92
77
10
61
9867
92
22
97
94
88
94 99
86
70
41
8880
92
31
84
88
0.0
0.5
1.0
1.5
2.0
44
29 98
2543
75
95
93
9519
7082
90
85
52
299630
94
9288
91
29
87
88
72
84
59
53
84
86
62
0
93
92
87
1696
46
7334
26
98
91
49
83
82
78
81
95
73
74 94 97
92
77 83
85
10
7546
8740
48
94
90
99
87
92
99
73
94
88
9957
3698
95
90
949085
86
82 99
91 69
47
85
9631
84
88
0.0
0.5
1.0
1.5
2.0
93
100
7587
96
89
80
6574
95
89
96
95
93
92
9466
7164
89
88
92
88
91
93
75
80
7686
88
66
48
77
76
76
94
2764
70
51
84
85
86
78
87
84
6886
91 82
92
89
79
3193
9081 0
95
58
95
92
96
78
90
8270
74
9899
67
70
92
22
94
93
88
5198 90
98
75
86
78
87
98
0
99
7987
92
31
96
9741
84
88
9394
5796
0.0
0.5
1.0
1.5
2.0
IGHVIGKVIGLVTRAVTRBVTRDVTRGV
Squamata
Anolis Carolinensis
Boa constrictor Ophiophagus hannah
Python bivittatus
Figure 1: V-gene repertoire in Squamata. Trees were constructed with the amino acid sequences obtained from theVgenExtractor tool. These sequences were aligned with Clustal-omega. For constructing the phylogenetic trees, amaximum likelihood algorithm with the WAG matrix and 500 bootstrap replicates were realized for validation. Root-ing was performed at the midpoint and the linearization provided by Mega5 was applied to improve the visualizationof the trees. For each sequence, two independent classification techniques (view Material and Methods) were usedto determine its respective locus, thereby confirming that clades of the tree correspond to independent loci of Ig andTCR.
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
Table 1: Distribution of V-genes across the main Ig and TCR loci.
Order/specie ighv igkv iglv trav trbv trgv trdv all
SquamataA. carolinensis 89 N/D 30 23 7 0 0 149P. bivittatus 71 0 34 27 2 0 0 134B. constrictor 54 0 17 13 1 0 0 85O. hannah 77 0 28 25 1 0 0 131TestudinesC. picta bellii 342 109 83 113 19 25 11 702P. sinensis 117 87 45 60 16 18 10 353A. spinifera 58 78 17 19 14 12 2 200C. mydas 57 49 25 40 13 16 7 207CrocodyliaA. mississippiensis 74 19 40 77 26 13 9 259A. sinensis 112 28 60 77 29 15 12 334C. porosus 46 17 22 29 19 16 3 152G. gangeticus 103 20 24 55 21 15 7 245
sequence found in older Squamata. For the case of the IGH locus in the Anole, the results agreewith those we described previously by an independent method (Gambon-Deza et al., 2009).
With respect to TCR genes, our analysis showed that the Anole possesses a low number of Vgenes in the TRB locus. Also, we found that V genes corresponding to the γ and δ chains (TRGVand TRDV genes) are absent. Because of this anomalous finding, we searched the Anole genomefor genes that encode the constant (C) regions of the TCR γ and δ chains, with negative results. Wealso analyzed transcriptome sequences for this species obtained from the liver and kidney (avail-able from the NCBI public repository: Anole liver: SRA064777, trace SRR648821, Anole kidney:SRA059267, trace SRR579557). While there were many Ig and TCR α/βmRNA sequences foundin these organs, we obtained negative results when searching for TCR γ/δ sequences. It is pos-
: : * : : : * * : * : : * * * * . : * : . * : : : : : : . : : * * * : : * * * : * * * * * * * : * * : * . * . . * : * * * : *
1 1032 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
V11
V12
V13
V14
V15
T S G D I V V T Q T P S S L Q K N P G D R V D V Q C K T S S S V T S S S Y S Y M N F Y Q L L P G Q K P K L L I Y Y A T N R F E G M P D R F S G R G S G T D F T F T I N G V R P E D E G E Y Y C G Q S Y D S P LS S G N - V I T Q T P A S L Q K N P G D R V E I Q C K T S S S V T S S Y T N F I N F Y Q I L P G Q K P K L L I Y Y A T R R F E G T P D R F S G R G S G T D F T F T I D G V R P K D E G E Y Y C G Q V N N S P LS S G D I V V T Q T P S S L Q K N P G D R V D V Q C N T S S S V T G S S Y S Y M N F Y Q L R P G Q K P K L L I Y Y A T N R F E G T P D R F S G R G S G T D F T F T I N V V R P E D K G E Y Y C G Q G Y S H P LS S G E I V V T Q T P A S L Q K N P G D R V D I Q C K G S S S L D G - - - - Y I N F F Q L I P G Q K P K L L I Y Y A T N R F Q G T P H R F S G Q G S G T D Y T F S I N G V R P E D E G E Y Y C G H A I S Y P LS S G D I V V T Q T P A S L Q K N P G D R V D I Q C K T S S T V S - - - - T Y M N F Y K L I P G Q K P K L L I Y Y V T S R F E G T P D R F S G S G S G T D F T F T I S V V C P D D E G E Y Y C G Q G Y D Y P LS T G Q I V I T Q T P A L I Q A N P G D K V I F R C Q T S S S M G S - - - - Y M A L Y R L K D N Q K P K L L I Y D V T N R F E G T P Y R F S G Q G S G T D F T F T I N G V Q S E D E G E Y Y C C Q R Q S Y P LS S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S M S D D - - - - M A L Y Q F I P G K N P K L L L Y D S S S R F T G T P D R F S G S Y S G T D F T F T I S G V R P E D E R E Y Y C G Q H Y S Y P LS S G Q I V I T Q T P A S L H V N P G D R V I I Q C K A P S S M S N D - - - - M A L Y Q F I P G K N P K L L L Y E S S T R F T G T P D R F S G S Y S G T D F T F T I S G V H P E D E A K Y Y C G Q Y Y S R P LS S G Q I V I T Q A P A S L H V N P G D R V S I Q C K T S S S I S N D - - - - M H L Y Q F I P G Q N P K L L I Y Y A T G L F E G T P D R F S G S Y S G T D F T F T I S G V R P E D E A E Y Y C G Q S N S R P LS S G E I L I T Q T P V S L H V N P R D R V T L Q C K A S S S M S D D - - - - M A L Y Q F I P G K N P I L L L Y E S S T C F T G S P N H F S G S Y S G T D F T F T I S G V C P E D E G E Y Y C G Q Y Y N Q I LS S G Q I V V T Q S P A S L H V N P G D T V T I K C K A S S S I S N E - - - - M D L Y Q F I P G Q K P K L L I Y H A T Y R F E G T P D R F S G S Y S G T D F T F T I S G V R P E D E A E Y Y C G Q D Y S T P LS S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S M S D D - - - - M A L I Q F I L G Q K P K L L I H D G D T R F E G T P D R F S G S Y S G T D F T F T I S G V R P E D A A E Y Y C G Q D Y T T P LS S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S I S N Y - - - - M H L Y Q F I P G Q K P K L L I Y R A T S R F E G T P D R F S G S Y S G T D F T F T I S G V R P E D A A E Y Y C G Q G Y S T P LS S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S M S N D - - - - M A L Y Q F I L G Q K P K L L I H D A T S R F T G T P D H F S G S Y S G T D F T F T I S G V R P E D A A E Y Y C G Q G Y S T P LS S G Q I V I T Q T P A S L H V N P G D T V T I R C K A S S S M S N D - - - - M Q L Y Q F I P G Q N P K L L I Y D A T S R F T G T P D R F S G S Y S G T D F T F T I N G V R P E D E G E Y Y C G Q D Y S R P L
CDR1 CDR2W absence
Figure 2: Alignment of amino acid sequences deduced from V-genes obtained manually in the genome of A. car-olinensis. As can be seen, the amino acid tryptophan (W) does not occupy the canonical position 41, representingan anomaly as compared with other functional V-genes. We produced the alignment and graphical output with theSeaview software program (Gouy et al., 2010).
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
Anole record Anole record Anole record Anole record Anole record
Anole record Anole record Anole record Anole record Anole record
610 kb
140 KbTRAV
IGH
IGHV IGHC
Figure 3: Recent duplications in IGHV (top) and TRAV (bottom) loci in the Anole. Track duplications tracks aredrawn in green. Constraints for duplications in: (a) the IGH locus are: segments ≥ 1Kb, identity ≥ 90%; (b) the TRAlocus are: segments ≤ 0.3Kb; identity ≥ 90%. Duplications only exist in segments where V-genes are located (not inconstant regions), and in the case of the IGH locus, some coincide with the position of V-genes, indicated by the redlines.
sible that cells having TCRγ/δ may be abundant in other specialized tissues where transcriptomedata is lacking. Nonetheless, our studies, carried out with data from different tissue and differentsequencing methods, produced null results for searches of V-genes or constant gene sequences forTCR γ and δ chains. Thus, these results strongly suggest that these species have lost the TCRγ/δreceptor.
In the three serpents studied, there are few V-genes (Table 1). Most of these sequences fromthese species belong to the IGH and IGL loci (Figure 1). In the three snakes, we found that theIGK locus does not exist. From a phylogenetic analysis, we found that V sequences from theIGH locus belong to a single group or clan that has undergone a recent expansion. This result isdifferent from that found in A. carolinensis, where many V-gene clans are found in the IGH locus.In the case of the IGL locus, the number of V-genes in Serpentes is similar to that in the Anole.Finally, just as with A. carolinensis, no V-gene sequences corresponding to the TRG and TRD lociwere found and only one or two V genes were identified in the snake TRB locus.
From studies we carried out previously in mammals, we found a higher birth and death rateamongst the V-genes of the Ig loci as compared to that in the TCR loci. Trees of V-gene sequencesin reptiles have the same structure, suggesting that the same mechanisms exist. Figure 3 showsrecent duplications that occurred within the IGH and TRA loci. These loci are contained withinchromosome segments of length 610 Kb and 140 Kb, respectively. In this figure, track duplicationsare drawn in green and represent those sequences having a similarity greater than 90 percent forsegment sizes > 1Kb. As can be seen, there are many duplications in the IGH locus and theseduplication processes occurs specifically in the IGHV region and not in the constant (i.e., IGHC)region. By applying the same similarity criteria, few duplication tracks are visible in the TRAVregion, and those that are visible have smaller segment sizes: ≤ 0.3 Kb. Finally, we can observethat some duplications coincide with V-genes (indicated with red lines) in the IGHV region, whileno V-genes coincide with the duplication tracks shown in the TRAV region.
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
78
95
86
45
100
40
8888
7185
85 8581
100
99
81
82
100
93 978688
94
98
419985
768799 9387
968645 64 94
2789
7499
75 088
9392
83
55
83
84
98
84 7991
6
71
75
7386
50
209978 98
73
677786
91
88
7784
797799
81
86
41
78
8958
96
8871
979979
8140
72
7860
3698
5490
9288
74
829030
8878
2994
089
9089
84
81
92
85
41
98762473
83
5
29
97
83
2569 8246
66
90
93 888577
08092
7887
6955
86
87
7290 73
7995
5761
89777712879675
92
749581
897876
72
78
87
4090
85
9978
83
95
64
9445
99
99
81
93
899197
50907076
95
91
77
83
96
33
86
79
97
6391
79100
668664
984890
85
85
100
95
77
7282
71953142
83
6599
73808077
75
90
78
9559749193
37
36388799
71
92
8362
85
7897
9078
768996
82
89
88
529983
73
5898 8590
79
86
14
089
99
0
93
84
73
80
75
89
9993
917482
74
82
88
45
98
97
71
100
53
87
8576
97
95
98
21
100
65
72 96 84 73
8387
979594
85
83 90 9175
87 95
94
73
97 91
35 76 99
41
6898
778289 98
100
95 8156
75
72 52 67
26
92
99 3191
73
5194 98
84
98
19
81999138
82
8790 86
90
75
7885100100
92100
7598
9798
067
073 95
7946
86
87
82
9384
9088
74
92
79
31
5780
886879
64
7789
738570
74
87
9376
9488
517764
88
75
98
9697
76
80
77
79
9052
84
37
100
7886
84
7567
78
93
7579847977
77
9575
8391
73
7176
99
90
7699
769990
75
94
78
89
69
918177
91
76
90
75
94
87
97
21
77
87
72
88
9299
91
14
0
817097
88
87927595
81
91
99
801006274837696
69
70
10094
82
0
87889786
100
81998687
84
8458
84
91
75
576
89
7987
7671
96727494
9699
9299
90
81
95
97
98
91
94
83
0.0
0.5
1.0
1.5
2.0
76
99 66
82
97
98
84
6
98
75 95
14 58
937581
9675
100
7026
69
93
8987
8579
90
30
61
33
9189
31
89
72
97
83
99
96
8744
269
690
83
819377
89
5
49
9886
809889
85100
88
71
85
85
81
82
93
628891
89
56
99
9991
897890
93
29
9394
89
78
87
99
38
63
8682
9598
0
79
998574
40
90
9936
69
8780
91
79
70
79
99
857780
818821
89
8877
75
95
01881
93
90
87
81
7879
4589
89
73
7871
83
82
091
4198
91
0
94
7178
89
82
7498 95
0
85 99
64
54
42
92
90
85
859180 3033
86
86
83100
75
2085
77
72
7487
78 85 88
45
33
960
46
99
959590
0
98 8647 72 89
0
99
98
99 76
71
940
9985
96
89
90
81
61
7789
93
89
18
92
99
75
7459
98
2665
92
98
9910098
85
98
4897
80
98
7575
77
87
91
83
67
63
60
99
67728798
34
8485
78
73446
0 97
947763
95100
75
94
78
92
98
91
94
83
29
0,0
0,5
1,0
1,5
2,0
Apalone spinifera
IGKVIGLVTRAVTRBVTRDVTRGV
IGHV92
6 85
66
8969
83
77
99
99
67
88
14
78
88
44
93
97
7486
99
47963485
100
57
98
70
80
85
9178
87
80
7492
0
42
91
92
84
83
58
84
74
64
95
83
94
8476
23
41
87
96
6
9297
50
72
368577
89
75
86
75
6899776
35
89
83
90
74
76
70
25
81
70
79
98
100
7876
85
81 62
93
76 88
92
86
90
64
0 95
367781
87
83
85
56
74
88
96
78
80
84 76
82
90
93
26
88
85
9496
92
94
84
81
97
97
86
81
85
3
95
89
96
60
91
51
8894
99
91
92
84
93
98
0
1686 52 0
89
79
95
87
28
0.0
0.5
1.0
1.5
84
83
100
94
67
41
87
96
99
65
6
61
91
72
81
81
68
75
9763
90
93
74
89
73
94
73
8928
176873
97
89
78
60
94
67
85
9174
85
86
62
74
47
25
99
63
94
18
98
87
100
7179
88
89
7748
9392
90
75
89
85
86
7675
75
93
88
96
807776
90
93
17
63
26
8712
99
80
84
84
91
84
89
92
84
7796
93 25
67
73
77 85
82
98
70
80
80
85
99
36
78 49
91
82
76
81
95
94
76
97
8159
74
37
96
45
98
91
92
93
14
78
79
99
70
1483
74
8436
77
75
88
8875
89
73
83
79
95
87
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Testudines
Chelonia mydas
Chrysemys picta bellii
Pelodiscus sinensis
Figure 4: V-genes in Testudines. The trees were constructed in the same way as described in Figure 1.
3.2. Testudines and CrocodyliaIn the differentiation of the reptile lineage to birds, the Testudine and Crocodylia branches
emerged. Four Testudine genomes are available: those of three land tortoises, Chrysemys pictabellii, Pelodiscus sinensis and Apalone spinifera, and one marine turtle, Chelonia mydas. Figure4 shows the phylogenetic analysis of the V genes sequences from these species. For the westernpainted turtle, C. picta bellii, we found more than 700 V-genes, being the specie with the mostV-genes that we have studied to date. By comparison, the marine turtle has ≈ 1/3 as many genes(Table 1). In all four turtles, there is a preponderance of Ig V-genes as compared with the numberof TCR V-genes. In the three Ig loci, there is an expansion of multiple V-gene clans, similarto what happens in the Anole, suggesting that the structure of these genes is more diverse thanthat found in mammals (Gambon-Deza et al., 2009). Within the Ig loci, there are more Ig heavychain genes than those for light chains; the λ light chain has the least Ig V-genes. With respectto TCR V-genes, the TRA locus have the largest set, while the TRB locus has on average muchless genes. Such a low amount of Vβ genes is observed in all reptiles, opening the possibility thatan underlying process exists. In contrast to the Squamata, each of the turtle species we studiedpossess TCR γ and δ V-genes, albeit with a wide variance amongst them. From the phylogeneticstudy of these V-genes, the TRGV genes are included in one specific clan, while the TRDV genesare within the clan of the TRAV genes (Figure 4).
In the order Crocodylia, there are four publicly available genomes (Alligator mississippiensis,Alligator sinensis Crocodylus porosus, and Gavialis gangeticus). In each species, we found V-
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
57
0
968589100 92
83
7993 97
87
68
79
93
619693
869569
92
83
937769
67
98
5689
7510099
81
9997
87
4099
8239
66
099
9888
7998
79
80
80
99
85
91
68
80
41948499
54
8991
15
58
0
95
89
92
84
49
5693
97
7488
9884
0
96
82
96
81
96
97
88
79
33
8977
76
99
78
0
88
81
73
99
75
96
82
85
0
99
94
95
8864
77
7485
86
90
94 8565
25
86
8432
50 909285
7784 97
93
63
26
81
96
18100
8231
9566
88
1994
87
97
83
2991
3079 34
74
85
3
76
56
49
74
76
7685
91
85
20
96
69
84
9984
92
9880 35
80
9989
82 71
7883
79
95
87
98
28
0.0
0.5
1.0
1.5
9684
80
7897
87 96
75
9672100
8697
95
6992
6385
99
67
98
98
0
45
81
35 88
60
99
99
99
98
9788
79
9880
85 80 88
8290
9954
89
91
97
8992
84
49
56
97
74
88
98
84
0
96
82
96
9381
86
96
88
2179
97
7785
76
7571
80
96
82
74
91
88
6476
84
34
90
85
90
84
93
94
437379
88
7492
32
72
90
7
97
9363
26
12 92 95
100
81
96
100 75
97
66
30
7326
8197 87
072
8
70
83
83
75 88
94
97
8694
79
9334
74
85
3
78778367
49
74
76
7879
9371
40
38
92
680
91
84
86
20
96
94
9875
84
97
91
9398
100 3552
85
7679
82
9344
78
0
83
83
79
76
95
87
98
28
0.0
0.5
1.0
1.5
100
57
0
81
100
9436
93
87
68
79
93
43
618193
86
9569
92
9577
99
67
89
45
99
81
0
99 97
81
40
84660
99
98
7588
9979
98
79
80
99
85 751002591
68
41948454
89
91
100
9280
9589
92
84
0
49
56
6997
85
74
88
98
84
0
96
82
96
8196
93
5
99
78
98
92
80
86
0
85
76
938019
80
7781
9573
620
75
96
82
88
992175
94
88
64
9498
8466
76
77
90
989686
90
55
939194
75 94
8592
849850
72
90
7785
38
89
7 84
97
93
26
9592
95
779599
96 81
96100
7582
9985
9566
89
80
9326
74
87
83
95
8766
91
8385
993
74
79
34
3
33
717649
76
84
78
87
8576
79
43
8724
91
85
20
96
8980
75
69
81
80
099
92
0
9882
80
99
8397
7883
79
95
87
98
28
0.0
0.5
1.0
1.5
7985
0
80
9681100
866992
67
092
80
8290
5489
091
9758
8992
84
49
56
93
12
97
098
84
0
96
82
96
88
76
88
9185
76
96
82
56
9994
64
89
7585
90
96
86
90
7826
72
34
79
88 7492
863277
72
90
97
93
26
75
95
88
97
8287 83
88
85
52
93
74
85
3
76
98
8091
20
96
93 84
9114
85 44 780
839079
76
95
8798
28
0.0
0.5
1.0
1.5
IGKV
IGLV
TRAV
TRBV
TRDV
TRGV
IGHV
Crocodylia
Alligator mississippensis
Gavialis gangeticus
Alligator sinensis
Crocodylus porosus
Figure 5: V-genes in Crocodylia. The trees were constructed in the same way as described in Figure 1.
regions of the three main Ig loci and four TCR loci (Table 1). There are more V-genes for Igthan for TCR. Amongst the Ig V-genes, the majority of them are within the IGH locus. All fourCrocodylia species have higher numbers of V-genes in the IGL locus than in the IGK one. Figure5 shows the phylogenetic trees of the V-genes for each of these species.
3.3. Clans in V-gene sequencesWe studied the phylogenetic relationships of V-genes from the 82 mammal and 12 reptile
species available in Vgenerepertoire.org (Olivieri & Gambon-Deza, 2014). From the largeset of immunoglobulin V gene sequences in this study, there is evidence for distinct groups orclans (Lefranc & Lefranc, 2001) of related sequences from different species amongst the clades.Previous studies have identified such clans in human and mice; the IMGT (Lefranc et al., 2009;Giudicelli & Lefranc, 2012) has annotated three clans amongst the VH genes, three in Vκ, and atleast five clans in Vλ (Lefranc & Lefranc, 2001; Giudicelli & Lefranc, 1999).
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
Figure 6 shows multi-species trees obtained from alignments of the V-genes from the Ig loci.Since the purpose of these trees was to explore the clan structure, we used all available sequencesof reptiles and mammals instead of trying to retain the same number of species from each class.In these plots, the sequences corresponding to mammals are represented in blue, while those forreptiles are drawn in red. The nodes having sequences which belong to clans annotated by theIMGT are labelled in the trees. As seen in the trees of Figure 6, many sequences do not belongto a previously defined clan. Since the clan studies to date have been based upon a limited setof species, the set of identified clans has been necessarily inadequate. It is only now, with theavailability of large number of V-genes sequences from a broad representation of mammal andreptile species (Olivieri & Gambon-Deza, 2014), that it is possible to define new clans.
In the IGHV tree of Figure 6, the V-gene sequences of reptiles are grouped together and thesegroups are interspersed within mammal clans. For example, Clan-II contains sequences fromTestudines, Crocodylia, and Squamata that together form subgroups, and these subgroups areinterposed with several mammalian subgroups comprising many sequences. The tree for the IGLVsequences shows that the reptile Vλ sequences are also grouped in several subgroups and these aredistributed amongst the five mammalian clans previously defined by the IMGT. In the IGK locus,more Vκ clans can be identified in mammals than previously described, so that the present clannomenclature is not sufficient to group all the taxa. All the reptile IGKV genes cluster into a singlegroup forming an independent clan. Also, this clan contains only a small group of mammal V-genesequences, indicating a different evolution in this locus between mammals and reptiles.
V-gene clans have not been previously identified within the TCR loci. The multi-speciestrees of Figure 7, consisting of TCR sequences, demonstrate that the clan concept is also rel-evant amongst these loci. Indeed, in the TRA locus, six clans can be identified, each havingsequences from both reptiles and mammals. As can be seen in the figure, the roots of each of theclans correspond to reptile sequences, indicating the maintenance of these sequences over longerperiods of time than those found in mammals. The clans are numbered according to the amountof taxons in the clan, ranked from largest (Clan I) to smallest (Clan V). The tree also demonstratesthat clans I and II diversified significantly in mammals, while in others, such as clan V, there wasless diversification in mammals than witnessed in reptiles. Also, we can deduce the presence ofadditional clans in reptiles that retain remnants of sequences that were later lost in mammals.
In the tree for the TRB locus in Figure 7, three distinct clans can be identified that groupmammal and reptile taxons. Nonetheless, the majority of mammalian sequences in this locus donot share the same clan with those from reptiles. Few mammal sequences are found within thereptile clans, while those grouped in separate clans appear to have undergone an evolutionarydiversification different from reptiles within this locus.
Trees from the TCR γ/δ have a similar structure. In the TRDV tree, three clans can be identi-fied that contain sequences from mammals, Testudines, and Crocodylia (this locus does not existin Squamata, as described previously). Similar results are seen in the multi-species tree for theTRG locus, but in this case, the majority of sequences from reptiles belong to a single clan.
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
--------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
--------------------------------------------
----
--------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----
--------
----
----
----
----
----
----
----
----
--------
----
--------
----------------
--------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------
------------------------------------------------------------------------------------------------
------------------------------------
------------------------------------------------
------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0
0.2
0.4
0.6
0.8
IGHV
IGKV
IGLV
Test/Croc
Test/Croc/Squa
Test/Croc/Squa
Test/Squa
SquaTest/Croc
Test/Croc/Squa
Clan IV
Clan V
Clan III
Clan II
Clan I
Clan IIIClan II
Clan I
Test/Croc/Squa
Test/Squa
Test/Croc/Squa
Test/Croc/Squa
Test/Croc/Squa
Test/Squa
Test/Croc/Squa
Test/Squa
Clan II
Clan III
ClanI
---------------------------------------------------------------------------
---
---------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0
0.2
0.4
0.6
0.8
Figure 6: Trees were generated from 11 reptiles and 50 mammals using the V-gene sequences of the three Ig loci. Theclans with reptile taxa are marked in red while those for mammals are indicated in blue. Previous studies (Lefranc& Lefranc, 2001; Giudicelli & Lefranc, 1999) have defined the presence of the clans indicated in the plots. Thealignment of the amino acid sequences was performed with clustal-omega and tree was constructed using FastTreewith the WAG matrix. Plots were constructed using MEGA v5.2 with the linearized branch option. The labels of theReptilia orders correspond to Testudines (Test), Crocodylia (Croc), and Squamata (Squa).
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0
0.5
1.0
1.5
------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
------------
------------
------------
------------
------------------------------------
0.0
0.5
1.0
1.5
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----
--------
0.0
0.5
1.0
1.5
TRAVTRBV
TRDVTRGV
Squa/Tes/Cro
Squa/Tes/Cro
Tes/Cro
Squa/Tes/Cro
Squa/Tes/Cro
Tes/Cro
Cro
Cro
Tes/CroTes/Cro
Tes/Cro *
*
*
*
*
O
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------
------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------
0.0
0.5
1.0
1.5
Squa/Tes/Cro
Squa/Tes/Cro
Clan I
Clan III
Clan II
Clan V
Clan IV
Clan VI Clan I
Clan II
Clan III
Mammal Clans
Squa/Tes/Cro
Figure 7: Trees were generated from 11 reptiles and 50 mammals using the V-gene sequences of the four TCR loci.The clans in reptile taxa are marked in red while those for mammals are in blue. The nodes that root the mammalclans with the reptile clans are marked by (*), while others that are exclusively from mammals or reptiles are markedwith (o). The alignment and tree construction was performed in a similar way to that of Figure 6. The labels of theReptilia orders correspond to Testudines (Test), Crocodylia (Croc), and Squamata (Squa).
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
4. Discussion
How did the genes that code for Igs and TCRs arise and evolve in lineages of reptiles? Itis known that extant reptiles and mammals are separate vertebrate lineages that descended froma common ancestor. These vertebrates must have inherited their antigen recognition machinery,namely Ig and TCR genes, from this common ancestor. These vertebrates evolved on land indifferent habitats, and underwent further speciation. After the divergence from mammals, thereptile lineage split in two. One of these lines led to present day Squamata, while the other gaveway to Testudines and Crocodylia. We found that the Ig V-gene repertoire in reptiles is similarin structure to that found in mammals. Just as in mammals, the data suggests that Ig V-genesunderwent a diversification process with high birth/death rates.
Another striking result is that the Squamata species lost the TCRγ/δ locus. Neither V- norC-gene sequences were found in either the genome or several transcriptome datasets, indicatingthe absence of these genes. In each of the Squamata species we studied, there is a low numberof V-genes in both the TRA and TRB loci, perhaps underscoring a particular characteristic oftheir TCRα/β. This study also revealed several relevant facts with regard to the evolution ofthe Ig loci. From A. carolinensis, we found that the Vκ sequences have an anomalous alterationin the canonical tryptophan, whose presence is a general characteristic of all functional V-genesdescribed so far. This alteration observed in Iguanidae may be related to the loss of this locus in thesuborder of Serpentes. It is known that birds also lost the IGK locus. However, in the evolutionaryline that gave rise to Testudines and Crocodylia, common to that of birds, we did not detect thisloss of tryptophan 41 at the canonical position. It is thus surprising that two evolutionary lines,serpents and birds, that diverged ∼ 275 Mya independently lost the IGK locus.
In the Testudines and Crocodylia lineage, there is no such gene loss as seen in the Squamata.Table 1, shows the V-gene distribution across all the seven major Ig and TCR loci. While theaverage number of TRB V-genes is larger than that found in Squamata, it is still less than theaverage number found in mammals (Olivieri & Gambon-Deza, 2014). For the case of the TRGand TRD loci, the average number of V-genes was similar to that found in mammals.
At the evolutionary level, there are studies that show that the VH genes are grouped into threeclans (Lefranc & Lefranc, 2001; Giudicelli & Lefranc, 1999; Lefranc et al., 2009). These studiesdistributed the sequences in clans from limited information based upon a small number of mammalspecies. In this work, we presented multi-species trees formed from the V-genes of more than 90mammal and reptile species. These trees show that additional clans exist beyond those describedfrom previous annotations.
In a previous analysis of A. carolinensis (Gambon-Deza et al., 2009), we showed that its VHgene sequences cluster into five distinct groups, one of which is quite different from the rest. Thenecessity to define new clans is shown in the reptilian VH and Vκ sequences. In the Vκ genesof Testudines and Crocodylia, there exists a single clan different from those of mammals andwithin which mammals have only a few V-gene sequences (Figure 6). This suggests that reptilesgenerated κ chains from a homogeneous ancestral group, from which mammals either diversifiedvery little or lost most of the variants that were generated. Both in serpents and in birds, the κ chainwas lost, indicating that the Vκ genes expressed in reptiles may be dispensable. In contrast, theIGKV tree of Figure 6 demonstrates the large κ chain diversification that took place in mammals,
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
corresponding to the most common Ig chain in these species, such as the case in human and mouse.The retention of the five clans in the Vλ gene sequences, as described previously in mammals,
is significant because it suggests a positive selection mechanism. The reptile sequences maintainthis distribution of five V-genes clans found in mammals and each is rooted with a reptile clan.The difference in sequences (which may be structural) that conditioned the grouping within clanshas been maintained in both lineages for more than 300 Mya, indicating functional transcendencynot yet understood.
Instead of recognizing antigen directly, TCRs recognize a complex of antigen-derived pep-tide and a major histocompatibility complex (MHC) molecule. Thus, this restriction could beaccounted for by a co-evolution selection between the TCR V-genes with MHC molecules. Atpresent, there are few comprehensive studies exist that characterize the MHC molecules in reptiles(Glaberman et al., 2008; Glaberman & Caccone, 2008; Glaberman et al., 2009).
In a previous publication (Olivieri et al., 2013), we commented on the presence of long branchesin the trees corresponding to the TCR V-gene sequences of mammals. This same phenomena isalso observed in the TCR V-gene trees for reptiles. Due to the ability to study a large number ofV-gene sequences in these loci, we also describe for the first time the presence of well-definedclans in the multi-species trees. In the TRA locus, the reptiles and mammals share common clansand the roots of the clades are from both reptiles and mammals, indicating a common origin. Oneexplanation for this observed phenomena is a structural or functional preservation of products fromthis loci in species that diverged more than 300 Mya.
The TRBV multi-species tree of Figure 7 describes an evolution between reptiles and mammalsthat is quite different from that seen in the TRAV tree. The Vα sequences evolved from commonancestors, as seen by the shared mammal and reptile root nodes. The distribution of the Vαsequences are also evenly distributed amongst the clans. This same homogeneous distributionis not seen in the TRBV tree. In particular, the vast majority of mammal Vβ sequences havediversified into separate clans (i.e., mammal clans). In this way, the Vβ sequences of reptiles groupwithin tree old clusters with little representation from mammals. Thus, contrary to Vα sequences,reptile and mammal Vβ sequences are segregated from each other, in a way reminiscent of Vκsequences. While the common evolutionary features of Vα in reptiles and mammals supports acoevolution hypothesis between TCR and MHC, the evolutionary differences observed in the Vβsuggests this chain may have a different role in reptiles and mammals. Studies of MHC evolutionin reptiles could provide clues that clarify this issue.
1
ReferencesAlfoldi, J. e. a. (2011). The genome of the green anole lizard and a comparative analysis with birds and mammals.
Nature, 477, 587–591. doi:10.1038/nature1039.Benton, M. (2005). Vertebrate Palaeontology. Oxford, UK: Blackwell.Benton, M., & Donoghue, P. (2007). Paleontological evidence to date the tree of life. Mol. Biol. Evol., 24, 26–53.
doi:10.1093/molbev/msl150.Bradnam, K., & et.al. (2013). Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate
species. Gigascience, 2, 10. doi:10.1186/2047-217X-2-10.Cannon, J., Haire, R., Rast, J., & Litman, G. (2004). The phylogenetic origins of the antigen-binding receptors and
somatic diversification mechanisms. Immunological Reviews, 200, 12–22.
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
Charlemagne, J., Fellah, J., Guerra, A., Kerfourn, F., & Partula, S. (1998). T-cell receptors in ectothermic vertebrates.Immunological Reviews, 166, 87–102.
Danilova, N., & Amemiya, C. T. (2009). Going adaptive. Annals of the New York Academy of Sciences, 1168,130–155.
deBraga, M., & Rieppel, O. (1997). Reptile phylogeny and the interrelationships of turtles. Zool. J. Linnean Soc.,120, 281–354.
Flajnik, M., & Kasahara, M. (2010). Origin and evolution of the adaptive immune system: genetic events and selectivepressures. Nat Rev Genet., 11, 47–59. doi:10.1038/nrg2703.
Gambon-Deza, F., Sanchez-Espinel, C., Mirete-Bachiller, S., & S, M.-M. (2012). Snakes antibodies. Dev CompImmunol., 38, 1–9. doi:10.1016/j.dci.2012.03.001.8.
Gambon-Deza, F., Sanchez-Espinel, C., & Magadan-Mompo, S. (2009). The immunoglobulin heavy chain locus inthe reptile anolis carolinensis. Mol Immunol., 46, 1679–87. doi:10.1016/j.molimm.2009.02.019.
Giudicelli, V., & Lefranc, M. (1999). Ontology for immunogenetics: the imgt-ontology. Bioinformatics, 15, 1047–54.Giudicelli, V., & Lefranc, M. (2012). Imgt-ontology 2012. Front Genet., 3, 79. doi:10.3389/fgene.2012.00079.Glaberman, S., & Caccone, A. (2008). Species-specific evolution of class i mhc genes in iguanas (order: Squamata;
subfamily: Iguaninae). Immunogenetics, 60, 371–82. doi:10.1007/s00251-008-0298-y.Glaberman, S., DuPasquier, L., & Caccone, A. (2008). Characterization of a nonclassical class i mhc gene in a
reptile, the galapagos marine iguana (amblyrhynchus cristatus). PLoS One, 3, e2859. doi:10.1371/journal.pone.0002859.
Glaberman, S., Moreno, M., & Caccone, A. (2009). Characterization and evolution of mhc class ii b genes ingalapagos marine iguanas (amblyrhynchus cristatus). Dev Comp Immunol, 33, 939–47. doi:doi:10.1016/j.dci.2009.03.003.
Goecks, J., Nekrutenko, A., Taylor, J., & Galaxy-Team. (2010). Galaxy: a comprehensive approach for supportingaccessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 11, R86.doi:10.1186/gb-2010-11-8-r86.
Gouy, M., Guindon, S., & Olivier, G. (2010). Seaview version 4: a multiplatform graphical user interface for sequencealignment and phylogenetic tree building. Molecular Biology and Evolution, 27, 221–224.
Hedges, S., & Kumar, S. (2009). The Timetree of Life. New York, USA: Oxford Univ. Press.Janes, D., Organ, C., Fujita, M., Shedlock, A., & Edwards, S. (2010). Genome evolution in reptilia, the sister group
of mammals. Annu. Rev. Genom. Human Genet., 11, 239–264. doi:10.1146/annurev-genom-082509-141646.Laurin, M., & Girondot, M. (1999). Embryo retention in sarcopterygians, and the origin of the extra-embryonic
membranes of the amniotic egg. Ann. Sci. Nat. Zool., 20, 99–104.Laurin, M., & Reisz, R. (1995). A reevaluation of early amniote phylogeny. Zoological Journal of the Linnean
Society, 113, 165–223. doi:10.1111/j.1096-3642.1995.tb00932.x.Lefranc, M., Giudicelli, V., Ginestoux, C., Jabado-Michaloud, J., Folch, G., Bellahcene, F., Wu, Y., Gemrot, E.,
Brochet, X., Lane, J., Regnier, L., Ehrenmann, F., Lefranc, G., & Duroux, P. (2009). Imgt, the internationalimmunogenetics information system. Nucleic Acids Res, 37, D1006–12. doi:10.1093/nar/gkn838.
Lefranc, M., & Lefranc, G. (2001). The Immunoglobulin FactsBook. London, UK: Academic Press.Li, L., Wang, T., Sun, Y., Cheng, G., Yang, H., Wei, Z., Wang, P., Hu, X., Ren, L., Meng, Q., Zhang, R., Guo,
Y., Hammarstrom, L., Li, N., & Zhao, Y. (2012). Extensive diversification of igd-, igy-, and truncated igy(δfc)-encoding genes in the red-eared turtle (trachemys scripta elegans). J Immunol., 189, 3995–4004. doi:10.4049/jimmunol.1200188.
Magadan-Mompo, S., Sanchez-Espinel, C., & Gambon-Deza, F. (2013). Igh loci of american alligator and saltwatercrocodile shed light on iga evolution.. Immunogenetics, 65, 531–41. doi:10.1007/s00251-013-0692-y.
Magadan-Mompo, S., Sanchez-Espinel, C., & Gambon-Deza, F. (2013). Immunoglobulin genes of the turtles. Im-munogenetics, 65, 227–37. doi:10.1007/s00251-012-0672-7.
Modesto, S., & Anderson, J. (2004). The phylogenetic definition of reptilia. Systematic Biology, 53, 815–821.doi:10.1080/10635150490503026.
Narciso, J., Uy, I., Cabang, A., Chavez, J., Lorenzo, J., Padilla-Concepcion, G., & Padlan, E. (2011). Analysis of theantibody structure based on high-resolution crystallographic studies. New Biotechnology, 28, 435–47.
Olivieri, D., Faro, J., von Haeften, B., & Sanchez-Espinel, F., Gambon-Deza (2013). An automated algorithm for
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;
extracting functional immunologic v-genes from genomes in jawed vertebrates. Immunogenetics, 65, 691–702.doi:10.1007/s00251-013-0715-8.
Olivieri, D., & Gambon-Deza, F. (2014). Vgenerepertoire.org identifies and stores variable genes of immunoglobulinsand t-cell receptors from the genomes of jawed vertebrates. bioRxiv, . doi:10.1101/002139.
Price, M., Dehal, P., & AP, A. (2010). Fasttree 2–approximately maximum-likelihood trees for large alignments.PLoS One, 5, e9490. doi:10.1371/journal.pone.0009490.
Reisz, R., & Muller, J. (2004). Molecular timescales and the fossil record: a paleontological perspective. TrendsGenet., 20, 237–41;.
Schroeder, H., & Cavacini, L. (2010). Structure and function of immunoglobulins. Journal of Allergy and ClinicalImmunology, 125, S41 – S52.
Sievers, F., & Higgins, D. (2014). Clustal omega, accurate alignment of very large numbers of sequences. MethodsMol Biol., 1079, 105–16. doi:10.1007/978-1-62703-646-7_6.
Sun, Y., Wei, Z., Hammarstrom, L., & Zhao, Y. (2011). The immunoglobulin δ gene in jawed vertebrates: a compar-ative overview. Dev Comp Immunol., 35, 975–81. doi:10.1016/j.dci.2010.12.010.
Sun, Y., Wei, Z., Li, N., & Zhao, Y. (2012). A comparative overview of immunoglobulin genes and the generation oftheir diversity in tetrapods. Developmental & Comparative Immunology, 39, 103–109.
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., & Kumar, S. (2011). Mega5: molecular evolutionarygenetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. MolecularBiology and Evolution, 28, 2731–2739. doi:10.1093/molbev/msr121.
van Tuinen, M., & Hadly, E. (2004). Error in estimation of rate and time inferred from the early amniote fossil recordand avian molecular clocks. J Mol Evol., 59, 267–76.
Wei, Z., Wu, Q., Ren, L., Hu, X., Guo, Y., Warr, G., Hammarstrom, L., Li, N., & Zhao, Y. (2009). Expression of igm,igd, and igy in a reptile, anolis carolinensis. J. Immunol., 183, 3858–64. doi:10.4049/jimmunol.0803251.
Xu, Z., Wang, G., & Nie, P. (2009). Igm, igd and igy and their expression pattern in the chinese soft-shelled turtlepelodiscus sinensis. Mol Immunol., 46, 2124–32. doi:10.1016/j.molimm.2009.03.028.
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;