genomic v-gene repertoire in reptiles - biorxiv v-gene repertoire in reptiles david n. olivieri1,...

16
Genomic V-gene repertoire in reptiles David N. Olivieri 1 , Bernardo von Haeften 2 , Christian S´ anchez-Espinel 3 , Jose Faro 2,4,5 , Francisco Gamb´ on-Deza 4,6 1 School of Computer Science, University of Vigo, Ourense 32004, Spain. 2 Area of Immunology, Faculty of Biology, and Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain. 3 Nanoimmunotech SL, Pza. Fernando Conde, Montero R´ ıos 9, 3620, Vigo, Spain. 4 Instituto Biom´ edico de Vigo, 36310 Vigo, Spain. 5 Instituto Gulbenkian de Ciˆ encia, 2781-901 Oeiras, Portugal. 6 Servicio Gallego de Salud (SERGAS), Unidad de Inmunolog´ ıa, Hospital do Meixoeiro, 36210 Vigo, Spain. [email protected], [email protected] Abstract Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged, one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia and birds. In this study, we determined the genomic variable (V)-gene repertoire in reptiles corresponding to the three main immunoglobulin (Ig) loci and the four main T-cell receptor (TCR) loci. We show that squamata lack the TCR γ/δ genes and snakes lack the Vκ genes. In representative species of testudines and crocodiles, the seven major Ig and TCR loci are maintained. As in mammals, genes of the Ig loci can be grouped into well-defined clans through a multi-species phylogenetic analysis. We show that the reptile VH and Vλ genes are distributed amongst the established mammalian clans, while their Vκ genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptile and mammal V-genes of the TRA locus cluster into six common evolutionary clans. In contrast, the reptile V-genes from the TRB locus cluster into three clans, which have few mammalian members. In this locus, the V-gene sequences from mammals appear to have undergone dierent evolutionary diversification processes that occurred outside these shared reptile clans. Keywords: Immunologic Repertoire, Reptile Evolution 1. Introduction The order Reptilia is believed to have evolved from amphibian-like creatures, the labyrinthodonts, which date back to the Carboniferous period some 312 million years ago (Mya) (Laurin & Reisz, 1995; Laurin & Girondot, 1999). The descendants of those precursors became progressively more terrestrial and independent of their aquatic origins, venturing out on a larger expanse of land with many new dierent habitats (Benton & Donoghue, 2007; deBraga & Rieppel, 1997; van Tuinen & Hadly, 2004; Reisz & M¨ uller, 2004). Evolving from this primordial amphibious ancestor, a lineage began that would later split into two separate lines, the Sauropsids and the Synapsids, the latter of which would eventually give rise to the line of extant mammals (Benton, 2005; Hedges & Preprint February 12, 2014 not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was . http://dx.doi.org/10.1101/002618 doi: bioRxiv preprint first posted online Feb. 12, 2014;

Upload: vutu

Post on 12-May-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Genomic V-gene repertoire in reptiles

David N. Olivieri1, Bernardo von Haeften2, Christian Sanchez-Espinel3, Jose Faro2,4,5, FranciscoGambon-Deza4,6

1School of Computer Science, University of Vigo, Ourense 32004, Spain. 2Area of Immunology, Facultyof Biology, and Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain.

3Nanoimmunotech SL, Pza. Fernando Conde, Montero Rıos 9, 3620, Vigo, Spain. 4Instituto Biomedico deVigo, 36310 Vigo, Spain. 5Instituto Gulbenkian de Ciencia, 2781-901 Oeiras, Portugal. 6Servicio Gallego

de Salud (SERGAS), Unidad de Inmunologıa, Hospital do Meixoeiro, 36210 Vigo, [email protected], [email protected]

Abstract

Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionarylineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged, onegave rise to Squamata, while the other gave rise to Testudines, Crocodylia and birds. In thisstudy, we determined the genomic variable (V)-gene repertoire in reptiles corresponding to thethree main immunoglobulin (Ig) loci and the four main T-cell receptor (TCR) loci. We showthat squamata lack the TCR γ/δ genes and snakes lack the Vκ genes. In representative speciesof testudines and crocodiles, the seven major Ig and TCR loci are maintained. As in mammals,genes of the Ig loci can be grouped into well-defined clans through a multi-species phylogeneticanalysis. We show that the reptile VH and Vλ genes are distributed amongst the establishedmammalian clans, while their Vκ genes are found within a single clan, nearly exclusive fromthe mammalian sequences. The reptile and mammal V-genes of the TRA locus cluster into sixcommon evolutionary clans. In contrast, the reptile V-genes from the TRB locus cluster into threeclans, which have few mammalian members. In this locus, the V-gene sequences from mammalsappear to have undergone different evolutionary diversification processes that occurred outsidethese shared reptile clans.

Keywords: Immunologic Repertoire, Reptile Evolution

1. Introduction

The order Reptilia is believed to have evolved from amphibian-like creatures, the labyrinthodonts,which date back to the Carboniferous period some 312 million years ago (Mya) (Laurin & Reisz,1995; Laurin & Girondot, 1999). The descendants of those precursors became progressively moreterrestrial and independent of their aquatic origins, venturing out on a larger expanse of land withmany new different habitats (Benton & Donoghue, 2007; deBraga & Rieppel, 1997; van Tuinen& Hadly, 2004; Reisz & Muller, 2004). Evolving from this primordial amphibious ancestor, alineage began that would later split into two separate lines, the Sauropsids and the Synapsids, thelatter of which would eventually give rise to the line of extant mammals (Benton, 2005; Hedges &

Preprint February 12, 2014

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

Kumar, 2009). Shortly after this separation of the Sauropsids and Synapsids, the sauropsids furtherdivided into two separate lineages. One gave rise to the present day Squamata (lizards and snakes)and Tuatara, while the other gave rise in a very early period to Testudines, later to Crocodylia, andfinally, gave way to birds (Benton, 2005; Hedges & Kumar, 2009).

The first jawed vertebrates developed an adaptive immune system consisting of B and T lym-phocytes together with their associated antigen receptors, immunoglobulins (Ig) and T-cell recep-tors (TCR), respectively. This adaptive immune system evolved in parallel within Reptilia and thesister group of mammals from their common amphibian ancestor (Janes et al., 2010).

Although Igs and TCRs share a basic modular structure (Charlemagne et al., 1998; Schroeder& Cavacini, 2010; Narciso et al., 2011), Igs are made up of four polypeptide chains (two iden-tical larger or heavy (H) chains and two identical smaller or light (L) chains), while TCRs areheterodimers (Schroeder & Cavacini, 2010). The H chains of Igs consist typically of four or moreglobular domains, with the N-terminal domain varying amongst the Igs of different B-cell clones(termed variable (V) domain), and the remaining domains being shared by many B-cell clones(termed constant (C) domains). In contrast, L chains of Igs and TCR polypeptide chains consist oftwo domains, an N-terminal V domain, and a C domain. In addition, there are two different kindsof TCRs, one composed of α/β heterodimers and the other of γ/δ heterodimers. In both Igs andTCRs, only the V domains interact with antigen.

Each chain of the Igs and of TCRs is encoded in different loci. Jawed vertebrates have threemain loci corresponding to Ig genes (IGH, coding for H chains, and IGK and IGL, coding for κand λ L chains, respectively), and four main loci associated with the TCR genes (TRA, TRB, TRGand TRD, coding for TCR α, β, γ and δ chains, respectively) (Cannon et al., 2004; Danilova &Amemiya, 2009; Flajnik & Kasahara, 2010; Sun et al., 2012). In all these loci, multiple unique Vgene sequences (V-genes) exist that share a common homology. Each Ig and TCR V domain isencoded by a unique combination of a V-gene, a D segment (only in Ig H and TCR β and δ chains)and a J segment, the largest contribution to the V domain amino acid sequence being from the V-gene. Thus, identifying the V-gene repertoire within the genome of a species provides informationabout its potential for mounting an adaptive immune response against antigen.

The Ig and TCR loci may have undergone an independent evolution, because they are locatedin different chromosomes, or in widely separated regions within the same chromosome. One ex-ception to this general trend exists, the TRD locus, which is embedded within the TRA locus. Thisindependent evolution hypothesis is supported by the formation of V-gene clusters in phylogenetictrees constructed from these sequences (Olivieri et al., 2013). If true, this explanation would implythe existence of an ancestral gene (or genes) that define each of the extant Ig and TCR V-genes.

In reptiles, published studies of the Ig genes, together with the functionality of their products,are scarce (Gambon-Deza et al., 2012; Magadan-Mompo et al., 2013; Magadan-Mompo et al.,2013). At present, there are no studies of V-genes in the the TCR loci of reptiles. In Squamata,all the species studied so far have IgM, an IgD consisting of eleven domains, and IgY (Sun et al.,2011; Gambon-Deza et al., 2012). In Anole, the Vκ and Vλ chains have been described andthe absence of the IGK locus in snakes (Wei et al., 2009; Gambon-Deza et al., 2009) has beenconfirmed. Testudines have a more complex IGH locus structure (Li et al., 2012; Xu et al., 2009;Magadan-Mompo et al., 2013), having one IgM gene, one IgD gene encoding eleven constantdomains, and multiple genes for IgY and IgD2. Previous studies of the IGH locus in crocodiles

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

have described three genes for IgM, one gene for IgD encoding seven viable constant domains,three IgA genes, three IgY genes that are similar to IgY in birds, and two IgD2 genes, whichencode chimeric Igs with IgA and IgD related constant domains (Magadan-Mompo et al., 2013).These studies have also detected in crocodiles that the IgA heavy chain is similar to the IgX andIgA of amphibians and birds, respectively.

In contrast to the growing literature related to Igs in reptiles, there are fewer studies of TCRs inthese species. In this article, we uncover the genomic V-gene repertoire of both immunologic re-ceptors in species representative of the three main orders of extant Reptilia: Squamata, Testudines,and Crocodylia, and show relations that provide a deeper understanding of the evolution of thislarge vertebrate lineage (Modesto & Anderson, 2004; Benton, 2005).

2. Methods

For these studies, we used genome data from whole genome shotgun (WGS) assemblies thatare publicly available at the NCBI and provided in FASTA format in the form of contigs. Un-til now, complete V-gene annotations have only been available in two species (i.e., human andmouse), while incomplete annotations have existed in a few species, and even in these cases, onlywithin certain loci. We used our VgenExtractor software tool (Olivieri et al., 2013) to obtain theV-gene sequences from genome files. Our software algorithm searches and extracts the genomesequences of functional V-genes based upon known motifs. In particular, the algorithm scans largegenome files and extracts candidate exon sequences that fulfill specific criteria: they are flanked byrecombination signal sequences, they have a valid reading frame without stop codon, the lengthis at least 280 bp long, and they contain two canonical cysteines and a tryptophan at specificpositions.

From the set of translated candidate exons produced by the VgenExtractor tool, a fraction ofthese sequences could pass the initial filter, purely by chance, but have structures that are quitedifferent from a functional V-gene. Such sequences are easily discarded by subjecting all the can-didate sequences to a BLASTP comparison against a V-gene consensus with a modest threshold(e.g., evalue = 10−15). This operation produces a set of exons possessing the structural require-ments for being functional V-genes.

Once viable V-genes are obtained we need to classify them into their respective loci. Thecorresponding amino acid sequences were analyzed with the suite of tools provided by Galaxy(Goecks et al., 2010). The analysis workflow of the amino acid sequences consist of a multipleBLASTP alignment against a set of consensus sequences, each representing one of the seven mainIg and TCR V-gene loci. Each candidate V-gene sequence is assigned to the locus having thehighest score with respect to the corresponding consensus.

We developed another independent algorithm for classifying V-genes. In this method, givena V-gene sequence, we determine to which locus this sequence most likely belongs by obtaininga heuristic score calculated from the results of a BLASTP comparison against the NR proteindatabase. The score is a weighted combination of information from sequence similarity results(or hits). In particular, the score is constructed from the following: (a) the relevance and contentwords from the protein description, (b) the similarity score, c) and the sequence coverage. A scoreis calculated for each locus, and the V-gene sequence is assigned to the most likely locus. This

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

method has been validated from known annotations in the human and mouse (Olivieri et al., 2013).This technique is used to show the consistency of results obtained with the the simpler workflowdescribed above and can be used to resolve ambiguous loci predictions of the simpler method,which may arise in rare cases.

To compare the repertoire of V-genes in extant species, we carried out single and multi-speciesphylogenetic analysis. The sequences for this study are available at a new public database reposi-tory vgenerepertoire.org (Olivieri & Gambon-Deza, 2014). At present, approximately 20,000V-gene sequences from 82 mammals and 12 reptiles are indexed. For both the single and multi-species phylogenetic analysis, sequences were aligned using Clustal-omega (Sievers & Higgins,2014) and trees were constructed with FastTree (Price et al., 2010) (using maximum likelihoodand WAG matrix). We used MEGA5 (Tamura et al., 2011) for visualization.

In Crocodylidae we used the NCBI accession numbers for the following species: Americanalligator (Alligator mississippiensis; AKHW000000001), Chinese alligator (Alligator sinensis;AVPB0000000001), while the following species used assemblies found in www.crocgenomes.org: saltwater crocodile (Crocodylus porosus; croc sub2.assembly), and gharial (Gavialis gangeti-cus; PRJNA172383). In Testudines we used the NCBI accession numbers for the followingspecies: spiny softshell turtle (Apalone spinifera; APJP00000000.1), green sea turtle (Cheloniamydas; AJIM00000000.1), Chinese soft-shelled turtle (Pelodiscus sinensis; AGCU00000000.1),painted turtle (Chrysemys picta bellii; AHGY00000000.1). In Squamata we used the NCBI ac-cession numbers for the following species: green anole (Anolis carolinensis; AAWZ00000000.2),Burmese python (Python bivittatus; AEQU000000000.2), the king cobra (Ophiophagus hannah;AZIM00000000.1), and data from the Boa Assemblathon 2 (Bradnam & et.al., 2013) (Boa con-strictor; scaffold 6C file).

3. Results

From the WGS data obtained from the NCBI repositories of twelve Reptile species, we usedthe VgenExtractor tool to obtain functional V-genes and classify them with respect to the locusto which they belong. Table 1 presents a summary of all the functional V-genes encountered inthe analysis. For many of these species, it is the first time that Ig and TCR V-genes have beendescribed, and it completes the sparse data presently available with respect to these genes.

3.1. SquamataWithin the Squamata order, we studied the genomes of one Iguanidae (i.e., Anolis carolinen-

sis), and three snakes (i.e., Python bivittatus, Ophiophagus hannah and Boa constrictor). Thephylogenetic trees of the V-genes for these species are shown in Figure 1.

Within the antibody V genes of the Anole, we did not detect the Vκ genes, contradictingprevious studies (Alfoldi, 2011; Wei et al., 2009). A further analysis showed that we did notdetect these Vκ sequences because they lack tryptophan at position 41 (Figure 2). This resultis surprising, since the presence and position of this amino acid is a general characteristic of allfunctional V domains described so far, implying a structural alteration in this chain. The snakespecies, which arose later in the evolution of this lineage, no longer possess the κ chain genes.A possible explanation for this loss in snakes may be due to this structural modification of the κ

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

60

8793

7792

908796

81

95

9779 99

8095

29

86

7659

66

708799

88

91

82

95

85

100

99

84

9894

100

9095

8793

80

90

100

0

79

78

90

83

0

8198

11

96121672

99

3785

801478890

87

94

76

75 83 92

47 91

54

72

0

96

9964

7563

96

8070

858399

99

95

9691

87

87

95

90

73

6

94

0

69

79

75

88

99

89

179492

99

18

75 86 86

98

83

9441

95

8097

2185

8488

96

0.0

0.5

1.0

1.5

2.0

77

89

8370

93

9695

3829

66

8169

92

91

90

79

78

56

86

95

66

23

85

37

76

90

83

9067

83

42

77

50

20

49

81

95

92

77

10

61

9867

92

22

97

94

88

94 99

86

70

41

8880

92

31

84

88

0.0

0.5

1.0

1.5

2.0

44

29 98

2543

75

95

93

9519

7082

90

85

52

299630

94

9288

91

29

87

88

72

84

59

53

84

86

62

0

93

92

87

1696

46

7334

26

98

91

49

83

82

78

81

95

73

74 94 97

92

77 83

85

10

7546

8740

48

94

90

99

87

92

99

73

94

88

9957

3698

95

90

949085

86

82 99

91 69

47

85

9631

84

88

0.0

0.5

1.0

1.5

2.0

93

100

7587

96

89

80

6574

95

89

96

95

93

92

9466

7164

89

88

92

88

91

93

75

80

7686

88

66

48

77

76

76

94

2764

70

51

84

85

86

78

87

84

6886

91 82

92

89

79

3193

9081 0

95

58

95

92

96

78

90

8270

74

9899

67

70

92

22

94

93

88

5198 90

98

75

86

78

87

98

0

99

7987

92

31

96

9741

84

88

9394

5796

0.0

0.5

1.0

1.5

2.0

IGHVIGKVIGLVTRAVTRBVTRDVTRGV

Squamata

Anolis Carolinensis

Boa constrictor Ophiophagus hannah

Python bivittatus

Figure 1: V-gene repertoire in Squamata. Trees were constructed with the amino acid sequences obtained from theVgenExtractor tool. These sequences were aligned with Clustal-omega. For constructing the phylogenetic trees, amaximum likelihood algorithm with the WAG matrix and 500 bootstrap replicates were realized for validation. Root-ing was performed at the midpoint and the linearization provided by Mega5 was applied to improve the visualizationof the trees. For each sequence, two independent classification techniques (view Material and Methods) were usedto determine its respective locus, thereby confirming that clades of the tree correspond to independent loci of Ig andTCR.

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

Table 1: Distribution of V-genes across the main Ig and TCR loci.

Order/specie ighv igkv iglv trav trbv trgv trdv all

SquamataA. carolinensis 89 N/D 30 23 7 0 0 149P. bivittatus 71 0 34 27 2 0 0 134B. constrictor 54 0 17 13 1 0 0 85O. hannah 77 0 28 25 1 0 0 131TestudinesC. picta bellii 342 109 83 113 19 25 11 702P. sinensis 117 87 45 60 16 18 10 353A. spinifera 58 78 17 19 14 12 2 200C. mydas 57 49 25 40 13 16 7 207CrocodyliaA. mississippiensis 74 19 40 77 26 13 9 259A. sinensis 112 28 60 77 29 15 12 334C. porosus 46 17 22 29 19 16 3 152G. gangeticus 103 20 24 55 21 15 7 245

sequence found in older Squamata. For the case of the IGH locus in the Anole, the results agreewith those we described previously by an independent method (Gambon-Deza et al., 2009).

With respect to TCR genes, our analysis showed that the Anole possesses a low number of Vgenes in the TRB locus. Also, we found that V genes corresponding to the γ and δ chains (TRGVand TRDV genes) are absent. Because of this anomalous finding, we searched the Anole genomefor genes that encode the constant (C) regions of the TCR γ and δ chains, with negative results. Wealso analyzed transcriptome sequences for this species obtained from the liver and kidney (avail-able from the NCBI public repository: Anole liver: SRA064777, trace SRR648821, Anole kidney:SRA059267, trace SRR579557). While there were many Ig and TCR α/βmRNA sequences foundin these organs, we obtained negative results when searching for TCR γ/δ sequences. It is pos-

: : * : : : * * : * : : * * * * . : * : . * : : : : : : . : : * * * : : * * * : * * * * * * * : * * : * . * . . * : * * * : *

1 1032 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100

V1

V2

V3

V4

V5

V6

V7

V8

V9

V10

V11

V12

V13

V14

V15

T S G D I V V T Q T P S S L Q K N P G D R V D V Q C K T S S S V T S S S Y S Y M N F Y Q L L P G Q K P K L L I Y Y A T N R F E G M P D R F S G R G S G T D F T F T I N G V R P E D E G E Y Y C G Q S Y D S P LS S G N - V I T Q T P A S L Q K N P G D R V E I Q C K T S S S V T S S Y T N F I N F Y Q I L P G Q K P K L L I Y Y A T R R F E G T P D R F S G R G S G T D F T F T I D G V R P K D E G E Y Y C G Q V N N S P LS S G D I V V T Q T P S S L Q K N P G D R V D V Q C N T S S S V T G S S Y S Y M N F Y Q L R P G Q K P K L L I Y Y A T N R F E G T P D R F S G R G S G T D F T F T I N V V R P E D K G E Y Y C G Q G Y S H P LS S G E I V V T Q T P A S L Q K N P G D R V D I Q C K G S S S L D G - - - - Y I N F F Q L I P G Q K P K L L I Y Y A T N R F Q G T P H R F S G Q G S G T D Y T F S I N G V R P E D E G E Y Y C G H A I S Y P LS S G D I V V T Q T P A S L Q K N P G D R V D I Q C K T S S T V S - - - - T Y M N F Y K L I P G Q K P K L L I Y Y V T S R F E G T P D R F S G S G S G T D F T F T I S V V C P D D E G E Y Y C G Q G Y D Y P LS T G Q I V I T Q T P A L I Q A N P G D K V I F R C Q T S S S M G S - - - - Y M A L Y R L K D N Q K P K L L I Y D V T N R F E G T P Y R F S G Q G S G T D F T F T I N G V Q S E D E G E Y Y C C Q R Q S Y P LS S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S M S D D - - - - M A L Y Q F I P G K N P K L L L Y D S S S R F T G T P D R F S G S Y S G T D F T F T I S G V R P E D E R E Y Y C G Q H Y S Y P LS S G Q I V I T Q T P A S L H V N P G D R V I I Q C K A P S S M S N D - - - - M A L Y Q F I P G K N P K L L L Y E S S T R F T G T P D R F S G S Y S G T D F T F T I S G V H P E D E A K Y Y C G Q Y Y S R P LS S G Q I V I T Q A P A S L H V N P G D R V S I Q C K T S S S I S N D - - - - M H L Y Q F I P G Q N P K L L I Y Y A T G L F E G T P D R F S G S Y S G T D F T F T I S G V R P E D E A E Y Y C G Q S N S R P LS S G E I L I T Q T P V S L H V N P R D R V T L Q C K A S S S M S D D - - - - M A L Y Q F I P G K N P I L L L Y E S S T C F T G S P N H F S G S Y S G T D F T F T I S G V C P E D E G E Y Y C G Q Y Y N Q I LS S G Q I V V T Q S P A S L H V N P G D T V T I K C K A S S S I S N E - - - - M D L Y Q F I P G Q K P K L L I Y H A T Y R F E G T P D R F S G S Y S G T D F T F T I S G V R P E D E A E Y Y C G Q D Y S T P LS S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S M S D D - - - - M A L I Q F I L G Q K P K L L I H D G D T R F E G T P D R F S G S Y S G T D F T F T I S G V R P E D A A E Y Y C G Q D Y T T P LS S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S I S N Y - - - - M H L Y Q F I P G Q K P K L L I Y R A T S R F E G T P D R F S G S Y S G T D F T F T I S G V R P E D A A E Y Y C G Q G Y S T P LS S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S M S N D - - - - M A L Y Q F I L G Q K P K L L I H D A T S R F T G T P D H F S G S Y S G T D F T F T I S G V R P E D A A E Y Y C G Q G Y S T P LS S G Q I V I T Q T P A S L H V N P G D T V T I R C K A S S S M S N D - - - - M Q L Y Q F I P G Q N P K L L I Y D A T S R F T G T P D R F S G S Y S G T D F T F T I N G V R P E D E G E Y Y C G Q D Y S R P L

CDR1 CDR2W absence

Figure 2: Alignment of amino acid sequences deduced from V-genes obtained manually in the genome of A. car-olinensis. As can be seen, the amino acid tryptophan (W) does not occupy the canonical position 41, representingan anomaly as compared with other functional V-genes. We produced the alignment and graphical output with theSeaview software program (Gouy et al., 2010).

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

Anole record Anole record Anole record Anole record Anole record

Anole record Anole record Anole record Anole record Anole record

610 kb

140 KbTRAV

IGH

IGHV IGHC

Figure 3: Recent duplications in IGHV (top) and TRAV (bottom) loci in the Anole. Track duplications tracks aredrawn in green. Constraints for duplications in: (a) the IGH locus are: segments ≥ 1Kb, identity ≥ 90%; (b) the TRAlocus are: segments ≤ 0.3Kb; identity ≥ 90%. Duplications only exist in segments where V-genes are located (not inconstant regions), and in the case of the IGH locus, some coincide with the position of V-genes, indicated by the redlines.

sible that cells having TCRγ/δ may be abundant in other specialized tissues where transcriptomedata is lacking. Nonetheless, our studies, carried out with data from different tissue and differentsequencing methods, produced null results for searches of V-genes or constant gene sequences forTCR γ and δ chains. Thus, these results strongly suggest that these species have lost the TCRγ/δreceptor.

In the three serpents studied, there are few V-genes (Table 1). Most of these sequences fromthese species belong to the IGH and IGL loci (Figure 1). In the three snakes, we found that theIGK locus does not exist. From a phylogenetic analysis, we found that V sequences from theIGH locus belong to a single group or clan that has undergone a recent expansion. This result isdifferent from that found in A. carolinensis, where many V-gene clans are found in the IGH locus.In the case of the IGL locus, the number of V-genes in Serpentes is similar to that in the Anole.Finally, just as with A. carolinensis, no V-gene sequences corresponding to the TRG and TRD lociwere found and only one or two V genes were identified in the snake TRB locus.

From studies we carried out previously in mammals, we found a higher birth and death rateamongst the V-genes of the Ig loci as compared to that in the TCR loci. Trees of V-gene sequencesin reptiles have the same structure, suggesting that the same mechanisms exist. Figure 3 showsrecent duplications that occurred within the IGH and TRA loci. These loci are contained withinchromosome segments of length 610 Kb and 140 Kb, respectively. In this figure, track duplicationsare drawn in green and represent those sequences having a similarity greater than 90 percent forsegment sizes > 1Kb. As can be seen, there are many duplications in the IGH locus and theseduplication processes occurs specifically in the IGHV region and not in the constant (i.e., IGHC)region. By applying the same similarity criteria, few duplication tracks are visible in the TRAVregion, and those that are visible have smaller segment sizes: ≤ 0.3 Kb. Finally, we can observethat some duplications coincide with V-genes (indicated with red lines) in the IGHV region, whileno V-genes coincide with the duplication tracks shown in the TRAV region.

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

78

95

86

45

100

40

8888

7185

85 8581

100

99

81

82

100

93 978688

94

98

419985

768799 9387

968645 64 94

2789

7499

75 088

9392

83

55

83

84

98

84 7991

6

71

75

7386

50

209978 98

73

677786

91

88

7784

797799

81

86

41

78

8958

96

8871

979979

8140

72

7860

3698

5490

9288

74

829030

8878

2994

089

9089

84

81

92

85

41

98762473

83

5

29

97

83

2569 8246

66

90

93 888577

08092

7887

6955

86

87

7290 73

7995

5761

89777712879675

92

749581

897876

72

78

87

4090

85

9978

83

95

64

9445

99

99

81

93

899197

50907076

95

91

77

83

96

33

86

79

97

6391

79100

668664

984890

85

85

100

95

77

7282

71953142

83

6599

73808077

75

90

78

9559749193

37

36388799

71

92

8362

85

7897

9078

768996

82

89

88

529983

73

5898 8590

79

86

14

089

99

0

93

84

73

80

75

89

9993

917482

74

82

88

45

98

97

71

100

53

87

8576

97

95

98

21

100

65

72 96 84 73

8387

979594

85

83 90 9175

87 95

94

73

97 91

35 76 99

41

6898

778289 98

100

95 8156

75

72 52 67

26

92

99 3191

73

5194 98

84

98

19

81999138

82

8790 86

90

75

7885100100

92100

7598

9798

067

073 95

7946

86

87

82

9384

9088

74

92

79

31

5780

886879

64

7789

738570

74

87

9376

9488

517764

88

75

98

9697

76

80

77

79

9052

84

37

100

7886

84

7567

78

93

7579847977

77

9575

8391

73

7176

99

90

7699

769990

75

94

78

89

69

918177

91

76

90

75

94

87

97

21

77

87

72

88

9299

91

14

0

817097

88

87927595

81

91

99

801006274837696

69

70

10094

82

0

87889786

100

81998687

84

8458

84

91

75

576

89

7987

7671

96727494

9699

9299

90

81

95

97

98

91

94

83

0.0

0.5

1.0

1.5

2.0

76

99 66

82

97

98

84

6

98

75 95

14 58

937581

9675

100

7026

69

93

8987

8579

90

30

61

33

9189

31

89

72

97

83

99

96

8744

269

690

83

819377

89

5

49

9886

809889

85100

88

71

85

85

81

82

93

628891

89

56

99

9991

897890

93

29

9394

89

78

87

99

38

63

8682

9598

0

79

998574

40

90

9936

69

8780

91

79

70

79

99

857780

818821

89

8877

75

95

01881

93

90

87

81

7879

4589

89

73

7871

83

82

091

4198

91

0

94

7178

89

82

7498 95

0

85 99

64

54

42

92

90

85

859180 3033

86

86

83100

75

2085

77

72

7487

78 85 88

45

33

960

46

99

959590

0

98 8647 72 89

0

99

98

99 76

71

940

9985

96

89

90

81

61

7789

93

89

18

92

99

75

7459

98

2665

92

98

9910098

85

98

4897

80

98

7575

77

87

91

83

67

63

60

99

67728798

34

8485

78

73446

0 97

947763

95100

75

94

78

92

98

91

94

83

29

0,0

0,5

1,0

1,5

2,0

Apalone spinifera

IGKVIGLVTRAVTRBVTRDVTRGV

IGHV92

6 85

66

8969

83

77

99

99

67

88

14

78

88

44

93

97

7486

99

47963485

100

57

98

70

80

85

9178

87

80

7492

0

42

91

92

84

83

58

84

74

64

95

83

94

8476

23

41

87

96

6

9297

50

72

368577

89

75

86

75

6899776

35

89

83

90

74

76

70

25

81

70

79

98

100

7876

85

81 62

93

76 88

92

86

90

64

0 95

367781

87

83

85

56

74

88

96

78

80

84 76

82

90

93

26

88

85

9496

92

94

84

81

97

97

86

81

85

3

95

89

96

60

91

51

8894

99

91

92

84

93

98

0

1686 52 0

89

79

95

87

28

0.0

0.5

1.0

1.5

84

83

100

94

67

41

87

96

99

65

6

61

91

72

81

81

68

75

9763

90

93

74

89

73

94

73

8928

176873

97

89

78

60

94

67

85

9174

85

86

62

74

47

25

99

63

94

18

98

87

100

7179

88

89

7748

9392

90

75

89

85

86

7675

75

93

88

96

807776

90

93

17

63

26

8712

99

80

84

84

91

84

89

92

84

7796

93 25

67

73

77 85

82

98

70

80

80

85

99

36

78 49

91

82

76

81

95

94

76

97

8159

74

37

96

45

98

91

92

93

14

78

79

99

70

1483

74

8436

77

75

88

8875

89

73

83

79

95

87

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Testudines

Chelonia mydas

Chrysemys picta bellii

Pelodiscus sinensis

Figure 4: V-genes in Testudines. The trees were constructed in the same way as described in Figure 1.

3.2. Testudines and CrocodyliaIn the differentiation of the reptile lineage to birds, the Testudine and Crocodylia branches

emerged. Four Testudine genomes are available: those of three land tortoises, Chrysemys pictabellii, Pelodiscus sinensis and Apalone spinifera, and one marine turtle, Chelonia mydas. Figure4 shows the phylogenetic analysis of the V genes sequences from these species. For the westernpainted turtle, C. picta bellii, we found more than 700 V-genes, being the specie with the mostV-genes that we have studied to date. By comparison, the marine turtle has ≈ 1/3 as many genes(Table 1). In all four turtles, there is a preponderance of Ig V-genes as compared with the numberof TCR V-genes. In the three Ig loci, there is an expansion of multiple V-gene clans, similarto what happens in the Anole, suggesting that the structure of these genes is more diverse thanthat found in mammals (Gambon-Deza et al., 2009). Within the Ig loci, there are more Ig heavychain genes than those for light chains; the λ light chain has the least Ig V-genes. With respectto TCR V-genes, the TRA locus have the largest set, while the TRB locus has on average muchless genes. Such a low amount of Vβ genes is observed in all reptiles, opening the possibility thatan underlying process exists. In contrast to the Squamata, each of the turtle species we studiedpossess TCR γ and δ V-genes, albeit with a wide variance amongst them. From the phylogeneticstudy of these V-genes, the TRGV genes are included in one specific clan, while the TRDV genesare within the clan of the TRAV genes (Figure 4).

In the order Crocodylia, there are four publicly available genomes (Alligator mississippiensis,Alligator sinensis Crocodylus porosus, and Gavialis gangeticus). In each species, we found V-

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

57

0

968589100 92

83

7993 97

87

68

79

93

619693

869569

92

83

937769

67

98

5689

7510099

81

9997

87

4099

8239

66

099

9888

7998

79

80

80

99

85

91

68

80

41948499

54

8991

15

58

0

95

89

92

84

49

5693

97

7488

9884

0

96

82

96

81

96

97

88

79

33

8977

76

99

78

0

88

81

73

99

75

96

82

85

0

99

94

95

8864

77

7485

86

90

94 8565

25

86

8432

50 909285

7784 97

93

63

26

81

96

18100

8231

9566

88

1994

87

97

83

2991

3079 34

74

85

3

76

56

49

74

76

7685

91

85

20

96

69

84

9984

92

9880 35

80

9989

82 71

7883

79

95

87

98

28

0.0

0.5

1.0

1.5

9684

80

7897

87 96

75

9672100

8697

95

6992

6385

99

67

98

98

0

45

81

35 88

60

99

99

99

98

9788

79

9880

85 80 88

8290

9954

89

91

97

8992

84

49

56

97

74

88

98

84

0

96

82

96

9381

86

96

88

2179

97

7785

76

7571

80

96

82

74

91

88

6476

84

34

90

85

90

84

93

94

437379

88

7492

32

72

90

7

97

9363

26

12 92 95

100

81

96

100 75

97

66

30

7326

8197 87

072

8

70

83

83

75 88

94

97

8694

79

9334

74

85

3

78778367

49

74

76

7879

9371

40

38

92

680

91

84

86

20

96

94

9875

84

97

91

9398

100 3552

85

7679

82

9344

78

0

83

83

79

76

95

87

98

28

0.0

0.5

1.0

1.5

100

57

0

81

100

9436

93

87

68

79

93

43

618193

86

9569

92

9577

99

67

89

45

99

81

0

99 97

81

40

84660

99

98

7588

9979

98

79

80

99

85 751002591

68

41948454

89

91

100

9280

9589

92

84

0

49

56

6997

85

74

88

98

84

0

96

82

96

8196

93

5

99

78

98

92

80

86

0

85

76

938019

80

7781

9573

620

75

96

82

88

992175

94

88

64

9498

8466

76

77

90

989686

90

55

939194

75 94

8592

849850

72

90

7785

38

89

7 84

97

93

26

9592

95

779599

96 81

96100

7582

9985

9566

89

80

9326

74

87

83

95

8766

91

8385

993

74

79

34

3

33

717649

76

84

78

87

8576

79

43

8724

91

85

20

96

8980

75

69

81

80

099

92

0

9882

80

99

8397

7883

79

95

87

98

28

0.0

0.5

1.0

1.5

7985

0

80

9681100

866992

67

092

80

8290

5489

091

9758

8992

84

49

56

93

12

97

098

84

0

96

82

96

88

76

88

9185

76

96

82

56

9994

64

89

7585

90

96

86

90

7826

72

34

79

88 7492

863277

72

90

97

93

26

75

95

88

97

8287 83

88

85

52

93

74

85

3

76

98

8091

20

96

93 84

9114

85 44 780

839079

76

95

8798

28

0.0

0.5

1.0

1.5

IGKV

IGLV

TRAV

TRBV

TRDV

TRGV

IGHV

Crocodylia

Alligator mississippensis

Gavialis gangeticus

Alligator sinensis

Crocodylus porosus

Figure 5: V-genes in Crocodylia. The trees were constructed in the same way as described in Figure 1.

regions of the three main Ig loci and four TCR loci (Table 1). There are more V-genes for Igthan for TCR. Amongst the Ig V-genes, the majority of them are within the IGH locus. All fourCrocodylia species have higher numbers of V-genes in the IGL locus than in the IGK one. Figure5 shows the phylogenetic trees of the V-genes for each of these species.

3.3. Clans in V-gene sequencesWe studied the phylogenetic relationships of V-genes from the 82 mammal and 12 reptile

species available in Vgenerepertoire.org (Olivieri & Gambon-Deza, 2014). From the largeset of immunoglobulin V gene sequences in this study, there is evidence for distinct groups orclans (Lefranc & Lefranc, 2001) of related sequences from different species amongst the clades.Previous studies have identified such clans in human and mice; the IMGT (Lefranc et al., 2009;Giudicelli & Lefranc, 2012) has annotated three clans amongst the VH genes, three in Vκ, and atleast five clans in Vλ (Lefranc & Lefranc, 2001; Giudicelli & Lefranc, 1999).

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

Figure 6 shows multi-species trees obtained from alignments of the V-genes from the Ig loci.Since the purpose of these trees was to explore the clan structure, we used all available sequencesof reptiles and mammals instead of trying to retain the same number of species from each class.In these plots, the sequences corresponding to mammals are represented in blue, while those forreptiles are drawn in red. The nodes having sequences which belong to clans annotated by theIMGT are labelled in the trees. As seen in the trees of Figure 6, many sequences do not belongto a previously defined clan. Since the clan studies to date have been based upon a limited setof species, the set of identified clans has been necessarily inadequate. It is only now, with theavailability of large number of V-genes sequences from a broad representation of mammal andreptile species (Olivieri & Gambon-Deza, 2014), that it is possible to define new clans.

In the IGHV tree of Figure 6, the V-gene sequences of reptiles are grouped together and thesegroups are interspersed within mammal clans. For example, Clan-II contains sequences fromTestudines, Crocodylia, and Squamata that together form subgroups, and these subgroups areinterposed with several mammalian subgroups comprising many sequences. The tree for the IGLVsequences shows that the reptile Vλ sequences are also grouped in several subgroups and these aredistributed amongst the five mammalian clans previously defined by the IMGT. In the IGK locus,more Vκ clans can be identified in mammals than previously described, so that the present clannomenclature is not sufficient to group all the taxa. All the reptile IGKV genes cluster into a singlegroup forming an independent clan. Also, this clan contains only a small group of mammal V-genesequences, indicating a different evolution in this locus between mammals and reptiles.

V-gene clans have not been previously identified within the TCR loci. The multi-speciestrees of Figure 7, consisting of TCR sequences, demonstrate that the clan concept is also rel-evant amongst these loci. Indeed, in the TRA locus, six clans can be identified, each havingsequences from both reptiles and mammals. As can be seen in the figure, the roots of each of theclans correspond to reptile sequences, indicating the maintenance of these sequences over longerperiods of time than those found in mammals. The clans are numbered according to the amountof taxons in the clan, ranked from largest (Clan I) to smallest (Clan V). The tree also demonstratesthat clans I and II diversified significantly in mammals, while in others, such as clan V, there wasless diversification in mammals than witnessed in reptiles. Also, we can deduce the presence ofadditional clans in reptiles that retain remnants of sequences that were later lost in mammals.

In the tree for the TRB locus in Figure 7, three distinct clans can be identified that groupmammal and reptile taxons. Nonetheless, the majority of mammalian sequences in this locus donot share the same clan with those from reptiles. Few mammal sequences are found within thereptile clans, while those grouped in separate clans appear to have undergone an evolutionarydiversification different from reptiles within this locus.

Trees from the TCR γ/δ have a similar structure. In the TRDV tree, three clans can be identi-fied that contain sequences from mammals, Testudines, and Crocodylia (this locus does not existin Squamata, as described previously). Similar results are seen in the multi-species tree for theTRG locus, but in this case, the majority of sequences from reptiles belong to a single clan.

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

--------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

--------------------------------------------

----

--------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----

--------

----

----

----

----

----

----

----

----

--------

----

--------

----------------

--------------------

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------

------------------------------------------------------------------------------------------------

------------------------------------

------------------------------------------------

------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

0.0

0.2

0.4

0.6

0.8

IGHV

IGKV

IGLV

Test/Croc

Test/Croc/Squa

Test/Croc/Squa

Test/Squa

SquaTest/Croc

Test/Croc/Squa

Clan IV

Clan V

Clan III

Clan II

Clan I

Clan IIIClan II

Clan I

Test/Croc/Squa

Test/Squa

Test/Croc/Squa

Test/Croc/Squa

Test/Croc/Squa

Test/Squa

Test/Croc/Squa

Test/Squa

Clan II

Clan III

ClanI

---------------------------------------------------------------------------

---

---------------

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------------

0.0

0.2

0.4

0.6

0.8

Figure 6: Trees were generated from 11 reptiles and 50 mammals using the V-gene sequences of the three Ig loci. Theclans with reptile taxa are marked in red while those for mammals are indicated in blue. Previous studies (Lefranc& Lefranc, 2001; Giudicelli & Lefranc, 1999) have defined the presence of the clans indicated in the plots. Thealignment of the amino acid sequences was performed with clustal-omega and tree was constructed using FastTreewith the WAG matrix. Plots were constructed using MEGA v5.2 with the linearized branch option. The labels of theReptilia orders correspond to Testudines (Test), Crocodylia (Croc), and Squamata (Squa).

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

0.0

0.5

1.0

1.5

------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------

------------

------------

------------

------------

------------------------------------

0.0

0.5

1.0

1.5

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----

--------

0.0

0.5

1.0

1.5

TRAVTRBV

TRDVTRGV

Squa/Tes/Cro

Squa/Tes/Cro

Tes/Cro

Squa/Tes/Cro

Squa/Tes/Cro

Tes/Cro

Cro

Cro

Tes/CroTes/Cro

Tes/Cro *

*

*

*

*

O

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------

------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------------------------------------

0.0

0.5

1.0

1.5

Squa/Tes/Cro

Squa/Tes/Cro

Clan I

Clan III

Clan II

Clan V

Clan IV

Clan VI Clan I

Clan II

Clan III

Mammal Clans

Squa/Tes/Cro

Figure 7: Trees were generated from 11 reptiles and 50 mammals using the V-gene sequences of the four TCR loci.The clans in reptile taxa are marked in red while those for mammals are in blue. The nodes that root the mammalclans with the reptile clans are marked by (*), while others that are exclusively from mammals or reptiles are markedwith (o). The alignment and tree construction was performed in a similar way to that of Figure 6. The labels of theReptilia orders correspond to Testudines (Test), Crocodylia (Croc), and Squamata (Squa).

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

4. Discussion

How did the genes that code for Igs and TCRs arise and evolve in lineages of reptiles? Itis known that extant reptiles and mammals are separate vertebrate lineages that descended froma common ancestor. These vertebrates must have inherited their antigen recognition machinery,namely Ig and TCR genes, from this common ancestor. These vertebrates evolved on land indifferent habitats, and underwent further speciation. After the divergence from mammals, thereptile lineage split in two. One of these lines led to present day Squamata, while the other gaveway to Testudines and Crocodylia. We found that the Ig V-gene repertoire in reptiles is similarin structure to that found in mammals. Just as in mammals, the data suggests that Ig V-genesunderwent a diversification process with high birth/death rates.

Another striking result is that the Squamata species lost the TCRγ/δ locus. Neither V- norC-gene sequences were found in either the genome or several transcriptome datasets, indicatingthe absence of these genes. In each of the Squamata species we studied, there is a low numberof V-genes in both the TRA and TRB loci, perhaps underscoring a particular characteristic oftheir TCRα/β. This study also revealed several relevant facts with regard to the evolution ofthe Ig loci. From A. carolinensis, we found that the Vκ sequences have an anomalous alterationin the canonical tryptophan, whose presence is a general characteristic of all functional V-genesdescribed so far. This alteration observed in Iguanidae may be related to the loss of this locus in thesuborder of Serpentes. It is known that birds also lost the IGK locus. However, in the evolutionaryline that gave rise to Testudines and Crocodylia, common to that of birds, we did not detect thisloss of tryptophan 41 at the canonical position. It is thus surprising that two evolutionary lines,serpents and birds, that diverged ∼ 275 Mya independently lost the IGK locus.

In the Testudines and Crocodylia lineage, there is no such gene loss as seen in the Squamata.Table 1, shows the V-gene distribution across all the seven major Ig and TCR loci. While theaverage number of TRB V-genes is larger than that found in Squamata, it is still less than theaverage number found in mammals (Olivieri & Gambon-Deza, 2014). For the case of the TRGand TRD loci, the average number of V-genes was similar to that found in mammals.

At the evolutionary level, there are studies that show that the VH genes are grouped into threeclans (Lefranc & Lefranc, 2001; Giudicelli & Lefranc, 1999; Lefranc et al., 2009). These studiesdistributed the sequences in clans from limited information based upon a small number of mammalspecies. In this work, we presented multi-species trees formed from the V-genes of more than 90mammal and reptile species. These trees show that additional clans exist beyond those describedfrom previous annotations.

In a previous analysis of A. carolinensis (Gambon-Deza et al., 2009), we showed that its VHgene sequences cluster into five distinct groups, one of which is quite different from the rest. Thenecessity to define new clans is shown in the reptilian VH and Vκ sequences. In the Vκ genesof Testudines and Crocodylia, there exists a single clan different from those of mammals andwithin which mammals have only a few V-gene sequences (Figure 6). This suggests that reptilesgenerated κ chains from a homogeneous ancestral group, from which mammals either diversifiedvery little or lost most of the variants that were generated. Both in serpents and in birds, the κ chainwas lost, indicating that the Vκ genes expressed in reptiles may be dispensable. In contrast, theIGKV tree of Figure 6 demonstrates the large κ chain diversification that took place in mammals,

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

corresponding to the most common Ig chain in these species, such as the case in human and mouse.The retention of the five clans in the Vλ gene sequences, as described previously in mammals,

is significant because it suggests a positive selection mechanism. The reptile sequences maintainthis distribution of five V-genes clans found in mammals and each is rooted with a reptile clan.The difference in sequences (which may be structural) that conditioned the grouping within clanshas been maintained in both lineages for more than 300 Mya, indicating functional transcendencynot yet understood.

Instead of recognizing antigen directly, TCRs recognize a complex of antigen-derived pep-tide and a major histocompatibility complex (MHC) molecule. Thus, this restriction could beaccounted for by a co-evolution selection between the TCR V-genes with MHC molecules. Atpresent, there are few comprehensive studies exist that characterize the MHC molecules in reptiles(Glaberman et al., 2008; Glaberman & Caccone, 2008; Glaberman et al., 2009).

In a previous publication (Olivieri et al., 2013), we commented on the presence of long branchesin the trees corresponding to the TCR V-gene sequences of mammals. This same phenomena isalso observed in the TCR V-gene trees for reptiles. Due to the ability to study a large number ofV-gene sequences in these loci, we also describe for the first time the presence of well-definedclans in the multi-species trees. In the TRA locus, the reptiles and mammals share common clansand the roots of the clades are from both reptiles and mammals, indicating a common origin. Oneexplanation for this observed phenomena is a structural or functional preservation of products fromthis loci in species that diverged more than 300 Mya.

The TRBV multi-species tree of Figure 7 describes an evolution between reptiles and mammalsthat is quite different from that seen in the TRAV tree. The Vα sequences evolved from commonancestors, as seen by the shared mammal and reptile root nodes. The distribution of the Vαsequences are also evenly distributed amongst the clans. This same homogeneous distributionis not seen in the TRBV tree. In particular, the vast majority of mammal Vβ sequences havediversified into separate clans (i.e., mammal clans). In this way, the Vβ sequences of reptiles groupwithin tree old clusters with little representation from mammals. Thus, contrary to Vα sequences,reptile and mammal Vβ sequences are segregated from each other, in a way reminiscent of Vκsequences. While the common evolutionary features of Vα in reptiles and mammals supports acoevolution hypothesis between TCR and MHC, the evolutionary differences observed in the Vβsuggests this chain may have a different role in reptiles and mammals. Studies of MHC evolutionin reptiles could provide clues that clarify this issue.

1

ReferencesAlfoldi, J. e. a. (2011). The genome of the green anole lizard and a comparative analysis with birds and mammals.

Nature, 477, 587–591. doi:10.1038/nature1039.Benton, M. (2005). Vertebrate Palaeontology. Oxford, UK: Blackwell.Benton, M., & Donoghue, P. (2007). Paleontological evidence to date the tree of life. Mol. Biol. Evol., 24, 26–53.

doi:10.1093/molbev/msl150.Bradnam, K., & et.al. (2013). Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate

species. Gigascience, 2, 10. doi:10.1186/2047-217X-2-10.Cannon, J., Haire, R., Rast, J., & Litman, G. (2004). The phylogenetic origins of the antigen-binding receptors and

somatic diversification mechanisms. Immunological Reviews, 200, 12–22.

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

Charlemagne, J., Fellah, J., Guerra, A., Kerfourn, F., & Partula, S. (1998). T-cell receptors in ectothermic vertebrates.Immunological Reviews, 166, 87–102.

Danilova, N., & Amemiya, C. T. (2009). Going adaptive. Annals of the New York Academy of Sciences, 1168,130–155.

deBraga, M., & Rieppel, O. (1997). Reptile phylogeny and the interrelationships of turtles. Zool. J. Linnean Soc.,120, 281–354.

Flajnik, M., & Kasahara, M. (2010). Origin and evolution of the adaptive immune system: genetic events and selectivepressures. Nat Rev Genet., 11, 47–59. doi:10.1038/nrg2703.

Gambon-Deza, F., Sanchez-Espinel, C., Mirete-Bachiller, S., & S, M.-M. (2012). Snakes antibodies. Dev CompImmunol., 38, 1–9. doi:10.1016/j.dci.2012.03.001.8.

Gambon-Deza, F., Sanchez-Espinel, C., & Magadan-Mompo, S. (2009). The immunoglobulin heavy chain locus inthe reptile anolis carolinensis. Mol Immunol., 46, 1679–87. doi:10.1016/j.molimm.2009.02.019.

Giudicelli, V., & Lefranc, M. (1999). Ontology for immunogenetics: the imgt-ontology. Bioinformatics, 15, 1047–54.Giudicelli, V., & Lefranc, M. (2012). Imgt-ontology 2012. Front Genet., 3, 79. doi:10.3389/fgene.2012.00079.Glaberman, S., & Caccone, A. (2008). Species-specific evolution of class i mhc genes in iguanas (order: Squamata;

subfamily: Iguaninae). Immunogenetics, 60, 371–82. doi:10.1007/s00251-008-0298-y.Glaberman, S., DuPasquier, L., & Caccone, A. (2008). Characterization of a nonclassical class i mhc gene in a

reptile, the galapagos marine iguana (amblyrhynchus cristatus). PLoS One, 3, e2859. doi:10.1371/journal.pone.0002859.

Glaberman, S., Moreno, M., & Caccone, A. (2009). Characterization and evolution of mhc class ii b genes ingalapagos marine iguanas (amblyrhynchus cristatus). Dev Comp Immunol, 33, 939–47. doi:doi:10.1016/j.dci.2009.03.003.

Goecks, J., Nekrutenko, A., Taylor, J., & Galaxy-Team. (2010). Galaxy: a comprehensive approach for supportingaccessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 11, R86.doi:10.1186/gb-2010-11-8-r86.

Gouy, M., Guindon, S., & Olivier, G. (2010). Seaview version 4: a multiplatform graphical user interface for sequencealignment and phylogenetic tree building. Molecular Biology and Evolution, 27, 221–224.

Hedges, S., & Kumar, S. (2009). The Timetree of Life. New York, USA: Oxford Univ. Press.Janes, D., Organ, C., Fujita, M., Shedlock, A., & Edwards, S. (2010). Genome evolution in reptilia, the sister group

of mammals. Annu. Rev. Genom. Human Genet., 11, 239–264. doi:10.1146/annurev-genom-082509-141646.Laurin, M., & Girondot, M. (1999). Embryo retention in sarcopterygians, and the origin of the extra-embryonic

membranes of the amniotic egg. Ann. Sci. Nat. Zool., 20, 99–104.Laurin, M., & Reisz, R. (1995). A reevaluation of early amniote phylogeny. Zoological Journal of the Linnean

Society, 113, 165–223. doi:10.1111/j.1096-3642.1995.tb00932.x.Lefranc, M., Giudicelli, V., Ginestoux, C., Jabado-Michaloud, J., Folch, G., Bellahcene, F., Wu, Y., Gemrot, E.,

Brochet, X., Lane, J., Regnier, L., Ehrenmann, F., Lefranc, G., & Duroux, P. (2009). Imgt, the internationalimmunogenetics information system. Nucleic Acids Res, 37, D1006–12. doi:10.1093/nar/gkn838.

Lefranc, M., & Lefranc, G. (2001). The Immunoglobulin FactsBook. London, UK: Academic Press.Li, L., Wang, T., Sun, Y., Cheng, G., Yang, H., Wei, Z., Wang, P., Hu, X., Ren, L., Meng, Q., Zhang, R., Guo,

Y., Hammarstrom, L., Li, N., & Zhao, Y. (2012). Extensive diversification of igd-, igy-, and truncated igy(δfc)-encoding genes in the red-eared turtle (trachemys scripta elegans). J Immunol., 189, 3995–4004. doi:10.4049/jimmunol.1200188.

Magadan-Mompo, S., Sanchez-Espinel, C., & Gambon-Deza, F. (2013). Igh loci of american alligator and saltwatercrocodile shed light on iga evolution.. Immunogenetics, 65, 531–41. doi:10.1007/s00251-013-0692-y.

Magadan-Mompo, S., Sanchez-Espinel, C., & Gambon-Deza, F. (2013). Immunoglobulin genes of the turtles. Im-munogenetics, 65, 227–37. doi:10.1007/s00251-012-0672-7.

Modesto, S., & Anderson, J. (2004). The phylogenetic definition of reptilia. Systematic Biology, 53, 815–821.doi:10.1080/10635150490503026.

Narciso, J., Uy, I., Cabang, A., Chavez, J., Lorenzo, J., Padilla-Concepcion, G., & Padlan, E. (2011). Analysis of theantibody structure based on high-resolution crystallographic studies. New Biotechnology, 28, 435–47.

Olivieri, D., Faro, J., von Haeften, B., & Sanchez-Espinel, F., Gambon-Deza (2013). An automated algorithm for

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;

extracting functional immunologic v-genes from genomes in jawed vertebrates. Immunogenetics, 65, 691–702.doi:10.1007/s00251-013-0715-8.

Olivieri, D., & Gambon-Deza, F. (2014). Vgenerepertoire.org identifies and stores variable genes of immunoglobulinsand t-cell receptors from the genomes of jawed vertebrates. bioRxiv, . doi:10.1101/002139.

Price, M., Dehal, P., & AP, A. (2010). Fasttree 2–approximately maximum-likelihood trees for large alignments.PLoS One, 5, e9490. doi:10.1371/journal.pone.0009490.

Reisz, R., & Muller, J. (2004). Molecular timescales and the fossil record: a paleontological perspective. TrendsGenet., 20, 237–41;.

Schroeder, H., & Cavacini, L. (2010). Structure and function of immunoglobulins. Journal of Allergy and ClinicalImmunology, 125, S41 – S52.

Sievers, F., & Higgins, D. (2014). Clustal omega, accurate alignment of very large numbers of sequences. MethodsMol Biol., 1079, 105–16. doi:10.1007/978-1-62703-646-7_6.

Sun, Y., Wei, Z., Hammarstrom, L., & Zhao, Y. (2011). The immunoglobulin δ gene in jawed vertebrates: a compar-ative overview. Dev Comp Immunol., 35, 975–81. doi:10.1016/j.dci.2010.12.010.

Sun, Y., Wei, Z., Li, N., & Zhao, Y. (2012). A comparative overview of immunoglobulin genes and the generation oftheir diversity in tetrapods. Developmental & Comparative Immunology, 39, 103–109.

Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., & Kumar, S. (2011). Mega5: molecular evolutionarygenetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. MolecularBiology and Evolution, 28, 2731–2739. doi:10.1093/molbev/msr121.

van Tuinen, M., & Hadly, E. (2004). Error in estimation of rate and time inferred from the early amniote fossil recordand avian molecular clocks. J Mol Evol., 59, 267–76.

Wei, Z., Wu, Q., Ren, L., Hu, X., Guo, Y., Warr, G., Hammarstrom, L., Li, N., & Zhao, Y. (2009). Expression of igm,igd, and igy in a reptile, anolis carolinensis. J. Immunol., 183, 3858–64. doi:10.4049/jimmunol.0803251.

Xu, Z., Wang, G., & Nie, P. (2009). Igm, igd and igy and their expression pattern in the chinese soft-shelled turtlepelodiscus sinensis. Mol Immunol., 46, 2124–32. doi:10.1016/j.molimm.2009.03.028.

not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was. http://dx.doi.org/10.1101/002618doi: bioRxiv preprint first posted online Feb. 12, 2014;