Berbnard Dujon
Institut Pasteur, Paris
Bioinformatics and Genome data Analysis
How Eukaryotic Genomes Evolve : the example of Yeasts
How Eukaryotic Genomes Evolve : the example of Yeasts
MECHANISMS OF DUPLICATIONS
Whole genome duplicationspolyploidization (auto- or allo-)accidental (rare)highly instable (no genome is actually duplicated, except for some plants)
Segmental duplicationsvarious sizes of chromosome segments (several adjacent genes)intra- or inter- chromosomalfrequent (human)chimeric genes (domain accretion)sufficiently stable
Tandem gene repeat formationarrays of paralogsinstable (looping out)rapid divergence
Dispersed (single) gene duplications (retrogenes)transposon-mediatedchimeric genes (domain accretion)
WHOLE GENOME DUPLICATIONS
The genome of Saccharomyces cerevisiae, 1997The genome of Saccharomyces cerevisiae, 1997
YDR200c
YDR201w
YDR202c
YDR204w
YDR205w
YDR206w
YDR207c
YDR208w
YDR209c
YDR210w
YDR211w
YDR212w
YDR213w
YDR214w
YDR215c
YDR216w
YDR217c
YDR218c
YDR219c
YDR220c
YDR221w
YDR222w
YLR225c
YLR226w
YLR227c
YLR228c
YLR229c
YLR231c
YLR233c
YLR234w
YLR236c
YLR237w
YLR238w
Chromosome 4
Chromosome 12
Duplicated chromosomal blocs in S. cerevisiaeDuplicated chromosomal blocs in S. cerevisiae
Chromosome 15
YLR266c
YLR267w
YLR268w
YLR270w
YLR271w
YLR272c
YLR273c
YLR274w
YLR275w
YLR276c
YLR277c
YLR278c
YLR281c
YLR283w
YLR284c
YLR285w
YLR286c
YLR287c
YLR287ca
YLR288c
YLR289w
YLR290c
YLR291c
YLR292c
YLR293c
YLR295c
YLR296w
YLR297w
YLR298c
YLR299w
YLR300w
YOR162c
YOR163w
YOR164c
YOR165w
YOR166c
YOR167c
YOR168w
YOR171c
YOR172w
YOR173w
YOR174w
YOR175c
YOR176w
YOR177c
YOR178c
YOR179c
YOR180c
YOR181w
YOR182c
YOR183w
YOR184w
YOR185c
YOR186w
YOR187w
YOR188w
YOR189wYOR190w
YOR191w
Chromosome 16
YPL145c
YPL144w
YPL143w
YPL141c
YPL140c
YPL139c
YPL138c
YPL137c
YPL135w
YPL134c
YPL133c YPL132w
YPL131w
YPL130w
YPL129w
YPL128c
YPL127c
YPL126w
YPL125w
YPL124w
YPL123c
YPL122c
YPL121c
YPL120w
YPL119c
YOR204w
YOR205c
YOR206w
YOR207c
YOR208w
YOR209c
YOR210w
YOR211c
YOR212w
YOR213c
YOR214c
YOR215c
YOR216c
YOR217w
YOR219c
YOR220w
YOR221c
YOR222w
YOR223w
YOR224c
YOR226c
YOR227w
YOR228c
YOR229w
YOR230w
YOR231w
YOR232w
YOR233w
YOR234cYOR235w
YOR236w
YOR237w
YOR192c
YLR234w
YLR236c
YLR237w
reli
c
reli
c
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
Wolfe and Schields, Nature (1997) 387: 708-713
DUPLICATED BLOCKS IN THE GENOME OF S. cerevisiae
Seoighe and Wolfe, Gene (1999) 238: 253-261
Total number of genes into identified blocks: 3391 (58 % of genome)
Total number of paired genes (paralogs) in blocks: 898 (449 pairs) 26 %
Total number of unpaired genes in blocks: 2493 74 %
60 to 80 ancient duplicated blocks can be identified in the entire yeast genome
DUPLICATED BLOCKS IN THE GENOME OF S. cerevisiae
WHOLE GENOME DUPLICATION FOLLOWED BY MASSIVE (ca. 92%) LOSS OF PARALOGOUS COPIES
Nb of « unique » gene prior to duplication 5800-449 = 5351Nb of paralogous copies lost 5351-449 = 4902Fraction of paralogous copies lost 4902 / 5351 = 91.6 %
1
2
3
4
S. cerevisiae
C. glabrata
K. lactis
D. hansenii
Y. lipolytica
extensive loss of duplicated genes
accidentalgenomeduplication
map dispersion
genome size control
MAT cassettes and centromeres
Charting genome evolution Charting genome evolution
tandem repeat formation mechanism
reductive evolution
segmental duplication mechanism
segmental duplication mechanism
segmental duplication mechanism
segmental duplication mechanism
Overall genome
redundancy
44 %
35%
32 %
51%
42%
SIGNATURES OF A WHOLE GENOME DUPLICATION
COMPARISON OF MAPS BETWEEN A DUPLICATED SPECIES AND NON DUPLICATED SPECIES
e.g. S. cerevisiae and K. lactis Dujon et al. Nature (2004) 430: 35-44
S. cerevisiae and K. waltii Kellis et al. Nature (2004) 428: 617-624
S. cerevisiae and Ashbya gossypii Dietrich et al. (2004) Science 304: 304-307
Tetraodon negroviridis and Homo sapiens Jaillon et al. Nature (2004) 431: 946-957
one to two relationship between intermingled segments
COMPARISON BETWEEN TWO DUPLICATED SPECIES ORIGINATING FROM THE SAME EVENT
e.g. S. cerevisiae and C. glabrata Dujon et al. Nature (2004) 430: 35-44
coincidence between duplicated blocks
Ancient duplicated blocks in each genomeAncient duplicated blocks in each genome
S. cerevisiae C. glabrata
S. cerevisiae C. glabrata
Total nb of duplicated blocksinternal to chromosomes 56 20subtelomeric 21 0
Block size (kb) mean 42 27max. 243 89
Nb of gene pairs /block mean 5.8 3.8max. 15 6
Application of ADHoRe (Vandepoele et al. 2002) (r2 cutoff = 0.8, max gap = 35, min pair = 3)
Coincidence of blocks
38 18 2
Hypothetical ancestor
chromosome of interest (X)
other chromosomes
1 2 3 4 5 76 8 9 10 11 12 13 14 15 16 17 181920
1 2 3 4 5 76 8 9 10 11 12 13 14 15 16 17 181920
Species 2Species 1
Instable intermediate
chromosome A
chromosome B
other chromosomes
other chromosomes
Comparison between species 1 and species 2
genome duplication
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
Kellis et al. Nature (2004) 428: 617-624
MAP OF K. waltii GENOME RELATIVE TO S. cerevisiae
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
RECONSTRUCTION OF S. cerevisiae DUPLICATED BLOCKS RELATIVE TO K. waltii
Kellis et al. Nature (2004) 428: 617-624
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
Jaillon et al. Nature (2004) 431: 946-957
DISTRIBUTION OF IDENTITY BETWEEN PARALOGOUS GENE PAIRS IN FISHES
Tetraodon negroviridis Takifugu rubripes
ancient duplicated pairs ancient duplicated pairs
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
Jaillon et al. Nature (2004) 431: 946-957
MAP OF ANCIENT DUPLICATED PAIRS ON ENTIRE GENOME OF Tetraodon negroviridis
QuickTime™ et un décompresseurTIFF (LZW) sont requis pour visualiser
cette image.
A B C D E F G H I J K L
Ancestral karyotype of bony vertebrates (12 chromosomes)
Amplification of transposoable elements Duplication
FusionsTranslocations and fusions
TetraodonHuman
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 211 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
Jaillon et al. Nature (2004) 431: 946-957
SEGMENTAL DUPLICATIONS
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
Segmental duplications in mammalian genomes:
segments of sequences of ≥ 90 % identity (recent) and ≥ 1 kb in length (weak criterion) or ≥ 5kb in length (more stringent)
interchromosomal
intrachromosomal
Example: rat genome
unassembled sequence reads
Total: 2.9 % of genome (rat)1-2% of genome (mouse)5-6% of genome (human)
Gibbs et al. Nature (2004) 428: 493-521
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
DETAILED MAP OF SEGMENTAL DUPLICATIONS ON HUMAN CHROMOSOME 16
Martin et al. Nature (2004) 432: 988-994
interchromosomal
intrachromosomal
centromere
also deteceted by whole genome shogun
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
Martin et al. Nature (2004) 432: 988-994
DISTRIBUTION OF LENGTHS AND IDENTITIES OF SEGMENTAL DUPLICATIONS
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
Human chromosome 16 Rat genome
Tuzun et al. Genome Res. (2004) 14: 493-506
1
2
3
4
S. cerevisiae
C. glabrata
K. lactis
D. hansenii
Y. lipolytica
extensive loss of duplicated genes
accidentalgenomeduplication
map dispersion
genome size control
MAT cassettes and centromeres
Charting genome evolution Charting genome evolution
tandem repeat formation mechanism
reductive evolution
segmental duplication mechanism
segmental duplication mechanism
segmental duplication mechanism
segmental duplication mechanism
Overall genome
redundancy
44 %
35%
32 %
51%
42%
K. lactis D. hansenii Y. lipolytica
Total nb of duplicated blocksinternal to chromosomes 8 5 2sutelomeric 1 10 0
Block size (kb) mean 9 19 90max. 25 59 148
Nb of gene pairs /block mean 4.3 3.7 4.0max. 11 6 4
K. lactis D. hansenii Y. lipolytica
Sporadic segmental duplications
?
Ancient duplicated blocks in each genomeAncient duplicated blocks in each genome
Spontaneous segmental duplications in the yeast genome:experimental design
Spontaneous segmental duplications in the yeast genome:experimental design
Wild type (two copies of ribosomal protein gene)
RPL20A
RPL20B
Deletion mutantSlow growth (gene dosage effect)
RPL20B
Spontaneous normal growth mutants
mutation rate ≈ 10-9 / generation / cell
?
R. KOSZUL, S. CABURET, B. DUJON, G. FISCHER Eucaryotic genome evolution through the spontaneous duplication of large chromosomal segments EMBO J. (2004) 23, 234-243
Formation of chimeric ORFs at junctionsFormation of chimeric ORFs at junctions
intrachromosomal segmental duplications
YKF1072 in frame fusion between YOR329c (SDC5) and YOR267c chimeric proteinYKF1057 in frame fusion between YOR372c (NDD1) and YOR267c chimeric proteinYKF1223 in frame fusion between YOR336w (KRE5) and YOR227w chimeric protein
YKF1022 out of frame fusion between YOR328w and YOR272w truncated proteinYKF1159 antiparallel fusion between YOR357c and YOR269w truncated proteinYKF1050 fusion between YOR328w and intergene truncated proteinYKF1080 fusion between YOR370c and intergene truncated proteinYKF1124 fusion between intergene and YOR220w truncated protein
YKF1175 fusion between LTRsYKF1095 fusion between intergenic regionsYKF1016 fusion between intergenic regions
interchromosomal segmental duplications
YKF1114 out of frame fusion between YJR090c and YOR267c truncated protein
YKF1085 fusion between LTRsYKF1246 fusion between LTRsYKF1122 fusion between LTRsYKF1027 fusion between intergenic regions
original strain, wild-type fitnessinitial genetic complexity
single gene deletion mutantreduced fitnessinitial genetic complexity
offspring of mutantrestored fitness, compete out its parentincreased genetic complexity (up to 300 genes simultaneously duplicated as a single segment)
1 2
3
spontaneous events
10-9 / generation / cell
TANDEM GENE REPEAT FORMATION
Well known cases of gene tandems: and globin genes
G A Human chromosome 11
Human chromosome 16
approx. Scale (kb)
100
Ancestor of vertebrates1 globine gene + 1 myoglobine gène
Ancestor of chordates: 1 gène
duplication and divergence1 gene + 1 gene
(e.g. Xenopus)
Gene number expansion(mammals, birds)
pseudogenes
birthpostnatalage after fecondation
weeks
% o
f to
tal g
lob
ine
50
40
30
20
10
6 12 18 24 30 36 42 486 12 18 24 30 36
lab pb Dfd Scr Antp
Ubx AbdA AbdB
ANT-C
BX-C
Drosophila genome
thorax abdomen
Drosophila larvae
Mouse embryohead
Mouse or human genomes
1 2 4 5 6 7 9HoxA
HoxB
HoxC
HoxD
3
1 2 4 5 6 7 8 93
4 5 6 8 9
1 4 8 93
Tandem repeat arrays in YeastsTandem repeat arrays in Yeasts
D. Hansenii chromosome K
Similar to S. cerevisiae YHR179w OYE2 NADPH dehydrogenase (old yellow enzyme), isoform 1
pseudogenes pseudogenes
Amino-acid sequence identity between copies: from 82 % to 95 %
total nb of direct total nb of
tandem pairs orientation arrays
S. cerevisiae 61 79% 50
C. glabrata 47 83% 32
K. lactis 36 72 % 33
D. hansenii 329 92 % 247
Y. lipolytica 54 72 % 48
YIL009ca YIL010w YIL011w YDR007w YNL031c YNL030w
Homologous to YIL014w (MNT3) : alpha-1, 3-mannosyltransferases responsible for adding the terminal mannose residues of O-linked oligosaccharides
CAGL0C03828g
CAGL0C03850g
CAGL0C03872g
similar toSACE
CAGL0C03894g
CAGL0C03916g
CAGL0C03938g
CAGL0C03960g
CAGL0C03982g
CAGL0C04004g
CAGL0C04026g
CAGL0C04048g
CAGL0C04092g
CAGL0C04114gCAGL0C04136g
TANDEM REPEATS IN C. glabrata
Homologous to YLR120c , YLR121c or YDR144c, Aspartic preoteases
YOL128c YOL126c YOL125w YOL124c
CAGL0E01683g
CAGL0E01705g
CAGL0E01749g
CAGL0E01771g
CAGL0E01793g
CAGL0E01815g
CAGL0E01837g
CAGL0E01859g
CAGL0E01881gCAGL0E01903g
CAGL0E01925gCAGL0E01727g
A. Thierry and B. Dujon, unpublished
CAGL0C03894g CAGL0C03916g CAGL0C03938g CAGL0C03960g CAGL0C03982g CAGL0C04004g CAGL0C04026g CAGL0C04048g
51% 75% 81% 76% 60% 59% 59%
78%50% 63%
• 20 • 40 • 60 • 80 • 100 • 120 1 ACTCTTATACACCTAGTACCCGATCGCTTCTGTCAACGTCCCCGCTCGGTTACTGTGCATTCCTAACCCCCACAGATACAATGACTACAGCAATACTTCCACAACCACTTATCTCACTTCAGAAA 125 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 4302 ACTCTTATACACCTAGTACCCGATCGCTTCTGTCAACGTCCCCGCTCGGTTACTGTGCATTCCTAACCCCCACAGATACAATGACTACAGCAATACTTCCACAACCACTTATCTCACTTCAGAAA 4426 • 4320 • 4340 • 4360 • 4380 • 4400 • 4420 • 140 • 160 • 180 • 200 • 220 • 240 • 126 TGCTCTCATAACACTTTCCCGCCAGCAATCTCTCACTACCACAACACCCTTCCCATTGTTCCCTCGAGACTCACGCTGGCAGATCGCTTTCGGTAAATCCTTTGTAAACTAACTTTTTCACCAGG 250 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 4427 TGCTCTCATAACACTTTCCCGCCAGCAATCTCTCACTACCACAACACCCTTCCCATTGTTCCCTCGAGACTCACGCTGGCAGATCGCTTTCGGTAAATCCTTTGTAAACTAACTTTTTCACCAGG 4551 • 4440 • 4460 • 4480 • 4500 • 4520 • 4540 • 260 • 280 • 300 • 251 GTCTGCGCTGTTTCTCTGGCAACCTCGAGGACTCCCGTCGACTGGTGATGTGCGATAAAGCTGCCC |||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||| || 4552 GTCTGCGCTGTTTCTCTGGCAACCTCGAGGACTCCCGTCGACTGGTGATGTGAGATAAAGCTGTCC 4560 • 4580 • 4600 • 4620
56% 64%
TANDEM REPEATS IN C. glabrata
A. Thierry and B. Dujon, unpublished
EVOLUTION WITHIN THE ORTHOLOGOUS ALCOHOL DESHYDROGENASE GENE CLUSTER
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
pseudogene
pseudogene
Expansion in the human lineage
Expansion in the chicken lineage
Hillier et al. Nature (2004) 432: 695-716
DISPERSED (SINGLE) GENE DUPLICATIONS
(RETROGENES)
FORMATION OF RETROGENES AND PROCESSED PSEUDOGENES
Gene with intron
5 ’ 3 ’exon 1 exon 2intron
transcription
5 ’ 3 ’
splicing
Messager RNA
P T
Complementary DNA
Action of reverse transcriptase
integration
poly A:T tail
AAAAAAA 3 ’
AAAAAAA 3 ’AAAAAAA 3 ’TTTTTTTT 5 ’
polyadenylation
gag pol
protease, integrasereverse transcriptase
LTR LTR
+1 frameshift
6 kb
Yeast genome Ty elements (transposons of yeast)
Boeke, J. D. et al. 1985. Ty elements transpose through an RNA intermediate. Cell 40, 491-500
variable position in genome between strains, mutagenic
Presence of molecular tagLoss of intron
artificial intron molecular tagartificial promoteur
experiment galactose inductionselection of [his+] mutants (reactivation of promoter-less gene)molecular analysis of integrated transposon
targetP - HIS3 integration
transcription
splicing
reverse transcription
ORIGINAL DISCOVERY OF RETROPOSONS
Single gene duplications in S. cerevisiae
Anecdotal observations:
ACP1 Hansche et al., (1978) Genetics 88, 673-687HIS4 Greer and Fink, (1979) PNAS 76, 4006-4010ADH2 Paquin et al., (1992) Genetics 130, 263-271HXTx Brown et al., (1998) Mol. Biol. Evol. 15, 931-942
10-10 - 10-12 duplication / cell / generation
recent experimental demonstration
Ty-mediated gene duplicationTy-mediated gene duplication
QuickTime™ et un décompresseurTIFF (LZW) sont requis pour visualiser
cette image. GATase: glutamine amidotransferaseCPSase: carbamoylphosphate synthetaseDHOase: dihydro-orotaseATCase: aspartate transcarbamylase
Haploid strain, select [Ura+] prototroph ca. 10-10 event / cell / generation
1- Insertion of Ty1 upstream of ATCase Roelants et al., (1997) Mol. Gen. Genet. 246, 767-773
2- Deletion of the GATase, CPSase mutated region Welcker et al., (2000) Genetics 156, 549-557
(RAD52-dependent)
3- Duplication of the ATCase coding sequence elsewhere in the genome Bach et al., (1995) Yeast 11, 169-177
Schacherer et al. (2004) Genome Res. 14, 1291-1297
QuickTime™ et un décompresseurTIFF (LZW) sont requis pour visualiser
cette image.
Ty-mediated gene duplicationTy-mediated gene duplication
Spontaneous events
QuickTime™ et un décompresseurTIFF (LZW) sont requis pour visualiser
cette image.
Ty overexpressionInterchromosmal events: 16Intrachromosmal events: 4
Interchromosmal events: 3Intrachromosmal events: 1
Schacherer et al. (2004) Genome Res. 14, 1291-1297
QuickTime™ et un décompresseurTIFF (LZW) sont requis pour visualiser
cette image.
Ty-mediated gene duplicationTy-mediated gene duplication
Schacherer et al. (2004) Genome Res. 14, 1291-1297
polyA tailsmicrohomology regions between TyA and URA2
Accidental incorporation of URA mRNA in Ty-VLPReverse transcription of URA3 mRNATemplate switch onto Ty-RNAIntegration of cDNA
Increases polymorphism
Decreases polymorphism
Incr
ease
s re
dund
ancy
Dec
reas
es r
edun
danc
y
DuplicationsGene loss
Sequence divergence
Genetic drift, selection
GENE LOSS
GENE RELICSGENE RELICS
IVtII
SuYBR60c SuYBR061c
SuYDR037w
IItIVSuYDR038c
SuYDR037w
S. uvarum
YBR60c
II
YBR061c
YDR036cIV
YDR038c
YDR037w(KRS1)
S. cerevisiaeRelic of
YDR037w paralog
1000 2000 3000 4000 5000
YBR060c YBR061c
YD
R0
37w
(KR
S1)
Stringency 15/23
. . . .10 . . . .20 . . . .30 . . . .40 . . . .50 . . . .60relic_ydr037w 1:GTGCCACAGCAAGTTAATGTCACGGCAGCTAGTGACGCTATTGCTAGTTTACACCTAGAT: 60YDR037w 1:ATGTCTCAACAAGATAATGTCAAAGCCGCCGCTGAAGGTGTTGCTAACCTACATCTCGAC: 60 . . . .70 . . . .80 . . . .90 . . . 100 . . . 110 . . . 120relic_ydr037w 61:GAGGCCACTGGAGAAATGGTCTCTAAGACAGAGTTGAAGAAGCGTATTAAGGGAATACAA: 120YDR037w 61:GAAGCTACCGGGGAAATGGTCTCCAAGTCTGAATTGAAGAAGCGTATCAAGCAAAGACAA: 120 . . . 130 . . . 140 . . . 150 . . . 160 . . . 170 . . . 180relic_ydr037w 121:ATTGAGGCCAAAAAG.CTGTCAAAAAGACTCTTGCGAAACCAAAACCAGCTTC....GAA: 175YDR037w 121:GTCGAAGCTAAAAAGGCCGCCAAAAAGGCTGCCGCTCAACCAAAACCGGCTTCCAAAAAA: 180 . . . 190 . . . 200 . . . 210 . . . 220 . . . 230 . . . 240relic_ydr037w 176:AAGACTAATTTCCTGGCCGGTTTATAGTCATCTCAATACT........AGATCACAGCAA: 227YDR037w 181:AAAACAGATTTGTTCGCTGACCTGGATCCATCGCAATATTTCGAAACAAGATCTCGCCAA: 240 . . . 250 . . . 260 . . . 270 . . . 280 . . . 290 . . . 300relic_ydr037w 228:ATCCAATTAA.GAAACAGACTCTTGATATAAATTTTTATCCATACAAGTTCCGATTATAT: 286YDR037w 241:ATTCAAGAATTGAGAAAGACTCACGAACCAAATCCATACCCACACAAGTTTCACGTTTCT: 300 . . . 310 . . . 320 . . . 330 . . . 340 . . . 350 . . . 360relic_ydr037w 287:ATATTCAATCCTGAATTTTTGGCCAAGTATGCCCATTC..AAAAAGGCGAAAATTTCCCT: 344YDR037w 301:ATATCCAATCCTGAGTTCTTGGCCAAATATGCGCATTTGAAAAAAGGTGAAACCTTACCT: 360 . . . 370 . . . 380 . . . 390 . . . 400 . . . 410 . . . 420relic_ydr037w 345:TAAGAGAAGTTTCACATTGCTAGGAGAGTTCATGCAGAAAGAGAATCAGCTTAAAAATTG: 404YDR037w 361:GAAGAGAAGGTTTCAATTGCTGGTAGAATTCATGCCAAAAGAGAATCTGGCTCCAAATTG: 420 . . . 430 . . . 440 . . . 450 . . . 460 . . . 470 . . . 480relic_ydr037w 405:AAATTCTACGTTCT...CAATGGTGGTGTTGAGCTCTAAATTATTTTACAATTTCAGGAT: 461YDR037w 421:AAATTCTATGTTCTTCACGGTGATGGTGTTGAAGTTCAATTGATGTCCCAATTGCAGGAC: 480 . . . 490 . . . 500 . . . 510 . . . 520 . . . 530 . . . 540relic_ydr037w 462:TATTACGACGAGAACCCATA..AAAAGGAGCATGACCTTT.AAGGAGGAGTAATAT....: 514YDR037w 481:TACTGCGACCCAGACTCTTACGAAAAGGATCACGACCTTTTGAAAAGGGGTGATATCGTT: 540 . . . 550 . . . 560 . . . 570 . . . 580 . . . 590 . . . 600relic_ydr037w 514:.......................ATATCCACCAAAGAAGACCGGCGGAGATGAGATATAT: 551YDR037w 541:GGTGTCGAGGGTTACGTCGGAAGAACTCAACCAAAGAAAGGTGGTGAAGGTGAAGT.TTC: 599 . . . 610 . . . 620 . . . 630 . . . 640 . . . 650 . . . 660relic_ydr037w 552:TTTTTTCGTTAACAGAGTGCAATT...GACAACTTGTTTGCAC...TTGCCTGCTAACTG: 605YDR037w 600:CGTCTTCGTTAGCAGAGTGCAATTATTGACACCATGTTTGCACATGTTACCTGCCGACCA: 659 . . . 670 . . . 680 . . . 690 . . . 700 . . . 710 . . . 720relic_ydr037w 606:TTTTGGTTTCAAAGATCAAGAAAATAGATA..............................: 635YDR037w 660:CTTTGGTTTCAAAGACCAGGAAACCAGATACAGAAAGCGTTATTTGGATTTGATCATGAA: 719 . . . 730 . . . 740 . . . 750 . . . 760 . . . 770 . . . 780relic_ydr037w 635:...........GAACCCGTTTTATTATTCAAT.TGACATCGCCCGTTATATCAGACGATT: 683YDR037w 720:CAAAGACGCCAGAAACCGTTTTATTACCCGTTCTGAAATTATCCGTTACATCAGAAGATT: 779 . . . 790 . . . 800 . . . 810 . . . 820 . . . 830 . . . 840relic_ydr037w 684:TTTGGATCAAAAAAAGTTTATTGGAGCAGAAGCAATTCTGAAATGAAGGTCCTAATATGA: 743YDR037w 780:TTTGGACCAAAGAAAGTTTATTGAAGTAGAAAC..TCCAATGATGAACGTTATTGC.TGG: 836 . . . 850 . . . 860 . . . 870 . . . 880 . . . 890 . . . 900relic_ydr037w 744:CCCCAATATGAC.ACATAATTCGGAATCTGCCACTTGTGAGTTTTATCAAGCCTATGCGG: 802YDR037w 837:TGGTGCTACCGCTAAGCCATTTATTACCCACCA.TAATGACCTTGAT.ATGGACATGTAC: 894 . . . 910 . . . 920 . . . 930 . . . 940 . . . 950 . . . 960relic_ydr037w 803:ATGTTTGTGACTAGTTGGATATGACTGAATTAATACTTTCAGAAATGGACAAGGAGATAT: 862YDR037w 895:ATGAGAATTGCTCCAGAATTGTTCTTGAAACAAT.TGGTTGTCGGTGGTTTGGATCGTGT: 953
Average sequence identitybetween relic and
gene
= 62 %
(1127 / 1818)
i.e.
One copy of the two ancestral genes has undergone several hundred events of:
nucleotide substitutions
single nucleotide deletions or insertions,
microdeletions.
Distribution of gene relics on the S. cerevisiae genomic mapDistribution of gene relics on the S. cerevisiae genomic map
relics
functional paralogs
(106 identified)
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Species-specific gene lossSpecies-specific gene loss
Lost from C. glabrata
YBR018c Carbohydrate metabolismYBR019c Carbohydrate metabolismYBR020w Carbohydrate metabolismYDR009w Carbohydrate metabolismYIL162w Carbohydrate metabolismYNR071c Carbohydrate metabolismYMR096w Cell cycle and DNA processingYFL059w Cell rescue, defense and virulenceYLL057c Cell rescue, defense and virulenceYNL333w Cell rescue, defense and virulenceYBR296c Homeostasis of cations YLR189c Lipid, fatty-acid and isoprenoid metabolism YGR286c Metabolism of vitamins, cofactors, and prosthetic groupsYIR027c Nitrogen and sulfur metabolism YIR029w Nitrogen and sulfur metabolism YIR032c Nitrogen and sulfur metabolism YPR194c Oligopeptide Transporter YJL212c Pheromone response, mating-type determination, sex-specific proteins YAR071w Phosphase metabolismYBR092c Phosphase metabolismYBR093c Phosphase metabolismYHR215w Phosphase metabolismYDR104c Sporulation and germinationYOR313c Sporulation and germinationYMR283c tRNA modificationYJL100w Unclassified proteinsYMR321c Unclassified proteinsYOR129c Unclassified proteinsYPL273w Unclassified proteins
Lost from Y. lipolyticaYJL094c Cation transporteursYBR238c Cell cycleYDR082w Cell cycleYBR131w Cell rescue, defense and virulenceYIL150c DNA processingYDL200c DNA recombination and DNA repairYPL057c Fungal cell differentiationYCR020c mRNA transcriptionYLR067c Protein synthesisYMR257c Protein synthesisYNL284c Protein synthesisYML111w Proteolytic degradationYMR275c Proteolytic degradationYBL014c rRNA transcriptionYML043c rRNA transcriptionYER132c Sporulation and germinationYGL197w Sporulation and germinationYHR184w Sporulation and germinationYLR139c TranscriptionYBR163w Unclassified proteinsYDR131c Unclassified proteinsYDR367w Unclassified proteinsYEL001c Unclassified proteinsYER004w Unclassified proteinsYER077c Unclassified proteinsYFR013w Unclassified proteinsYGL107c Unclassified proteinsYGR134w Unclassified proteinsYHR029c Unclassified proteinsYJL149w Unclassified proteinsYJR003c Unclassified proteinsYJR003c Unclassified proteinsYJR111c Unclassified proteinsYLL033w Unclassified proteinsYLR320w Unclassified proteinsYNR068c Unclassified proteinsYNR069c Unclassified proteinsYOL017w Unclassified proteinsYOR060c Unclassified proteinsYPL005w Unclassified proteins
Lost from D. hansenii
YFR018c Amino acid metabolismYEL023c Cell growth and morphogenesisYCR014c DNA recombination and DNA repairYJL132w Lipid, fatty-acid and isoprenoid metabolism YBR227c Proteolytic degradationYMR265c Unclassified proteinsYNL187w Unclassified proteinsYPR002w Unclassified proteins
Lost from K. lactisYGL156w Carbohydrate metabolismYML005w Unclassified proteinsYPL207w Unclassified proteinsYPR147c Unclassified proteins
Criterion: a protein family represented in all yeast species but one
reductive evolution
MORE TO THE EVOLUTIONARY DYNAMICS:
HGT and NUMTs
Species Gene name HomologAcc. N°
Species Function
KLLA0C09218g Q8PPU9 Xanthomonas axonopodis Conserved glyoxalase domain protein
KLLA0A02431g Q8EG95 Shewanella oneidensis Hypothetical proteinKLLA0A12089g P21340 Bacillus subtilis negative regulatory protein, acetyll transferase domain
DEHA0B15763g Q8ZIB2 Bacillus cereus protein ydhR precursor
YALI0F04290g Q987V4 Rhizobium loti D-amino peptidaseYALI0F05654g Q8EAT4 Shewanella oneidensis Conserved hypothetical proteinYALI0F31867g Q9I5L7 Pseudomonas aeruginosa Conserved hypothetical protein
YALI0E33011g P45900 Bacillus subtilis Conserved hypothetical protein, adenylate kinase family
K. lactis
D. hansenii
Y. lipolytica
Species-specific genes (in yeasts) with homologs in Bacteria
Possible cases of horizontal gene transfer Possible cases of horizontal gene transfer
Summary of HGT
C. glabrata noneK. lactis 5 genes (including a pair of paralogs)D. hansenii 1 geneY. lipolytica 8 genes (including two pairs of paralogs)
YALI0D21582gQ87HL8 Pseudomonas putida Yee/YedE family protein
YALI0F01408g
YALI0A15400gQ92QU2 Rhizobium meliloti Putative acetyltransferase
YALI0F11605g
KLLA0B00451g Q9JYX0 Neisseria meningitidis Alcohol dehydrogenaseKLLA0D19949g
QuickTime™ et undécompresseur TIFF (LZW)
sont requis pour visionner cette image.
NUMTS in the genome of S. cerevisiae
Ricchetti et al., Nature (1999) 402: 96-100
expression plasmid (pPEX7)
I-Sce I GENEPgal
I-Sce Isite
URA3TSH1 > < TSH2
I-Sce Isite
artificial cassette
Normal yeast chromosome
telomeretelomere centromere
I-Sce I I-Sce I> <
Engineered yeast chromosome
Broken yeast chromosome
?
ca. 99 % of cases: cell arrest, no colony
ca. 1 % of cases: repair, loss of cassette, [ura-] colonies
I-Sce I endonucleas
e
Ricchetti et al., (1999) Nature 402, 96-100
Transfer of mitochondrial DNA to the nucleusexperimental design
Transfer of mitochondrial DNA to the nucleusexperimental design
Experimental designExperimental design
Repaired yeast chromosome
[ura-] colonies?
PCR amplifications
Chromosome repaired by non-homologous end-joining
short PCR fragments
Chromosome repaired with insertion of novel DNA fragment
long PCR fragments
sequenced?
ccaagagataaaattgtacaagaagttataagaataatttta
ctattactttaatattttaaataactaatttagatcaatctaaaaaatctaagtgtttagatgataataaagaatatttattaaagtatt
gaaccccgaaaggaggaataagataaatatatagCAGGGTAAT
tatttatatttatatttc
(T)
(AT)
ATTACCCTGTTAatgattttaaaacaataattttgttttaagtattaataataatattaatattcgacctcttaattgaggatattataatcataattttttgatacaatttttgataaaaagAACAGGGTAAT
34-II-89
(A)
ATTACCCTGTTATattattattttttattattaataataataatttatagggtttattctgttttatcataaatacgtaaatatctaacttagctctcaaattatattacTAACAGGGTAAT
34pAT9
(ATAA)
ATTACCCTGTTATctttattatatttaagaatattattataattattattattattattatttttaataattaaaaatattaataataagtaaatattaattattgttcatttaatcattccaaaaatttaggtaatgatactgcttcgatcttaattggcatatttgcatgacctgtcccacacaactcagaacatgctccggccacgggagccg
34pAS15
(T)
(A)
ATTACCCTGTTAagtttccatagaagtaataataataataaatatattaaatattaatataattattaatta
TAACAGGGTAAT
622pBS8
ATTACCCTGTTATttagaatatttttaattaaataatataattaaatgaataccaaacttatattatatttaTAACAGGGTAAT
(A)
34pAS16
ATTACCCTGTTATtttataattttataaataatatattattataaatatttaatataattTAACAGGGTAAT
(A)
34pAS7
------------ATTACCCTGTTAT3'------------TAATGGGAC
3'TATTGTCCCATTA----------? CAGGGTAAT----------
mitochondria
nucleus
A flux of mitochondrial DNA sequences to the nucleus A flux of mitochondrial DNA sequences to the nucleus
Fragmentation of mtDNA
Transfer into the nucleus (?)
Integration into chromosomes following double-strand break
427494282826324
230kbI
7100152379
270kbVI
784kbXIV
61906
72377
24069
36933
41088
745kbX
60312
13770
1091kb
50089
XV813kbII
5455
71001
79268
III315kb
60739
575kbV
64614
42749
26324
64412
80378or 56176
58841
45799
1091kb
42828
VII948kbXVI
64603
42084
71001
77611
52679 or62508
36144
84295
563kbVIII
73712
74289
41088
16105
440kbIX
58971
51903
666kbXI
80559
51324
6556
4360549454
65001
924kbXIII
58341
1367
1522kbIV
13723
29100
9445
6836
XII1078kb+rDNA
XII1078kb+rDNA
rDNA repeatsrDNA repeats
13723
29100
9445
6836
100
200
400
600
800
1000
1200
1400
1600
kb
34 numts in the yeast nuclear genome
211 numts in the human genome
93 % are insertions of single DNA fragments, 7 % are insertions of multiple, non-adjacent mtDNA fragmentnumts size range 47 - 14654 bp and 78-100 % identity to mtDNA.
PCR amplification on DNA from 21 human donors and 3 chimpanzees (Pan troglodytes) using either one primer in the nuclear sequence and one in the numts sequence or two primers in the nuclear sequencesome PCR fragments were directly sequenced for verification.
Results
10 numts common to H. sapiens and P.troglodytes: present in all human individuals tested -----> ancient
21 numts specific to H. sapiens and present in all 21 individuals tested
6 numts specific to H. sapiens but present in some individuals only
Conclusion: 27 insertions have occured in the human genome since its separation from Pt6 of them are not fixed in the human population
Numts in the human nuclear genomeNumts in the human nuclear genome
Ricchetti, Tekaia, Dujon (2004) PLOS 2(9) E273
fixation of one novel numt per 200 000 years in human lineage
numts insertion and human genetic diseasesnumts insertion and human genetic diseasesTurner et al., (2003) Human Genet. 112, 303-309
16 year old boy with sporadic case of Pallister-Hall syndrome (anomalous development: polydactyly, metacarpal fusion, hypothalamic hamartoma, bifid epiglotis)
72 bp insertional mutation in exon 14 of GLI3 genesequence identical to fragment of mtDNA (fragment of ser-tRNA - leu tRNA genes)sequence predicts a truncated protein (935 aa compared to 1580 aa for w.t.)functional disruption of a key developmental gene
conception of patient temporally and geographically associated with high-level radioactive contamination following the Chernobyl accident
Borensztajn et al., (2002) Brit. J. Haematol. 117, 168-171family case of 251 bp mitDNA fragment inserted into coagulation factor VII gene
Willett-Brozick et al., (2001) Human Genet. 109, 216-223germline insertion of a 41 bp mtDNA fragment (12S rRNA) associated with a balanced translocation (t(9;11)(p24;q23)) of uncertain clinical significance, founder of mutation unknown.
C / T T / T
C / T72 bp insert
de novo mitochondrial-nuclear DNA transfer of paternal origin (associated SNP)
SOME CONCLUSIONS
AND PERPECTIVES
Eucaryotic genome evolution represents a dynamic equilibrium between:
1- duplications and loss of genes :
consequences: 1 -formation of paralogs with possibility of neo-functionalization (acquisition of novel function) or subfunctionalization (specialization of function between members of a family)
2 -gene family expansion and reduction
3- change of genetic maps (loss of synteny)
Increases polymorphism
Decreases polymorphism
Increases redundancy
Decreases redundancy
DuplicationsGene loss
Sequence divergence
Genetic drift, selection
2- divergence of sequences (creation of alleles, polymorphism of population) and loss of divergence (genetic drift and selection)
4- possible acquisition of external sequences (HGT) or internal sequences (NUMTs)
5- what about non-coding RNA genes ?
consequences: formation of pseudogenes (non-processed, disabled genes)
consequences: acquisition of novel functions (selection) or gene inactivation
3- activity and elimination of transposable elementsconsequences: duplication of genes or fragments (domain accretion)
change of genetic maps (chromosome reanrragements)formation of retrogenes and processed pseudogenes
The central dogma of molecular biologyThe central dogma of molecular biology
RNA
Proteins
Transcription
Translation
DNA
Replication
19771977
RNA
Splicing
19701970
Reverse trancription
ATTENTION: This RNA is not exactly identical to the gene product
presentpresent
Edition
5’ AUCGUUGCAGUC 3 ’
5’ AUCGUUGUAGUC 3 ’
5’ ATCGTTGCAGTC 3 ’
example of RNA editing
type loci chr.
S. cerevisiae 1 1 1 internal
C. glabrata 1 2 2 subtel
K. lactis 1 1 1 internal
D. hansenii 3 3 3 subtel + 1 orphan unit
Y. lipolytica >9 7 4 subtel
+ several orphan units + 105 copies 5S dispersed
Variability of rDNAVariability of rDNA
5S25S
5.8S18S
5S25S
5.8S18S
5S25S
5.8S18S
5S
5S25S
5.8S18S
5S var
25S5.8S
18Svar
VARIABILITY OF NON-CODING RNA GENES IN YEASTS AND VERTEBRATES
SACE CAGL KLLA DEHA YALI
Total tRNA genes 274 207 162 205 510(co-transcribed tRNA gene pairs) (4) (0) (2) (17) (11)
Splicing RNA U1 1 1 1 1 2U2 1 1 1 1 1U4 1 1 1 2 1U5 1 1 1 1 1U6 1 1 1 1 1
Processing U3 1 1 1 2 3Rnase P 1 1 1 1 1
Protein transport (SRP) 1 1 1 1 2
Telomerase 1 1 1 nd nd
Chicken Human synteny
Total tRNA genes 280 496 33 %
Splicing RNA U1 18 146U2 6 88U4 4 119U5 9 36U6 15 821 20%U4atac 1 1U6atac 4 5U11 1 1U12 1 2
Processing U3 nd ndRnase P 1 1
Protein transport (SRP) 3 12
Telomerase 1 1
HILLIET et al. Nature (2004) 432: 695-716
and UMR8030 CNRS, Evry J. Weissenbach, V. Anthouard, V. Barbe, L. Cattolico, S. Oztas, C. Scarpelli,
P. Wincker
Génopole Institut Pasteur, Paris C. Bouchier, L. Frangeul, L. Ma
LaBRI (UMR5800 CNRS), Centre de bioinformatique and IBGC (UMR5095 CNRS), Univ. Victor Segalen, Bordeaux D. Sherman, E. Beyne, I. Lesur, M. Nikolski, H. Ferry-Dumazet, A. Groppi; A. de Daruvar, N. Goffard; M. Aigle, P. Durrens
Dynamique, évolution et expression des génomes de microorganismes (FRE2326 CNRS), Univ. Louis Pasteur, Strasbourg J-L. Souciet, S. Potier, C. Bleykasten, J. de Montigny, L. Despons, N. Jauniaux, M-L. Straub, B. Wirth, M. Zeniou-Meyer
Unité de Génétique moléculaire des levures, (URA2171 CNRS, and Univ. P. M. Curie), Institut Pasteur , Paris B. Dujon , J. Boyer, E. Fabre, C. Fairhead, G. Fischer, C. Hennequin, A. Kerrest, R. Koszul, I. Lafontaine, H. Muller, O. Ozier-Kalogeropoulos, S. Pellenz, G-F. Richard, E. Talla, F. Tekaia, A. Thierry
CLIB and Génétique moléculaire et cellulaire (UMR216 INRA and URA1925 CNRS), Institut National Agronomique, Grignon C. Gaillardin, A. Babour, S. Barnay, J-M. Beckerich, S. Blanchin, A. Boisramé, S. Casaregola, P. Joyet, C. Neuvéglise, J-M. Nicaud, A. Suleau, D. Swennene
Institut de Génétique moléculaire (UMR8621 CNRS), Univ. Paris-Sud, Orsay M. Bolotin-Fukuhara, F. Confanioleri, I. ZivanovicGénétique des levures, (UMR5122 CNRS), Univ. Claude Bernard, Lyon M. Wésolowski-Louvel, M. Lemaire
IBMC (UPR9002 CNRS), Strasbourg E. Westhof, R. Kachouri
Logiciels et banques de données, Institut Pasteur, Paris B. Caudron
Biochimie et Génétique moléculaire CEA, Saclay C. Marck
Interactions macromoléculaires (URA2171 CNRS), Institut Pasteur, Paris F. Hantraye
The GénolevuresGénolevures Sequencing Consortium (GDR 2354 CNRS )