nothing in ( computational ) biology makes sense except in the light of evolution
DESCRIPTION
Comparative genomics and the new perspective on genome evolution. Nothing in ( computational ) biology makes sense except in the light of evolution. after Theodosius Dobzhansky (1970). 1. 101. 201. 301. 401. 1. 101. 201. 301. 401. 501. 601. - PowerPoint PPT PresentationTRANSCRIPT
Nothing in (computational) biology makessense except in the light of evolution
after Theodosius Dobzhansky (1970)
Comparative genomics and the new perspective on genome evolution
Conservation of gene order in bacterial species of the same
genus1
101
201
301
401
501
601
1 101 201 301 401
M. genitaliumvs
M. pneumoniae
Conservation of gene order in closely related bacterial genera
C. trachomatisvs
C. pneumoniae
1
101
201
301
401
501
601
701
801
901
1001
1 101 201 301 401 501 601 701 801
Lack of gene order conservation - even in “closely related” bacteria of the same
Proteobacterial subdivision
P. aeruginosavs
E. coli
1101201301401501601701801901
1001110112011301140115011601170118011901200121012201230124012501260127012801290130013101320133013401350136013701380139014001410142014301440145014601470148014901500151015201530154015501
1 101
201
301
401
501
601
701
801
901
1001
1101
1201
1301
1401
1501
1601
1701
1801
1901
2001
2101
2201
2301
2401
2501
2601
2701
2801
2901
3001
3101
3201
3301
3401
3501
3601
3701
3801
3901
4001
4101
4201
ecoli
paer
<0.3
0.3-0.8
0.8-1.3
>1.3
Genome Alignments - MethodProtein sets from completely genomes
BLAST cross-comparison
Pairwise Genome AlignmentLocal alignment algorithmLamarck (gap opening penalty,gap extension penalty); statisticswith Monte Carlo simulations
Table of Hits
Template-Anchored Genome Alignment
Genome Alignments - Statistics
0.0
0.1
0.2
0.3
0.4
0.52 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
>20
cpneu-ctra
mjan-mthe
bsub-ecoli
drad-aero
Distribution of conserved gene string lengths
Genome Alignments - StatisticsPairwise No. No. % in % inalignments: strings genes Gen1 Gen2
all homologsecoli-hinf 138 566 13% 33%ecoli-bsub 89 322 8% 8%ecoli-mjan 10 30 1% 2%
probable orthologsecoli-hinf 105 482 11% 28%ecoli-bsub 34 168 4% 4%ecoli-mjan 12 33 1% 2%
Genome Alignments - Statistics
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
aero af
ul
mjan
mth
epy
ro
aqua
ebb
urbs
ub cac
cjej
cpne
uct
radr
adec
oli hinf
hpyl
mge
n
mpn
eum
tub
nmen
rpxx
syne
cho
tmar tp
aluu
re
Not in gene strings
In non-conserved gene strings (directons)
In conserved gene strings
Breakdown of genesin the genome
Genome Alignments - StatisticsFraction of the genome in conserved gene strings - from
template-anchored alignments
Minimum Synechocystis sp. 5%
Aquifex aeolicus 10%Archaeoglobus fulgidus 13%Escherichia coli 14%Treponema pallidum 17%
Maximum Thermotoga maritima 23%Mycoplasma genitalium 24%
-proteobacteria-proteobacteria
Bacillus/Clostridium groupmycoplasmas
spirocheteschlamydias
AaeTma
cyanobacteria
actinobacteriaDra
Hbs
TxxAfu
PxxMjaMth
crenarchaea
-proteobacteria
eukaryota
Archaea
Bacteria
The three domains of life: the Tree
Eukaryotes
Archaea
Bacteria 729188
21087
111
245
315
496
The three domains of life: relationships within clusters of orthologs(COGs)
A+B+E
A+E
A+B
A
All COGs Pan-archaeal COGs
0
10
20
30
40
50
60
70
information metabolic cellular unknow n
A+E-B
A+E-B A+E-B
A+E-BA+B-E
A+B-E
A+B-E
A+B-E
Protein functions in the archaeo-eukaryotic and archaeo-bacterial subsets of the conserved archaeal core (310 COGs total)
Eco E
Sce E cytAth E cyt
Hsa Q
Sce Q
Eco QHin Q
Mth EMja E
Pho E
Afu E
Sso E
Hpy E /1/
Hpy E /2/
Hin E
Tpa EBbu ECel E mit
Sce E mitCtr EMtu E
Ssp EMpn E
Mge EBsu EAae E
Mge WMpn W
Sce W mitCel W mit
Ssp WHpy W
Bsu WMtu W
Hin W Eco W
Aae WCtr W Tpa W
Bbu W
Afu WMth WMja W
Pho W
Sce W cytHsa W cyt
Phylogenetic trees of aminoacyl-tRNA synthetases: HGT comes out loud and clear
Csp3_Hsa
PC_Ddis
ActD_MxanXF2779_Xfa
Mlr3300_MloYOR197w_Sce
MC_SpoMC5_At
MC4_AtMC2_At
MC_HbrMC3_At
MC1_At
MC_Rsph MC_Geos
PK3_Scoe
Gingipain K_Pgin
Gingipain R_Pgin
CASP-like_Deha
Mlr3303_Mlo
Mlr2366_MloMll2372_Mlo
Mlr3463_MloMlr1804_Mlo
Mll5190_Mlo
PC_Hsa
PC_CelCsp1_Hsa
Csp2_HsaCED3_Cel
Csp10_Hsa
Csp9_Hsa
Phylogenetic tree of the caspase-like protease superfamily
Eukaryotic programmed cell death - the bacterial contribution
A B C D A B C D A B C D
A B C D A B C D
IQ=1 IQ=1
IQ=2 IQ=2
HGTLoss
Inconsistency Quotient
IQ = minimal number of events (Loss, Emergence, or HGT) required to reconcile a COG’s phyletic pattern with the topology of the species tree
2 parsimoniousscenarios
0
100
200
300
400
500
600
700
0 2 4 6 8 10 12
Number of events (I)
Nu
mb
er
of
CO
Gs
Number of gene loss and HGT events in most parsimonious evolutionary scenarios for COGs (I values).
Conclusion
Comparative genomics shows that genome evolutionis a highly dynamic process dominated by gene shuffling,lineage-specific gene loss and horizontal gene transfer