phylogenomics talk in 2000 at university of maryland by j. eisen

115
TIGR TIGR Phylogenomics: Combining Evolutionary Reconstructions and Genome Analysis into a Single Composite Approach 0 250000 500000 750000 1000000 1250000 0 250000 500000 750000 1000000 1250000 Query Orf Position Mycobacterium tuberculosis Bacillus subtilis Synechocystis sp. Caenorhabditis elegans Drosophila melanogaster Saccharomyces cerevisiae Methanobacterium thermoautotrophicum Archaeoglobus fulgidus Pyrococcus horikoshii Methanococcus jannaschii Aeropyrum pernix Aquifex aeolicus Thermotoga maritima Deinococcus radiodurans Treponema pallidum Borrelia burgdorferi Helicobacter pylori Campylobacter jejuni Neisseria meningitidis Escherichia coli Vibrio cholerae Haemophilus influenzae Rickettsia prowazekii Mycoplasma pneumoniae Mycoplasma genitalium Chlamydia trachomatis Chlamydia pneumoniae 0.05 changes Archaea Bacteria Eukarya Bacteria Archaea Bacteria Archaea A. rRNA tree of Bacterial and Archaeal Major Groups B. Groups with Completed Genomes Highlighted A B C D E F A B C D E F A B C D E F A B C D E F A’ B’ C’ D’ E’ F’ A B C D E F A’ B’ C’ D’ E’ F’ A C D F A’ B’ E’ E. coli E. coli B C D F A’ B’ D’ E’ V. cholerae A B C D E F A’ B’ C’ D’ E’ F’ B1 A1 B2 A2 B3 A3 A2 A1 A2 A3 B2 B1 B3 B2 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 6 7 25 8 26 27 28 29 30 1 2 3 4 5 3132 B1 3132 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 B3 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 6 7 25 8 26 27 28 29 3 323130 4 5 2 1 A1 3132 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A2 3132 6 7 8 9 10 11 12 13 19 18 17 16 15 14 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 A3 2 6 7 8 9 10 11 12 13 19 18 17 16 15 14 20 21 22 23 24 25 26 27 5 4 3 3130 29 28 1 32 B2 Inversion Around Terminus (*) Inversion Around Terminus (*) Inversion Around Origin (*) Inversion Around Origin (*) * * * * * * * * Figure 4 Common Ancestor of A and B 3132 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 3132 Three V. cholerae Photolyases Phr.S thyp PHR E. coli ORFA00965********* phr.neucr Phr.Tricho Phr.Yeast Phr.B firm phr.strpy phr.haloba PHR STRGR pCRY1.huma phr.mouse phr2.human phr2.mouse phr.drosop phr3.Synsp ORF02295.Vibch******** phr.neigo ORF01792.Vibch******* Phr.Adiant Phr2.Adian Phr3.Adian phr.tomato CRY1 ARATH phr.phycom CRY2 ARATH PHH1.arath PHR1 SINAL phr.chlamy PHR ANANI phr.Synsp PHR SYNY3 phr.Theth Rh.caps MTHF type Class I CPD Photolyases 6-4 Photolyases Blue Light Receptors 8-HDF type CPD Photolyases Three Photolyase Homologs in V. cholera UvrA2 UvrA2 S. coelicolor DrrC S. peuceteus UvrA2 D. radiodurans Duplication in UvrA family UvrA1 UvrA H. influenzae UvrA E. coli UvrA N. gonorrhoaea UvrA R. prowazekii UvrA S. mutans UvrA S. pyogenes UvrA S. pneumoniae UvrA B. subtilis UvrA M. luteus UvrA M. tuberculosis UvrA M. hermoautotrophicum UvrA H. pylori UvrA C. jejuni UvrA P. gingivalis UvrA C. tepidum uvra1 D. radiodurans UvrA T. thermophilus UvrA T. pallidum UvrA B. burgdorefi UvrAT. maritima UvrA A. aeolicus UvrA Synechocystis sp. UvrA1 UvrA2 OppDF UUP NodI LivF XylG NrtDC PstB MDR HlyB TAP1 CFTR, SUR A. ABC Transporters B. UvrA Subfamily 0 10 20 30 40 50 60 0 5 10 15 20 0 50 100 150 0 5 10 15 20 Number of Species With High Hits 0 50 100 150 200 250 F 0 5 10 15 20 Papa Bear Mama Bear Baby Bear 0 100 200 300 400 500 0 5 10 15 20 E. coli

Upload: jonathan-eisen

Post on 15-Jul-2015

160 views

Category:

Science


4 download

TRANSCRIPT

Page 1: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Phylogenomics:

Combining Evolutionary Reconstructions and Genome

Analysis into a Single Composite Approach

0

250000

500000

750000

1000000

1250000

Subject Orf Position

0 250000 500000 750000 1000000 1250000

Query Orf Position

Mycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansDrosophila melanogasterSaccharomyces cerevisiaeMethanobacterium thermoautotrophicumArchaeoglobus fulgidusPyrococcus horikoshiiMethanococcus jannaschiiAeropyrum pernixAquifex aeolicusThermotoga maritimaDeinococcus radioduransTreponema pallidumBorrelia burgdorferiHelicobacter pyloriCampylobacter jejuniNeisseria meningitidisEscherichia coliVibrio choleraeHaemophilus influenzaeRickettsia prowazekiiMycoplasma pneumoniaeMycoplasma genitaliumChlamydia trachomatisChlamydia pneumoniae0.05 changesArchaeaBacteriaEukarya

Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85

Bacteria Archaea Bacteria Archaea A. rRNA tree of Bacterial and Archaeal Major Groups B. Groups with Completed Genomes Highlighted

A

B

CD

E

F

A

B

CD

E

F

A

B

CD

E

F

A

B

C

D

EF

A’

B’

C’

D’

E’F’

A

B

C

D

EF

A’

B’

C’

D’

E’F’

A

C

D

F

A’

B’

E’

E. coliE. coli

B

C

D

F

A’

B’

D’

E’

V. cholerae

A

B

C

D

EF

A’

B’

C’

D’

E’F’

B1

A1

B2

A2

B3

A3

A2

A1 A2

A3

B2

B1

B3

B2

2423

2221

2019

1817161514

1312

11109

67258

2627

2829

301 2 3

45

3132

B1

3132

6789

1011

1213

1415161718

1920

2122

23242526

2728

2930

1 2 34

5

3132

B3 2423

2221

2019

1817161514

1312

11109

67258

2627

2829

33231 30

45

2 1

A1

3132

6789

1011

1213

1415161718

1920

2122

23242526

2728

2930

1 2 34

5

3132

A2

3132

6789

1011

1213

1918171615

1420

2122

2324252627

2829

301 2 3

45

3132

A3

2

6789

1011

1213

1918171615

1420

2122

23242526

275

43 31 30

2928

1 32

B2

Inversion Around Terminus (*)

Inversion Around Terminus (*)

Inversion AroundOrigin (*)

Inversion AroundOrigin (*)

* *

* *

* *

* *

Figure 4

Common Ancestor of

A and B

3132

6789

1011

1213

1415161718

1920

2122

23242526

2728

2930

1 2 34

5

3132

Three V. choleraePhotolyases

Phr.S thyp

PHR E. coli

ORFA00965*********

phr.neucr

Phr.Tricho

Phr.Yeast

Phr.B firm

phr.strpy

phr.haloba

PHR STRGR

pCRY1.huma

phr.mouse

phr2.human

phr2.mouse

phr.drosop

phr3.Synsp

ORF02295.Vibch********

phr.neigo

ORF01792.Vibch*******

Phr.Adiant

Phr2.Adian

Phr3.Adian

phr.tomato

CRY1 ARATH

phr.phycom

CRY2 ARATH

PHH1.arath

PHR1 SINAL

phr.chlamy

PHR ANANI

phr.Synsp

PHR SYNY3

phr.Theth

Rh.caps

MTHF type Class I CPD Photolyases

6-4 Photolyases

Blue Light

Receptors

8-HDF type CPD

Photolyases

Three Photolyase Homologs in V. cholerae

UvrA2UvrA2 S. coelicolorDrrC S. peuceteusUvrA2 D. radioduransDuplicationin UvrAfamilyUvrA1UvrA H. influenzaeUvrA E. coliUvrA N. gonorrhoaeaUvrA R. prowazekiiUvrA S. mutansUvrA S. pyogenesUvrA S. pneumoniaeUvrA B. subtilisUvrA M. luteusUvrA M. tuberculosisUvrA M. hermoautotrophicumUvrA H. pyloriUvrA C. jejuniUvrA P. gingivalisUvrA C. tepidumuvra1 D. radioduransUvrA T. thermophilusUvrA T. pallidumUvrA B. burgdorefiUvrA T. maritimaUvrA A. aeolicusUvrA Synechocystis sp. UvrA1UvrA2OppDFUUPNodILivFXylGNrtDCPstBMDRHlyBTAP1CFTR, SURA. ABC TransportersB. UvrA Subfamily

01020304050600510152005010015005101520Number of Species With High Hits050100150200250Frequency05101520Papa BearMama BearBaby Bear010020030040050005101520E. coli

Page 2: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Topics of Discussion• Introduction to phylogenomics• Phylogenomics Examples

– Functional prediction– Not making functional predictions– Gene duplication– Genetic exchange within genomes– Gene loss– Specialization – Horizontal gene transfer

Page 3: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGRTIGRTIGR

“Nothing in biology makes senseexcept in the light of evolution.”

T. H. Dobzhansky (1973)

Page 4: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Page 5: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Uses of Evolutionary Analysis in Molecular Biology

• Identification of mutation patterns (e.g., ts/tv ratio)• Amino-acid/nucleotide substitution patterns useful in

structural studies (e.g., rRNA)• Sequence searching matrices (e.g., PAM, Blosum)• Motif analysis (e.g., Blocks)• Functional predictions• Classifying multigene families• Evolutionary history puts other information into

perspective (e.g., duplications, gene loss)

TIGRTIGR

Page 6: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Evolutionary Studies Improve Most Aspects of Genome Analysis• Phylogeny of species places comparative data in perspective• Evolution of genes and gene families

– Functional predictions– Identification of orthologs and paralogs– Species specific mutation patterns

• Evolution of pathways– Convergence– Prediction of function

• Evolution of gene order/genome rearrangements• Phylogenetic distribution patterns• Identification of novel features

Page 7: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Genome Information and Analysis Improves Studies of Evolution

• Complete genome information particularly useful • Unbiased sampling• More sequences of genes• Presence/absence information needed to infer certain

events (e.g., gene loss, duplication)• Genome wide mutation and substitution patterns (e.g.,

strand bias)• Diversification and duplication

Page 8: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Phylogenomic Analysis• There are feedback loop between evolutionary and genome

analysis such that for many studies, genome and evolutionary analyses are interdependent.

• Therefore, I have proposed that they actually be combined into a single composite approach I refer to as phylogenomics

• Phylogenomics involves combining evolutionary reconstructions of genes, proteins, pathways, and species with analysis of complete genome sequences.

Page 9: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Outline of PhylogenomicsGene Evolution EventsPhenotype PredictionsDatabaseSpecies treePresence/AbsenceGene treesCongruenceEvol. DistributionF(x) PredictionsPathway Evolution

TIGRTIGR

Page 10: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Page 11: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Uses of Phylogenomics I:

Functional Predictions

Page 12: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Predicting Function

• Identification of motifs• Homology/similarity based methods

– Highest hit– Top hits– Clusters of orthologous groups– HMM models– Structural threading and modeling– Evolutionary reconstructions

TIGRTIGR

Page 13: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Types of Molecular Homology

• Homologs: genes that are descended from a common ancestor (e.g., all globins)

• Orthologs: homologs that have diverged after speciation events (e.g., human and chimp β-globins)

• Paralogs: homologs that have diverged after gene duplication events (e.g., α and β globin).

• Xenologs: homologs that have diverged after lateral transfer events

• Positional homology: common ancestry of specific amino acid or nucleotide positions in different genes

Page 14: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Phylogenomic Analysis of the MutS Family of Proteins

• Published analysis– Eisen JA et al. 1997. Nature Medicine

3(10):1076-1078. – Eisen JA. 1998. Nucleic Acids Research 26(18):

4291-4300

Page 15: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Page 16: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Blast Search of H. pylori “MutS” Score E Sequences producing significant alignments: (bits) Value sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25 sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10 sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09 sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08 sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07 sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07

• Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs

Page 17: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

H. pylori and MutS• Prior to this genome, all species that

encoded a MutS homolog also encoded a MutL homolog

• Experimental studies have shown MutS and MutL always work together in mismatch repair

• Problem: what do we conclude about H. pylori mismatch repair

Page 18: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Phylogenetic Tree of MutS FamilyAquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathStrpyBacsuCelegHumanYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHuman

Page 19: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

MutS SubfamiliesAquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathStrpyBacsuCelegHumanYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2

Page 20: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

MutS Subfamilies

• MutS1 Bacterial MMR• MSH1 Euk - mitochondrial MMR• MSH2 Euk - all MMR in nucleus• MSH3 Euk - loop MMR in nucleus• MSH6 Euk - base:base MMR in nucleus

• MutS2 Bacterial - function unknown• MSH4 Euk - meiotic crossing-over• MSH5 Euk - meiotic crossing-over

Page 21: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Overlaying Functions onto TreeAquaeTrepaRatFlyXenlaMouseHumanYeastNeucrArathBorbuSynspNeigoThemaStrpyBacsuEcoliTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathStrpyBacsuHumanCelegYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2

Page 22: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Functional Prediction Using TreeAquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathMSH1

Repairin Mictochondria

MSH3Repair of Loops

in Nucleus

MSH6Repair of Mismatches

in Nucleus

MutS1Repair of Loops and Mismatches

StrpyBacsuCelegHumanYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHumanMSH4Meiotic Crossing-Over

MSH5Meiotic Crossing-OverMutS2 Unknown FunctionsMSH2Repair of Loops and Mismatches

in Nucleus

Page 23: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Table 3. Presence of MutS Homologs in Complete Genomes Sequences

Species # of MutSHomologs

WhichSubfamilies?

MutLHomologs

BacteriaEscherichia coli K12 1 MutS1 1Haemophilus influenzae Rd KW20 1 MutS1 1Neisseria gonorrhoeae 1 MutS1 1Helicobacter pylori 26695 1 MutS2 -Mycoplasma genitalium G-37 - - -Mycoplasma pneumoniae M129 - - -Bacillus subtilis 169 2 MutS1,MutS2 1Streptococcus pyogenes 2 MutS1,MutS2 1Mycobacterium tuberculosis - - -Synechocystis sp. PCC6803 2 MutS1,MutS2 1Treponema pallidum Nichols 1 MutS1 1Borrelia burgdorferi B31 2 MutS1,MutS2 1Aquifex aeolicus 2 MutS1,MutS2 1Deinococcus radiodurans R1 2 MutS1,MutS2 1

ArchaeaArchaeoglobus fulgidus VC-16, DSM4304 - - -Methanococcus janasscii DSM 2661 - - -Methanobacterium thermoautotrophicum ∆Η 1 ΜυτΣ2 −

ΕυκαρψοτεσΣαχχηαροµψχεσ χερεϖισιαε 6 ΜΣΗ1−6 3+Ηοµο σαπιενσ 5 ΜΣΗ2−6 3+

Page 24: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Why was the MutS2 Family Missed?Blast Search of Syn. sp. MutS#2

Sequences producing significant alignments: (bits) Value

sp|Q56239|MUTS_THETH DNA MISMATCH REPAIR PROTEIN MUT 91 3e-17sp|P26359|SWI4_SCHPO MATING-TYPE SWITCHING PROTEIN 87 4e-16sp|P27345|MUTS_AZOVI DNA MISMATCH REPAIR PROTEIN MUTS 83 1e-14sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN MUTS 81 3e-14sp|Q56215|MUTS_THEAQ DNA MISMATCH REPAIR PROTEIN MUTS 81 4e-14sp|P10564|HEXA_STRPN DNA MISMATCH REPAIR PROTEIN HEXA 80 5e-14

• Blast search pulls up standard MutS genes but with only a moderate p value (10-17)

Page 25: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Problems with Similarity Based Functional Prediction

• Prone to database error propagation.• Cannot identify orthologous groups reliably.• Perform poorly in cases of evolutionary rate

variation and non-hierarchical trees (similarity will not reflect evolutionary relationships in these cases)

• May be misled by modular proteins or large insertion/deletion events.

• Are not set up to deal with expanding data sets.

TIGRTIGR

Page 26: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Evolutionary Rate Variation

231456

Page 27: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Rate Variation and DuplicationSpecies 3Species 1Species 21A2A3A1B2B3BDuplication

Page 28: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

EvolutionaryMethod

PHYLOGENENETIC PREDICTION OF GENE FUNCTIONIDENTIFY HOMOLOGSOVERLAY KNOWNFUNCTIONS ONTO TREE

INFER LIKELY FUNCTIONOF GENE(S) OF INTEREST

1234563531A2A3A1B2B3B2A1B1A3A1B2B3BALIGN SEQUENCESCALCULATE GENE TREE1246CHOOSE GENE(S) OF INTEREST2A2A53Species 3Species 1Species 211222311A3A1A2A3A1A2A3A464564562B3B1B2B3B1B2B3B ACTUAL EVOLUTION(ASSUMED TO BE UNKNOWN)

Duplication?EXAMPLE AEXAMPLE BDuplication?Duplication?Duplication5 METHODAmbiguous

Page 29: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

MutS.Aquaeorf.TrepaSPE1.DromeMSH2.XenlaMSH2.RatMSH2.MouseMSH2.HumanMSH2.YeastMSH2.NeucratMSH2.ArathMutS.Borbuorf.StrpyMutS.BacsuMutSSynspMutSEcoliorfNeigoMutSThemaMutSTheaq

orf.Deiraorf.ChltrMSH1.SpombeMSH1.YeastMSH3.YeastSwi4.SpombeRep3.MousehMSH3.Humanorf.ArathMSH6.YeastGTBP.HumanGTBP.MouseMSH6.ArathorfStrpyyshDBacsuMSH5CaeelhMHS5humanMSH5YeastMutS.Metthorf

BorbuMutS2AquaeMutSSynsporfDeiraMutS.HelpysgMutS.SauglMSH4.YeastMSH4.CaeelhMSH4.HumanA.AquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathMutS2.MetthMutS2.SauglStrpyBacsuCaeelHumanYeastBorbuAquaeSynspDeiraHelpyYeastCaeelHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2B.AquaeTrepaXenlaNeucrArathBorbuSynspNeigoThemaDeiraChltrSpombeSpombeArathMouseMouseFlyRatMouseHumanYeastStrpyBacsuEcoliTheaqYeastYeastHumanYeastHumanArathStrpyBacsuHumanMutS2-MetthBorbuAquaeSynspDeiraHelpyMutS2-SauglCaeelYeastYeastCaeelHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2C.MutS2StrpyBacsuMutS2.MetthBorbuAquaeSynspDeiraHelpyMutS2.SauglCaeelYeastYeastCaeelHumanHumanMSH4Segregation &

Crossover

MSH5Segregation &

Crossover

FlyMouseHumanYeastAquaeTrepaXenlaNeucrArathBorbuSynspNeigoThemaDeiraChltrSpombeSpombeArathArathMutS1All MMR

(Bacteria)

RatStrpyBacsuEcoliTheaqYeastYeastMouseHumanYeastHumanMouseMSH1MMR in

Mitochondria

MSH3MMR of

Large Loops in Nucleus

MSH6MMR of

Mismatches and Small Loopsin Nucleus

MSH2All MMR

in Nucleus

D.

Page 30: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

ETL1_M.m YA19_S.c CHD1_M.m SYGP4_S.c MOT1_S.c ERCC6_H.s RAD26_S.c NUCP_H.s NUCP_M.m YB53_S.c RAD54_S.c DNRPPX_S.p RAD5_S.c RAD8_S.p HIP116A_H.s RAD16_S.c LODE._D.mNPHCG_42HEPA._E.c YB95_S.c F37A4_C.e ISWI_D.m SNF2L_H.s BRM_D.m BRM_H.s BRG1_H.s BRG1_M.m STH1_S.c SNF2_S.c SNF2SNF2LCHD1ETL1CSBRAD54RAD16LODEEvolution of the SNF2 Family of Proteins

Page 31: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

4 F17L22 170 Arabidopsis thali4455279 Arabidopsis thaliana1049068 Lycopersicon esculentuHomo sapiens5514652 Drosophila melanogasteDrosophila melanogaster2123725 Caenorhabditis elegans6606113 Capronia mansoniiRpoII.Yeast.YOR151C107346 Schizosaccharomyces pom151348 Euplotes octocarinatus265427 Euplotes octocarinatus3845258 Plasmodium falciparumRpoIII.DromeRpoIII.Drome.7303535EGAD 114464 Caenorhabditis eleRpoIII.Yeast.172383EGAD 145012 SchizosaccharomyceRpoIII.Neucr.7800864ARATH5 K18C1 1Aeropyrum pernixEGAD 8025 Sulfolobus acidocald5458046 Pyrococcus abyssiPH1546 Pyrococcus horikoshiiThermococcus celerEGAD 14667 Methanococcus vanniMJ1040 Methanococcus jannaschiAF1886 Archaeoglobus fulgidusHalobacterium halobiumThermoplasma acidophilumRPB2 Methanobacterium thermoauatmystery.BAB02021ARATH3 MRC8.7ARATH3 MYM9.126723961 Schizosaccharomyces poRpoI.Yeast.YPR010CRpoI.Neucr.3668171RPA2 Rattus norvegicusMus musculusRpoI.Drome.7296211Caenorhabditis elegans92131 Euplotes octocarinatusARATH1 T1P2.15ARATH1 F1N18.21492072Molluscum contagiosum v439046 Variola major virus1143635 Variola virus2772787 Vaccinia virus323395 Cowpox virus6578643 Rabbit fibroma virus6523969 Myxoma virus6682809 Yaba monkey tumor viru7271687 Fowlpox virus4049822 Melanoplus sanguinipes2887 Kluyveromyces lactisEGAD 151364 Sacch kluyveri1369760 Borrelia burgdorferiBB0389 Borrelia burgdorferiTP0241 Treponema pallidum6652714 Rickettsia massiliae6652723 Rickettsia sp. Bar296652720 Rickettsia conoriiRP140 Rickettsia prowazekii6960339 Salmonella typhimuriumEGAD 1084 Salmonella choleraesEC3987 Escherichia coliEGAD 23892 Buchnera aphidicolaHI0515 Haemophilus influenzaeEGAD 6020 Pseudomonas putidaRPOB Coxiella burnetii3549149 Legionella pneumophilaRPOB Neisseria meningitidisHP1198 Helicobacter pylori6967949 Campylobacter jejuniAA1339 Aquifex aeolicusBS0107 Bacillus subtilis4512396 Bacillus halodurans6002201 Listeria monocytogenesEGAD 32012 Staphylococcus aureEGAD 32011 Spiroplasma citriMG341 Mycoplasma genitaliumMP326 Mycoplasma pneumoniae6899151 Ureaplasma urealyticumRv0667 Mycobacterium tuberculoMycobacterium leprae7144498 Mycobacterium smegmatiEGAD 39063 Mycobacterium smegmGP 7331268 Amycolatopsis medit7248348 Streptomyces coelicolo7573273 Thermus aquaticusDR0912 Deinococcus radioduransTM0458 Thermotoga maritimaEGAD 74970 80693 Heterosigma cEGAD Odontella sinensisEGAD 60306 Spinacia oleraceaEGAD Nicotiana tabacum6723742 Oenothera elata5457427 Sinapis alba5881686 Arabidopsis thaliana4958867 Triticum aestivumEGAD 76270 Zea maysRPOB Oryza sativaEGAD Pinus thunbergiiEGAD Marchantia polymorpha7259525 Mesostigma viride5880717 Nephroselmis olivaceaRPOB Guillardia thetasll1787 Synechocystis PCC6803EGAD 75526 Porphyra purpurea6466433 Cyanidium caldariumEGAD 76712 Cyanophora paradoxaRPOB Chlorella vulgarisEGAD 76424 Euglena gracilis5231258 Toxoplasma gondii6492294 Neospora caninumEGAD 83446 Plasmodium falcipar

100

78

100

85

93

83

100

79

100

100

100100 100

100

94100

100

7499100

99100

100

99

9480

100

100

100

100

59

100

100

99

56100

100

100

10058 95100

9763

95100

100

10081

100

100

100

59

6099

100

10094

100100

69100

7710097

100

71

100

9958

83

100100

100

99100

98100

100

61

99

75100

73100

100

59

100

100

72

72

98

529859

100

100

a

Novel RNA Polymerase in A. thaliana

ArchaealIV

II

III

I

Viral

Bacterial - RpoB

Plastid- RpoBs

Page 32: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Novel Large Subunit Rubisco in Chlorobium tepidumAgathis.gi3982533

Agathis.gi3982549

Araucaria.gi3982517

Agathis.gi3982535

Agathis.gi3982541

Venturiella.gi4009420

Leucobryum.gi6230571

Mougeotia.gi1145415

Anabaena.gi68158

Thife.gi2411435

Thiin.gi4105518

Metja.gi2129276

Pyrho.gi|3257353

Pyrab.gi|5458634

Pyr karaensis.gi3769302

Arcfu.gi2648911

Arcfu.gi2648975

Bacsu.gi2633730

Chlte.ORF02314

100

100

96

54

99

58

66

59

100

100

82

67100

100

100

93

Type X

Type I

Rubisco Large Subunit Phylogeny

Page 33: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Uses of Phylogenomics II:

Knowing when to Not Predict Functions

Page 34: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Deinococcus radiodurans

Page 35: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

DNA Repair Genes in D. radiodurans Complete Genome

Process Genes in D. radiodurans

Nucleotide Excision Repair UvrABCD, UvrA2Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths,

MPGAP Endonuclease XthMismatch Excision Repair MutS, MutLRecombination Initiation Recombinase Migration and resolution

RecFJNRQ, SbcCD, RecDRecARuvABC, RecG

Replication PolA, PolC, PolX, phage PolLigation DnlJdNTP pools, cleanup MutTs, RRaseOther LexA, RadA, HepA, UVDE, MutS2

Page 36: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Recombination Genes in GenomesPathway |------------------------------Bacteria---------------------------| |---Archaea---| Euks

Protein Name(s)

Initiation

RecBCD pathwayRecB + + - - - - - - + + - + - - - - - - - -RecC + + - - - - - - + ±+ - ± - - - - - - - -RecD + + - - ± - - - + ±+ - ++ - ± ±+ - - - - -

RecF pathwayRecF + + + - + - - + + - + ± - - + - - ± ± ±RecJ + + + + + - - + - + + + + + + - - - - -RecO + + - - + - - + + - - - - - ± - - - - -RecR + + + ±+ + - - + + - + + - + + - - - - -RecN + + + + + - - + + - + - ± + + - - ± ± -RecQ + + - - + - - + - - + - - - + - - - - + ++

RecE pathwayRecE/ExoVIII + - - - - - - - - - - - - - - - - - - -RecT + - - - + - - - - - - - - - - - - - - -

SbcBCD pathwaySbcB/ExoI + + - - - - - - - - - - - - - - - - - -SbcC + - - - + - - + - + + - + + + ± ± ± ± ± ±SbcD + - - - + - - + - + + - + + + ± ± ± ± ± ±

AddAB PathwayAddA/RexA - - + - + - - - - - + + - ± - - - - - -AddB/RexB - - - - + - - - - - - - - - - - - - - -

Rad52 pathwayRad52, Rad59 - - - - - - - - - - - - - - - - - - - ++ +Mre11/Rad32 ± - - - ± - - ± - ± ± - ± ± ± + + + + + +Rad50 ± - - - ± - - ± - ± ± - ± ± ± + + + ± + +

RecombinaseRecA, Rad51 + + + + + + + + + + + + + + + + + + + ++ ++

Branch migrationRuvA + + + + + + + + + + + + + - + - - - - -RuvB + + + + + + + + + + + + + - + - - - - -

RecG + + + + + - - + + + + - + + + - - - - -

ResolvasesRuvC + + + + - - - + + - + + + - + - - - - -RecG + + + + + - - + + + + - + + + - - - - -Rus + - - - - - - - ±+ - - - - ±+ - - - - - -CCE1 - - - - - - - - - - - - - - - - - - - +

Other recombination proteinsRad54 - - - - - - - - - - - - - - - - - - - + +Rad55 - - - - - - - - - - - - - - - - - - - + +Rad57 - - - - - - - - - - - - - - - - - - - + +Xrs2 - - - - - - - - - - - - - - - - - - - +

Page 37: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Unusual Features of D. radiodurans DNA Repair Genes

Process Genes

Nucleotide excision repair Two UvrAs

Base excision repair Four MutY-Nths

Recombination RecD but not RecBC

Replication Four Pol genes

dNTP pools Many MutTs, two RRases

Other UVDE

Page 38: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Problem:

List of DNA repair gene homologs in D. radiodurans genome is not significantly different from other

bacterial genomes of the similar size

Page 39: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

-Ogt-RecFRQN-RuvC-Dut-SMS

-PhrI-AlkA-Nfo-Vsr-SbcCD-LexA-UmuC

-PhrI-PhrII-AlkA-Fpg-Nfo-MutLS-RecFORQ-SbcCD-LexA-UmuC-TagI

-PhrI-Ogt-AlkA-Xth-MutLS-RecFJORQN-Mfd-SbcCD-RecG-Dut-PriA-LexA-SMS-MutT

-PhrI-PhrII?-AlkA-Fpg-Nfo-RecO-LexA-UmuC

-PhrI-Ung?-MutLS-RecQ?-Dut-UmuC

-PhrII-Ogg

-Ogt-AlkA-TagI-Nfo-Rec-SbcCD-LexA

-Ogt-AlkA-Nfo-RecQ-SbcD?-Lon-LexA

-AlkA-Xth-Rad25?

-AlkA-Rad25

-Nfo

-Ogt-Ung-Nfo-Dut-Lon

-Ung

-PhrII

-PhrI

Ecoli

Haein

Neigo

Helpy

Bacsu

Strpy

Mycge

Mycpn

Borbu

Trepa

Synsp

Metjn

Arcfu

Metth

Human

Yeast

BACTERIA ARCHAEA EUKARYOTES

from mitochondria

+Ada+MutH+SbcB

dPhr

+TagI?+Fpg

+UvrABCD+Mfd

+RecFJNOR+RuvABC

+RecG+LigI

+LexA+SSB

+PriA+Dut?

+Rus+UmuD

+Nei?+RecE

tRecT?

+Vsr+RecBCD?

+RFAs+TFIIH

+Rad4,10,14,16,23,26+CSA

+Rad52,53,54+DNA-PK, Ku

dSNF2dMutSdMutLdRecA

+Rad1+Rad2

+Rad25?+Ogg+LigII

+Ung?+SSB,

+Dut?

+PhrI, PhrII+Ogt

+Ung, AlkA, MutY-Nth+AlkA

+Xth, Nfo?+MutLS?

+SbcCD+RecA

+UmuC+MutT

+LondMutSI/MutSII

dRecA/SMSdPhrI/PhrII

+Sprt3MG

+Rad7+CCE1

+P53dRecQ

dRad23+MAG?

-PhrII-RuvC

tRad25

+TagI?

+RecT

tUvrABCD

tTagI ?

Gain and Loss of Repair Genes

TIGRTIGR

Page 40: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Repair Studies in Different Species(determined by Medline searches as of 1998)

Humans 7028E. coli 3926S. cerevisiae 988Drosophila 387B. subtilits 284S. pombe 116Xenopus 56C. elegans 25A. thaliana 20Methanogens 16Haloferax 5Giardia 0

Page 41: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Uses of Phylogenomics III:

Gene Duplication

Page 42: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Why Duplications Are Useful to Identify

• Allows division into orthologs and paralogs

• Aids functional predictions

• Recent duplications may be indicative of species’ specific adaptations

• Helps identify mechanisms of duplication

• Can be used to study mutation processes in different parts of genome

Page 43: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Recent Duplications

Page 44: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

MutY-NthDEIRA ORF00829DEIRA ORF02784DEIRA AQUAEMETJA METTHTHEMACHLTRHAEIN MCYTU THEMAMETTHPYRHOAQUAE METJAARCFU CELEGVIBCHECOLIHAEINTREPARICPR AQUAEBACSUCAMJEHELPYMCYTU SYNSPCHLPNCHLTRBBUR

Page 45: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Expansion of MCP Family in V. choleraeE.coli gi1787690B.subtilis gi2633766Synechocystis sp. gi1001299Synechocystis sp. gi1001300Synechocystis sp. gi1652276Synechocystis sp. gi1652103H.pylori gi2313716H.pylori99 gi4155097C.jejuni Cj1190cC.jejuni Cj1110cA.fulgidus gi2649560A.fulgidus gi2649548B.subtilis gi2634254B.subtilis gi2632630B.subtilis gi2635607B.subtilis gi2635608B.subtilis gi2635609B.subtilis gi2635610B.subtilis gi2635882E.coli gi1788195E.coli gi2367378E.coli gi1788194E.coli gi1789453C.jejuni Cj0144C.jejuni Cj0262cH.pylori gi2313186H.pylori99 gi4154603C.jejuni Cj1564C.jejuni Cj1506cH.pylori gi2313163H.pylori99 gi4154575H.pylori gi2313179H.pylori99 gi4154599C.jejuni Cj0019cC.jejuni Cj0951cC.jejuni Cj0246cB.subtilis gi2633374T.maritima TM0014T.pallidum gi3322777T.pallidum gi3322939T.pallidum gi3322938B.burgdorferi gi2688522T.pallidum gi3322296B.burgdorferi gi2688521T.maritima TM0429T.maritima TM0918T.maritima TM0023T.maritima TM1428T.maritima TM1143T.maritima TM1146P.abyssi PAB1308P.horikoshii gi3256846P.abyssi PAB1336P.horikoshii gi3256896P.abyssi PAB2066P.horikoshii gi3258290P.abyssi PAB1026P.horikoshii gi3256884D.radiodurans DRA00354D.radiodurans DRA0353D.radiodurans DRA0352P.abyssi PAB1189P.horikoshii gi3258414B.burgdorferi gi2688621M.tuberculosis gi1666149V.cholerae VC0512V.cholerae VCA1034V.cholerae VCA0974V.cholerae VCA0068V.cholerae VC0825V.cholerae VC0282V.cholerae VCA0906V.cholerae VCA0979V.cholerae VCA1056V.cholerae VC1643V.cholerae VC2161V.cholerae VCA0923V.cholerae VC0514V.cholerae VC1868V.cholerae VCA0773V.cholerae VC1313V.cholerae VC1859V.cholerae VC1413V.cholerae VCA0268V.cholerae VCA0658V.cholerae VC1405V.cholerae VC1298V.cholerae VC1248V.cholerae VCA0864V.cholerae VCA0176V.cholerae VCA0220V.cholerae VC1289V.cholerae VCA1069V.cholerae VC2439V.cholerae VC1967V.cholerae VCA0031V.cholerae VC1898V.cholerae VCA0663V.cholerae VCA0988V.cholerae VC0216V.cholerae VC0449V.cholerae VCA0008V.cholerae VC1406V.cholerae VC1535V.cholerae VC0840V.cholerae VC0098V.cholerae VCA1092V.cholerae VC1403V.cholerae VCA1088V.cholerae VC1394V.cholerae VC0622NJ*******************************************************************************

Page 46: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Phosphate TransportersARCFUSYNSPTHEMAAQUAEMETJAMCYTUMCYTUVIBCHECOLIDEIRA_ORF00198DEIRA_ORFA00139DEIRA_ORF00510

Page 47: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Levels of Paralogy Within A Genome• All

– All members of a gene family are linked together

• Top matches– Only top matching pairs are linked together.

Therefore, if in a large gene family, only the pair from the most recent duplication event is included

• Recent– Operational definition based on comparison to other

species. Only pairs which are more similar to each other than to selected other species are included.

Page 48: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

C. pneumoniae Paralogs - All

0

250000

500000

750000

1000000

1250000

Subject Orf Position

0 250000 500000 750000 1000000 1250000

Query Orf Position

Page 49: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

C. pneumoniae Paralogs - Top

0

250000

500000

750000

1000000

1250000

Subject Orf Position

0 250000 500000 750000 1000000 1250000

Query Orf Position

Page 50: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

C. pneumoniae Paralogs – Recent

0

250000

500000

750000

1000000

1250000

Subject Orf Position

0 250000 500000 750000 1000000 1250000

Query Orf Position

Page 51: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Uses of Phylogenomics IV:

Genetic Exchange within Genomes

Page 52: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Circular Maps

Page 53: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Page 54: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Uses of Phylogenomics V:

Gene Loss

Page 55: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Why Gene Loss is Useful to Identify

• Indicates that gene is not absolutely required for survival

• Helps distinguish likelihood of gene transfers

• Correlated loss of same gene in different species may indicate selective advantage of loss of that gene

• Correlated loss of genes in a pathway indicates a conserved association among those genes

Page 56: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

EuksArchBacteriaLossEvolutionary Origin of GeneMTMJSCHSAADRTABSMGMPBBTPHPHIECSSMTPresence ( ) or Absence of GeneSpecies AbbreviationKingdom

Example of Tracing Gene Loss

TIGRTIGR

Page 57: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

51234E. coliH. influenzaeN. gonorrhoeaeH. pyloriSyn. spB. subtilisS. pyogenesM. pneumoniaeM. genitaliumA. aeolicusD. radioduransT. pallidumB.burgdorferiA. aeolicusS pyogenesB. subtilisSyn. spD. radioduransB. burgdorferiSyn. spB. subtilisS. pyogenesA. aeolicusD. radioduransB. burgdorferiMutS2MutS1A.B.GeneDuplication

GeneDuplication

Ancient Duplication in MutS Family

Page 58: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Loss of MMR

• Lost in many pathogen species• Mechanism of loss

– gene deletion (e.g., M. tuberculosis, H. pylori)– frameshifts (e.g., N. meningitidis, S.

pneumoniae)– some species have evolved systems to turn

MMR on and off depending on conditions (e.g., E. coli)

Page 59: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Need for Phylogenomics Example:Gene Duplication and Loss

• Genome analysis required to determine number of homologs in different species

• Evolutionary analysis required to divide into orthology groups and identify gene duplications

• Genome analysis is then required to determine presence and absence of orthologs

• Then loss of orthologs can be traced onto evolutionary tree of species

Page 60: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Uses of Phylogenomics VI:

Specialization

Page 61: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Circular Maps

Page 62: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Species Distribution of Homologs of D. radiodurans Genes

01020304050600510152005010015005101520Number of Species With High Hits050100150200250Frequency05101520Papa BearMama BearBaby Bear010020030040050005101520E. coli

Page 63: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Specialized Genetic Elements (Chromosome II and Megaplasmid)

• Many two component systems• Nitrogen metabolism• LexA• Ribonucleotide reductase• UvrA2• Many transcription factors (e.g., HepA)• Iron metabolism

Page 64: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Uses of Phylogenomics VII:

Genome Rearrangements

Page 65: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

V. cholerae vs. E. coli All Hits

0

1000000

2000000

3000000

4000000

5000000

E. coli

Coordinates

0 1000000 2000000 3000000

V. cholerae Coordinates

Page 66: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

V. cholerae vs. E. coli Top Hits

0

1000000

2000000

3000000

4000000

5000000

E. coli

Coordinates

0 1000000 2000000 3000000

V. cholerae Coordinates

Page 67: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

V. cholerae vs. E. coliOnly if EC-Orf is Closest in All Genomes

0

1000000

2000000

3000000

4000000

5000000

E. coli

Coordinates

0 1000000 2000000 3000000

V. cholerae Coordinates

Page 68: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

V. cholerae vs. E. coli Proteins Top

0

1000000

2000000

3000000

4000000

V. cholerae ORF Coordinates

Page 69: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

S. pneumoniae vs. S. pyogenes DNA F+R0500000100000015000002000000BSP vs Spyo

Page 70: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

M. tuberculosis vs. M. leprae DNA

0

1000000

2000000

3000000

4000000

M1

Page 71: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Duplication and Gene Loss Model

A

B

CD

E

F

A

B

CD

E

F

A

B

CD

E

F

A

B

C

D

EF

A’

B’

C’

D’

E’F’

A

B

C

D

EF

A’

B’

C’

D’

E’F’

A

C

D

F

A’

B’

E’

E. coliE. coli

B

C

D

F

A’

B’

D’

E’

V. cholerae

A

B

C

D

EF

A’

B’

C’

D’

E’F’

Page 72: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

V. cholerae vs. E. coli Proteins Top

0

1000000

2000000

3000000

4000000

V. cholerae ORF Coordinates

Page 73: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR C. trachomatis MoPn

C. p

neum

onia

e A

R39

Origin

Termination

C. trachomatis vs C. pneumoniae Dot Plot

Page 74: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

B1

A1

B2

A2

B3

A3

A2

A1 A2

A3

B2

B1

B3

B2

2423

2221

2019

1817161514

1312

11109

67258

2627

2829

301 2 3

45

3132

B1

3132

6789

1011

1213

1415161718

1920

2122

2324252627

2829

301 2 3

45

3132

B3 2423

2221

2019

1817161514

1312

11109

67258

2627

2829

33231 30

45

2 1

A1

3132

6789

1011

1213

1415161718

1920

2122

2324252627

2829

301 2 3

45

3132

A2

3132

6789

1011

1213

1918171615

1420

2122

2324252627

2829

301 2 3

45

3132

A3

2

6789

1011

1213

1918171615

1420

2122

2324252627

54

3 31 3029

28

1 32

B2

Inversion Around Terminus (*)

Inversion Around Terminus (*)

Inversion AroundOrigin (*)

Inversion AroundOrigin (*)

* *

* *

* *

* *

Figure 4

Common Ancestor of

A and B

3132

6789

1011

1213

1415161718

1920

2122

2324252627

2829

301 2 3

45

3132

Page 75: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Uses of Phylogenomics VIII:

Horizontal Gene Transfer and Species Evolution

Page 76: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Vertical Inheritance

Page 77: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Examples of Horizontal Transfers

• Antibiotic resistance genes on plasmids• Insertion sequences• Pathogenicity islands• Toxin resistance genes on plasmids• Agrobacterium Ti plasmid• Viruses and viroids• Organelle to nucleus transfers

Page 78: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Why Gene Transfers Are Useful to Identify

• Laterally transferred genes frequently involved in environmental adaptations and/or pathogenicity

• Helps identify transposons, integrons, and other vectors of gene transfer

• Helps identify species associations in the environment

Page 79: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Steps in Lateral Gene Transfer

1

2

3-5

6

A B C D

Page 80: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

How to Infer Gene Transfers

• Unusual distribution patterns

• Unusual nucleotide composition

• High sequence similarity to supposedly distantly related species

• Unusual gene trees

• Observe transfer events

Page 81: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

E. coli and S. typhimurium TransferE. coliS. typhimuriumOld ModelE. coliS. typhimuriumNew Model

Page 82: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Archaeal genes in bacterial genomesArchaeal genes in bacterial genomes**

Bacterial speciesBacterial species Best hits to ArchaealBest hits to Archaeal

Thermotoga maritimaThermotoga maritima 451 (24%)451 (24%)

Aquifex aeolicusAquifex aeolicus 246 (16%)246 (16%)

SynechocystisSynechocystis sp. sp. 126 (4%)126 (4%)

Borrelia burgdorferiBorrelia burgdorferi 45 (3.6%)45 (3.6%)

Escherichia coliEscherichia coli 99 (2.3%)99 (2.3%)

** 1010-5-5 over 60% of sequence over 60% of sequence

Page 83: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Evidence for lateral gene transfer in Evidence for lateral gene transfer in ThermotogaThermotoga

1. 81 archaeal-like genes are clustered in 15 regions which range in size from ~ 4 to 20 kb; many share conserved gene order with their archaeal counterparts.

2. Many of the archaeal-like genes correspond to regions with a significantly different base composition than the rest of the chromosome.

3. Some of these regions are associated with a 30 bp repeat structure found only in thermophiles.

4. Initial phylogenetic analyses of some of these genes lends support to lateral gene transfer.

Page 84: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

0987 09900989ThermotogaThermotoga ORF ORF

Archaea homologArchaea homolog

Bacterial homologBacterial homolog

Eukaryote homologEukaryote homolog

ThermotogaThermotoga ORF ORF

Archaea homologArchaea homolog

Bacterial homologBacterial homolog

Eukaryote homologEukaryote homolog

0988 0991 0992 0993 0994

0995 0996 0997 0998 0999 1000 10021001 1003

Region TM00987 - TM1003 ( 21kb Archaea-like stretch)Region TM00987 - TM1003 ( 21kb Archaea-like stretch)

79% 69% 69% 72%

72% 69% 65%61% 78%

72%

TransposonTransposon

54%

48%

68% 51%

73%

73%

Regulatory proteinRegulatory protein

Page 85: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

0

100

200

300

400

500

600

700

500 1000 1500 2000 2500 3000 3500 4000 4500

Orfs in Target Genome

Best Matches

Best Matches to Prokaryotes

CAUCR BACSU

ECOLI

MYCTU

SYNSP

Page 86: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

A. thaliana T1E2.8 is aChloroplast Derived HSP60ARATH -T1E2.8**********ECOLHAEINVIBCHVIBCHRICPR YEASTCHLPNCHLTRAQUAECAMJEHELPYBBURTREPATHEMA BACSUDEIRAMCYTU MCYTU SYNSPSYNSPODONT CPSTMYCGEMYCPNCHLPNCHLTRCHLPN CHLTR ARCFUARCFUMETJAPYRHOMETTH METTHYEAST YEASTYEASTYEAST CELEGYEASTYEASTYEASTCELEG YEAST YEAST CELEGYEASTCELEG CELEGEukaryaArchaeaBacteriaCyano/Cpst

Page 87: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Organellar HSP60sDROMECG12101DROMECG7235DROMECG2830DROMECG16954ARATH At2g33210ARATH F14O13.19ARATH MCP4.7YEAST SWCAUCR ORF03639RICPR gi|3861167ECOLI gi|1790586NEIMEb gi|7227233.AQUAE gi|2984379CHLPN gi|4376399|DEIRA ORF02245BACSU gi|2632916SYNSP gi|1652489SYNSP gi|1001103ARATH At2g28000ARATH MRP15.11MCYTU gi|2909515MCYTU gi|1449370THEMA TM0506BBUR gi|2688576TREPA gi|3322286PORGI ORF00933CHLTE ORF00173HELPY gi|2313084MitochondrialFormsα−ΠροτεοΧψανοβαχτεριαΠλαστιδ Φορµσ

Page 88: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

ParA PhylogenypOMB25.BorBBl32.BorbBorbu3Borbu.2BBM32.BorbCP32-6.BorBBA20.BorbCp18.BorbupOMB10.BorpLp7E.BorbBBE19.BorbBBB12.BorbBBN32.BorbBBF13.BorbBBH28.BorbBBK21.BorbBBU05.BorbBBJ17.BorbBBQ08.BorbBBF24.BorbOrfC.BorbuBBG08.BorbPyrabPyrhoYZ24 METJAIncC1.EntaIncC2.EntaINC1 ECOLIINC2 ECOLIOrf.pRK2IncC.pRK2pM3.ParAORF3.PseaeORFB.Psepu2603.Vibch*****ParA.StrcoStrco2Strco3Myctu4Mycle3Deira.ChroSoj.TrepaSOJ BACSURicprYGI1 PSEPUParA.CaucrpAG1.CorglMycleMycle2Rv1708.MycStrcoRv3213.MycHelpy99Helpy26695

A00900.Vib*****ParB.pR27.

ParA.pMT1.parA.pMT1parA.phageParA phageORFA00900

SOPA ECOLIF-PlasmidPhageN13pCD1.YerpepCD1#2.YerpYVe227.YepNL1.SpharpQPH1.Coxbp42d.Rhilep42d.RhietREPA AGRRApRiA4b.AgrpTiB6S3.AgpTi-SAKURApRL8JI.RhiY4CK PlasmParA.RaleupL6.5.PsefChr2.DeiraMP1#2.DeirMP1.DeiraPX02.BacanORF298.CloSojC.HalspBorbu4sojD.Halspplasmid.StSojB.HalspParA.RhoerSOJ MYCPNSOJ MYCGEMinD2.PyraPyrho2pK214.LaclPatA.synspDeira.ParApCHL1.Chlt2GP5D CHLTRpCHL1.ChltChltrChlpsChlps2ChlpnChltr2Chlpn2

Chromosomal

Plasmid and Phage

BBQ08.Borb

Chlamydial

Inc

Borrelia Plasmids

Archaea

Misc

Evolution of Chromosome Partitioning Proteins (ParA)

Page 89: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Horizontal Gene Transfer II

Page 90: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Reconciling a Tree of Life in the Context of Lateral Gene Transfer

Page 91: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

rRNA Tree of Complete GenomesMycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansDrosophila melanogasterSaccharomyces cerevisiaeMethanobacterium thermoautotrophicumArchaeoglobus fulgidusPyrococcus horikoshiiMethanococcus jannaschiiAeropyrum pernixAquifex aeolicusThermotoga maritimaDeinococcus radioduransTreponema pallidumBorrelia burgdorferiHelicobacter pyloriCampylobacter jejuniNeisseria meningitidisEscherichia coliVibrio choleraeHaemophilus influenzaeRickettsia prowazekiiMycoplasma pneumoniaeMycoplasma genitaliumChlamydia trachomatisChlamydia pneumoniae0.05 changesArchaeaBacteriaEukarya

Page 92: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Whole Genome Phylogeny

Page 93: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

rRNA vs. Whole Genome Trees

Mycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansDrosophila melanogasterSaccharomyces cerevisiaeMethanobacterium thermoautotrophicumArchaeoglobus fulgidusPyrococcus horikoshiiMethanococcus jannaschiiAeropyrum pernixAquifex aeolicusThermotoga maritimaDeinococcus radioduransTreponema pallidumBorrelia burgdorferiHelicobacter pyloriCampylobacter jejuniNeisseria meningitidisEscherichia coliVibrio choleraeHaemophilus influenzaeRickettsia prowazekiiMycoplasma pneumoniaeMycoplasma genitaliumChlamydia trachomatisChlamydia pneumoniae0.05 changesArchaeaBacteriaEukarya

Page 94: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Outline of PhylogenomicsGene Evolution EventsPhenotype PredictionsDatabaseSpecies treePresence/AbsenceGene treesCongruenceEvol. DistributionF(x) PredictionsPathway Evolution

TIGRTIGR

Page 95: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Evolutionary Genome Scanning• Distribution patterns/phylogenetic profiles

• Patterns of evolution (ds/dn, correlations, constraints)

• Lateral gene transfers (organellar genes, Pathogenicity islands)• Subdividing gene families• Functional predictions (gene trees, PG profiles)• Gene duplications• Gene loss

• Specialization

• Comparing close relatives

• Species evolution

Page 96: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Evolutionary Diversity Still Poorly Represented in Complete Genomes

Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85

Bacteria Archaea Bacteria Archaea A. rRNA tree of Bacterial and Archaeal Major Groups B. Groups with Completed Genomes Highlighted

Page 97: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

True Phylogenetic Methods Work Best

MutS2.SynsMutS2.BacsMutS2.HelpMutS2.DeirMutsl.MettMSH4.CelegMSH4.YeastMSH4.humanmMutS.SacoMSH3.yeastC23C11.SpoMSH1.YeastMSH3.HumanREP1.MouseGTBP.MouseGTBP.HumanMSH6.YeastMSH5.HumanMSH5.CelegMSH5.YeastMSH2.HumanMSH2.MouseMSH2.YeastMutS.EcoliMutS.SynspMutS.DeiraMutS.Bacsu

MutS.EcoliMutS.SynspMutS.BacsuMutS.DeiraMSH2.HumanMSH2.MouseMSH2.YeastMSH3.HumanREP1.MouseGTBP.MouseGTBP.HumanMSH6.YeastC23C11.SpoMSH1.YeastMSH3.yeastMSH4.CelegMSH4.humanMSH5.CelegMSH5.YeastmMutS.SacoMSH5.HumanMSH4.YeastMutS2.SynsMutS2.BacsMutS2.DeirMutS2.HelpMutsl.Mett

UPGMANeighbor-Joining

Page 98: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Acknowledgements

• Genome duplications: S. Salzberg, J. Heidelberg, O. White, A. Stoltzfus, J. Peterson

• Genome sequences and analysis: J. Heidelberg, T. Read, H. Tettelin, K. Nelson, J. Peterson, R. Fleischmann, D. Bryant

• Horizontal transfers: K. Nelson, W. F. Doolittle

• TIGR: C. Fraser, J. Venter, M-I. Benito, S. Kaul, Seqcore

• $$$: DOE, NSF, NIH, ONR

Page 99: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Evolutionary Diversity Still Poorly Represented in Complete Genomes

Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85

Bacteria Archaea Bacteria Archaea A. rRNA tree of Bacterial and Archaeal Major Groups B. Groups with Completed Genomes Highlighted

Page 100: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Page 101: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

TIGTIGRR

Other Other peoplepeople

Mom and DadMom and Dad

S. KarlinS. Karlin

M. FeldmanM. Feldman

A. M. CampbellA. M. Campbell

R. FernaldR. FernaldR. ShaferR. Shafer

D. AckerlyD. AckerlyD. GoldsteinD. Goldstein

M. EisenM. Eisen

J. CourcelleJ. Courcelle

R. MyersR. Myers

C. M. CavanaughC. M. Cavanaugh

P. HanawaltP. Hanawalt

NSFNSF

J. HeidelbergJ. Heidelberg

T.ReadT.Read

S. KaulS. Kaul

M-I BenitoM-I Benito

J. C. VenterJ. C. VenterC. FraserC. Fraser

S. SalzbergS. Salzberg

O. WhiteO. White

K. NelsonK. Nelson

$$$$$$

ONRONRDOEDOE

NIHNIHH. TettelinH. Tettelin

Page 102: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Uses of Phylogenomics IX:

Evolution Within Species

Page 103: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

M. tuberculosis strain phylogeny (Indels)

Page 104: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Musser-Type Evolution (Indel Phylogeny)

98a

107a

43a

73a

105a

133a

114a

169a

218a

290a

160a

159a

13a

18a

26a

30a

32a

53a

58a

70a

96a

97a

100a

124a

204a

208a

236a

239a

249a

286a

99a

279a

205a

304a

54a

155a

165a

CD

C15

51a

223a

110a

122a

245a

313a

36a

40a

71a

79a

168a

254a

283a

312a

4a 12a

41a

42a

52a

77a

187a

214a

81a

129a

274a

220a

64a

48a

55a

60a

72a

80a

83a

85a

89a

91a

95a

111a

170a

171a

182a

212a

219a

225a

244a

278a

301a

195a

2a 123a

207a

306a

69a

94a

101a

102a

112a

113a

121a

132a

211a

222a

235a

250a

284a

285a

N1a

87a

117a

120a

136a

191a

237a

261a

37a

131a

269a

240a

63a

197a

206a

75a

108a

263a

128a

172a

162a

86a

38a

109a

119a

248a

6a 65a

68a

189a

66a

106a

227a

31a

78a

202a

213a

62a

163a

224a

256a

276a

287a

173a

291a

252a

281a

295a

310a

251a

151a

188a

292a

140a

141a

103a

174a

229a

259a

H37

Rv

88a

44a

74a

76a

126a

282a

166a

210a

84a

Page 105: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Consistency Indices (Indel Phylogeny)

Calculated over stored trees

CI

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

maximum

average

minimum

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201

Character

Page 106: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Page 107: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Phylogenomics I:Presence/Absence of Homologs

• Important to have complete genomes

• Similarity searches with high “homology threshold” (to prevent false positives)

• Iterative searches (to prevent false negatives)

• Multiple sequence alignments to confirm assignment of homology and to divide up multi-domain proteins

Page 108: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Phylogenomics II:Phylogenetic Analysis of Homologs

• Multiple sequence alignment

• Mask alignment (exclude certain regions)– ambiguous regions of alignment– hypervariable regions and regions with large gaps

• Phylogenetic tree with method of choice

• Robustness checks– bootstrapping– compare trees with different alignments– compare trees with different tree-building methods

Page 109: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Phylogenomics III:Inferring Evolutionary Events

• Infer evolutionary distribution patterns (overlay presence/absence onto species tree)

• Compare gene tree vs. species tree

• Compare gene tree vs. evolutionary distribution

• Infer gene duplication and transfer events

• Combine gene transfer and duplication information with evolutionary distribution analysis to infer gene loss, gene origin, and timing of gene duplications and transfers

Page 110: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Phylogenomics IV:Functional Predictions and Evolution• Overlay experimentally determined functions

onto gene tree

• Infer changes in function– many changes suggests caution should be used in

making new predictions

• Predict functions based on position in tree relative to genes with known functions and based on orthology groups

Page 111: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Phylogenomics V:Pathway Analysis

• Correlated presence/absence of all genes in pathway in different species?– If not, maybe non-orthologous gene displacement– Alternatively, pathway may be different between species

• Correlated evolutionary events for genes in pathway– loss of all genes at once– correlated duplications?

• Compare evolution of function between pathways – The number of times an activity has evolved helps in making

predictions of function/phenotype

Page 112: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Steps in Phylogenomic Analysis

• Create database of genes of interest

• Presence/absence of homologs in complete genomes

• Phylogenetic trees of each gene family

• Infer evolutionary events (gene origin, duplication, loss and transfer)

• Refine presence/absence (orthologs, paralogs, subfamilies)

• Functional predictions and functional evolution

• Analysis of pathways

Page 113: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Evolution as a Screening Method

• Gene duplications

• Gene loss

• Lateral gene transfers

• Organellar genes

• Structurally constrained genes

• Correlated evolutionary changes

Page 114: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Evolutionary Genome Scanning• Distribution patterns/phylogenetic profiles• Patterns of evolution

– (ds/dn)

– Structurally constrained genes– Correlated evolutionary changes

• Lateral gene transfers– Organellar genes– Pathogenicity islands

• Subdividing gene families– Orthologs vs paralogs– Functional predictions– Subfamilies– Motif identification

• Gene duplications

• Gene loss

Page 115: Phylogenomics talk in 2000 at University of Maryland by J. Eisen

TIGRTIGR

Genome Sequences Allow “Hypothesisless Research”

• DNA microarrays• Proteomics• GC skew and other nucleotide composition

analyses• Parallel genome wide genetic experiments• Evolutionary genome scanning• Phylogenetic profiles