jonathan eisen talk on "phylogenomics of microbes" at lake arrowhead small genomes meeting...
DESCRIPTION
Talk by Jonathan Eisen on Phylogenomics of microbes at Lake Arrowhead Small Genomes meeting in 2004.TRANSCRIPT
TIGRTIGRTIGRTIGR
The Axis of Evol:
Uncultured Organisms, Phylogenetic Anchorsand the Tree of Life
TIGRTIGRTIGRTIGR
Famous Arrowhead 2004 Quotes
• Space-time continuum of genes and genomes
• Gene sequences are the wormhole that allows one to tunnel into the past
• The human mind can conceive of things with no basis in physical reality
• Thoughts can go faster than the speed of light
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
TIGRTIGRTIGRTIGR
TIGRTIGRTIGRTIGR
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
TIGRTIGRTIGRTIGR
Outline
• Three phylogenomic tales, each with discovery and woe– Uncultured organisms I: complete genomes of
symbionts– Uncultured organisms II: incomplete genomes
and phylogenetic anchors– Predicting gene function
• One approach that can help with all the woe
TIGRTIGRTIGRTIGR
Phylogenomic Tale I:Sequencing Symbiont Genomes
TIGRTIGRTIGRTIGR
Symbiont Genome Sequencing
shotgunshotgun
sequencesequence
Warner Brothers, Inc.Warner Brothers, Inc.
TIGRTIGRTIGRTIGR Wu et al., 2004
Wolbachia pipientis wMel
TIGRTIGRTIGRTIGR
Wolbachia Primer• Intracellular bacteria related to Rickettsias (e.g., agent of
Typhus)• Found in many invertebrate species including insects,
Arachnids, isopods, and nematodes• Transmitted vertically from mother to offspring, like
mitochondria• Many Wolbachia inhibit male survival or reproduction, to
promote transmission through females.• Wolbachia that infect parasitic filarial nematodes (e.g.,
Brugia malayi) are needed by the host. Treatment of people infected with these nematodes with bacterial antibiotics kills the Wolbachia and helps kill the nematode.
TIGRTIGRTIGRTIGR
Wolbachia Mobile/Repetitive DNA
RepeatClass
Size(Median)
Copies Protein motifs/families IS Family Possible Terminal Inverted Repeat Sequence
1 1512 3 Transposase IS4 5’ ATACGCGTCAAGTTAAG 3’2 360 12 - New 5’ GGCTTTGTTGCAT CGCTA 3’3 858 9 Transposase IS492/IS110 5’ GGCTTTGTTGCAT 3’4 1404.5 4 Conserved hypothetical,
phage terminaseNew 5’ ATACCGCGAWTSAWTCGCGGTAT 3’
5 1212 15 Transposase IS3 5’ TGACCTTACCCAGAAAAAGTGGAGAGAAAG 3’6 948 13 Transposase IS5 5’ AGAGGTTGTCCGGAAACAAGTAAA 3’7 2405.5 8 RT/maturase -8 468 45 - -9 817 3 conserved hypothetical,
transposaseISBt12
10 238 2 ExoD -11 225 2 RT/maturase -12 1263 4 Transposase ???13 572.5 2 Transposase ??? None detected14 433 2 Ankyrin -15 201 2 - -16 1400 6 RT/maturase -17 721 2 transposase IS63018 1191.5 2 EF-Tu -19 230 2 hypothetical -
Wu et al., 2004
TIGRTIGRTIGRTIGR
TIGRTIGRTIGRTIGR
TIGRTIGRTIGRTIGR
TIGRTIGRTIGRTIGR
e S9. Ankyrin repeats in wMel proteins
Name Locus AnnotationNumber of
RepeatsSignalPeptide
HighlyExpressed
00505 WD0035 ankyrin repeat domain protein 600554 WD0073 ankyrin repeat domain protein 500647 WD0147 ankyrin repeat domain protein 1100700 WD0191 ankyrin repeat domain protein 100821 WD0285 prophage LambdaW1, ankyrin repeat domain protein 3 Y00822 WD0286 prophage LambdaW1, ankyrin repeat domain protein 300827 WD0291 prophage LambdaW1, ankyrin repeat domain protein 500828 WD0292 prophage LambdaW1, ankyrin repeat domain protein 200830 WD0294 ankyrin repeat domain protein 900948 WD0385 ankyrin repeat domain protein 1101014 WD0438 ankyrin repeat domain protein 201017 WD0441 ankyrin repeat domain protein 1 Y01081 WD0498 ankyrin repeat domain protein 901104 WD0514 ankyrin repeat domain protein 601148 WD0550 ankyrin repeat domain protein 601171 WD0566 ankyrin repeat domain protein 201204 WD0596 prophage LambdaW4, ankyrin repeat domain protein 901255 WD0633 prophage LambdaW5, ankyrin repeat domain protein 401260 WD0636 prophage LambdaW5, ankyrin repeat domain protein 2 Y01261 WD0637 prophage LambdaW5, ankyrin repeat domain protein 3 Y01402 WD0754 ankyrin repeat domain protein 201414 WD0766 ankyrin repeat domain protein 800332 WD1213 ankyrin repeat domain protein, putative 1 Y
TIGRTIGRTIGRTIGR Wu et al., 2004
TIGRTIGRTIGRTIGR
Selection Apparently Inefficient in wMel
• Likely not due to higher mutation rate– Full suite of DNA repair genes
• Likely not due to low amounts of homologous recombination– RecA present
– Population studies suggest homologous recombination occurs
• Wolbachia has multiple types of bottlenecks– Maternal transmission like obligate mutualists
– Infectious sweeps of cytoplasmic incompatibility like pathogens
Wu et al., 2004
TIGRTIGRTIGRTIGR
• Sap feeding insect
•Carriers of Xylella fastidiosa that causes Pierce’s disease of grapevines
•Listed as potential Agro-Terrorism Agent
•There are >20000 sharpshooter species, within which intracellular symbiotic bacteria are wildspread
Glassy-winged Sharpshooter
Baumannia cicadellinicola genome project:1° symbionts of the Glassy-winged Sharpshooter
TIGRTIGRTIGRTIGR
400,000
100,000
200,000
300,000
500,000
600,000
1
Baumania Genome Completed
Collaboration between Jonathan Eisen and Nancy Moran (U. Arizona). Analysis led by Dongying Wu in Eisen’s group.
TIGRTIGRTIGRTIGR
“Whole genome” tree of insect endosymbionts
TIGRTIGRTIGRTIGR
Bu_APS Bu_Bp Bu_Sg Wi_gl Ca_bl gebc recG&ruvABC(recombinate repa ir) - - - - - recG&ruvABC methy -direct DNA repair(correct T -G) mutSL mutSLH mutSL - - mutSLH 8-Oxo_ dGTP prevention(mutT, MutY||MutM ) mutTY mutY mutTY - mutTY mutTM uvrD(or homolog rep) rep rep rep uvrD - uvrD recA - - - recA - recA phrB(UV pyrimidine dimer) phrB phrB - phrB - - recBCD recBCD recBCD recBCD recBCD recBCD recBCD mutL(mismatch repair) mutL mutL mutL - - mutL recJ(rec based exision,methy -direct repair) - - - recJ - recJ transcription -repair coupling factor(mfd) mfd mfd - mfd - - uracil -DNA -glycosylase(remove U from DNA) ung ung - ung ung ung site -specific DNA inversion stimulation(fis) fis - fis - - - dna recombination protein rmuc (inversion) - - - - rmuc rmuc
DNA repair genes
TIGRTIGRTIGRTIGR
PEP + erythrose 4-phosphate aroEaroB aroDaroH aroK aroA aroC chorismate
pabB pabA pabC
GTPfolB folKfolE
folC folAfolPfolate
glyceraldehyde-3-phosphate+pyruvate
dxs thiG thiH
L-cysteinethiS thiF thiI
thiS-COSH
L-tyrosine
5’-phosphoribosyl-5-aminoimidazolethiC thiD
thiE ?thiamine phosphate
thiamine
unknownprecursor
bioAbioFbioC bioH bioD bioBpimeloyl coA biotin
L-glutamate + L-cysteine
gshBgshAglutathione
L-glutamategltX hemA hemL hemB hemC hemD hemE
hemNhemG hemH cyoE
cysG
porphobilinogen uroporphyrinogen III coproporphyrinogen III
siroheme
proporphyrinogen protoheme Heme OSuccinyl CoA+ Glycine
Delta-aminoevulinicacid
TIGRTIGRTIGRTIGR
Phylogenomic Tale II: Phylogenetic Anchors
TIGRTIGRTIGRTIGR
rRNA and Uncultured Microbes
Eisen et al. 1992
TIGRTIGRTIGRTIGR
Phylogenetic Anchors
Beja et al., 2000
TIGRTIGRTIGRTIGR
TIGRTIGRTIGRTIGR
GlutamateArgA ArgB ArgC ArgD ArgE
OmithineAarAB ArgF ArgG ArgH
Arg
PyruvateIvHI IlvC IlvD IlvE
Val
Pyruvate+
Alpha-Ketobutyrate
ThreonineIle
IlvA
PEP +Erythrose 4-phosphate
AroH AroB AroD AroE AroK AroA AroCChorismate
PheA HisC Phe
TrpEG TrpD TrpC TrpABTrp
AspartateThrA Asd ThrA
HomoserineThrB ThrC
MetB MetC MetE
Thr
MetDapA DapB DapD DapC DapE DapF LysA Lys
PRPP+ATPHisG HisI HisA HisHF HisB HisC HisB HisD
His
Essential amino acid biosynthetic pathways
TIGRTIGRTIGRTIGR
9359 clones that are not included in the final assembly
Run_TA
7152 assembles (400 have been assembled)
<1kb 6996
1kb-2kb 125
2kb-3kb 18
3kb-4kb 6
4kb-5kb 3
5kb-6kb 2
6kb-7kb 1
7kb-8kb 1
Another Symbiont Present
150 Bacteroides/Chlorobi (njtree/blast)
TIGRTIGRTIGRTIGR
AspartateThrA Asd
HomoserineThrB ThrC
MetB MetC MetE
Thr
MetDapA DapB DapD DapC DapE DapF LysA Lys
TIGRTIGRTIGRTIGR
Phylogenetic Anchors and the Sargasso Sea Shotgun Sequencing
shotgunshotgun
sequencesequence
Warner Brothers, Inc.Warner Brothers, Inc.
TIGRTIGRTIGRTIGR
rRNA as a Phylogenetic Anchor
Venter et al., 2004
TIGRTIGRTIGRTIGR
Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA)
Venter et al., 2004
TIGRTIGRTIGRTIGR
Sargasso Phylotypes
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
AlphaproteobacteriaBetaproteobacteriaGammaproteobacteriaEpsilonproteobacteria
Deltaproteobacteria
CyanobacteriaFirmicutes
Actinobacteria
Chlorobi
CFB
ChloroflexiSpirochaetesFusobacteria
Deinococcus-Thermus
EuryarchaeotaCrenarchaeota
Major Phylogenetic Group
Weighted % of Clones
EFG
EFTu
HSP70
RecA
RpoB
rRNA
Venter et al., 2004
TIGRTIGRTIGRTIGR
Phylogenomic Tale III: Functional Prediction
TIGRTIGRTIGRTIGR
Shotgun Sequencing Detects More Diversity than PCR-methods
Venter et al., 2004
TIGRTIGRTIGRTIGR
Functional Diversity of Proteorhodopsins?
Venter et al., 2004
TIGRTIGRTIGRTIGR
Deinococcus radiodurans
TIGRTIGRTIGRTIGR
DNA Repair Genes in D. radiodurans Complete Genome
Process Genes in D. radiodurans
Nucleotide Excision Repair UvrABCD, UvrA2Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths,
MPGAP Endonuclease XthMismatch Excision Repair MutS, MutLRecombination Initiation Recombinase Migration and resolution
RecFJNRQ, SbcCD, RecDRecARuvABC, RecG
Replication PolA, PolC, PolX, phage PolLigation DnlJdNTP pools, cleanup MutTs, RRaseOther LexA, RadA, HepA, UVDE, MutS2
TIGRTIGRTIGRTIGR
Problem:
List of DNA repair gene homologs in D. radiodurans genome is not significantly different from other
bacterial genomes of the similar size
TIGRTIGRTIGRTIGR
0.1
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree based on Hugenholtz (2002) with some modifications.
~40 Phyla of Bacteria
TIGRTIGRTIGRTIGR
0.1
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree based on Hugenholtz (2002) with some modifications.
Most DNA metabolism studies in two Phyla
TIGRTIGRTIGRTIGR
0.1
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree based on Hugenholtz (2002) with some modifications.
Deinococcus is very distant from well studied groups
TIGRTIGRTIGRTIGR
-Ogt-RecFRQN-RuvC-Dut-SMS
-PhrI-AlkA-Nfo-Vsr-SbcCD-LexA-UmuC
-PhrI-PhrII-AlkA-Fpg-Nfo-MutLS-RecFORQ-SbcCD-LexA-UmuC-TagI
-PhrI-Ogt-AlkA-Xth-MutLS-RecFJORQN-Mfd-SbcCD-RecG-Dut-PriA-LexA-SMS-MutT
-PhrI-PhrII?-AlkA-Fpg-Nfo-RecO-LexA-UmuC
-PhrI-Ung?-MutLS-RecQ?-Dut-UmuC
-PhrII-Ogg
-Ogt-AlkA-TagI-Nfo-Rec-SbcCD-LexA
-Ogt-AlkA-Nfo-RecQ-SbcD?-Lon-LexA
-AlkA-Xth-Rad25?
-AlkA-Rad25
-Nfo
-Ogt-Ung-Nfo-Dut-Lon
-Ung
-PhrII
-PhrI
Ecoli
Haein
Neig
o
Help
y
Bacsu
Str
py
Mycg
e
Mycp
n
Borb
u
Tre
pa
Syn
sp
Metj
n
Arc
fu
Mett
h
Hu
man
Yeast
BACTERIA ARCHAEA EUKARYOTES
from mitochondria
+Ada+MutH+SbcB
dPhr
+TagI?+Fpg
+UvrABCD+Mfd
+RecFJNOR+RuvABC
+RecG+LigI
+LexA+SSB
+PriA+Dut?
+Rus+UmuD
+Nei?+RecE
tRecT?
+Vsr+RecBCD?
+RFAs+TFIIH
+Rad4,10,14,16,23,26+CSA
+Rad52,53,54+DNA-PK, Ku
dSNF2dMutSdMutLdRecA
+Rad1+Rad2
+Rad25?+Ogg+LigII
+Ung?+SSB,
+Dut?
+PhrI, PhrII+Ogt
+Ung, AlkA, MutY-Nth+AlkA
+Xth, Nfo?+MutLS?
+SbcCD+RecA
+UmuC+MutT
+LondMutSI/MutSII
dRecA/SMSdPhrI/PhrII
+Sprt3MG
+Rad7+CCE1
+P53dRecQ
dRad23+MAG?
-PhrII-RuvC
tRad25
+TagI?
+RecT
tUvrABCD
tTagI ?
Gain and Loss of Repair Genes
TIGRTIGRTIGRTIGR Eisen and Hanawalt, 1999
TIGRTIGRTIGRTIGR
Non-Homology Prediction: Phylogenetic Profiles
• Step 1: Search all genes in organisms of interest against all other genomes
• Ask: Yes or No, is each gene found in each other species
• Cluster genes by distribution patterns (profiles)
TIGRTIGRTIGRTIGR
Carboxydothermus hydrogenoformans
• Isolated in Yellowstone• Thermophile (grows at 80°C)• Anaerobic• Grows on CO (Carbon Monoxide)• Produces hydrogen gas• Low GC gram postive species• Many Archaeal-like genes
TIGRTIGRTIGRTIGR
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
TIGRTIGRTIGRTIGR
IV. Phylogenetic Based Genome Sequencing
TIGRTIGRTIGRTIGR
Reminders
1.Phylogenetic anchors don’t work if you do not have data from across the tree (e.g., Sargasso study limited)
2.Non homology methods rely on diverse genomes
3.Novel processes need to be studied in novel lineages
TIGRTIGRTIGRTIGR
Biased Sampling of Bacterial Genomes
• Of 40 bacterial phyla, most genome sequences come from only 3 groups
• Sargasso Sea Study Limited By Poor Sampling of Species for Data in Genbank
• Difficult to figure out what types of species may be present except when analyzing rRNA genes
Hugenholtz 2002
TIGRTIGRTIGRTIGR
0.1AcidobacteriaBacteroidesFibrobacteres GemmimonasVerrucomicrobia PlanctomycetesChloroflexiProteobacteriaChlorobi FirmicutesFusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus AquificaeThermotogaeTM6OS-KTermite GroupOP8Marine GroupAWS3OP9NKB19OP3OP10TM7OP1OP11NitrospiraSynergistesDeferribacteresThermudesulfobacteriaChrysiogenetesThermomicrobiaDictyoglomusCoprothmermobacterThis projectPublishedIn progressUncultured lineageTree based on Hugenholtz (2002) with some modifications.
• Solution: TIGR Tree of Life Project
– Eisen and Ward, Co-PIs– Selecting genome
projects to increase phylogeneticdiversity
– Supported by NSF Tree of Life Program
– Genomes from 6 new phyla in closure
– More information at http://www.tigr.org/tol
TIGRTIGRTIGRTIGR
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
TIGRTIGRTIGRTIGR
TIGRTIGR
Other peopleOther people
Mom and DadMom and Dad
H. OchmanH. Ochman
W. MartinW. Martin
F. RobbF. Robb
J. BattistaJ. Battista
E. OriasE. Orias
D. BryantD. BryantS. O’NeillS. O’Neill
M. EisenM. Eisen
N. MoranN. Moran
R. MyersR. Myers
C. M. CavanaughC. M. Cavanaugh
P. HanawaltP. Hanawalt
NSFNSF
J. HeidelbergJ. Heidelberg
T.ReadT.Read
N. WardN. Ward
M-I BenitoM-I Benito
J. C. VenterJ. C. Venter C. FraserC. Fraser
S. SalzbergS. Salzberg
O. WhiteO. White
I. PaulsenI. Paulsen
$$$$$$
ONRONR
DOEDOE
NIHNIHH. TettelinH. Tettelin
Eisen GroupEisen Group
Martin WuMartin WuDongying WuDongying WuJames SakwaJames SakwaJonathan BadgerJonathan Badger
TIGRTIGRTIGRTIGR
EvolutionaryMethod
PHYLOGENENETIC PREDICTION OF GENE FUNCTIONIDENTIFY HOMOLOGSOVERLAY KNOWNFUNCTIONS ONTO TREE
INFER LIKELY FUNCTIONOF GENE(S) OF INTEREST
1234563531A2A3A1B2B3B2A1B1A3A1B2B3BALIGN SEQUENCESCALCULATE GENE TREE1246CHOOSE GENE(S) OF INTEREST2A2A53Species 3Species 1Species 211222311A3A1A2A3A1A2A3A464564562B3B1B2B3B1B2B3B ACTUAL EVOLUTION(ASSUMED TO BE UNKNOWN)
Duplication?EXAMPLE AEXAMPLE BDuplication?Duplication?Duplication5 METHODAmbiguous
TIGRTIGRTIGRTIGR
rRNA and Uncultured Microbes
Eisen et al., 1992