jonathan eisen talk on "phylogenomics of microbes" at lake arrowhead small genomes meeting...

Post on 10-May-2015

3.063 Views

Category:

Health & Medicine

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk by Jonathan Eisen on Phylogenomics of microbes at Lake Arrowhead Small Genomes meeting in 2004.

TRANSCRIPT

TIGRTIGRTIGRTIGR

The Axis of Evol:

Uncultured Organisms, Phylogenetic Anchorsand the Tree of Life

TIGRTIGRTIGRTIGR

Famous Arrowhead 2004 Quotes

• Space-time continuum of genes and genomes

• Gene sequences are the wormhole that allows one to tunnel into the past

• The human mind can conceive of things with no basis in physical reality

• Thoughts can go faster than the speed of light

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

TIGRTIGRTIGRTIGR

TIGRTIGRTIGRTIGR

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

TIGRTIGRTIGRTIGR

Outline

• Three phylogenomic tales, each with discovery and woe– Uncultured organisms I: complete genomes of

symbionts– Uncultured organisms II: incomplete genomes

and phylogenetic anchors– Predicting gene function

• One approach that can help with all the woe

TIGRTIGRTIGRTIGR

Phylogenomic Tale I:Sequencing Symbiont Genomes

TIGRTIGRTIGRTIGR

Symbiont Genome Sequencing

shotgunshotgun

sequencesequence

Warner Brothers, Inc.Warner Brothers, Inc.

TIGRTIGRTIGRTIGR Wu et al., 2004

Wolbachia pipientis wMel

TIGRTIGRTIGRTIGR

Wolbachia Primer• Intracellular bacteria related to Rickettsias (e.g., agent of

Typhus)• Found in many invertebrate species including insects,

Arachnids, isopods, and nematodes• Transmitted vertically from mother to offspring, like

mitochondria• Many Wolbachia inhibit male survival or reproduction, to

promote transmission through females.• Wolbachia that infect parasitic filarial nematodes (e.g.,

Brugia malayi) are needed by the host. Treatment of people infected with these nematodes with bacterial antibiotics kills the Wolbachia and helps kill the nematode.

TIGRTIGRTIGRTIGR

Wolbachia Mobile/Repetitive DNA

RepeatClass

Size(Median)

Copies Protein motifs/families IS Family Possible Terminal Inverted Repeat Sequence

1 1512 3 Transposase IS4 5’ ATACGCGTCAAGTTAAG 3’2 360 12 - New 5’ GGCTTTGTTGCAT CGCTA 3’3 858 9 Transposase IS492/IS110 5’ GGCTTTGTTGCAT 3’4 1404.5 4 Conserved hypothetical,

phage terminaseNew 5’ ATACCGCGAWTSAWTCGCGGTAT 3’

5 1212 15 Transposase IS3 5’ TGACCTTACCCAGAAAAAGTGGAGAGAAAG 3’6 948 13 Transposase IS5 5’ AGAGGTTGTCCGGAAACAAGTAAA 3’7 2405.5 8 RT/maturase -8 468 45 - -9 817 3 conserved hypothetical,

transposaseISBt12

10 238 2 ExoD -11 225 2 RT/maturase -12 1263 4 Transposase ???13 572.5 2 Transposase ??? None detected14 433 2 Ankyrin -15 201 2 - -16 1400 6 RT/maturase -17 721 2 transposase IS63018 1191.5 2 EF-Tu -19 230 2 hypothetical -

Wu et al., 2004

TIGRTIGRTIGRTIGR

TIGRTIGRTIGRTIGR

TIGRTIGRTIGRTIGR

TIGRTIGRTIGRTIGR

e S9. Ankyrin repeats in wMel proteins

Name Locus AnnotationNumber of

RepeatsSignalPeptide

HighlyExpressed

00505 WD0035 ankyrin repeat domain protein 600554 WD0073 ankyrin repeat domain protein 500647 WD0147 ankyrin repeat domain protein 1100700 WD0191 ankyrin repeat domain protein 100821 WD0285 prophage LambdaW1, ankyrin repeat domain protein 3 Y00822 WD0286 prophage LambdaW1, ankyrin repeat domain protein 300827 WD0291 prophage LambdaW1, ankyrin repeat domain protein 500828 WD0292 prophage LambdaW1, ankyrin repeat domain protein 200830 WD0294 ankyrin repeat domain protein 900948 WD0385 ankyrin repeat domain protein 1101014 WD0438 ankyrin repeat domain protein 201017 WD0441 ankyrin repeat domain protein 1 Y01081 WD0498 ankyrin repeat domain protein 901104 WD0514 ankyrin repeat domain protein 601148 WD0550 ankyrin repeat domain protein 601171 WD0566 ankyrin repeat domain protein 201204 WD0596 prophage LambdaW4, ankyrin repeat domain protein 901255 WD0633 prophage LambdaW5, ankyrin repeat domain protein 401260 WD0636 prophage LambdaW5, ankyrin repeat domain protein 2 Y01261 WD0637 prophage LambdaW5, ankyrin repeat domain protein 3 Y01402 WD0754 ankyrin repeat domain protein 201414 WD0766 ankyrin repeat domain protein 800332 WD1213 ankyrin repeat domain protein, putative 1 Y

TIGRTIGRTIGRTIGR Wu et al., 2004

TIGRTIGRTIGRTIGR

Selection Apparently Inefficient in wMel

• Likely not due to higher mutation rate– Full suite of DNA repair genes

• Likely not due to low amounts of homologous recombination– RecA present

– Population studies suggest homologous recombination occurs

• Wolbachia has multiple types of bottlenecks– Maternal transmission like obligate mutualists

– Infectious sweeps of cytoplasmic incompatibility like pathogens

Wu et al., 2004

TIGRTIGRTIGRTIGR

• Sap feeding insect

•Carriers of Xylella fastidiosa that causes Pierce’s disease of grapevines

•Listed as potential Agro-Terrorism Agent

•There are >20000 sharpshooter species, within which intracellular symbiotic bacteria are wildspread

Glassy-winged Sharpshooter

Baumannia cicadellinicola genome project:1° symbionts of the Glassy-winged Sharpshooter

TIGRTIGRTIGRTIGR

400,000

100,000

200,000

300,000

500,000

600,000

1

Baumania Genome Completed

Collaboration between Jonathan Eisen and Nancy Moran (U. Arizona). Analysis led by Dongying Wu in Eisen’s group.

TIGRTIGRTIGRTIGR

“Whole genome” tree of insect endosymbionts

TIGRTIGRTIGRTIGR

Bu_APS Bu_Bp Bu_Sg Wi_gl Ca_bl gebc recG&ruvABC(recombinate repa ir) - - - - - recG&ruvABC methy -direct DNA repair(correct T -G) mutSL mutSLH mutSL - - mutSLH 8-Oxo_ dGTP prevention(mutT, MutY||MutM ) mutTY mutY mutTY - mutTY mutTM uvrD(or homolog rep) rep rep rep uvrD - uvrD recA - - - recA - recA phrB(UV pyrimidine dimer) phrB phrB - phrB - - recBCD recBCD recBCD recBCD recBCD recBCD recBCD mutL(mismatch repair) mutL mutL mutL - - mutL recJ(rec based exision,methy -direct repair) - - - recJ - recJ transcription -repair coupling factor(mfd) mfd mfd - mfd - - uracil -DNA -glycosylase(remove U from DNA) ung ung - ung ung ung site -specific DNA inversion stimulation(fis) fis - fis - - - dna recombination protein rmuc (inversion) - - - - rmuc rmuc

DNA repair genes

TIGRTIGRTIGRTIGR

PEP + erythrose 4-phosphate aroEaroB aroDaroH aroK aroA aroC chorismate

pabB pabA pabC

GTPfolB folKfolE

folC folAfolPfolate

glyceraldehyde-3-phosphate+pyruvate

dxs thiG thiH

L-cysteinethiS thiF thiI

thiS-COSH

L-tyrosine

5’-phosphoribosyl-5-aminoimidazolethiC thiD

thiE ?thiamine phosphate

thiamine

unknownprecursor

bioAbioFbioC bioH bioD bioBpimeloyl coA biotin

L-glutamate + L-cysteine

gshBgshAglutathione

L-glutamategltX hemA hemL hemB hemC hemD hemE

hemNhemG hemH cyoE

cysG

porphobilinogen uroporphyrinogen III coproporphyrinogen III

siroheme

proporphyrinogen protoheme Heme OSuccinyl CoA+ Glycine

Delta-aminoevulinicacid

TIGRTIGRTIGRTIGR

Phylogenomic Tale II: Phylogenetic Anchors

TIGRTIGRTIGRTIGR

rRNA and Uncultured Microbes

Eisen et al. 1992

TIGRTIGRTIGRTIGR

Phylogenetic Anchors

Beja et al., 2000

TIGRTIGRTIGRTIGR

TIGRTIGRTIGRTIGR

GlutamateArgA ArgB ArgC ArgD ArgE

OmithineAarAB ArgF ArgG ArgH

Arg

PyruvateIvHI IlvC IlvD IlvE

Val

Pyruvate+

Alpha-Ketobutyrate

ThreonineIle

IlvA

PEP +Erythrose 4-phosphate

AroH AroB AroD AroE AroK AroA AroCChorismate

PheA HisC Phe

TrpEG TrpD TrpC TrpABTrp

AspartateThrA Asd ThrA

HomoserineThrB ThrC

MetB MetC MetE

Thr

MetDapA DapB DapD DapC DapE DapF LysA Lys

PRPP+ATPHisG HisI HisA HisHF HisB HisC HisB HisD

His

Essential amino acid biosynthetic pathways

TIGRTIGRTIGRTIGR

9359 clones that are not included in the final assembly

Run_TA

7152 assembles (400 have been assembled)

<1kb 6996

1kb-2kb 125

2kb-3kb 18

3kb-4kb 6

4kb-5kb 3

5kb-6kb 2

6kb-7kb 1

7kb-8kb 1

Another Symbiont Present

150 Bacteroides/Chlorobi (njtree/blast)

TIGRTIGRTIGRTIGR

AspartateThrA Asd

HomoserineThrB ThrC

MetB MetC MetE

Thr

MetDapA DapB DapD DapC DapE DapF LysA Lys

TIGRTIGRTIGRTIGR

Phylogenetic Anchors and the Sargasso Sea Shotgun Sequencing

shotgunshotgun

sequencesequence

Warner Brothers, Inc.Warner Brothers, Inc.

TIGRTIGRTIGRTIGR

rRNA as a Phylogenetic Anchor

Venter et al., 2004

TIGRTIGRTIGRTIGR

Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA)

Venter et al., 2004

TIGRTIGRTIGRTIGR

Sargasso Phylotypes

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

AlphaproteobacteriaBetaproteobacteriaGammaproteobacteriaEpsilonproteobacteria

Deltaproteobacteria

CyanobacteriaFirmicutes

Actinobacteria

Chlorobi

CFB

ChloroflexiSpirochaetesFusobacteria

Deinococcus-Thermus

EuryarchaeotaCrenarchaeota

Major Phylogenetic Group

Weighted % of Clones

EFG

EFTu

HSP70

RecA

RpoB

rRNA

Venter et al., 2004

TIGRTIGRTIGRTIGR

Phylogenomic Tale III: Functional Prediction

TIGRTIGRTIGRTIGR

Shotgun Sequencing Detects More Diversity than PCR-methods

Venter et al., 2004

TIGRTIGRTIGRTIGR

Functional Diversity of Proteorhodopsins?

Venter et al., 2004

TIGRTIGRTIGRTIGR

Deinococcus radiodurans

TIGRTIGRTIGRTIGR

DNA Repair Genes in D. radiodurans Complete Genome

Process Genes in D. radiodurans

Nucleotide Excision Repair UvrABCD, UvrA2Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths,

MPGAP Endonuclease XthMismatch Excision Repair MutS, MutLRecombination Initiation Recombinase Migration and resolution

RecFJNRQ, SbcCD, RecDRecARuvABC, RecG

Replication PolA, PolC, PolX, phage PolLigation DnlJdNTP pools, cleanup MutTs, RRaseOther LexA, RadA, HepA, UVDE, MutS2

TIGRTIGRTIGRTIGR

Problem:

List of DNA repair gene homologs in D. radiodurans genome is not significantly different from other

bacterial genomes of the similar size

TIGRTIGRTIGRTIGR

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

~40 Phyla of Bacteria

TIGRTIGRTIGRTIGR

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Most DNA metabolism studies in two Phyla

TIGRTIGRTIGRTIGR

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Deinococcus is very distant from well studied groups

TIGRTIGRTIGRTIGR

-Ogt-RecFRQN-RuvC-Dut-SMS

-PhrI-AlkA-Nfo-Vsr-SbcCD-LexA-UmuC

-PhrI-PhrII-AlkA-Fpg-Nfo-MutLS-RecFORQ-SbcCD-LexA-UmuC-TagI

-PhrI-Ogt-AlkA-Xth-MutLS-RecFJORQN-Mfd-SbcCD-RecG-Dut-PriA-LexA-SMS-MutT

-PhrI-PhrII?-AlkA-Fpg-Nfo-RecO-LexA-UmuC

-PhrI-Ung?-MutLS-RecQ?-Dut-UmuC

-PhrII-Ogg

-Ogt-AlkA-TagI-Nfo-Rec-SbcCD-LexA

-Ogt-AlkA-Nfo-RecQ-SbcD?-Lon-LexA

-AlkA-Xth-Rad25?

-AlkA-Rad25

-Nfo

-Ogt-Ung-Nfo-Dut-Lon

-Ung

-PhrII

-PhrI

Ecoli

Haein

Neig

o

Help

y

Bacsu

Str

py

Mycg

e

Mycp

n

Borb

u

Tre

pa

Syn

sp

Metj

n

Arc

fu

Mett

h

Hu

man

Yeast

BACTERIA ARCHAEA EUKARYOTES

from mitochondria

+Ada+MutH+SbcB

dPhr

+TagI?+Fpg

+UvrABCD+Mfd

+RecFJNOR+RuvABC

+RecG+LigI

+LexA+SSB

+PriA+Dut?

+Rus+UmuD

+Nei?+RecE

tRecT?

+Vsr+RecBCD?

+RFAs+TFIIH

+Rad4,10,14,16,23,26+CSA

+Rad52,53,54+DNA-PK, Ku

dSNF2dMutSdMutLdRecA

+Rad1+Rad2

+Rad25?+Ogg+LigII

+Ung?+SSB,

+Dut?

+PhrI, PhrII+Ogt

+Ung, AlkA, MutY-Nth+AlkA

+Xth, Nfo?+MutLS?

+SbcCD+RecA

+UmuC+MutT

+LondMutSI/MutSII

dRecA/SMSdPhrI/PhrII

+Sprt3MG

+Rad7+CCE1

+P53dRecQ

dRad23+MAG?

-PhrII-RuvC

tRad25

+TagI?

+RecT

tUvrABCD

tTagI ?

Gain and Loss of Repair Genes

TIGRTIGRTIGRTIGR Eisen and Hanawalt, 1999

TIGRTIGRTIGRTIGR

Non-Homology Prediction: Phylogenetic Profiles

• Step 1: Search all genes in organisms of interest against all other genomes

• Ask: Yes or No, is each gene found in each other species

• Cluster genes by distribution patterns (profiles)

TIGRTIGRTIGRTIGR

Carboxydothermus hydrogenoformans

• Isolated in Yellowstone• Thermophile (grows at 80°C)• Anaerobic• Grows on CO (Carbon Monoxide)• Produces hydrogen gas• Low GC gram postive species• Many Archaeal-like genes

TIGRTIGRTIGRTIGR

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

TIGRTIGRTIGRTIGR

IV. Phylogenetic Based Genome Sequencing

TIGRTIGRTIGRTIGR

Reminders

1.Phylogenetic anchors don’t work if you do not have data from across the tree (e.g., Sargasso study limited)

2.Non homology methods rely on diverse genomes

3.Novel processes need to be studied in novel lineages

TIGRTIGRTIGRTIGR

Biased Sampling of Bacterial Genomes

• Of 40 bacterial phyla, most genome sequences come from only 3 groups

• Sargasso Sea Study Limited By Poor Sampling of Species for Data in Genbank

• Difficult to figure out what types of species may be present except when analyzing rRNA genes

Hugenholtz 2002

TIGRTIGRTIGRTIGR

0.1AcidobacteriaBacteroidesFibrobacteres GemmimonasVerrucomicrobia PlanctomycetesChloroflexiProteobacteriaChlorobi FirmicutesFusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus AquificaeThermotogaeTM6OS-KTermite GroupOP8Marine GroupAWS3OP9NKB19OP3OP10TM7OP1OP11NitrospiraSynergistesDeferribacteresThermudesulfobacteriaChrysiogenetesThermomicrobiaDictyoglomusCoprothmermobacterThis projectPublishedIn progressUncultured lineageTree based on Hugenholtz (2002) with some modifications.

• Solution: TIGR Tree of Life Project

– Eisen and Ward, Co-PIs– Selecting genome

projects to increase phylogeneticdiversity

– Supported by NSF Tree of Life Program

– Genomes from 6 new phyla in closure

– More information at http://www.tigr.org/tol

TIGRTIGRTIGRTIGR

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

TIGRTIGRTIGRTIGR

TIGRTIGR

Other peopleOther people

Mom and DadMom and Dad

H. OchmanH. Ochman

W. MartinW. Martin

F. RobbF. Robb

J. BattistaJ. Battista

E. OriasE. Orias

D. BryantD. BryantS. O’NeillS. O’Neill

M. EisenM. Eisen

N. MoranN. Moran

R. MyersR. Myers

C. M. CavanaughC. M. Cavanaugh

P. HanawaltP. Hanawalt

NSFNSF

J. HeidelbergJ. Heidelberg

T.ReadT.Read

N. WardN. Ward

M-I BenitoM-I Benito

J. C. VenterJ. C. Venter C. FraserC. Fraser

S. SalzbergS. Salzberg

O. WhiteO. White

I. PaulsenI. Paulsen

$$$$$$

ONRONR

DOEDOE

NIHNIHH. TettelinH. Tettelin

Eisen GroupEisen Group

Martin WuMartin WuDongying WuDongying WuJames SakwaJames SakwaJonathan BadgerJonathan Badger

TIGRTIGRTIGRTIGR

EvolutionaryMethod

PHYLOGENENETIC PREDICTION OF GENE FUNCTIONIDENTIFY HOMOLOGSOVERLAY KNOWNFUNCTIONS ONTO TREE

INFER LIKELY FUNCTIONOF GENE(S) OF INTEREST

1234563531A2A3A1B2B3B2A1B1A3A1B2B3BALIGN SEQUENCESCALCULATE GENE TREE1246CHOOSE GENE(S) OF INTEREST2A2A53Species 3Species 1Species 211222311A3A1A2A3A1A2A3A464564562B3B1B2B3B1B2B3B ACTUAL EVOLUTION(ASSUMED TO BE UNKNOWN)

Duplication?EXAMPLE AEXAMPLE BDuplication?Duplication?Duplication5 METHODAmbiguous

TIGRTIGRTIGRTIGR

rRNA and Uncultured Microbes

Eisen et al., 1992

top related