a phylogeny driven genomic encyclopedia of bacteria and archaea

78
A phylogeny driven genomic encyclopedia of bacteria and archaea Jonathan A. Eisen Talk at Stanford University April 17, 2010 Saturday, April 24, 2010

Upload: jonathan-eisen

Post on 10-May-2015

2.356 views

Category:

Education


2 download

DESCRIPTION

Slides for talk given by Jonathan Eisen at Stanford University April 17, 2010

TRANSCRIPT

Page 1: A phylogeny driven genomic encyclopedia of bacteria and archaea

A phylogeny driven genomic encyclopedia of bacteria and archaea

Jonathan A. Eisen

Talk at Stanford UniversityApril 17, 2010

Saturday, April 24, 2010

Page 2: A phylogeny driven genomic encyclopedia of bacteria and archaea

Bacterial evolve

Saturday, April 24, 2010

Page 3: A phylogeny driven genomic encyclopedia of bacteria and archaea

Fleischmann et al. 1995

Saturday, April 24, 2010

Page 4: A phylogeny driven genomic encyclopedia of bacteria and archaea

Microbial genomes

From http://genomesonline.orgSaturday, April 24, 2010

Page 5: A phylogeny driven genomic encyclopedia of bacteria and archaea

Saturday, April 24, 2010

Page 6: A phylogeny driven genomic encyclopedia of bacteria and archaea

rRNA Tree of Life

Based on tree by

Norm Pace

Saturday, April 24, 2010

Page 7: A phylogeny driven genomic encyclopedia of bacteria and archaea

The Tree is not Happy

Based on tree by

Norm Pace

Saturday, April 24, 2010

Page 8: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Based on Hugenholtz, 2002

Saturday, April 24, 2010

Page 9: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

As of 2002

Based on Hugenholtz, 2002

Saturday, April 24, 2010

Page 10: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

As of 2002

Based on Hugenholtz, 2002

Saturday, April 24, 2010

Page 11: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Archaea

As of 2002

Based on Hugenholtz, 2002

Saturday, April 24, 2010

Page 12: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

As of 2002

Based on Hugenholtz, 2002

Saturday, April 24, 2010

Page 13: A phylogeny driven genomic encyclopedia of bacteria and archaea

Filling in the Genomic Phylogenetic Gaps

• Common approach within some eukaryotic groups

• Many small projects funded to fill in some bacterial or archaeal gaps

• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature

Saturday, April 24, 2010

Page 14: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution I: sequence more phyla

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Saturday, April 24, 2010

Page 15: A phylogeny driven genomic encyclopedia of bacteria and archaea

Phylum

Species selected

Chrysiogenes

Chrysiogenes arsenatis (GCA)

Coprothermobacter

Coprothermobacter proteolyticus (GCBP)

Dictyoglomi

Dictyoglomus thermophilum (GD T )

Thermodesulfobacteria

Thermodesulfobacterium commune (GTC)

Nitrospirae

Thermodesulfovibrio yellowstonii (GTY)

Thermomicrobia

Thermomicrobium roseum (GTR )

Deferribacteres

Geovibrio thiophilus (GGT)

Synergistes

Synergistes jonesii (GSJ)

Organisms Selected

Saturday, April 24, 2010

Page 16: A phylogeny driven genomic encyclopedia of bacteria and archaea

Bacterial aTOL Project AIMS

• Improve resolution of deep branches in the bacterial tree

• Launch biological studies of these phyla

• Leverage data for interpreting environmental surveys

Saturday, April 24, 2010

Page 17: A phylogeny driven genomic encyclopedia of bacteria and archaea

T. roseum genome

Saturday, April 24, 2010

Page 18: A phylogeny driven genomic encyclopedia of bacteria and archaea

From http://genomesonline.org

Microbial genomes

Saturday, April 24, 2010

Page 19: A phylogeny driven genomic encyclopedia of bacteria and archaea

The Tree of Life is Still Angry

Saturday, April 24, 2010

Page 20: A phylogeny driven genomic encyclopedia of bacteria and archaea

Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae

2.5 Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae

Saturday, April 24, 2010

Page 21: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 100 phyla of bacteria

• Genome sequences are mostly from three phyla

• Most phyla with cultured species are sparsely sampled

• Lineages with no cultured taxa even more poorly sampled

• Solution - use tree to really fill gaps

Well sampled phyla

Saturday, April 24, 2010

Page 22: A phylogeny driven genomic encyclopedia of bacteria and archaea

http://www.jgi.doe.gov/programs/GEBA/pilot.htmlSaturday, April 24, 2010

Page 23: A phylogeny driven genomic encyclopedia of bacteria and archaea

GEBA Pilot Project Overview

• Identify major branches in rRNA tree for which no genomes are available

• Identify a cultured representative for each group

• Grow > 200 of these and prep. DNA• Sequence and finish 100• Annotate, analyze, release data• Assess benefits of tree guided sequencing

Saturday, April 24, 2010

Page 24: A phylogeny driven genomic encyclopedia of bacteria and archaea

GEBA Pilot Target List

0

5

10

15

20

25

30

35

B: A

ctinob

acteria

(High GC)

B: A

minan

aero

bia

B: A

quifica

e

B: B

actero

idetes

B: C

hlor

oflexi

B: D

efer

ribac

tere

s

B: D

efer

ribac

tere

s

B: D

eino

cocc

i

B: D

elta Pro

teob

acteria

B: Eps

ilon Pr

oteo

bacter

ia

B: Firm

icutes

B: Fus

obac

teria

B: G

amma Pr

oteo

bacter

ia

B: G

emmatim

onad

etes

B: H

aloa

naer

obiales

B: Planc

tomyc

etes

B: S

piro

chae

tes

B: The

rmod

esulfoba

cter

ia

B: The

rmod

esulfobia

B: The

rmov

enab

ulae

A: H

alob

acteria

A: A

rcha

eoglob

i

A: M

etha

noba

cter

ia

A: M

etha

nomicr

obia

A: The

rmoc

occi

A: The

rmop

rotei

Phyla

# o

f G

en

om

es

Saturday, April 24, 2010

Page 25: A phylogeny driven genomic encyclopedia of bacteria and archaea

Why Increase Taxonomic Coverage?

• Gene discovery• Annotation, functional prediction• Metagenomic analysis• Mechanisms of diversification• Species phylogeny and classification

Saturday, April 24, 2010

Page 26: A phylogeny driven genomic encyclopedia of bacteria and archaea

GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan

Eisen, Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,

Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et

al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor

Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)

• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)

Saturday, April 24, 2010

Page 27: A phylogeny driven genomic encyclopedia of bacteria and archaea

Assess Benefits of GEBA

• All genomes have some value

• But what, if any, is the benefit of tree-guided sequencing over other selection methods

• Lessons for other large scale microbial genome projects?

Saturday, April 24, 2010

Page 28: A phylogeny driven genomic encyclopedia of bacteria and archaea

GEBA Lesson 1

rRNA Tree is Useful for Identifying Phylogenetically Novel Genomes

rRNA Tree topology is not perfect;Genome-based trees better

Saturday, April 24, 2010

Page 29: A phylogeny driven genomic encyclopedia of bacteria and archaea

rRNA Tree of Life

Based on tree by

Norm Pace

Saturday, April 24, 2010

Page 30: A phylogeny driven genomic encyclopedia of bacteria and archaea

Saturday, April 24, 2010

Page 31: A phylogeny driven genomic encyclopedia of bacteria and archaea

Saturday, April 24, 2010

Page 32: A phylogeny driven genomic encyclopedia of bacteria and archaea

Wh

Whole genome tree built using AMPHORAby Martin Wu and Dongying Wu

Saturday, April 24, 2010

Page 33: A phylogeny driven genomic encyclopedia of bacteria and archaea

PD of rRNA, Genome Trees Similar

From Wu et al. 2009. http://www.nature.com/nature/journal/v462/n7276/full/nature08656.html

Saturday, April 24, 2010

Page 34: A phylogeny driven genomic encyclopedia of bacteria and archaea

Proteobacteria

Saturday, April 24, 2010

Page 35: A phylogeny driven genomic encyclopedia of bacteria and archaea

GEBA Lesson 2

Phylogenetically-guided genome selection improves genome

annotation

Saturday, April 24, 2010

Page 36: A phylogeny driven genomic encyclopedia of bacteria and archaea

Predicting Function

• Key step in genome projects• More accurate predictions help guide

experimental and computational analyses• Many diverse approaches• Comparative and evolutionary analysis

greatly improves most predictions

Saturday, April 24, 2010

Page 37: A phylogeny driven genomic encyclopedia of bacteria and archaea

Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling

• Better definition of protein family sequence “patterns”

• Conversion of hypothetical into conserved hypotheticals

• Greatly improves “comparative” and “evolutionary” based predictions

• Linking distantly related members of protein families

• Improved non-homology prediction

Saturday, April 24, 2010

Page 38: A phylogeny driven genomic encyclopedia of bacteria and archaea

From Wu et al. 2009. http://www.nature.com/nature/journal/v462/n7276/full/nature08656.html

Saturday, April 24, 2010

Page 39: A phylogeny driven genomic encyclopedia of bacteria and archaea

GEBA Lesson 3

Improves analysis of genome data from uncultured organisms

Saturday, April 24, 2010

Page 40: A phylogeny driven genomic encyclopedia of bacteria and archaea

Environmental Shotgun Sequencing

shotgun

clone

Saturday, April 24, 2010

Page 41: A phylogeny driven genomic encyclopedia of bacteria and archaea

Saturday, April 24, 2010

Page 42: A phylogeny driven genomic encyclopedia of bacteria and archaea

rRNA phylotyping from metagenomics

Venter et al., 2004

Saturday, April 24, 2010

Page 43: A phylogeny driven genomic encyclopedia of bacteria and archaea

Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA)

Venter et al., 2004

Saturday, April 24, 2010

Page 44: A phylogeny driven genomic encyclopedia of bacteria and archaea

0

0.1250

0.2500

0.3750

0.5000

Alphaproteobacteria

Betaproteobacteria

Gammaproteobacteria

Epsilonproteobacteria

Deltaproteobacteria

Cyanobacteria

Firmicutes

Actinobacteria

Chlorobi

CFB

Chloroflexi

Spirochaetes

Fusobacteria

Deinococcus-Thermus

Euryarchaeota

Crenarchaeota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

Venter et al., 2004

Saturday, April 24, 2010

Page 45: A phylogeny driven genomic encyclopedia of bacteria and archaea

0

0.1250

0.2500

0.3750

0.5000

Alphaproteobacteria

Betaproteobacteria

Gammaproteobacteria

Epsilonproteobacteria

Deltaproteobacteria

Cyanobacteria

Firmicutes

Actinobacteria

Chlorobi

CFB

Chloroflexi

Spirochaetes

Fusobacteria

Deinococcus-Thermus

Euryarchaeota

Crenarchaeota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

Venter et al., 2004

Cannot be done without good sampling of genomes

Saturday, April 24, 2010

Page 46: A phylogeny driven genomic encyclopedia of bacteria and archaea

ABCDEFG

TUVWXYZ

Binning challenge

Saturday, April 24, 2010

Page 47: A phylogeny driven genomic encyclopedia of bacteria and archaea

ABCDEFG

TUVWXYZ

Binning challenge

Best binning method: reference genomes

Saturday, April 24, 2010

Page 48: A phylogeny driven genomic encyclopedia of bacteria and archaea

ABCDEFG

TUVWXYZ

Binning challenge

No reference genome? What do you do?

Saturday, April 24, 2010

Page 49: A phylogeny driven genomic encyclopedia of bacteria and archaea

ABCDEFG

TUVWXYZ

Binning challenge

No reference genome? What do you do?

Phylogeny ....Saturday, April 24, 2010

Page 50: A phylogeny driven genomic encyclopedia of bacteria and archaea

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacter

ia

Gammap

roteob

acteria

Deltapr

oteo

bacter

ia

Epsil

onpr

oteo

bacter

ia

Uncla

ssified

Pro

teob

acteria

Cyan

obac

teria

Chlamyd

iae

Acidob

acteria

Bacter

oide

tes

Actin

obac

teria

Aquific

ae

Plan

ctom

ycetes

Spiro

chae

tes

Firmicu

tes

Chloro

flexi

Chloro

bi

Uncla

ssified

Bac

teria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own treeSaturday, April 24, 2010

Page 51: A phylogeny driven genomic encyclopedia of bacteria and archaea

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacter

ia

Gammap

roteob

acteria

Deltapr

oteo

bacter

ia

Epsil

onpr

oteo

bacter

ia

Uncla

ssified

Pro

teob

acteria

Cyan

obac

teria

Chlamyd

iae

Acidob

acteria

Bacter

oide

tes

Actin

obac

teria

Aquific

ae

Plan

ctom

ycetes

Spiro

chae

tes

Firmicu

tes

Chloro

flexi

Chloro

bi

Uncla

ssified

Bac

teria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Cannot be done without good sampling of genomes

Saturday, April 24, 2010

Page 52: A phylogeny driven genomic encyclopedia of bacteria and archaea

GEBA Phylogenomic Lesson 5

We have still only scratched the surface of microbial diversity

Saturday, April 24, 2010

Page 53: A phylogeny driven genomic encyclopedia of bacteria and archaea

Protein Family Rarefaction Curves

• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families

Saturday, April 24, 2010

Page 54: A phylogeny driven genomic encyclopedia of bacteria and archaea

Saturday, April 24, 2010

Page 55: A phylogeny driven genomic encyclopedia of bacteria and archaea

Saturday, April 24, 2010

Page 56: A phylogeny driven genomic encyclopedia of bacteria and archaea

Saturday, April 24, 2010

Page 57: A phylogeny driven genomic encyclopedia of bacteria and archaea

Saturday, April 24, 2010

Page 58: A phylogeny driven genomic encyclopedia of bacteria and archaea

Saturday, April 24, 2010

Page 59: A phylogeny driven genomic encyclopedia of bacteria and archaea

Phylogenetic Distribution Novelty: Bacterial Actin Related Protein

Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor Kunin

!"#$%&'()*&& !"#$%&'(%()+"#,-.(/01 !"#*+,**'+(

2"#3)&4&*&& !"#*)$*),+%5"#$-.-6&0&1- !"#$%,$-%)(7"#0(1.8-9& !"#$''+-+,',!5"#:1,)*&$/0 !"#&$,%+)+-+

;"#01,&-*0 !"#%*+$--(<"#$-.-3.1%&0 !"#%',&'-+)

2"#$&*-.-1 !"#$'(-%%+&$="#$.1001 !"#-*$+$(&(>"#0$1,/%1.&0 !"#&$**+),)-!;"#01,&-*0 !"#*+,$*'(

5"#:1,)*&$/0 !"#&$,%+%-%%5"#$-.-6&0&1- !"#',&+$)*?"#@-%1*)A10(-. !"#&%'%&*%*B"#A1%%/0# "#%*,-&*'(2"#*-)').@1*0 !"#*-&'''(+5"#$-.-6&0&1- !"#',&&*&*?"#@-%1*)A10(-. !"#$)),)*%,;"#01,&-*0 !"#*+,$*),!;"#)$C.1$-/@ !"#&&),(*((-

."#,1(-*0 !"#$'-+*$((&!!"#(C1%&1*1 !"#$-,(%'+-!

5"#$-.-6&0&1- !"#$++-&%%!

?"#@-%1*)A10(-. !"#$)),),%)

?"#C1*0-*&&!"#&$-*$$(&$5"#$-.-6&0&1- !"#',&,$$%

5"#:1,)*&$/0 !"#&$,%+-,(,!5"#$-.-6&0&1- !"#$,+$(,&

?"#4&0$)&4-/@ !"#''-+&%$-

D"#01(&61 !"#$-&'*)%&+!!"#(C1%&1*1!"#$-%$ $),)

?"#@-%1*)A1(-. !"#$((&+,*-<"#@/0$/%/0 !"#&&'&%'*(,

((

')

$++$++

'*

$++

$++

)*

$++

$++

*$

((),

$++()

(%$++

)%

$++

-)

$++

+/*!

!"#$%

!&'(

!&')

!&'*

+!&'

!&',

!&'-

!&'.

!&'/

!&'(0

From Wu et al. 2009. http://www.nature.com/nature/journal/v462/n7276/full/nature08656.htmlSaturday, April 24, 2010

Page 60: A phylogeny driven genomic encyclopedia of bacteria and archaea

rRNA Tree of Life

Based on tree by

Norm Pace

Saturday, April 24, 2010

Page 61: A phylogeny driven genomic encyclopedia of bacteria and archaea

Phylogenetic Diversity: Sequenced Bacteria & Archaea

From Wu et al. 2009. http://www.nature.com/nature/journal/v462/n7276/full/nature08656.html

Saturday, April 24, 2010

Page 62: A phylogeny driven genomic encyclopedia of bacteria and archaea

Phylogenetic Diversity with GEBA

From Wu et al. 2009. http://www.nature.com/nature/journal/v462/n7276/full/nature08656.html

Saturday, April 24, 2010

Page 63: A phylogeny driven genomic encyclopedia of bacteria and archaea

Phylogenetic Diversity: Isolates

From Wu et al. 2009. http://www.nature.com/nature/journal/v462/n7276/full/nature08656.htmlSaturday, April 24, 2010

Page 64: A phylogeny driven genomic encyclopedia of bacteria and archaea

Phylogenetic Diversity: All

From Wu et al. 2009. http://www.nature.com/nature/journal/v462/n7276/full/nature08656.html

Saturday, April 24, 2010

Page 65: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Most phyla with cultured species are sparsely sampled

• Lineages with no cultured taxa even more poorly sampled

Well sampled phylaPoorly sampled

No cultured taxaSaturday, April 24, 2010

Page 66: A phylogeny driven genomic encyclopedia of bacteria and archaea

Uncultured Lineages:Technical Approaches

• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification

Saturday, April 24, 2010

Page 67: A phylogeny driven genomic encyclopedia of bacteria and archaea

GEBA Phylogenomic Lesson 6

Need Experiments from Across the Tree of Life too

Saturday, April 24, 2010

Page 68: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Based on Hugenholtz, 2002

Saturday, April 24, 2010

Page 69: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Experimental studies are mostly from three phyla

As of 2002

Based on Hugenholtz, 2002

Saturday, April 24, 2010

Page 70: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Experimental studies are mostly from three phyla

• Some studies in other phyla

As of 2002

Based on Hugenholtz, 2002

Saturday, April 24, 2010

Page 71: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

As of 2002

Based on Hugenholtz, 2002

Saturday, April 24, 2010

Page 72: A phylogeny driven genomic encyclopedia of bacteria and archaea

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

As of 2002

Based on Hugenholtz, 2002

Saturday, April 24, 2010

Page 73: A phylogeny driven genomic encyclopedia of bacteria and archaea

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Need experimental studies from across the tree too

Saturday, April 24, 2010

Page 74: A phylogeny driven genomic encyclopedia of bacteria and archaea

Saturday, April 24, 2010

Page 75: A phylogeny driven genomic encyclopedia of bacteria and archaea

MICROBES

Saturday, April 24, 2010

Page 76: A phylogeny driven genomic encyclopedia of bacteria and archaea

A Happy Tree of Life

Saturday, April 24, 2010

Page 77: A phylogeny driven genomic encyclopedia of bacteria and archaea

Lateral transfer

• Many lines of evidence suggest it is important in adaptations in microbes– E.g., K12 vs. O157:H7– e.g., Many genes show anomalous patterns

• However, does not appear to wipe out phylogenetic signal– Core of genomes gives similar phylogeny– Most acquired genes do not last long in lineages– Many claims of LGT are more “identification of

anomalies” than detecting LGT

Saturday, April 24, 2010

Page 78: A phylogeny driven genomic encyclopedia of bacteria and archaea

Saturday, April 24, 2010