talk for uc davis applied phylogenetics course at bodega bay

190
Phylogenomics Jonathan A. Eisen UC Davis Bodega Applied Phylogenetics Workshop March 7, 2011 Tuesday, March 8, 2011

Upload: jonathan-eisen

Post on 10-May-2015

3.314 views

Category:

Documents


2 download

DESCRIPTION

Talk by Jonathan Eisen for UC Davis Applied Phylogenetics Course at Bodega Bay

TRANSCRIPT

Page 1: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylogenomics

Jonathan A. EisenUC Davis

Bodega Applied Phylogenetics WorkshopMarch 7, 2011

Tuesday, March 8, 2011

Page 2: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Fleischmann et al. 1995 Science 269:496-512

Tuesday, March 8, 2011

Page 3: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Whole Genome Shotgun Sequencing

Tuesday, March 8, 2011

Page 4: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Whole Genome Shotgun Sequencing

Tuesday, March 8, 2011

Page 5: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Whole Genome Shotgun Sequencing

Warner Brothers, Inc.

Tuesday, March 8, 2011

Page 6: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Whole Genome Shotgun Sequencing

shotgun

Warner Brothers, Inc.

Tuesday, March 8, 2011

Page 7: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Whole Genome Shotgun Sequencing

shotgun

Warner Brothers, Inc.

Tuesday, March 8, 2011

Page 8: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Whole Genome Shotgun Sequencing

shotgun

sequenceWarner Brothers, Inc.

Tuesday, March 8, 2011

Page 9: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Whole Genome Shotgun Sequencing

shotgun

sequenceWarner Brothers, Inc.

Tuesday, March 8, 2011

Page 10: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Assemble Fragments

Tuesday, March 8, 2011

Page 11: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Assemble Fragments

sequencer output

Tuesday, March 8, 2011

Page 12: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Assemble Fragments

sequencer output

Tuesday, March 8, 2011

Page 13: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Assemble Fragments

sequencer output

assemble fragments

Tuesday, March 8, 2011

Page 14: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Assemble Fragments

sequencer output

assemble fragments

Closure &

Annotation

Tuesday, March 8, 2011

Page 15: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

From http://genomesonline.orgTuesday, March 8, 2011

Page 16: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 17: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 18: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 19: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 20: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Genome Sequences Have Revolutionized Microbiology

• Predictions of metabolic processes

• Better vaccine and drug design

• New insights into mechanisms of evolution

• Genomes serve as template for functional studies

• New enzymes and materials for engineering and synthetic biology

Tuesday, March 8, 2011

Page 21: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

General Steps in Analysis of Complete Genomes

• Identification/prediction of genes• Characterization of gene features• Characterization of genome features• Prediction of gene function• Prediction of pathways• Integration with known biological

data• Comparative genomics

Tuesday, March 8, 2011

Page 22: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Genome Size

Tuesday, March 8, 2011

Page 23: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Genome Structure:

More Variable

than Once Thought

Tuesday, March 8, 2011

Page 24: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 25: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Why Completeness is • Improves characterization of genome

features– Gene order, replication origins

• Better comparative genomics– Genome duplications, inversions

• Presence and absence of particular genes can be very important

• Missing sequence might be important (e.g., centromere)

• Allows researchers to focus on biology not sequencing

Tuesday, March 8, 2011

Page 26: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Vibrio cholerae Metabolism

Tuesday, March 8, 2011

Page 27: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 28: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

From http://genomesonline.orgTuesday, March 8, 2011

Page 29: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylogenomic Analysis

• Evolutionary reconstructions greatly improve genome analyses

• Genome analysis greatly improves evolutionary reconstructions

• There is a feedback loop such that these should be integrated

Tuesday, March 8, 2011

Page 30: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Outline

• Phylogenomic Tales– Selecting genomes for sequencing– Species evolution– Predicting functions of genes– Uncultured microbes– Searching for novel organisms and genes

Tuesday, March 8, 2011

Page 31: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Outline

• Phylogenomic Tales– Selecting genomes for sequencing– Species evolution– Predicting functions of genes– Uncultured microbes– Searching for novel organisms and genes

• All of these going to be told in context of a recent project “A Genomic Encyclopedia of Bacteria and Archaea” (aka GEBA)

Tuesday, March 8, 2011

Page 32: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

GEBA Introduction

Knowing What We Don’t Know

Tuesday, March 8, 2011

Page 34: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

As of 2002

Tuesday, March 8, 2011

Page 35: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Based on Hugenholtz, 2002

Tuesday, March 8, 2011

Page 36: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

As of 2002

Based on Hugenholtz, 2002

Tuesday, March 8, 2011

Page 37: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

As of 2002

Based on Hugenholtz, 2002

Tuesday, March 8, 2011

Page 38: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

As of 2002

Based on Hugenholtz, 2002

Tuesday, March 8, 2011

Page 39: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Need for Tree Guidance Well Established

• Common approach within some eukaryotic groups

• Many small projects funded to fill in some bacterial or archaeal gaps

• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature

Tuesday, March 8, 2011

Page 40: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution I: sequence more phyla

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen, Ward, Robb, Nelson, et al

Tuesday, March 8, 2011

Page 41: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylum

Species selected

Chrysiogenes

Chrysiogenes arsenatis (GCA)

Coprothermobacter

Coprothermobacter proteolyticus (GCBP)

Dictyoglomi

Dictyoglomus thermophilum (GD T )

Thermodesulfobacteria

Thermodesulfobacterium commune (GTC)

Nitrospirae

Thermodesulfovibrio yellowstonii (GTY)

Thermomicrobia

Thermomicrobium roseum (GTR )

Deferribacteres

Geovibrio thiophilus (GGT)

Synergistes

Synergistes jonesii (GSJ)

Organisms Selected

Tuesday, March 8, 2011

Page 42: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Still highly biased in terms of the tree

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Tuesday, March 8, 2011

Page 43: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae

2.5 Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae

Tuesday, March 8, 2011

Page 44: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Archaea

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Tuesday, March 8, 2011

Page 45: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Tuesday, March 8, 2011

Page 46: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Tuesday, March 8, 2011

Page 47: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution: Really Fill in the Tree

• GEBA• A genomic

encyclopedia of bacteria and archaea

Eisen & Ward, PIs

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tuesday, March 8, 2011

Page 48: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

http://www.jgi.doe.gov/programs/GEBA/pilot.htmlTuesday, March 8, 2011

Page 49: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan

Eisen, Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,

Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et

al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor

Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)

• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)

Tuesday, March 8, 2011

Page 50: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

rRNA Tree of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Tuesday, March 8, 2011

Page 51: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 52: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 53: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 54: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

GEBA Pilot Target List

0

5

10

15

20

25

30

35

B: A

ctinob

acteria

(High GC)

B: A

minan

aero

bia

B: A

quifica

e

B: B

actero

idetes

B: C

hlor

oflexi

B: D

efer

ribac

tere

s

B: D

efer

ribac

tere

s

B: D

eino

cocc

i

B: D

elta Pro

teob

acteria

B: Eps

ilon Pr

oteo

bacter

ia

B: Firm

icutes

B: Fus

obac

teria

B: G

amma Pr

oteo

bacter

ia

B: G

emmatim

onad

etes

B: H

aloa

naer

obiales

B: Planc

tomyc

etes

B: S

piro

chae

tes

B: The

rmod

esulfoba

cter

ia

B: The

rmod

esulfobia

B: The

rmov

enab

ulae

A: H

alob

acteria

A: A

rcha

eoglob

i

A: M

etha

noba

cter

ia

A: M

etha

nomicr

obia

A: The

rmoc

occi

A: The

rmop

rotei

Phyla

# o

f G

en

om

es

Tuesday, March 8, 2011

Page 55: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

GEBA Pilot Project Overview

• Identify major branches in rRNA tree for which no genomes are available

• Identify those with a cultured representative in DSMZ

• DSMZ grew > 200 of these and prepped DNA

• Sequence and finish 200+• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009

Tuesday, March 8, 2011

Page 56: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

GEBA Phylogenomic Lesson 1

The rRNA Tree of Life is a Useful Tool for Identifying Phylogenetically Novel

Genomes

Tuesday, March 8, 2011

Page 57: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

rRNA Tree of Life

Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007.

Based on tree from Pace 1997 Science 276:734-740

Archaea

Eukaryotes

Bacteria

Tuesday, March 8, 2011

Page 58: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

The Core Gets Small ...

Tuesday, March 8, 2011

Page 59: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

The Pangenome

Tuesday, March 8, 2011

Page 60: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Islands Among Synteny

Tuesday, March 8, 2011

Page 61: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

The Pangenome

Tuesday, March 8, 2011

Page 62: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Network of Life

Figure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Tuesday, March 8, 2011

Page 63: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Using the Core

Tuesday, March 8, 2011

Page 64: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Wh

Whole genome tree built using AMPHORAby Martin Wu and Dongying Wu

Tuesday, March 8, 2011

Page 65: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 66: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Four Models for Rooting TOLfrom Lake et al. doi: 10.1098/rstb.2009.0035

Tuesday, March 8, 2011

Page 67: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

GEBA Phylogenomic Lesson 2

rRNA Tree is good but not perfectand better genomic sampling improves

phylogenetic inference

Tuesday, March 8, 2011

Page 68: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

16s Says Hyphomonas is in Rhodobacteriales

Badger et al. 2005

Tuesday, March 8, 2011

Page 69: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

WGT and individual gene trees:Its Related to Caulobacterales

Badger et al. 2005

Tuesday, March 8, 2011

Page 71: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Caveats: ignoring LGT and using concatenated alignments

Tuesday, March 8, 2011

Page 72: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Concatenated Alignment ML Tree

Tuesday, March 8, 2011

Page 73: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Green Non Sulfur Bacteria

Tuesday, March 8, 2011

Page 74: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Chlamydia-Verrucomicrobia

Tuesday, March 8, 2011

Page 75: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Proteobacteria

Tuesday, March 8, 2011

Page 77: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

GEBA Phylogenomic Lesson 3

Phylogenetics guided genome selection (and phylogenetics in

general) improves genome annotation

Tuesday, March 8, 2011

Page 78: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Predicting Function

• Key step in genome projects• More accurate predictions help guide

experimental and computational analyses• Many diverse approaches• All improved both by “phylogenomic” type

analyses that integrate evolutionary reconstructions and understanding of how new functions evolve

Tuesday, March 8, 2011

Page 80: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Blast Search of H. pylori “MutS”

• Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs

• Based on this TIGR predicted this species had mismatch repair

• Assumes functional constancy Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.

Tuesday, March 8, 2011

Page 81: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Predicting Function• Identification of motifs

– Short regions of sequence similarity that are indicative of general activity

– e.g., ATP binding• Homology/similarity based methods

– Gene sequence is searched against a databases of other sequences

– If significant similar genes are found, their functional information is used

• Problem– Genes frequently have similarity to hundreds of motifs

and multiple genes, not all with the same function

Tuesday, March 8, 2011

Page 82: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

MutL??

From http://asajj.roswellpark.org/huberman/dna_repair/mmr.html

Tuesday, March 8, 2011

Page 83: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylogenetic Tree of MutS Family

Aquae Trepa

FlyXenlaRatMouseHumanYeastNeucrArath

BorbuStrpyBacsu

SynspEcoliNeigo

ThemaTheaqDeira

Chltr

SpombeYeast

YeastSpombeMouseHumanArath

YeastHumanMouseArath

StrpyBacsu

CelegHumanYeast MetthBorbu

AquaeSynspDeira Helpy

mSaco

YeastCelegHuman

Based on Eisen, 1998 Nucl Acids Res 26: 4291-4300.

Tuesday, March 8, 2011

Page 84: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

MutS Subfamilies

Aquae Trepa

FlyXenlaRatMouseHumanYeastNeucrArath

BorbuStrpyBacsu

SynspEcoli

Neigo

ThemaTheaqDeira

Chltr

SpombeYeast

YeastSpombe

MouseHumanArath

YeastHumanMouseArath

StrpyBacsu

CelegHumanYeast MetthBorbu

AquaeSynspDeira Helpy

mSaco

YeastCelegHuman

MSH4

MSH5 MutS2

MutS1

MSH1

MSH3

MSH6

MSH2

Based on Eisen, 1998 Nucl Acids Res 26: 4291-4300.

Tuesday, March 8, 2011

Page 85: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Overlaying Functions onto Tree

Aquae Trepa

Rat

FlyXenla

MouseHumanYeastNeucrArath

BorbuSynsp

Neigo

ThemaStrpyBacsu

Ecoli

TheaqDeiraChltr

SpombeYeast

YeastSpombeMouseHumanArath

YeastHumanMouseArath

StrpyBacsu

HumanCelegYeast

MetthBorbu

AquaeSynspDeira Helpy

mSaco

YeastCelegHuman

MSH4

MSH5MutS2

MutS1

MSH1

MSH3

MSH6

MSH2

Based on Eisen, 1998 Nucl Acids Res 26: 4291-4300.

Tuesday, March 8, 2011

Page 86: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Functional Prediction Using Tree

Aquae Trepa

FlyXenlaRatMouseHumanYeastNeucrArath

BorbuStrpyBacsu

SynspEcoliNeigo

ThemaTheaqDeira

Chltr

SpombeYeast

YeastSpombe

MouseHumanArath

YeastHumanMouseArath

MSH1MitochondrialRepair

MSH3 - Nuclear RepairOf Loops

MSH6 - Nuclear RepairOf Mismatches

MutS1 - Bacterial Mismatch and Loop Repair

StrpyBacsu

CelegHumanYeast MetthBorbu

AquaeSynspDeira Helpy

mSaco

YeastCelegHuman

MSH4 - Meiotic CrossingOver

MSH5 - Meiotic Crossing Over MutS2 - Unknown Functions

MSH2 - Eukaryotic NuclearMismatch and Loop Repair

Based on Eisen, 1998 Nucl Acids Res 26: 4291-4300.

Tuesday, March 8, 2011

Page 87: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 88: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

PHYLOGENENETIC PREDICTION OF GENE FUNCTION

IDENTIFY HOMOLOGS

OVERLAY KNOWNFUNCTIONS ONTO TREE

INFER LIKELY FUNCTIONOF GENE(S) OF INTEREST

1 2 3 4 5 6

3 5

3

1A 2A 3A 1B 2B 3B

2A 1B

1A

3A

1B2B

3B

ALIGN SEQUENCES

CALCULATE GENE TREE

12

4

6

CHOOSE GENE(S) OF INTEREST

2A

2A

5

3

Species 3Species 1 Species 2

1

1 2

2

2 31

1A 3A

1A 2A 3A

1A 2A 3A

4 6

4 5 6

4 5 6

2B 3B

1B 2B 3B

1B 2B 3B

ACTUAL EVOLUTION(ASSUMED TO BE UNKNOWN)

Duplication?

EXAMPLE A EXAMPLE B

Duplication?

Duplication?

Duplication

5

METHOD

Ambiguous

Based on Eisen, 1998 Genome Res 8: 163-167.

Tuesday, March 8, 2011

Page 89: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylogenetic Prediction of

• Termed phylogenomics (Eisen, et al 1997)• Greatly improves accuracy of functional

predictions compared to similarity based methods (e.g., blast)

• Automated methods now available– Sean Eddy, Steven Brenner, Kimmen Sjölander,

etc.• But …

Tuesday, March 8, 2011

Page 90: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Example 2: Recent Changes

E.coli gi1787690

B.subtilis gi2633766Synechocystis sp. gi1001299Synechocystis sp. gi1001300Synechocystis sp. gi1652276Synechocystis sp. gi1652103H.pylori gi2313716H.pylori99 gi4155097C.jejuni Cj1190cC.jejuni Cj1110cA.fulgidus gi2649560A.fulgidus gi2649548B.subtilis gi2634254B.subtilis gi2632630B.subtilis gi2635607B.subtilis gi2635608B.subtilis gi2635609B.subtilis gi2635610B.subtilis gi2635882E.coli gi1788195E.coli gi2367378E.coli gi1788194

E.coli gi1789453C.jejuni Cj0144C.jejuni Cj0262c

H.pylori gi2313186H.pylori99 gi4154603C.jejuni Cj1564

C.jejuni Cj1506cH.pylori gi2313163H.pylori99 gi4154575H.pylori gi2313179H.pylori99 gi4154599C.jejuni Cj0019cC.jejuni Cj0951cC.jejuni Cj0246cB.subtilis gi2633374T.maritima TM0014

T.pallidum gi3322777T.pallidum gi3322939T.pallidum gi3322938B.burgdorferi gi2688522T.pallidum gi3322296B.burgdorferi gi2688521T.maritima TM0429T.maritima TM0918T.maritima TM0023T.maritima TM1428T.maritima TM1143T.maritima TM1146P.abyssi PAB1308P.horikoshii gi3256846P.abyssi PAB1336P.horikoshii gi3256896P.abyssi PAB2066P.horikoshii gi3258290P.abyssi PAB1026P.horikoshii gi3256884D.radiodurans DRA00354D.radiodurans DRA0353D.radiodurans DRA0352P.abyssi PAB1189P.horikoshii gi3258414B.burgdorferi gi2688621M.tuberculosis gi1666149

V.cholerae VC0512V.cholerae VCA1034V.cholerae VCA0974V.cholerae VCA0068V.cholerae VC0825V.cholerae VC0282V.cholerae VCA0906V.cholerae VCA0979V.cholerae VCA1056V.cholerae VC1643V.cholerae VC2161V.cholerae VCA0923V.cholerae VC0514V.cholerae VC1868V.cholerae VCA0773V.cholerae VC1313V.cholerae VC1859V.cholerae VC1413V.cholerae VCA0268V.cholerae VCA0658V.cholerae VC1405V.cholerae VC1298V.cholerae VC1248V.cholerae VCA0864V.cholerae VCA0176V.cholerae VCA0220V.cholerae VC1289V.cholerae VCA1069V.cholerae VC2439V.cholerae VC1967V.cholerae VCA0031V.cholerae VC1898V.cholerae VCA0663V.cholerae VCA0988V.cholerae VC0216V.cholerae VC0449V.cholerae VCA0008V.cholerae VC1406V.cholerae VC1535V.cholerae VC0840

V.cholerae VC0098V.cholerae VCA1092

V.cholerae VC1403V.cholerae VCA1088

V.cholerae VC1394

V.cholerae VC0622

NJ

**

*****

******

****

***

****

**

*

****

**

**

******

******

*

****

******

***

***

***

****

**

*

****

*

• Phylogenomic functional prediction may not work well for very newly evolved functions

• Can use understanding of origin of novelty to better interpret these cases?

• Screen genomes for genes that have changed recently

– Pseudogenes and gene loss– Contingency Loci– Acquisition (e.g., LGT)– Unusual dS/dN ratios– Rapid evolutionary rates– Recent duplications

Tuesday, March 8, 2011

Page 91: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Example 3: Non homology methods

• Many genes have homologs in other species but no homologs have ever been studied experimentally

• Non-homology methods can make functional predictions for these

• Example: phylogenetic profiling

Tuesday, March 8, 2011

Page 92: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylogenetic profiling basis

• Microbial genes are lost rapidly when not maintained by selection

• Genes can be acquired by lateral transfer• Frequently gain and loss occurs for entire

pathways/processes• Thus might be able to use correlated presence/

absence information to identify genes with similar functions

Tuesday, March 8, 2011

Page 93: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Non-Homology Predictions: Phylogenetic Profiling

• Step 1: Search all genes in organisms of interest against all other genomes

• Ask: Yes or No, is each gene found in each other species

• Cluster genes by distribution patterns (profiles)

Tuesday, March 8, 2011

Page 94: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Carboxydothermus hydrogenoformans

• Isolated from a Russian hotspring• Thermophile (grows at 80°C)• Anaerobic• Grows very efficiently on CO

(Carbon Monoxide)• Produces hydrogen gas• Low GC Gram positive

(Firmicute)• Genome Determined (Wu et al.

2005 PLoS Genetics 1: e65. )

Tuesday, March 8, 2011

Page 98: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

PG Profiling Works Better Using Orthology

Tuesday, March 8, 2011

Page 99: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

GEBA Lesson 3:Phylogeny driven genome selection (and

phylogenetics) improves genome annotation

• Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes

• Better definition of protein family sequence “patterns”• Greatly improves “comparative” and “evolutionary”

based predictions• Conversion of hypothetical into conserved hypotheticals• Linking distantly related members of protein families• Improved non-homology prediction

Tuesday, March 8, 2011

Page 100: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

GEBA Lesson 4:Metadata Important

Tuesday, March 8, 2011

Page 101: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

GEBA Phylogenomic Lesson 5

Phylogeny-driven genome selection helps discover new genetic diversity

Tuesday, March 8, 2011

Page 102: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Network of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Tuesday, March 8, 2011

Page 103: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Protein Family Rarefaction

• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families

Tuesday, March 8, 2011

Page 110: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Families/PD not uniform

! !

!"#$%"&'(%)"*

+,%-./&#(%)"*

Tuesday, March 8, 2011

Page 111: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Structural Novelty

• Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu)

• Structural modeling suggests many are structurally novel too (D'haeseleer)

• 372 being crystallized by the PSI (Kerfeld)

Tuesday, March 8, 2011

Page 112: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

GEBA Phylogenomic Lesson 6

Improves analysis of genome data from uncultured organisms

Tuesday, March 8, 2011

Page 113: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Great Plate Count Anomaly

Culturing Microscope

CountCount

Tuesday, March 8, 2011

Page 114: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Great Plate Count Anomaly

Culturing Microscope

CountCount <<<<

Tuesday, March 8, 2011

Page 115: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Environmental DNA Analysis

Culturing Microscope

CountCount <<<<

DNA

Tuesday, March 8, 2011

Page 116: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

rRNA Phylotyping

• Collect DNA from environment

• PCR amplify rRNA genes using broad (so-called universal) primers

• Sequence• Align to others• Infer evolutionary tree• Unknowns “identified”

by placement on tree• Some use BLAST, but

not as good as phylogeny

Tuesday, March 8, 2011

Page 117: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

rRNA PCR

The Hidden Majority Richness estimates

Bohannan and Hughes 2003Hugenholtz 2002

Tuesday, March 8, 2011

Page 118: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 119: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

rRNA data increasing exponentially tooTuesday, March 8, 2011

Page 120: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

rRNA phylotyping issues

• Massive amounts of data– 1 x 10^6 new partial sequences with new 454– 2 x 10^6 full length sequences in DB

• Alignments of new sequences not always straightforward

• Solutions:– Reliance on similarity scores (bad)– High throughput automated phylogenetic tools

• STAP• WATERs

Tuesday, March 8, 2011

Page 121: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Perna et al. 2003Tuesday, March 8, 2011

Page 122: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 123: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 124: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 125: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Diversity of Proteorhodopsins by PCR

de la Torre et al 2003

Tuesday, March 8, 2011

Page 126: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Metagenomics

shotgunsequence

Tuesday, March 8, 2011

Page 127: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Massiuve Diversity of Proteorhodopsins

Venter et al., 2004Tuesday, March 8, 2011

Page 128: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 129: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Applied Phylogenetics

Tuesday, March 8, 2011

Page 130: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Example I: Functional Diversity

Tuesday, March 8, 2011

Page 132: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Example II: Phylotyping w/ many genes

Tuesday, March 8, 2011

Page 135: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

0

0.1250

0.2500

0.3750

0.5000

Alphapro

teob

acte

ria

Betap

rote

obac

teria

Gamm

apro

teob

acte

ria

Epsilon

prote

obac

teria

Deltap

rote

obac

teria

Cyano

bacte

ria

Firm

icute

s

Actino

bacte

ria

Chloro

biCFB

Chloro

flexi

Spiroch

aete

s

Fuso

bacte

ria

Deinoc

occu

s-Th

erm

us

Eurya

rcha

eota

Crena

rcha

eota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

Venter et al., Science 304: 66. 2004

Tuesday, March 8, 2011

Page 136: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

0

0.1250

0.2500

0.3750

0.5000

Alphapro

teob

acte

ria

Betap

rote

obac

teria

Gamm

apro

teob

acte

ria

Epsilon

prote

obac

teria

Deltap

rote

obac

teria

Cyano

bacte

ria

Firm

icute

s

Actino

bacte

ria

Chloro

biCFB

Chloro

flexi

Spiroch

aete

s

Fuso

bacte

ria

Deinoc

occu

s-Th

erm

us

Eurya

rcha

eota

Crena

rcha

eota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

Venter et al., Science 304: 66-74. 2004Tuesday, March 8, 2011

Page 137: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

0

0.1250

0.2500

0.3750

0.5000

Alphapro

teob

acte

ria

Betap

rote

obac

teria

Gamm

apro

teob

acte

ria

Epsilon

prote

obac

teria

Deltap

rote

obac

teria

Cyano

bacte

ria

Firm

icute

s

Actino

bacte

ria

Chloro

biCFB

Chloro

flexi

Spiroch

aete

s

Fuso

bacte

ria

Deinoc

occu

s-Th

erm

us

Eurya

rcha

eota

Crena

rcha

eota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

Cannot be done without good sampling of genomes

Venter et al., Science 304: 66-74. 2004Tuesday, March 8, 2011

Page 138: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Example III:Binning

Tuesday, March 8, 2011

Page 139: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Metagenomics Challenge

Tuesday, March 8, 2011

Page 140: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Binning challenge

Tuesday, March 8, 2011

Page 141: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Binning challenge

Best binning method: reference genomes

Tuesday, March 8, 2011

Page 142: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Binning challenge

Best binning method: reference genomes

Tuesday, March 8, 2011

Page 143: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Binning challenge

No reference genome? What do you do?

Tuesday, March 8, 2011

Page 144: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

CFB Phyla

Tuesday, March 8, 2011

Page 145: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

0

0.1250

0.2500

0.3750

0.5000

Alphapro

teob

acte

ria

Betap

rote

obac

teria

Gamm

apro

teob

acte

ria

Epsilon

prote

obac

teria

Deltap

rote

obac

teria

Cyano

bacte

ria

Firm

icute

s

Actino

bacte

ria

Chloro

biCFB

Chloro

flexi

Spiroch

aete

s

Fuso

bacte

ria

Deinoc

occu

s-Th

erm

us

Eurya

rcha

eota

Crena

rcha

eota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Phylogenetic Binning

Venter et al., Science 304: 66-74. 2004Tuesday, March 8, 2011

Page 146: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

0

0.1250

0.2500

0.3750

0.5000

Alphapro

teob

acte

ria

Betap

rote

obac

teria

Gamm

apro

teob

acte

ria

Epsilon

prote

obac

teria

Deltap

rote

obac

teria

Cyano

bacte

ria

Firm

icute

s

Actino

bacte

ria

Chloro

biCFB

Chloro

flexi

Spiroch

aete

s

Fuso

bacte

ria

Deinoc

occu

s-Th

erm

us

Eurya

rcha

eota

Crena

rcha

eota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

Cannot be done without good sampling of genomes

Venter et al., Science 304: 66-74. 2004Tuesday, March 8, 2011

Page 147: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

0

0.1250

0.2500

0.3750

0.5000

Alphapro

teob

acte

ria

Betap

rote

obac

teria

Gamm

apro

teob

acte

ria

Epsilon

prote

obac

teria

Deltap

rote

obac

teria

Cyano

bacte

ria

Firm

icute

s

Actino

bacte

ria

Chloro

biCFB

Chloro

flexi

Spiroch

aete

s

Fuso

bacte

ria

Deinoc

occu

s-Th

erm

us

Eurya

rcha

eota

Crena

rcha

eota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

GEBA Project improves metagenomic analysis

Venter et al., Science 304: 66-74. 2004Tuesday, March 8, 2011

Page 148: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

134

GEBA CyanoSequencing status (as of 01/14):

Awaiting Material 11Library 12Production 22Finishing 5Grand Total 50

On-going/ Planed Activities:

- Building Cyanobacterial Metadatabase (IMG-GOLD)

- 10th Cyanobacterial Molecular Biology Workshop, Lake Arrowhead, CA (06/10)

--> Cheryl will host: Workshop training as prep for virtual Jamboree

Tuesday, March 8, 2011

Page 149: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

135

Plan: Sequence multiple Root Nodule Bacteria (RNBs) across the

planet. Pilot: 100 RNBs.

Alpha RNB

BradyrhizobiumMesorhizobiumRhizobium

Beta RNB

Sinorhizobium

CupriavidisBurkholderia

Balneimonas-like

DevosiaOchrobactrumPhyllobacterium

AzorhizobiumAllorhizobium

GEBA RNB

Goal: • Understand BioGeographical effects on species evolution

and understand host-specificity.

Rationale: • N2 fixation by legume pastures and crops provides 65% of the N

currently utilized in agricultural production.

• Contributes 25 to 90 million metric tones N pa.

• Symbioses save $US 6-10 billion annually on N fertilizer.

• Grain and animal production enhanced by fixed nitrogen supplied by the symbiosis.

Nikos KyrpidesTuesday, March 8, 2011

Page 150: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Haloarchaeal GEBA-like

Tuesday, March 8, 2011

Page 151: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Still not happy

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Tuesday, March 8, 2011

Page 152: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

0

0.1250

0.2500

0.3750

0.5000

Alphapro

teob

acte

ria

Betap

rote

obac

teria

Gamm

apro

teob

acte

ria

Epsilon

prote

obac

teria

Deltap

rote

obac

teria

Cyano

bacte

ria

Firm

icute

s

Actino

bacte

ria

Chloro

biCFB

Chloro

flexi

Spiroch

aete

s

Fuso

bacte

ria

Deinoc

occu

s-Th

erm

us

Eurya

rcha

eota

Crena

rcha

eota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

GEBA Project improves metagenomic analysis, but only a little

Venter et al., Science 304: 66-74. 2004Tuesday, March 8, 2011

Page 153: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylogenomics Future 1

Need to adapt genomic and metagenomic methods to make better

use of data

Tuesday, March 8, 2011

Page 154: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Improving Metagenomic Analysis

• Methods– More automation– Better phylogenetic methods for short reads– Improved tools for using distantly related genomes

in metagenomic analysis• Data sets

– Rebuild protein family models– New phylogenetic markers– Need better reference phylogenies, including HGT

• More simulationsTuesday, March 8, 2011

Page 155: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Automation

Tuesday, March 8, 2011

Page 156: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

AMPHORA

Guide treeTuesday, March 8, 2011

Page 157: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

0

0.1750

0.3500

0.5250

0.7000

Alphapro

teob

acte

ria

Betap

rote

obac

teria

Gamm

apro

teob

acte

ria

Deltap

rote

obac

teria

Epsilon

prote

obac

teria

Unclas

sified

Pro

teob

acte

ria

Cyano

bacte

ria

Chlam

ydiae

Acidob

acte

ria

Bacte

roidet

es

Actino

bacte

ria

Aquifica

e

Planct

omyc

etes

Spiroch

aete

s

Firm

icute

s

Chloro

flexi

Chloro

bi

Unclas

sified

Bac

teria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

Tuesday, March 8, 2011

Page 158: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

We have more than 700 compete genome sequences:

•Select 100 representatives•Build gene families•Identify families that present in all organisms with equal numbers •Hmm building and phylogenetic analysis to identify the true makers

ε γβαδProteobacteria Firmicutes

Phylogenetic Tree of Bacteria (built from 31 concatenate marker alignments)

Tuesday, March 8, 2011

Page 159: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

More Markers

Tuesday, March 8, 2011

Page 160: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

AMPHORA 2 Coming w/ More Markers

Phylogenetic group Genome Number

Gene Number

Maker Candidates

Archaea 62 145415 106

Actinobacteria 63 267783 136

Alphaproteobacteria 94 347287 121

Betaproteobacteria 56 266362 311

Gammaproteobacteria 126 483632 118

Deltaproteobacteria 25 102115 206

Epislonproteobacteria 18 33416 455

Bacteriodes 25 71531 286

Chlamydae 13 13823 560

Chloroflexi 10 33577 323

Cyanobacteria 36 124080 590

Firmicutes 106 312309 87

Spirochaetes 18 38832 176

Thermi 5 14160 974

Thermotogae 9 17037 684

Tuesday, March 8, 2011

Page 161: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

0 1 2 3 4 5 6

rRNA16SruvBnusArplBpurArpsJsecYrpsIpyrHrpsErplPrplNrpsCruvArplFrplAserSrplKrpsKpriAsmpBrpsGguaArpsQrpsLrplUrplOrpsMinfCrplSrplVrplCrpsPrplErplTrplLrplQrpsHmraWrpsOrpsBrplIrplMrplRttffrrtsfrplDradArpsStrmDcoaErpmA

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

nusArpsCrpsEpriArplBsecY

rRNA16SrpsJrpsBruvBguaArplNserSrplFfrrrplArplErplCinfCrplDrplKpurAradAruvArpsMpyrHrplIrplMrpsGrpsLmraWrpsIttfrplStrmDtsfrplUrpsKrpsPrplOrplTrplVrpsSrplPrpsOsmpBrpsHrplQrplRrpsQrplLrpmAcoaE

Ribosomal protein Transcription/translation related proteinDNA repair protein Protein of other functionAMPHORA marker

Distance between the genome tree and 100 random trees (average ± standard deviation)

NODAL distance SPLIT distance

Distances between gene trees and the AMPHORA concatenated genome tree

Tuesday, March 8, 2011

Page 162: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Fragments

Tuesday, March 8, 2011

Page 163: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylogenetic challenge

A single tree with everything

Tuesday, March 8, 2011

Page 164: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

PhylOTU: A High-Throughput Procedure QuantifiesMicrobial Community Diversity and Resolves Novel Taxafrom Metagenomic DataThomas J. Sharpton1*, Samantha J. Riesenfeld1, Steven W. Kembel2, Joshua Ladau1, James P.

O’Dwyer2,3, Jessica L. Green2, Jonathan A. Eisen4, Katherine S. Pollard1,5

1 The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America, 2Center for Ecology and Evolutionary

Biology, University of Oregon, Eugene, Oregon, United States of America, 3 Institute of Integrative and Comparative Biology, University of Leeds, Leeds, United Kingdom,

4Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America, 5 Institute for Human Genetics & Division of Biostatistics,

University of California San Francisco, San Francisco, California, United States of America

Abstract

Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomicunits (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases inpriming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoidsamplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUsfrom metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles.Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisonsof PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR librariesidentify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. Inaddition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed byanalysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of thebiosphere currently hidden from PCR-based surveys of diversity?

Citation: Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O’Dwyer JP, et al. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial CommunityDiversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061

Editor: Oded Beja, Technion-Israel Institute of Technology, Israel

Received July 22, 2010; Accepted December 17, 2010; Published January 20, 2011

Copyright: ! 2011 Sharpton et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: Funding for this work was provided by the Gordon and Betty Moore Foundation (grant #1660, http://www.moore.org/). JPOD acknowledges fundingfrom the EPSRC (grant #EP/G051402/1, http://www.epsrc.ac.uk). The funders had no role in study design, data collection and analysis, decision to publish, orpreparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

Introduction

A central goal of ecology and evolution is to understand theforces that shape biodiversity - the variety of life on Earth. It isbecoming increasingly clear that global biodiversity is mostlymicrobial. It is estimated that there are millions of microbialspecies on the planet, relatively few of which have been isolated inculture [1–2]. Despite the recognized importance of microorgan-isms, we still know little about the magnitude and variability ofmicrobial biodiversity in natural environments relative to what isknown about plants and animals. This is a major knowledge gap,given that microbes are critical components of our planet,responsible for key ecosystems services including the productionof agriculturally critical small molecules, the degradation ofenvironmental contaminants, and the regulation of human hostphenotypes.Biodiversity science has traditionally focused on comparing

species richness across space, time and environments. Out ofnecessity, microbial diversity studies usually examine the richness(i.e. number) of operational taxonomic units (OTUs), where OTUsare sequence similarity based surrogates for microbial taxa, whichcan be difficult to define. In addition to richness, OTUs have been

used to characterize the abundance, range, and distribution ofmicrobes, thereby improving our understanding of both naturalecosystems and human health [3–6]. OTUs are commonlyidentified by aligning sequences of the small subunit of ribosomalRNA (SSU-rRNA) from one or more samples and identifyinggroups of related sequences using a hierarchical clusteringalgorithm. This clustering is based upon a measure of distancebetween all pairs of sequences, which is typically defined usingsome variant of the percent sequence identify (PID) (e.g. [3,7–8]).For example, researchers traditionally cluster sequences that areno more than 3% diverged into the same OTU. This designationhas been proposed as being roughly equivalent to a species-levelclassification [9], though evidence suggests that it may result in anunderestimate of the true number of species [10].The SSU-rRNA sequences for OTU identification are tradi-

tionally amplified from a sample via polymerase chain reaction(PCR) using universal primers. Each PCR product is thenindividually sequenced. One of the biggest drawbacks of thistargeted sequencing approach is that it leverages PCR, which hasbeen shown to exhibit sequence-based biases at the level ofpriming and extension [11–13]. In addition, the so-called‘universal’ PCR primers used in such assays will fail to amplify

PLoS Computational Biology | www.ploscompbiol.org 1 January 2011 | Volume 7 | Issue 1 | e1001061

alignment used to build the profile, resulting in a multiplesequence alignment of full-length reference sequences andmetagenomic reads. The final step of the alignment process is aquality control filter that 1) ensures that only homologous SSU-rRNA sequences from the appropriate phylogenetic domain areincluded in the final alignment, and 2) masks highly gappedalignment columns (see Text S1).We use this high quality alignment of metagenomic reads and

references sequences to construct a fully-resolved, phylogenetictree and hence determine the evolutionary relationships betweenthe reads. Reference sequences are included in this stage of theanalysis to guide the phylogenetic assignment of the relativelyshort metagenomic reads. While the software can be easilyextended to incorporate a number of different phylogenetic toolscapable of analyzing metagenomic data (e.g., RAxML [27],pplacer [28], etc.), PhylOTU currently employs FastTree as adefault method due to its relatively high speed-to-performanceratio and its ability to construct accurate trees in the presence ofhighly-gapped data [29]. After construction of the phylogeny,lineages representing reference sequences are pruned from thetree. The resulting phylogeny of metagenomic reads is then used tocompute a PD distance matrix in which the distance between apair of reads is defined as the total tree path distance (i.e., branchlength) separating the two reads [30]. This tree-based distancematrix is subsequently used to hierarchically cluster metagenomicreads via MOTHUR into OTUs in a fashion similar to traditionalPID-based analysis [31]. As with PID clustering, the hierarchicalalgorithm can be tuned to produce finer or courser clusters,corresponding to different taxonomic levels, by adjusting theclustering threshold and linkage method.To evaluate the performance of PhylOTU, we employed

statistical comparisons of distance matrices and clustering resultsfor a variety of data sets. These investigations aimed 1) to compare

PD versus PID clustering, 2) to explore overlap between PhylOTUclusters and recognized taxonomic designations, and 3) to quantifythe accuracy of PhylOTU clusters from shotgun reads relative tothose obtained from full-length sequences.

PhylOTU Clusters Recapitulate PID ClustersWe sought to identify how PD-based clustering compares to

commonly employed PID-based clustering methods by applyingthe two methods to the same set of sequences. Both PID-basedclustering and PhylOTU may be used to identify OTUs fromoverlapping sequences. Therefore we applied both methods to adataset of 508 full-length bacterial SSU-rRNA sequences (refer-ence sequences; see above) obtained from the Ribosomal DatabaseProject (RDP) [25]. Recent work has demonstrated that PID ismore accurately calculated from pairwise alignments than multiplesequence alignments [32–33], so we used ESPRIT, whichimplements pairwise alignments, to obtain a PID distance matrixfor the reference sequences [32]. We used PhylOTU to compute aPD distance matrix for the same data. Then, we used MOTHUR tohierarchically cluster sequences into OTUs based on both PIDand PD. For each of the two distance matrices, we employed arange of clustering thresholds and three different definitions oflinkage in the hierarchical clustering algorithm: nearest-neighbor,average, and furthest-neighbor.To statistically evaluate the similarity of cluster composition

between of each pair of clustering results, we used two summarystatistics that together capture the frequency with which sequencesare co-clustered in both analyses: true conjunction rate (i.e., theproportion of pairs of sequences derived from the same cluster inthe first analysis that also are clustered together in the secondanalysis) and true disjunction rate (i.e., the proportion of pairs ofsequences derived from different clusters in the first analysis thatalso are not clustered together in the second analysis) (see Methods

Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generalizeworkflow of PhylOTU. See Results section for details.doi:10.1371/journal.pcbi.1001061.g001

Finding Metagenomic OTUs

PLoS Computational Biology | www.ploscompbiol.org 3 January 2011 | Volume 7 | Issue 1 | e1001061

Tuesday, March 8, 2011

Page 165: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

AMPHORA ALL • Build reference tree with concatenated alignment

• Align reads that match any of the HMMs to concatenated alignment

• Place reads into reference tree one at a time

Tuesday, March 8, 2011

Page 166: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylogenomics Future 2

We have still only scratched the surface of microbial diversity

Tuesday, March 8, 2011

Page 167: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

rRNA Tree of Life

Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007.

Based on tree from Pace 1997 Science 276:734-740

Archaea

Eukaryotes

Bacteria

Tuesday, March 8, 2011

Page 170: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylogenetic Diversity: Isolates

From Wu et al. 2009 Nature 462, 1056-1060Tuesday, March 8, 2011

Page 171: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylogenetic Diversity: All

From Wu et al. 2009 Nature 462, 1056-1060

Tuesday, March 8, 2011

Page 172: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Uncultured Lineages:

• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification

Tuesday, March 8, 2011

Page 173: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

159

Number of SAGs from Candidate Phyla

OD

1

OP

11

OP

3

SA

R4

06

Site A: Hydrothermal vent 4 1 - -Site B: Gold Mine 6 13 2 -Site C: Tropical gyres (Mesopelagic) - - - 2Site D: Tropical gyres (Photic zone) 1 - - -

Sample collections at 4 additional sites are underway.

Phil Hugenholtz

GEBA uncultured

Tuesday, March 8, 2011

Page 174: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 175: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 176: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 177: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 178: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Phylogenomics Future 3

Need Experiments from Across the Tree of Life too

Tuesday, March 8, 2011

Page 179: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Based on Hugenholtz, 2002

Tuesday, March 8, 2011

Page 180: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Experimental studies are mostly from three phyla

As of 2002

Based on Hugenholtz, 2002

Tuesday, March 8, 2011

Page 181: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Experimental studies are mostly from three phyla

• Some studies in other phyla

As of 2002

Based on Hugenholtz, 2002

Tuesday, March 8, 2011

Page 182: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

As of 2002

Based on Hugenholtz, 2002

Tuesday, March 8, 2011

Page 183: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

As of 2002

Based on Hugenholtz, 2002

Tuesday, March 8, 2011

Page 184: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Need experimental studies from across the tree too

Tuesday, March 8, 2011

Page 185: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Adopt a Microbe

Tuesday, March 8, 2011

Page 186: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Conclusion

• Phylogenetic sampling of genomes improves our understanding of microbial diversity in many ways

• Still need– More biogeography– More phenotypic/experimental data– Deeper phylogenetic sampling

Tuesday, March 8, 2011

Page 187: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Tuesday, March 8, 2011

Page 188: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

MICROBES

Tuesday, March 8, 2011

Page 189: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

A Happy Tree of Life

Tuesday, March 8, 2011

Page 190: Talk for UC Davis Applied Phylogenetics Course at Bodega Bay

Acknowledgements

• GEBA: DOE-JGI, DSMZ• GWSS: Nancy Moran & lab, Dongying Wu• iSEEM: Katie Pollard, Jessica Green,

Martin Wu• RecA: Dongying Wu, Craig Venter, Doug

Rusch, et al.

Tuesday, March 8, 2011