jonathan eisen talk on "genomic encyclopedia" at lake arrowhead small genomes meeting 2008
DESCRIPTION
Talk by Jonathan Eisen on "A genomic encyclopedia of bacteria and archaea" at Lake Arrowhead Small Genomes meeting in 2008.TRANSCRIPT
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
A Genomic Encyclopedia ofBacteria and Archaea
(GEBA)
Jonathan A. Eisen
U. C. Davis and J. G. I.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Outline
• Background– Why history matters– Gaps in available genomes
• The GEBA pilot project
• Future needs
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
The Tree of Life
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Famous Arrowhead 2004 Quotes
• Space-time continuum of genes and genomes
• Gene sequences are the wormhole that allows one to tunnel into the past
• The human mind can conceive of things with no basis in physical reality
• Thoughts can go faster than the speed of light
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Famous Arrowhead Quotes 2006
• Publications, student degrees, etc.
• Not trying to say anything bad about anyone
• The human guts are a real milieu
• Where’s you evening gown?
• You better kiss everybody
• This is how you do metagenomics on 50 dollars, and that’s Canadian dollars
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture. From http://genomesonline.org
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Major Microbial Sequencing Efforts
• Coordinated, top-down efforts– Fungal Genome Initiative (Broad/Whitehead)
– Gordon and Betty Moore Foundation Marine Microbial Genome Sequencing Project
– Sanger Center Pathogen Sequencing Unit
– NHGRI Human Gut Microbiome Project
– NIH Human Microbiome Program
• White paper or grant systems– NIAID Microbial Sequencing Centers
– DOE/JGI Community Sequencing Program
– DOE/JGI BER Sequencing Program
– NSF/USDA Microbial Genome Sequencing
• Covers lots of ground and biological diversity
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
The Tree of Life
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
The Tree is not Happy
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
As of 2002
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
As of 2002
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
As of 2002
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Archaea, Eukaryotes
As of 2002
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Need for Tree Guidance Well Established
• Common approach within some eukaryotic groups– NHGRI animal projects– FGI at Whitehead– Plant sequencing at JGI
• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature
• Many small projects funded to fill in some gaps– DOE/TIGR Sequencing– Multiple CSP projects– Multiple NSF/USDA projects– Private projects (e.g., Integrated Genomics, Diversa)– TIGR (Eisen, Ward) Bacterial Tree of Life Project
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Why Increase Taxonomic Coverage?
• Mechanisms of diversification
• Gene discovery
• Annotation, functional prediction
• Metagenomic analysis
• Species phylogeny and classification
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution I: sequence more phyla
• Eisen-Ward NSF Tree of Life Project
• A genome from each of eight phyla
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
The Tree of Life is Still Angry
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Within Phyla Diversity Immense
• Each phyla represents billions of years of evolution
• Some have hundreds of major lineages, most with no genomes
• Need to sample within phyla too
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Major Lineages of Actinobacteria II2.5 Actinobacteria
2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 100 phyla of bacteria
• Genome sequences are mostly from three phyla
• Most phyla with cultured species are sparsely sampled
• Lineages with no cultured taxa even more poorly sampled
• Solution - use tree to really fill gaps
Well sampled phyla
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot Project: Components• Project management (David Bruce, Lynne Goodwin et al)• Selection of strains (Phil Hugenholtz, Nikos Kyrpides, Jonathan
Eisen) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Libraries and DNA (Eileen Dalin et al.)• Sequencing and closure (Susan Lucas, Alla Lapidus et al.)• Annotation and database needs (Nikos Kyrpides)• Analysis (Dongying Wu, Martin Wu, Jenna Morgan, Victor Kunin,
Marcel Huntemann, Neil Rawlings, Ian Paulsen, Gary Xie, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Mavrommatis Kostas)
• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot I: Identifying Lineages without
Genomes
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (LZW) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot II: Selecting Targets
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Key Criteria
• Phylogenetic novelty– Working from top of tree down– Also selected one phylum to fill in in more detail -
Actinobacteria
• Culturable– Type strain preferred is all else equal
• DOE mission relevance• Ready availability to us and community
– Of strain– Of DNA
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot III: Partnership with DSMZ
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Biggest Challenge:Getting DNA
• Getting quality DNA is biggest bottleneck• Decided to test as part of the GEBA pilot
the possibility of getting DNA directly from culture collections
• DSMZ offered to do for free• ATCC is doing a small number for a fee• Working with other culture collections
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
MicroorganismsMicroorganismsQuantification gel of the genomic DNA isolated from
Conexibacter woesei (DSM 14684T)
Conexibacter woesei (DSM 14684T) was taken from the German Collection of Microorganisms and Cell Cultures (DSMZ). The genomic DNA was isolated using the Qiagen Genomic 500 DNA Kit (Qiagen 10262). The genomic DNA was 10-250 kb in size as determined by Pulsed Field Gel Electrophoresis (PFGE). The bulk of DNA had a size of 50-250 kb (see attached PFGE image). The DNA concentration is 500 ng/µl as estimated from the gel. Spectrophotometric measurements yielded a DNA concentration of 450 µg/ml; 300 µl of genomic DNA are shipped (150 µg).
1 2 3 4 5 6 7 8
Lane 1: c(-Marker)= 15 ngLane 2: c(-Marker)= 30 ngLane 3: c(-Marker)= 50 ngLane 4: DNA Molecular Weight Marker II (Roche
236250)Lane 5: DSM 13279, Collinsella stercorisLane 6: DSM 43043, Intrasporangium calvumLane 7: DSM 18053, Dyadobacter fermentansLane 8: DSM 20476, Slackia heliotrinireducens
Lane 9: DSM 18081, Patulibacter minatonensisLane 10: DSM 14684, Conexibacter woeseiLane 11: DSM 11002, Dethiosulfovibrio peptidovoransLane 12: DSM 11551, Halogeometricum borinquenseLane 13: DNA Molecular Weight Marker II (Roche
236250)Lane 14: c(-Marker)= 125 ngLane 15: c(-Marker)= 250 ng Lane 16: c(-Marker)= 500 ng
9 10 11 12 13 14 15 16
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot IV: Sequencing Progress
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot Target List
0
5
10
15
20
25
30
35
B: Actinobacteria (High GC)
B: Aminanaerobia
B: Aquificae
B: BacteroidetesB: Chloroflexi
B: DeferribacteresB: Deferribacteres
B: Deinococci
B: Delta ProteobacteriaB: Epsilon Proteobacteria
B: FirmicutesB: Fusobacteria
B: Gamma ProteobacteriaB: Gemmatimonadetes
B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes
B: Thermodesulfobacteria
B: ThermodesulfobiaB: Thermovenabulae
A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia
A: ThermococciA: Thermoprotei
Phyla
# of Genomes
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot Status 5-12-08
0
5
10
15
20
25
30
35
B: Actinobacteria (High GC)
B: Aminanaerobia
B: Aquificae
B: BacteroidetesB: Chloroflexi
B: DeferribacteresB: Deferribacteres
B: Deinococci
B: Delta ProteobacteriaB: Epsilon Proteobacteria
B: FirmicutesB: Fusobacteria
B: Gamma ProteobacteriaB: Gemmatimonadetes
B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes
B: Thermodesulfobacteria
B: ThermodesulfobiaB: Thermovenabulae
A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia
A: ThermococciA: Thermoprotei
Phyla
# of Genomes
Closed
Post Draft
Production
Library
Awaiting Material
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Non Active Projects
0
2
4
6
8
10
12
14
16
B: Actinobacteria (High GC)
B: Aminanaerobia
B: Aquificae
B: BacteroidetesB: Chloroflexi
B: DeferribacteresB: Deferribacteres
B: Deinococci
B: Delta ProteobacteriaB: Epsilon Proteobacteria
B: FirmicutesB: Fusobacteria
B: Gamma ProteobacteriaB: Gemmatimonadetes
B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes
B: Thermodesulfobacteria
B: ThermodesulfobiaB: Thermovenabulae
A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia
A: ThermococciA: Thermoprotei
Phyla
# of Genomes
Abandoned
On Hold
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot Data Release
0
5
10
15
20
25
30
B: Actinobacteria (High GC)
B: Aminanaerobia
B: Aquificae
B: BacteroidetesB: Chloroflexi
B: DeferribacteresB: Deferribacteres
B: Deinococci
B: Delta ProteobacteriaB: Epsilon Proteobacteria
B: FirmicutesB: Fusobacteria
B: Gamma ProteobacteriaB: Gemmatimonadetes
B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes
B: Thermodesulfobacteria
B: ThermodesulfobiaB: Thermovenabulae
A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia
A: ThermococciA: Thermoprotei
Phyla
# of Genomes
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Progress ReportGEBA Status 5-12-08
Awaiting Material26%
Library9%
Production11%
Post Draft51%
Closed3%
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Progress
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Data
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Organism Domain Phylum Status IMG-GEBA NCBI-PID Culture-ID GOLD-IDAcidimicrobium ferrooxidans DSM 10331 Bacteria Actinobacteria draft 2500645360 29525 DSM 10331 Gi02326Actinosynnema mirum 101, DSM 43827 Bacteria Actinobacteria draft 2500395345 19705 DSM 43827 Gi02064Alicyclobacillus acidocaldarius acidocaldarius 104-IA, DSM 446 Bacteria Firmicutes draft 2500575013 29405 DSM 446 Gi02324Anaerococcus prevotii PC 1, DSM 20548 Bacteria Firmicutes draft 2500645363 29533 DSM 20548 Gi02318Atopobium parvulum IPP 1246, DSM 20469 Bacteria Actinobacteria draft 2500575011 29401 DSM 20469 Gi02317Beutenbergia cavernosa HKI 0122, DSM 12333 Bacteria Actinobacteria draft 2500395322 20827 DSM 12333 Gi02225Brachybacterium faecium DSM 4810 Bacteria Actinobacteria finished 2500153401 17026 DSM 4810 Gi02066Brachyspira murdochii DSM 12563 Bacteria Spirochaetes draft 2500645365 29543 DSM 12563 Gi02313Capnocytophaga ochracea DSM 7271 Bacteria Bacteroidetes draft 2500575012 29403 DSM 7271 Gi02305Catenulispora acidiphila ID139908, DSM 44928 Bacteria Actinobacteria draft 2500395338 21085 DSM 44928 Gi02233Cellulomonas flavigena 134, DSM 20109 Bacteria Actinobacteria draft 2500395336 19707 DSM 20109 Gi02067Chitinophaga pinensis UQM 2034, DSM 2588 Bacteria Bacteroidetes draft 2500395347 27951 DSM 2588 Gi02244Conexibacter woesei ID131577, DSM 14684 Bacteria Actinobacteria draft 2500347307 20745 DSM 14684 Gi02154Cryptobacterium curtum DSM 15641 Bacteria Actinobacteria finished 2500332002 20739 DSM 15641 Gi02234Denitrovibrio acetiphilus N2460, DSM 12809 Bacteria Deferribacteres draft 2500575016 29431 DSM 12809 Gi02322Desulfohalobium retbaense DSM 5692 Bacteria Deltaproteobacteria draft 2500575018 29199 DSM 5692 Gi02246Desulfomicrobium baculatum DSM 04028 Bacteria Deltaproteobacteria draft 2500645356 29527 DSM 4028 Gi02302Desulfotomaculum acetoxidans 5575, DSM 771 Bacteria Firmicutes draft 2500395337 27947 DSM 771 Gi02239Dethiosulfovibrio peptidovorans SEBR 4207, DSM 11002 Bacteria Aminanaerobia draft 2500549401 20741 DSM 11002 Gi02152Dyadobacter fermentans NS 114, DSM 18053 Bacteria Bacteroidetes draft 2500395342 20829 DSM 18053 Gi02155Eggerthella lenta VPI 0255, DSM 2243 Bacteria Actinobacteria draft 2500549402 21093 DSM 2243 Gi02242Geodermatophilus obscurus DSM 43160 Bacteria Actinobacteria draft 2500645366 29547 DSM 43160 Gi02257Gordonia bronchialis DSM 43247 Bacteria Actinobacteria draft 2500645367 29549 DSM 43247 Gi02258Haliangium ochraceum SMP-2, DSM 14365 Bacteria Deltaproteobacteria draft 2500395339 28711 DSM 14365 Gi02251Halogeometricum borinquense DSM 11551 Archaea Halobacteria finished 2500153400 20743 DSM 11551 Gi02153Halomicrobium mukohataei arg-2, DSM 12286 Archaea Halobacteria draft 2500395343 27945 DSM 12286 Gi02248Halorhabdus utahensis AX-2, DSM 12940 Archaea Halobacteria draft 2500575004 29305 DSM 12940 Gi02250Jonesia denitrificans DSM 20603 Bacteria Actinobacteria draft 2500168153 20833 DSM 20603 Gi02227Kangiella koreensis SW-125, DSM 16069 Bacteria Gammaproteobacteria draft 2500645353 29443 DSM 16069 Gi02314Kribbella flavida DSM 17836 Bacteria Actinobacteria draft 2500395325 21089 DSM 17836 Gi02235Kytococcus sedentarius DSM 20547 Bacteria Actinobacteria finished 2500168150 21067 DSM 20547 Gi02226Leptotrichia buccalis C-1013-b, DSM 1135 Bacteria Fusobacteria draft 2500645352 29445 DSM 1135 Gi02240Meiothermus ruber DSM 1279 Bacteria Deinococci draft 2500395348 28827 DSM 1279 Gi02300Meiothermus silvanus DSM 9946 Bacteria Deinococci draft 2500645369 29551 DSM 9946 Gi02308Nakamurella multipartita DSM 44233 Bacteria Actinobacteria draft 2500645368 21081 DSM 44233 Gi02230Nocardiopsis dassonvillei dassonvillei DSM 43111 Bacteria Actinobacteria draft 2500395320 19709 DSM 43111 Gi02065Pedobacter heparinus HIM 762-3, DSM 2366 Bacteria Bacteroidetes draft 2500395321 27949 DSM 2366 Gi02243Planctomyces limnophilus DSM 3776 Bacteria Bacteroidetes draft 2500575009 29411 DSM 3776 Gi02301Rhodothermus marinus DSM 4252 Bacteria Bacteroidetes draft 2500575002 29281 DSM 4252 Gi02303Saccharomonospora viridis P101, DSM 43017 Bacteria Actinobacteria finished 2500347305 20835 DSM 43017 Gi02228Sanguibacter keddieii DSM 10542 Bacteria Actinobacteria finished 2500153403 19711 DSM 10542 Gi02151Sebaldella termitidis ATCC 33386 Bacteria Fusobacteria draft 2500645364 29539 ATCC 33386 Gi02490Slackia heliotrinireducens DSM 20476 Bacteria Actinobacteria finished 2500168151 20831 DSM 20476 Gi02157Sphaerobacter thermophilus 4ac11, DSM 20745 Bacteria Chloroflexi draft 2500347306 21087 DSM 20745 Gi02236Spirosoma linguale DSM 74 Bacteria Bacteroidetes draft 2500395346 28817 DSM 74 Gi02298Stackebrandtia nassauensis LLR-40K-21, DSM 44728 Bacteria Actinobacteria draft 2500549403 19713 DSM 44728 Gi02068Streptobacillus moniliformis DSM 12112 Bacteria Fusobacteria draft 2500575005 29309 DSM 12112 Gi02312Streptosporangium roseum NI 9100, DSM 43021 Bacteria Actinobacteria draft 2500395335 21083 DSM 43021 Gi02229Sulfurospirillum deleyianum DSM 6946 Bacteria Epsilonproteobacteria draft 2500645361 29529 DSM 6946 Gi02323Thermanaerovibrio acidaminovorans Su883 DSM 6589 Bacteria Aminanaerobia draft 2500645362 29531 DSM 6589 Gi02247Thermobaculum terrenum YNP1, ATCC BAA-798 Bacteria Chloroflexi draft 2500645355 29523 ATCC BAA-798 Gi02489Thermobispora bispora DSM 43833 Bacteria Actinobacteria finished 2500194801 20737 DSM 43833 Gi02237Thermomonospora curvata DSM 43183 Bacteria Actinobacteria draft 2500645351 20825 DSM 43183 Gi02238Tsukamurella paurometabola DSM 20162 Bacteria Actinobacteria draft 2500575010 29399 DSM 20162 Gi02254Veillonella parvula Te3, DSM 2008 Bacteria Firmicutes draft 2500347300 21091 DSM 2008 Gi02241Xylanimonas cellulosilytica DSM 15894 Bacteria Actinobacteria draft 2500153402 19715 DSM 15894 Gi02069
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot V: Benefit?
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Why Increase Taxonomic Coverage?
• Mechanisms of diversification
• Gene discovery
• Annotation, functional prediction
• Metagenomic analysis
• Species phylogeny and classification
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Value of 100 diverse genomes I: Gene discovery
• Gene families– Will compare and contrast gene family
diversity in these genomes versus random samples of previous genomes
– Will assess rate of gene family discovery and whether / how much it is diminishing
• Specific examples of novelty– Focusing on DOE mission areas– Do we find novel forms of hydrogenases,
cellulases, C-fixation, etc
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Value of 100 diverse genomes II: Annotation
• Ortholog identification– Filling in gaps will help identify orthologs between species
– Diverse GC content and amino acid composition should also improve ortholog identification
• Examination of the rate of hypothetical protein conversion to “known” proteins
• Non-homology functional prediction should improve greatly– Phylogenetic profiling
– Rosetta Stone domain sharing
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Based on Wu et al. 2005
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Value of 100 diverse genomes III: Metagenomics
• More diverse genomes should improve anchoring and binning of all metagenomic data sets
• Will test by running phylotyping software comparing to genome data sets with and without GEBA genomes– Megan
– AMPHORA
• Should be a good complement to reference genome sequencing
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
AlphaproteobacteriaBetaproteobacteriaGammaproteobacteria
DeltaproteobacteriaEpsilonproteobacteria
Unclassified Proteobacteria
CyanobacteriaChlamydiae
AcidobacteriaBacteroidetesActinobacteria
Aquificae
PlanctomycetesSpirochaetes
FirmicutesChloroflexiChlorobi
Unclassified Bacteria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Value of 100 diverse genomes IV: Mechanisms of Diversification
• Lateral gene transfer– Lateral gene transfer is fundamentally important in
microbial evolution– However, when we find “foreign” DNA in genomes we
usually cannot pinpoint the origin of that DNA– Having more diverse genomes may help better pin
down source groups for each piece of foreign DNA
• Eukaryotic diversification– Of ~200 eukaryotic specific gene families– How many now show up in bacteria and archaea– Any patterns to where there are found?
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
CRISPR - expanding the possible
34 out of 56 genomes contain CRISPR1-13 arrays (loci) per genome
Halingium ochraceum SMP-2, DSM 14365807 repeats in total
a single repeat contains 382 repeats
Verminephrobacter eisenieae: 249 repeats
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Value of 100 diverse genomes V: Phylogeny
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
16s Says Hyphomonas is in Rhodobacteriales
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Badger et al. 2005
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
WGT Says Its Related to Caulobacterales
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Badger et al. 2005
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Tree of Life Example II
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA - What’s Next
• Repeat and/or scale up
• Need to determine the value of finished versus unfinished genomes
• Apply this method to other groups– Microbial eukaryotes– Viruses
• Really fill in bacterial and archaeal tree
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
The slopes of the linear regression Lines represent the PD contribution of the genomes(each window contains 50 genomes)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Slo
pe (
50 g
enom
e w
indo
ws)
Window position
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Window position
Slo
pe (
50 g
enom
e w
indo
ws)
Greengenes ssrRNA
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA: Long Run
• Need active community input• Involvement of multiple funding agencies,
labs, genome centers• Integration/ communication among all large
scale projects• Follow recommendations of NAS, ASM,
AAM reports• Adopt a Microbe - Link to educational
initiatives
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Most phyla with cultured species are sparsely sampled
• Lineages with no cultured taxa even more poorly sampled
Well sampled phyla
Poorly sampled
No cultured taxa
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Uncultured Lineages:Technical Approaches
• Get into culture
• Enrichment cultures
• If abundant in low diversity ecosystems
• Flow sorting
• Microbeads
• Microfluidic sorting
• Single cell amplification
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
A Happy Tree of Life
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Most phyla with cultured species are sparsely sampled
• Lineages with no cultured taxa even more poorly sampled
Well sampled phyla
Poorly sampled
No cultured taxa
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Proteobacteria54%
Actinobacteria18%
Firmicutes14%
Bacteroidetes6%
other phyla3%
3760 bacterial cultures
culture collection (ACM)
other phyla20%
Bacteroidetes1%
Firmicutes24%
Actinobacteria7%
Proteobacteria45%
71 bacterial genomes
sequenced genomes
Taxonomic Bias in Cultures TooTaxonomic Bias in Cultures Too
Slide by Hugenholtz
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot Project at JGI
• Select 200 organisms using tree/taxonomy as guide
• Collaborate with culture collections to obtain DNA
• Sequence to closure 100 for which DNA QC is good
• Sequencing by Sanger-454 hybrid approach• Data, annotation released after shotgun and
closure• Assess tree based sequencing by doing
reconstructions with different selection criteria.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Selecting Organisms Step 2:
• Selecting representatives from each lineage without a genome
• Preference given within groups for organisms of DOE mission relevance
• Focused on type strains of cultured species
• Selecting those for which we can get DNA
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
0
5
10
15
20
25
30
35
40
ActinobacteriaFirmicutes
ProteobacteriaBacteroidetesMethanomicrobia
ThermoproteiHalobacteriaFusobacteria
Thermi
AminanaerobiaPlanctomycetesSpirochaetes
Chloroflexi
Deferribacteres
Aquificae
HaloanaerobialesArchaeoglobi
Thermovenabulae
AcidobacteriaMethanobacteria
Thermococci
Thermodesulf...Thermodesulfobia
GEBA Pilot Project 4th month project status: 79/169 DNAs delivered
in JGI pending
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Sargasso Phylotypes
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
AlphaproteobacteriaBetaproteobacteriaGammaproteobacteriaEpsilonproteobacteria
Deltaproteobacteria
CyanobacteriaFirmicutes
Actinobacteria
Chlorobi
CFB
ChloroflexiSpirochaetesFusobacteria
Deinococcus-Thermus
EuryarchaeotaCrenarchaeota
Major Phylogenetic Group
Weighted % of Clones
EFG
EFTu
HSP70
RecA
RpoB
rRNA
Other Markers Give Similar Phylotpyes
Venter et al., 2004
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution I: sequence more phyla
• Eisen-Ward NSF Tree of Life Project
• A genome from each of eight phyla
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution II: Fill in Phyla
• JGI - Genomic Encyclopedia of Bacteria and Archaea
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution III: Sequence Uncultured
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
The Tree of Life is Still Angry
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Circular Maps
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
DNA Repair Genes in D. radiodurans Complete Genome
Process Genes in D. radiodurans
Nucleotide Excision Repair UvrABCD, UvrA2Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths,
MPGAP Endonuclease XthMismatch Excision Repair MutS, MutLRecombination Initiation Recombinase Migration and resolution
RecFJNRQ, SbcCD, RecDRecARuvABC, RecG
Replication PolA, PolC, PolX, phage PolLigation DnlJdNTP pools, cleanup MutTs, RRaseOther LexA, RadA, HepA, UVDE, MutS2
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Problem:
List of DNA repair gene homologs in D. radiodurans genome is not significantly different from other bacterial genomes
of the similar size
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
0.1
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree based on Hugenholtz (2002) with some modifications.
~40 Phyla of Bacteria
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
0.1
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree based on Hugenholtz (2002) with some modifications.
Most DNA metabolism studies in two Phyla
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
0.1
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree based on Hugenholtz (2002) with some modifications.
Deinococcus is very distant from well studied groups
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
-Ogt-RecFRQN-RuvC-Dut-SMS
-PhrI-AlkA-Nfo-Vsr-SbcCD-LexA-UmuC
-PhrI-PhrII-AlkA-Fpg-Nfo-MutLS-RecFORQ-SbcCD-LexA-UmuC-TagI
-PhrI-Ogt-AlkA-Xth-MutLS-RecFJORQN-Mfd-SbcCD-RecG-Dut-PriA-LexA-SMS-MutT
-PhrI-PhrII?-AlkA-Fpg-Nfo-RecO-LexA-UmuC
-PhrI-Ung?-MutLS-RecQ?-Dut-UmuC
-PhrII-Ogg
-Ogt-AlkA-TagI-Nfo-Rec-SbcCD-LexA
-Ogt-AlkA-Nfo-RecQ-SbcD?-Lon-LexA
-AlkA-Xth-Rad25?
-AlkA-Rad25
-Nfo
-Ogt-Ung-Nfo-Dut-Lon
-Ung
-PhrII
-PhrI
Ecoli
Haein
Neig
o Help
y
Bacsu
Str
py
Mycg
e
Mycp
n
Borb
u Tre
pa
Syn
sp
Metj
n
Arc
fu
Mett
h
Hu
man
Yeast
BACTERIA ARCHAEA EUKARYOTES
from mitochondria
+Ada+MutH+SbcB
dPhr
+TagI?+Fpg
+UvrABCD+Mfd
+RecFJNOR+RuvABC
+RecG+LigI
+LexA+SSB
+PriA+Dut?
+Rus+UmuD
+Nei?+RecE
tRecT?
+Vsr+RecBCD?
+RFAs+TFIIH
+Rad4,10,14,16,23,26+CSA
+Rad52,53,54+DNA-PK, Ku
dSNF2dMutSdMutLdRecA
+Rad1+Rad2
+Rad25?+Ogg+LigII
+Ung?+SSB,
+Dut?
+PhrI, PhrII+Ogt
+Ung, AlkA, MutY-Nth+AlkA
+Xth, Nfo?+MutLS?
+SbcCD+RecA
+UmuC+MutT
+LondMutSI/MutSII
dRecA/SMSdPhrI/PhrII
+Sprt3MG
+Rad7+CCE1
+P53dRecQ
dRad23+MAG?
-PhrII-RuvC
tRad25
+TagI?
+RecT
tUvrABCD
tTagI ?
Gain and Loss of Repair Genes
Eisen and Hanawalt, 1999 Mut Res 435: 171-213
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Solution - Experiments
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
0.1
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree based on Hugenholtz (2002) with some modifications.
Need experimental studies from across the tree too
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
MICROBES
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
A Happy Tree of Life
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
TIGRTIGR
Other peopleOther people
Mom and DadMom and Dad
H. OchmanH. OchmanF. RobbF. Robb
J. BattistaJ. Battista
E. OriasE. Orias
D. BryantD. BryantS. O’NeillS. O’Neill
M. EisenM. Eisen
N. MoranN. Moran
R. MyersR. Myers
C. M. CavanaughC. M. Cavanaugh
P. HanawaltP. Hanawalt
J. HeidelbergJ. HeidelbergN. WardN. Ward
J. VenterJ. Venter
C. FraserC. Fraser
S. SalzbergS. Salzberg
I. PaulsenI. Paulsen
$$$$$$
NSFNSFDOEDOE
NIHNIH
M. WuM. Wu
D. WuD. Wu
S. ChatterjiS. Chatterji
H. HuseH. Huse
A. HartmanA. Hartman
MooreMoore
JCVIJCVI
D. RuschD. Rusch
A. HalpernA. Halpern
Eisen Eisen Group/Group/DavisDavis
J. MorganJ. Morgan
JGIJGI
E. EisenstadtE. Eisenstadt
M. FrazierM. Frazier
T. WoykeT. Woyke
E. RubinE. Rubin
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
CRISPR - expanding the possible
34 out of 56 genomes contain CRISPR1-13 arrays (loci) per genome
Halingium ochraceum SMP-2, DSM 14365807 repeats in total
a single repeat contains 382 repeats
Verminephrobacter eisenieae: 249 repeats
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA
Annotation and data status summary
GBP - BDMTC
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Annotation - Current status
• 56 draft genomes in IMG-GEBA site(free access)
• 61 in IMG-ER (passwd protected)
• 19 complete genomes (none in IMG)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
24.6Mb of sequence
230,596 genes
227,562 proteins
155,641 w. function (67.4%)
16,435 fused genes (6.3%)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
• GC%: 74% Actinosynnema mirum 101, DSM 43827
26% Streptobacillus moniliformis DSM 12112
• Size: 13.4Mb Ktedonobacter racemifer SOSP1-21, DSM 44963 1.5Mb Streptobacillus moniliformis DSM 12112
• Scaffolds: 407 Ktedonobacter racemifer SOSP1-21, DSM 44963
1 Atopobium parvulum • Genes: 13,445 Ktedonobacter racemifer SOSP1-21, DSM 44963
1,433 Cryptobacterium curtum DSM 15641
• w. Functions: 78.6% Thermanaerovibrio acidaminovorans 50.6% Planctomyces limnophilus DSM 3776
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
• COG%: 79% Thermanaerovibrio acidaminovorans
49% Planctomyces limnophilus
• SignalP: 38% Ferrimonas balearica 14% Methanohalophilus mahii
• Transmembr: 31% Eggerthella lenta
17% Haliangium ochraceum
• Fused genes: 9% Desulfomicrobium baculatum 3.7% Planctomyces limnophilus
• 16s: 12 Desulfotomaculum acetoxidans
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Progress ReportGEBA Status 5-12-08
Awaiting Material26%
Library9%
Production11%
Post Draft51%
Closed3%
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot Target List 5-12-08
0
5
10
15
20
25
30
35
B: Actinobacteria (High GC)
B: Aminanaerobia
B: Aquificae
B: BacteroidetesB: Chloroflexi
B: DeferribacteresB: Deferribacteres
B: Deinococci
B: Delta ProteobacteriaB: Epsilon Proteobacteria
B: FirmicutesB: Fusobacteria
B: Gamma ProteobacteriaB: Gemmatimonadetes
B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes
B: Thermodesulfobacteria
B: ThermodesulfobiaB: Thermovenabulae
A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia
A: ThermococciA: Thermoprotei
Phyla
# of Genomes
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot Status 5-12-08
0
5
10
15
20
25
30
35
B: Actinobacteria (High GC)
B: Aminanaerobia
B: Aquificae
B: BacteroidetesB: Chloroflexi
B: DeferribacteresB: Deferribacteres
B: Deinococci
B: Delta ProteobacteriaB: Epsilon Proteobacteria
B: FirmicutesB: Fusobacteria
B: Gamma ProteobacteriaB: Gemmatimonadetes
B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes
B: Thermodesulfobacteria
B: ThermodesulfobiaB: Thermovenabulae
A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia
A: ThermococciA: Thermoprotei
Phyla
# of Genomes
Closed
Post Draft
Production
Library
Awaiting Material
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Non Active Projects
0
2
4
6
8
10
12
14
16
B: Actinobacteria (High GC)
B: Aminanaerobia
B: Aquificae
B: BacteroidetesB: Chloroflexi
B: DeferribacteresB: Deferribacteres
B: Deinococci
B: Delta ProteobacteriaB: Epsilon Proteobacteria
B: FirmicutesB: Fusobacteria
B: Gamma ProteobacteriaB: Gemmatimonadetes
B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes
B: Thermodesulfobacteria
B: ThermodesulfobiaB: Thermovenabulae
A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia
A: ThermococciA: Thermoprotei
Phyla
# of Genomes
Abandoned
On Hold
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Pilot Data Release
0
5
10
15
20
25
30
B: Actinobacteria (High GC)
B: Aminanaerobia
B: Aquificae
B: BacteroidetesB: Chloroflexi
B: DeferribacteresB: Deferribacteres
B: Deinococci
B: Delta ProteobacteriaB: Epsilon Proteobacteria
B: FirmicutesB: Fusobacteria
B: Gamma ProteobacteriaB: Gemmatimonadetes
B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes
B: Thermodesulfobacteria
B: ThermodesulfobiaB: Thermovenabulae
A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia
A: ThermococciA: Thermoprotei
Phyla
# of Genomes
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
GEBA Paper Plans
• Methods for large scale microbial genome sequencing– Sequencing, closure methods– DNA sources (i.e., culture collections)– Outreach and educational issues
• How valuable is phylogenetic gap based sequencing?– For annotation– For metagenomics– For gene discovery
• How deep to go in phylogenetic gap filling?– Breadth between phyla versus filling in phyla