jonathan eisen talk on "genomic encyclopedia" at lake arrowhead small genomes meeting 2008

111
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. A Genomic Encyclopedia of Bacteria and Archaea (GEBA) Jonathan A. Eisen U. C. Davis and J. G. I.

Upload: jonathan-eisen

Post on 10-May-2015

3.100 views

Category:

Health & Medicine


0 download

DESCRIPTION

Talk by Jonathan Eisen on "A genomic encyclopedia of bacteria and archaea" at Lake Arrowhead Small Genomes meeting in 2008.

TRANSCRIPT

Page 1: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

A Genomic Encyclopedia ofBacteria and Archaea

(GEBA)

Jonathan A. Eisen

U. C. Davis and J. G. I.

Page 2: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Outline

• Background– Why history matters– Gaps in available genomes

• The GEBA pilot project

• Future needs

Page 3: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

The Tree of Life

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 4: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Famous Arrowhead 2004 Quotes

• Space-time continuum of genes and genomes

• Gene sequences are the wormhole that allows one to tunnel into the past

• The human mind can conceive of things with no basis in physical reality

• Thoughts can go faster than the speed of light

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 5: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Famous Arrowhead Quotes 2006

• Publications, student degrees, etc.

• Not trying to say anything bad about anyone

• The human guts are a real milieu

• Where’s you evening gown?

• You better kiss everybody

• This is how you do metagenomics on 50 dollars, and that’s Canadian dollars

Page 6: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 7: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture. From http://genomesonline.org

Page 8: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Major Microbial Sequencing Efforts

• Coordinated, top-down efforts– Fungal Genome Initiative (Broad/Whitehead)

– Gordon and Betty Moore Foundation Marine Microbial Genome Sequencing Project

– Sanger Center Pathogen Sequencing Unit

– NHGRI Human Gut Microbiome Project

– NIH Human Microbiome Program

• White paper or grant systems– NIAID Microbial Sequencing Centers

– DOE/JGI Community Sequencing Program

– DOE/JGI BER Sequencing Program

– NSF/USDA Microbial Genome Sequencing

• Covers lots of ground and biological diversity

Page 9: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

The Tree of Life

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 10: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

The Tree is not Happy

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 11: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Based on Hugenholtz, 2002

Page 12: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

As of 2002

Based on Hugenholtz, 2002

Page 13: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

As of 2002

Based on Hugenholtz, 2002

Page 14: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Archaea, Eukaryotes

As of 2002

Based on Hugenholtz, 2002

Page 15: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Need for Tree Guidance Well Established

• Common approach within some eukaryotic groups– NHGRI animal projects– FGI at Whitehead– Plant sequencing at JGI

• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature

• Many small projects funded to fill in some gaps– DOE/TIGR Sequencing– Multiple CSP projects– Multiple NSF/USDA projects– Private projects (e.g., Integrated Genomics, Diversa)– TIGR (Eisen, Ward) Bacterial Tree of Life Project

Page 16: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Why Increase Taxonomic Coverage?

• Mechanisms of diversification

• Gene discovery

• Annotation, functional prediction

• Metagenomic analysis

• Species phylogeny and classification

Page 17: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution I: sequence more phyla

• Eisen-Ward NSF Tree of Life Project

• A genome from each of eight phyla

Based on Hugenholtz, 2002

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 18: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

The Tree of Life is Still Angry

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 19: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Within Phyla Diversity Immense

• Each phyla represents billions of years of evolution

• Some have hundreds of major lineages, most with no genomes

• Need to sample within phyla too

Page 20: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae

Page 21: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Major Lineages of Actinobacteria II2.5 Actinobacteria

2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae

Page 22: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 100 phyla of bacteria

• Genome sequences are mostly from three phyla

• Most phyla with cultured species are sparsely sampled

• Lineages with no cultured taxa even more poorly sampled

• Solution - use tree to really fill gaps

Well sampled phyla

Page 23: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 24: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot Project: Components• Project management (David Bruce, Lynne Goodwin et al)• Selection of strains (Phil Hugenholtz, Nikos Kyrpides, Jonathan

Eisen) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Libraries and DNA (Eileen Dalin et al.)• Sequencing and closure (Susan Lucas, Alla Lapidus et al.)• Annotation and database needs (Nikos Kyrpides)• Analysis (Dongying Wu, Martin Wu, Jenna Morgan, Victor Kunin,

Marcel Huntemann, Neil Rawlings, Ian Paulsen, Gary Xie, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Mavrommatis Kostas)

• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)

Page 25: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot I: Identifying Lineages without

Genomes

Page 26: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 27: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 28: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (LZW) decompressorare needed to see this picture.

Page 29: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot II: Selecting Targets

Page 30: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Key Criteria

• Phylogenetic novelty– Working from top of tree down– Also selected one phylum to fill in in more detail -

Actinobacteria

• Culturable– Type strain preferred is all else equal

• DOE mission relevance• Ready availability to us and community

– Of strain– Of DNA

Page 31: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot III: Partnership with DSMZ

Page 32: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Biggest Challenge:Getting DNA

• Getting quality DNA is biggest bottleneck• Decided to test as part of the GEBA pilot

the possibility of getting DNA directly from culture collections

• DSMZ offered to do for free• ATCC is doing a small number for a fee• Working with other culture collections

Page 33: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 34: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

MicroorganismsMicroorganismsQuantification gel of the genomic DNA isolated from

Conexibacter woesei (DSM 14684T)

Conexibacter woesei (DSM 14684T) was taken from the German Collection of Microorganisms and Cell Cultures (DSMZ). The genomic DNA was isolated using the Qiagen Genomic 500 DNA Kit (Qiagen 10262). The genomic DNA was 10-250 kb in size as determined by Pulsed Field Gel Electrophoresis (PFGE). The bulk of DNA had a size of 50-250 kb (see attached PFGE image). The DNA concentration is 500 ng/µl as estimated from the gel. Spectrophotometric measurements yielded a DNA concentration of 450 µg/ml; 300 µl of genomic DNA are shipped (150 µg).

1 2 3 4 5 6 7 8

Lane 1: c(-Marker)= 15 ngLane 2: c(-Marker)= 30 ngLane 3: c(-Marker)= 50 ngLane 4: DNA Molecular Weight Marker II (Roche

236250)Lane 5: DSM 13279, Collinsella stercorisLane 6: DSM 43043, Intrasporangium calvumLane 7: DSM 18053, Dyadobacter fermentansLane 8: DSM 20476, Slackia heliotrinireducens

Lane 9: DSM 18081, Patulibacter minatonensisLane 10: DSM 14684, Conexibacter woeseiLane 11: DSM 11002, Dethiosulfovibrio peptidovoransLane 12: DSM 11551, Halogeometricum borinquenseLane 13: DNA Molecular Weight Marker II (Roche

236250)Lane 14: c(-Marker)= 125 ngLane 15: c(-Marker)= 250 ng Lane 16: c(-Marker)= 500 ng

9 10 11 12 13 14 15 16

Page 35: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot IV: Sequencing Progress

Page 36: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 37: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot Target List

0

5

10

15

20

25

30

35

B: Actinobacteria (High GC)

B: Aminanaerobia

B: Aquificae

B: BacteroidetesB: Chloroflexi

B: DeferribacteresB: Deferribacteres

B: Deinococci

B: Delta ProteobacteriaB: Epsilon Proteobacteria

B: FirmicutesB: Fusobacteria

B: Gamma ProteobacteriaB: Gemmatimonadetes

B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes

B: Thermodesulfobacteria

B: ThermodesulfobiaB: Thermovenabulae

A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia

A: ThermococciA: Thermoprotei

Phyla

# of Genomes

Page 38: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot Status 5-12-08

0

5

10

15

20

25

30

35

B: Actinobacteria (High GC)

B: Aminanaerobia

B: Aquificae

B: BacteroidetesB: Chloroflexi

B: DeferribacteresB: Deferribacteres

B: Deinococci

B: Delta ProteobacteriaB: Epsilon Proteobacteria

B: FirmicutesB: Fusobacteria

B: Gamma ProteobacteriaB: Gemmatimonadetes

B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes

B: Thermodesulfobacteria

B: ThermodesulfobiaB: Thermovenabulae

A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia

A: ThermococciA: Thermoprotei

Phyla

# of Genomes

Closed

Post Draft

Production

Library

Awaiting Material

Page 39: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Non Active Projects

0

2

4

6

8

10

12

14

16

B: Actinobacteria (High GC)

B: Aminanaerobia

B: Aquificae

B: BacteroidetesB: Chloroflexi

B: DeferribacteresB: Deferribacteres

B: Deinococci

B: Delta ProteobacteriaB: Epsilon Proteobacteria

B: FirmicutesB: Fusobacteria

B: Gamma ProteobacteriaB: Gemmatimonadetes

B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes

B: Thermodesulfobacteria

B: ThermodesulfobiaB: Thermovenabulae

A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia

A: ThermococciA: Thermoprotei

Phyla

# of Genomes

Abandoned

On Hold

Page 40: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot Data Release

0

5

10

15

20

25

30

B: Actinobacteria (High GC)

B: Aminanaerobia

B: Aquificae

B: BacteroidetesB: Chloroflexi

B: DeferribacteresB: Deferribacteres

B: Deinococci

B: Delta ProteobacteriaB: Epsilon Proteobacteria

B: FirmicutesB: Fusobacteria

B: Gamma ProteobacteriaB: Gemmatimonadetes

B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes

B: Thermodesulfobacteria

B: ThermodesulfobiaB: Thermovenabulae

A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia

A: ThermococciA: Thermoprotei

Phyla

# of Genomes

Page 41: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Progress ReportGEBA Status 5-12-08

Awaiting Material26%

Library9%

Production11%

Post Draft51%

Closed3%

Page 42: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Progress

Page 43: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Data

Page 44: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Organism Domain Phylum Status IMG-GEBA NCBI-PID Culture-ID GOLD-IDAcidimicrobium ferrooxidans DSM 10331 Bacteria Actinobacteria draft 2500645360 29525 DSM 10331 Gi02326Actinosynnema mirum 101, DSM 43827 Bacteria Actinobacteria draft 2500395345 19705 DSM 43827 Gi02064Alicyclobacillus acidocaldarius acidocaldarius 104-IA, DSM 446 Bacteria Firmicutes draft 2500575013 29405 DSM 446 Gi02324Anaerococcus prevotii PC 1, DSM 20548 Bacteria Firmicutes draft 2500645363 29533 DSM 20548 Gi02318Atopobium parvulum IPP 1246, DSM 20469 Bacteria Actinobacteria draft 2500575011 29401 DSM 20469 Gi02317Beutenbergia cavernosa HKI 0122, DSM 12333 Bacteria Actinobacteria draft 2500395322 20827 DSM 12333 Gi02225Brachybacterium faecium DSM 4810 Bacteria Actinobacteria finished 2500153401 17026 DSM 4810 Gi02066Brachyspira murdochii DSM 12563 Bacteria Spirochaetes draft 2500645365 29543 DSM 12563 Gi02313Capnocytophaga ochracea DSM 7271 Bacteria Bacteroidetes draft 2500575012 29403 DSM 7271 Gi02305Catenulispora acidiphila ID139908, DSM 44928 Bacteria Actinobacteria draft 2500395338 21085 DSM 44928 Gi02233Cellulomonas flavigena 134, DSM 20109 Bacteria Actinobacteria draft 2500395336 19707 DSM 20109 Gi02067Chitinophaga pinensis UQM 2034, DSM 2588 Bacteria Bacteroidetes draft 2500395347 27951 DSM 2588 Gi02244Conexibacter woesei ID131577, DSM 14684 Bacteria Actinobacteria draft 2500347307 20745 DSM 14684 Gi02154Cryptobacterium curtum DSM 15641 Bacteria Actinobacteria finished 2500332002 20739 DSM 15641 Gi02234Denitrovibrio acetiphilus N2460, DSM 12809 Bacteria Deferribacteres draft 2500575016 29431 DSM 12809 Gi02322Desulfohalobium retbaense DSM 5692 Bacteria Deltaproteobacteria draft 2500575018 29199 DSM 5692 Gi02246Desulfomicrobium baculatum DSM 04028 Bacteria Deltaproteobacteria draft 2500645356 29527 DSM 4028 Gi02302Desulfotomaculum acetoxidans 5575, DSM 771 Bacteria Firmicutes draft 2500395337 27947 DSM 771 Gi02239Dethiosulfovibrio peptidovorans SEBR 4207, DSM 11002 Bacteria Aminanaerobia draft 2500549401 20741 DSM 11002 Gi02152Dyadobacter fermentans NS 114, DSM 18053 Bacteria Bacteroidetes draft 2500395342 20829 DSM 18053 Gi02155Eggerthella lenta VPI 0255, DSM 2243 Bacteria Actinobacteria draft 2500549402 21093 DSM 2243 Gi02242Geodermatophilus obscurus DSM 43160 Bacteria Actinobacteria draft 2500645366 29547 DSM 43160 Gi02257Gordonia bronchialis DSM 43247 Bacteria Actinobacteria draft 2500645367 29549 DSM 43247 Gi02258Haliangium ochraceum SMP-2, DSM 14365 Bacteria Deltaproteobacteria draft 2500395339 28711 DSM 14365 Gi02251Halogeometricum borinquense DSM 11551 Archaea Halobacteria finished 2500153400 20743 DSM 11551 Gi02153Halomicrobium mukohataei arg-2, DSM 12286 Archaea Halobacteria draft 2500395343 27945 DSM 12286 Gi02248Halorhabdus utahensis AX-2, DSM 12940 Archaea Halobacteria draft 2500575004 29305 DSM 12940 Gi02250Jonesia denitrificans DSM 20603 Bacteria Actinobacteria draft 2500168153 20833 DSM 20603 Gi02227Kangiella koreensis SW-125, DSM 16069 Bacteria Gammaproteobacteria draft 2500645353 29443 DSM 16069 Gi02314Kribbella flavida DSM 17836 Bacteria Actinobacteria draft 2500395325 21089 DSM 17836 Gi02235Kytococcus sedentarius DSM 20547 Bacteria Actinobacteria finished 2500168150 21067 DSM 20547 Gi02226Leptotrichia buccalis C-1013-b, DSM 1135 Bacteria Fusobacteria draft 2500645352 29445 DSM 1135 Gi02240Meiothermus ruber DSM 1279 Bacteria Deinococci draft 2500395348 28827 DSM 1279 Gi02300Meiothermus silvanus DSM 9946 Bacteria Deinococci draft 2500645369 29551 DSM 9946 Gi02308Nakamurella multipartita DSM 44233 Bacteria Actinobacteria draft 2500645368 21081 DSM 44233 Gi02230Nocardiopsis dassonvillei dassonvillei DSM 43111 Bacteria Actinobacteria draft 2500395320 19709 DSM 43111 Gi02065Pedobacter heparinus HIM 762-3, DSM 2366 Bacteria Bacteroidetes draft 2500395321 27949 DSM 2366 Gi02243Planctomyces limnophilus DSM 3776 Bacteria Bacteroidetes draft 2500575009 29411 DSM 3776 Gi02301Rhodothermus marinus DSM 4252 Bacteria Bacteroidetes draft 2500575002 29281 DSM 4252 Gi02303Saccharomonospora viridis P101, DSM 43017 Bacteria Actinobacteria finished 2500347305 20835 DSM 43017 Gi02228Sanguibacter keddieii DSM 10542 Bacteria Actinobacteria finished 2500153403 19711 DSM 10542 Gi02151Sebaldella termitidis ATCC 33386 Bacteria Fusobacteria draft 2500645364 29539 ATCC 33386 Gi02490Slackia heliotrinireducens DSM 20476 Bacteria Actinobacteria finished 2500168151 20831 DSM 20476 Gi02157Sphaerobacter thermophilus 4ac11, DSM 20745 Bacteria Chloroflexi draft 2500347306 21087 DSM 20745 Gi02236Spirosoma linguale DSM 74 Bacteria Bacteroidetes draft 2500395346 28817 DSM 74 Gi02298Stackebrandtia nassauensis LLR-40K-21, DSM 44728 Bacteria Actinobacteria draft 2500549403 19713 DSM 44728 Gi02068Streptobacillus moniliformis DSM 12112 Bacteria Fusobacteria draft 2500575005 29309 DSM 12112 Gi02312Streptosporangium roseum NI 9100, DSM 43021 Bacteria Actinobacteria draft 2500395335 21083 DSM 43021 Gi02229Sulfurospirillum deleyianum DSM 6946 Bacteria Epsilonproteobacteria draft 2500645361 29529 DSM 6946 Gi02323Thermanaerovibrio acidaminovorans Su883 DSM 6589 Bacteria Aminanaerobia draft 2500645362 29531 DSM 6589 Gi02247Thermobaculum terrenum YNP1, ATCC BAA-798 Bacteria Chloroflexi draft 2500645355 29523 ATCC BAA-798 Gi02489Thermobispora bispora DSM 43833 Bacteria Actinobacteria finished 2500194801 20737 DSM 43833 Gi02237Thermomonospora curvata DSM 43183 Bacteria Actinobacteria draft 2500645351 20825 DSM 43183 Gi02238Tsukamurella paurometabola DSM 20162 Bacteria Actinobacteria draft 2500575010 29399 DSM 20162 Gi02254Veillonella parvula Te3, DSM 2008 Bacteria Firmicutes draft 2500347300 21091 DSM 2008 Gi02241Xylanimonas cellulosilytica DSM 15894 Bacteria Actinobacteria draft 2500153402 19715 DSM 15894 Gi02069

Page 45: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot V: Benefit?

Page 46: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Why Increase Taxonomic Coverage?

• Mechanisms of diversification

• Gene discovery

• Annotation, functional prediction

• Metagenomic analysis

• Species phylogeny and classification

Page 47: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Value of 100 diverse genomes I: Gene discovery

• Gene families– Will compare and contrast gene family

diversity in these genomes versus random samples of previous genomes

– Will assess rate of gene family discovery and whether / how much it is diminishing

• Specific examples of novelty– Focusing on DOE mission areas– Do we find novel forms of hydrogenases,

cellulases, C-fixation, etc

Page 48: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Value of 100 diverse genomes II: Annotation

• Ortholog identification– Filling in gaps will help identify orthologs between species

– Diverse GC content and amino acid composition should also improve ortholog identification

• Examination of the rate of hypothetical protein conversion to “known” proteins

• Non-homology functional prediction should improve greatly– Phylogenetic profiling

– Rosetta Stone domain sharing

Page 49: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Based on Wu et al. 2005

Page 50: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 51: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Value of 100 diverse genomes III: Metagenomics

• More diverse genomes should improve anchoring and binning of all metagenomic data sets

• Will test by running phylotyping software comparing to genome data sets with and without GEBA genomes– Megan

– AMPHORA

• Should be a good complement to reference genome sequencing

Page 52: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

AlphaproteobacteriaBetaproteobacteriaGammaproteobacteria

DeltaproteobacteriaEpsilonproteobacteria

Unclassified Proteobacteria

CyanobacteriaChlamydiae

AcidobacteriaBacteroidetesActinobacteria

Aquificae

PlanctomycetesSpirochaetes

FirmicutesChloroflexiChlorobi

Unclassified Bacteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

Page 53: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Value of 100 diverse genomes IV: Mechanisms of Diversification

• Lateral gene transfer– Lateral gene transfer is fundamentally important in

microbial evolution– However, when we find “foreign” DNA in genomes we

usually cannot pinpoint the origin of that DNA– Having more diverse genomes may help better pin

down source groups for each piece of foreign DNA

• Eukaryotic diversification– Of ~200 eukaryotic specific gene families– How many now show up in bacteria and archaea– Any patterns to where there are found?

Page 54: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

CRISPR - expanding the possible

34 out of 56 genomes contain CRISPR1-13 arrays (loci) per genome

Halingium ochraceum SMP-2, DSM 14365807 repeats in total

a single repeat contains 382 repeats

Verminephrobacter eisenieae: 249 repeats

Page 55: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Value of 100 diverse genomes V: Phylogeny

Page 56: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

16s Says Hyphomonas is in Rhodobacteriales

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Badger et al. 2005

Page 57: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

WGT Says Its Related to Caulobacterales

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Badger et al. 2005

Page 58: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Tree of Life Example II

Page 59: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 60: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA - What’s Next

• Repeat and/or scale up

• Need to determine the value of finished versus unfinished genomes

• Apply this method to other groups– Microbial eukaryotes– Viruses

• Really fill in bacterial and archaeal tree

Page 61: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

The slopes of the linear regression Lines represent the PD contribution of the genomes(each window contains 50 genomes)

Page 62: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Slo

pe (

50 g

enom

e w

indo

ws)

Window position

Page 63: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Window position

Slo

pe (

50 g

enom

e w

indo

ws)

Greengenes ssrRNA

Page 64: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA: Long Run

• Need active community input• Involvement of multiple funding agencies,

labs, genome centers• Integration/ communication among all large

scale projects• Follow recommendations of NAS, ASM,

AAM reports• Adopt a Microbe - Link to educational

initiatives

Page 65: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Most phyla with cultured species are sparsely sampled

• Lineages with no cultured taxa even more poorly sampled

Well sampled phyla

Poorly sampled

No cultured taxa

Page 66: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Uncultured Lineages:Technical Approaches

• Get into culture

• Enrichment cultures

• If abundant in low diversity ecosystems

• Flow sorting

• Microbeads

• Microfluidic sorting

• Single cell amplification

Page 67: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 68: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 69: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 70: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

A Happy Tree of Life

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 71: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Most phyla with cultured species are sparsely sampled

• Lineages with no cultured taxa even more poorly sampled

Well sampled phyla

Poorly sampled

No cultured taxa

Page 72: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Proteobacteria54%

Actinobacteria18%

Firmicutes14%

Bacteroidetes6%

other phyla3%

3760 bacterial cultures

culture collection (ACM)

other phyla20%

Bacteroidetes1%

Firmicutes24%

Actinobacteria7%

Proteobacteria45%

71 bacterial genomes

sequenced genomes

Taxonomic Bias in Cultures TooTaxonomic Bias in Cultures Too

Slide by Hugenholtz

Page 73: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot Project at JGI

• Select 200 organisms using tree/taxonomy as guide

• Collaborate with culture collections to obtain DNA

• Sequence to closure 100 for which DNA QC is good

• Sequencing by Sanger-454 hybrid approach• Data, annotation released after shotgun and

closure• Assess tree based sequencing by doing

reconstructions with different selection criteria.

Page 74: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 75: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Selecting Organisms Step 2:

• Selecting representatives from each lineage without a genome

• Preference given within groups for organisms of DOE mission relevance

• Focused on type strains of cultured species

• Selecting those for which we can get DNA

Page 76: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

0

5

10

15

20

25

30

35

40

ActinobacteriaFirmicutes

ProteobacteriaBacteroidetesMethanomicrobia

ThermoproteiHalobacteriaFusobacteria

Thermi

AminanaerobiaPlanctomycetesSpirochaetes

Chloroflexi

Deferribacteres 

Aquificae

HaloanaerobialesArchaeoglobi 

Thermovenabulae

AcidobacteriaMethanobacteria

Thermococci

Thermodesulf...Thermodesulfobia

GEBA Pilot Project 4th month project status: 79/169 DNAs delivered

in JGI pending

Page 77: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Sargasso Phylotypes

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

AlphaproteobacteriaBetaproteobacteriaGammaproteobacteriaEpsilonproteobacteria

Deltaproteobacteria

CyanobacteriaFirmicutes

Actinobacteria

Chlorobi

CFB

ChloroflexiSpirochaetesFusobacteria

Deinococcus-Thermus

EuryarchaeotaCrenarchaeota

Major Phylogenetic Group

Weighted % of Clones

EFG

EFTu

HSP70

RecA

RpoB

rRNA

Other Markers Give Similar Phylotpyes

Venter et al., 2004

Page 78: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution I: sequence more phyla

• Eisen-Ward NSF Tree of Life Project

• A genome from each of eight phyla

Based on Hugenholtz, 2002

Page 79: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution II: Fill in Phyla

• JGI - Genomic Encyclopedia of Bacteria and Archaea

Based on Hugenholtz, 2002

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 80: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution III: Sequence Uncultured

Page 81: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

The Tree of Life is Still Angry

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 82: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Circular Maps

Page 83: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

DNA Repair Genes in D. radiodurans Complete Genome

Process Genes in D. radiodurans

Nucleotide Excision Repair UvrABCD, UvrA2Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths,

MPGAP Endonuclease XthMismatch Excision Repair MutS, MutLRecombination Initiation Recombinase Migration and resolution

RecFJNRQ, SbcCD, RecDRecARuvABC, RecG

Replication PolA, PolC, PolX, phage PolLigation DnlJdNTP pools, cleanup MutTs, RRaseOther LexA, RadA, HepA, UVDE, MutS2

Page 84: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Problem:

List of DNA repair gene homologs in D. radiodurans genome is not significantly different from other bacterial genomes

of the similar size

Page 85: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

~40 Phyla of Bacteria

Page 86: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Most DNA metabolism studies in two Phyla

Page 87: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Deinococcus is very distant from well studied groups

Page 88: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

-Ogt-RecFRQN-RuvC-Dut-SMS

-PhrI-AlkA-Nfo-Vsr-SbcCD-LexA-UmuC

-PhrI-PhrII-AlkA-Fpg-Nfo-MutLS-RecFORQ-SbcCD-LexA-UmuC-TagI

-PhrI-Ogt-AlkA-Xth-MutLS-RecFJORQN-Mfd-SbcCD-RecG-Dut-PriA-LexA-SMS-MutT

-PhrI-PhrII?-AlkA-Fpg-Nfo-RecO-LexA-UmuC

-PhrI-Ung?-MutLS-RecQ?-Dut-UmuC

-PhrII-Ogg

-Ogt-AlkA-TagI-Nfo-Rec-SbcCD-LexA

-Ogt-AlkA-Nfo-RecQ-SbcD?-Lon-LexA

-AlkA-Xth-Rad25?

-AlkA-Rad25

-Nfo

-Ogt-Ung-Nfo-Dut-Lon

-Ung

-PhrII

-PhrI

Ecoli

Haein

Neig

o Help

y

Bacsu

Str

py

Mycg

e

Mycp

n

Borb

u Tre

pa

Syn

sp

Metj

n

Arc

fu

Mett

h

Hu

man

Yeast

BACTERIA ARCHAEA EUKARYOTES

from mitochondria

+Ada+MutH+SbcB

dPhr

+TagI?+Fpg

+UvrABCD+Mfd

+RecFJNOR+RuvABC

+RecG+LigI

+LexA+SSB

+PriA+Dut?

+Rus+UmuD

+Nei?+RecE

tRecT?

+Vsr+RecBCD?

+RFAs+TFIIH

+Rad4,10,14,16,23,26+CSA

+Rad52,53,54+DNA-PK, Ku

dSNF2dMutSdMutLdRecA

+Rad1+Rad2

+Rad25?+Ogg+LigII

+Ung?+SSB,

+Dut?

+PhrI, PhrII+Ogt

+Ung, AlkA, MutY-Nth+AlkA

+Xth, Nfo?+MutLS?

+SbcCD+RecA

+UmuC+MutT

+LondMutSI/MutSII

dRecA/SMSdPhrI/PhrII

+Sprt3MG

+Rad7+CCE1

+P53dRecQ

dRad23+MAG?

-PhrII-RuvC

tRad25

+TagI?

+RecT

tUvrABCD

tTagI ?

Gain and Loss of Repair Genes

Eisen and Hanawalt, 1999 Mut Res 435: 171-213

Page 89: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Solution - Experiments

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 90: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Need experimental studies from across the tree too

Page 91: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 92: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

MICROBES

Page 93: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

A Happy Tree of Life

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 94: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

TIGRTIGR

Other peopleOther people

Mom and DadMom and Dad

H. OchmanH. OchmanF. RobbF. Robb

J. BattistaJ. Battista

E. OriasE. Orias

D. BryantD. BryantS. O’NeillS. O’Neill

M. EisenM. Eisen

N. MoranN. Moran

R. MyersR. Myers

C. M. CavanaughC. M. Cavanaugh

P. HanawaltP. Hanawalt

J. HeidelbergJ. HeidelbergN. WardN. Ward

J. VenterJ. Venter

C. FraserC. Fraser

S. SalzbergS. Salzberg

I. PaulsenI. Paulsen

$$$$$$

NSFNSFDOEDOE

NIHNIH

M. WuM. Wu

D. WuD. Wu

S. ChatterjiS. Chatterji

H. HuseH. Huse

A. HartmanA. Hartman

MooreMoore

JCVIJCVI

D. RuschD. Rusch

A. HalpernA. Halpern

Eisen Eisen Group/Group/DavisDavis

J. MorganJ. Morgan

JGIJGI

E. EisenstadtE. Eisenstadt

M. FrazierM. Frazier

T. WoykeT. Woyke

E. RubinE. Rubin

Page 95: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

CRISPR - expanding the possible

34 out of 56 genomes contain CRISPR1-13 arrays (loci) per genome

Halingium ochraceum SMP-2, DSM 14365807 repeats in total

a single repeat contains 382 repeats

Verminephrobacter eisenieae: 249 repeats

Page 96: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA

Annotation and data status summary

GBP - BDMTC

Page 97: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Annotation - Current status

• 56 draft genomes in IMG-GEBA site(free access)

• 61 in IMG-ER (passwd protected)

• 19 complete genomes (none in IMG)

Page 98: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 99: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 100: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

24.6Mb of sequence

230,596 genes

227,562 proteins

155,641 w. function (67.4%)

16,435 fused genes (6.3%)

Page 101: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 102: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

• GC%: 74% Actinosynnema mirum 101, DSM 43827

26% Streptobacillus moniliformis DSM 12112

• Size: 13.4Mb Ktedonobacter racemifer SOSP1-21, DSM 44963 1.5Mb Streptobacillus moniliformis DSM 12112

• Scaffolds: 407 Ktedonobacter racemifer SOSP1-21, DSM 44963

1 Atopobium parvulum • Genes: 13,445 Ktedonobacter racemifer SOSP1-21, DSM 44963

1,433 Cryptobacterium curtum DSM 15641

• w. Functions: 78.6% Thermanaerovibrio acidaminovorans 50.6% Planctomyces limnophilus DSM 3776

Page 103: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

• COG%: 79% Thermanaerovibrio acidaminovorans

49% Planctomyces limnophilus

• SignalP: 38% Ferrimonas balearica 14% Methanohalophilus mahii

• Transmembr: 31% Eggerthella lenta

17% Haliangium ochraceum

• Fused genes: 9% Desulfomicrobium baculatum 3.7% Planctomyces limnophilus

• 16s: 12 Desulfotomaculum acetoxidans

Page 104: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 105: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Progress ReportGEBA Status 5-12-08

Awaiting Material26%

Library9%

Production11%

Post Draft51%

Closed3%

Page 106: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot Target List 5-12-08

0

5

10

15

20

25

30

35

B: Actinobacteria (High GC)

B: Aminanaerobia

B: Aquificae

B: BacteroidetesB: Chloroflexi

B: DeferribacteresB: Deferribacteres

B: Deinococci

B: Delta ProteobacteriaB: Epsilon Proteobacteria

B: FirmicutesB: Fusobacteria

B: Gamma ProteobacteriaB: Gemmatimonadetes

B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes

B: Thermodesulfobacteria

B: ThermodesulfobiaB: Thermovenabulae

A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia

A: ThermococciA: Thermoprotei

Phyla

# of Genomes

Page 107: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot Status 5-12-08

0

5

10

15

20

25

30

35

B: Actinobacteria (High GC)

B: Aminanaerobia

B: Aquificae

B: BacteroidetesB: Chloroflexi

B: DeferribacteresB: Deferribacteres

B: Deinococci

B: Delta ProteobacteriaB: Epsilon Proteobacteria

B: FirmicutesB: Fusobacteria

B: Gamma ProteobacteriaB: Gemmatimonadetes

B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes

B: Thermodesulfobacteria

B: ThermodesulfobiaB: Thermovenabulae

A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia

A: ThermococciA: Thermoprotei

Phyla

# of Genomes

Closed

Post Draft

Production

Library

Awaiting Material

Page 108: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Non Active Projects

0

2

4

6

8

10

12

14

16

B: Actinobacteria (High GC)

B: Aminanaerobia

B: Aquificae

B: BacteroidetesB: Chloroflexi

B: DeferribacteresB: Deferribacteres

B: Deinococci

B: Delta ProteobacteriaB: Epsilon Proteobacteria

B: FirmicutesB: Fusobacteria

B: Gamma ProteobacteriaB: Gemmatimonadetes

B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes

B: Thermodesulfobacteria

B: ThermodesulfobiaB: Thermovenabulae

A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia

A: ThermococciA: Thermoprotei

Phyla

# of Genomes

Abandoned

On Hold

Page 109: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Pilot Data Release

0

5

10

15

20

25

30

B: Actinobacteria (High GC)

B: Aminanaerobia

B: Aquificae

B: BacteroidetesB: Chloroflexi

B: DeferribacteresB: Deferribacteres

B: Deinococci

B: Delta ProteobacteriaB: Epsilon Proteobacteria

B: FirmicutesB: Fusobacteria

B: Gamma ProteobacteriaB: Gemmatimonadetes

B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes

B: Thermodesulfobacteria

B: ThermodesulfobiaB: Thermovenabulae

A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia

A: ThermococciA: Thermoprotei

Phyla

# of Genomes

Page 110: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 111: Jonathan Eisen talk on "Genomic Encyclopedia" at Lake Arrowhead Small Genomes Meeting 2008

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

GEBA Paper Plans

• Methods for large scale microbial genome sequencing– Sequencing, closure methods– DNA sources (i.e., culture collections)– Outreach and educational issues

• How valuable is phylogenetic gap based sequencing?– For annotation– For metagenomics– For gene discovery

• How deep to go in phylogenetic gap filling?– Breadth between phyla versus filling in phyla