anl soil metagenomics 2014 soil reference database - let's do this

Post on 03-Jul-2015

142 Views

Category:

Science

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk at 2014 ANL Soil Metagenomics Meeting

TRANSCRIPT

Is it time for a (community) effort towards a soil reference

database?

Erick Cardenas, James Cole, Maude David, Aaron Garoutte, Adina Howe, Janet Jansson, Dave Myrold, James Tiedje, and you?

Modified version of slides will be available after presentation: http://www.slideshare.net/adinachuanghowe

The most important hands in soil microbiology

Significance of a soil-specific reference

• Need standardized resource to connect sequencing data at different levels

• Integrate sequencing data towards soil health and productivity

• Broadly enable “connecting the dots”

Genes

Organisms

Communities

Ecosystems

Soil metagenomic challenges

• The amount we know…

• Incredible microbial diversity

• Spatial heterogeneity

• Complex dynamics

• Lack of reference genomes (bacteria, archaea, fungal)

HUMAN MICROBIOME PROJECT

Lessons from HMP

• 2009 Goals:

– Take advantage of high throughput technologies to characterize human microbiome of large number of samples

– Determine whether associations between changes in the microbiome and health disease

– Provide a standardized data resource and new technological approaches to enable such studies to be undertaken broadly in scientific community

HMP metagenomic challenges

Soil

• Incredible microbial diversity

• Spatial heterogeneity

• Complex dynamics

• Lack of reference genomes (bacteria, archaea, fungal)

HMP

• Microbial diversity

• Individual variation

• Complex host-associated dynamics

• Lack of reference genomes?

The HMP reference genome effort

• Add at least 900-3000 additional reference bacterial genome sequences to public database

• Thorough representation of domains and major body sites

Not only sequencing….but access to data

Currently, over 1000 bacterial genomes at various stages of sequencing

Tools: Opening doors broadly

Metaphlan, Nature Methods 9, 811-814 (2012)

Nature Reviews Genetics, 15, 577-584 (2014)

Vital et al., mBio, Vol 5., 2014

Another example: GEBA

Comparison of • rRNA tree of life• genome

sequence in the DSMZ culture collection

Are there any general benefits that come from this "phylogeny driven" approach?

Simpact of “targeted” sequencing of improved references

Higher rate of discovery and characterization of new gene families

New ways to link distantly related homologs that would otherwise go undetected

Significant phylogenetic expansions of known protein families

Enrichment of genetic diversity

Can a similar strategy benefit soil studies?

What could we use it for?

• Target isolation and sequencing efforts; creation of a “most wanted” list

• Soil specific framework for larger scale sequencing and proteomic efforts to identify taxonomic and functional information

• Genome-centric investigation of soil genomes (e.g., distribution of shared genes among soil phyla); development of improved biomarkers for high throughput assays

• Providing data to tool developers to make bioinformatics/visualization easier for soil-specific studies

What are the challenges?

• How do we defined a soil organism?

– Origin form soil?

– 16S rRNA gene sequence matched one from soil?

– What level of finishing is adecuate?

What are the challenges?

• What is the most critical/practical metadata?

– Soil location

– Soil taxonomy

– Links to RefSeq IDs

– Is the strain available and where?

What are the challenges?

• Who to include?

– Fungi! Archaea!

What are the challenges?

• Expert curators?

– You?

– Tiered hierarchy of curation level

Some initial efforts

RefSoil (2011)Erick Cardenas, Aaron Garoutte, Adina Howe, Jim Tiedje

Bacterial genomes retrieved from Gold database , and , and selected those associated with soil habitats

Manually curated to exclude obligated human pathogens and extremophiles

Databases can be biased and redundant

Proteobacteria, 267

Firmicutes, 92

Actinobacteria, 75

Bacteroidetes, 12

Cyanobacteria, 7

Tenericutes, 5

Acidobacteria, 5

Other, 29

492 organisms19 phyla

NCBI Reference Genomes described as originating from soil

Proteobacteria

Actinobacteria

Firmicutes

Bacteroidetes

Cyanobacteria

Acidobacteria

Protein Models for Functions: FOAM Database

Nucl. Acids Res. (2014)doi: 10.1093/nar/gku702

Some Motivation

60 terrestrial NEON sites distributed across 20 ecoclimatic domainsTerrestrial scale streaming of lots of data including sequencing data for each site

If you’d like to contribute

• Join the breakout session Thursday evening (6-7 pm)

• Know someone with genomes / database, let us know? Want to contribute? Have an opinion? Have funding?

Adina Howe, adina.howe@gmail.com

top related