spin workshop microbial genomics @nist

34
Nate Olson Genome Scale Measurements Biosystems and Biomaterials Division Microbial Genomics @ NIST

Upload: nist-spin

Post on 16-Jul-2015

58 views

Category:

Science


2 download

TRANSCRIPT

Nate OlsonGenome Scale MeasurementsBiosystems and Biomaterials Division

Microbial Genomics @ NIST

● Example Microbial and Genomic Programs○ ERCC spike in controls○ Genome In a Bottle○ Biothreat Detection

● Three Microbial Genomics Projects○ Genomic Purity○ SNP Method Evaluation○ Genomic Reference Materials

Talk Overview

DisclaimerOpinions expressed in this paper are the authors’ and do not necessarily reflect the policies and views of DHS, NIST, or affiliated venues. Certain commercial equipment, instruments, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendations or endorsement by NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

Microbial Genomics @ NIST

External RNA Control Consortium

Advancing Genomic and Biothreat Detection Metrology

Biothreat Detection

Genome In A Bottle Consortium

ExternalRNAControlConsortium

Is the apparent difference between samples biological or an artifact?

External RNA Control Consortium

Spike in controls to assess technical performance

External RNA Control Consortium

https://github.com/usnistgov/erccdashboard

ERCC Dashboard facilitates RM use

Is this genetic mutation real or an artifact?

Genome In A Bottle Consortium

http://genomeinabottle.org/

Genome In A Bottle

RM to challenge measurement process

http://www.bioplanet.com/gcat

Genome In A Bottle

High confidence variants used for algorithm benchmarking

Biothreat Detection

Is this suspicious powder a biothreat agent, or a false alarm?

Biothreat Detection

Surrogate material to support first responder training

Biothreat Detection

Engineered yeast as surrogate for biothreat agents

Address measurement challenges with reference materials and

documentary standards.

Summary

Microbial Genomics

Microbial Sample Characterization● Genomic Purity - DHS● Evaluating SNP calling methods - DHS● Microbial Genomic RMs - FDA

Microbial Genomics Purity

Challenge: Identify low levels of contaminants without knowing their identity.

Genomic Purity

Approach:Taxonomic read classification

paired with NGS

Experimental Design

Genomic Purity

● Seven Organisms○ Bacillus anthracis○ Escherichia coli O157:H7○ Francisella tularensis○ Pseudomonas aeruginosa○ Salmonella enterica○ Staphylococcus aureus○ Yersinia pestis

● Simulated Datasets○ Illumina error profile○ 250 paired end reads○ 20 X coverage

● 336 spiked datasets○ Pairwise combinations○ Contaminant concentrations

5% to 2.5 x 10-4 % of cells● Pathoscope used for read

classification (http://sourceforge.net/projects/pathoscope/)○ Database - Genbank bacterial

genomes

In-Silico Experiment

Contaminant

Genomic Purity

Only Contaminant and Sample Genus Identified

Genomic Purity

Only Contaminant and Sample Genus Identified

Detected contaminants down to 5.0 x 10-4 % of cells

● Sensitivity Dependent on○ Relative size of the sample and

contaminant genomes○ Genetic similarity of sample and

contaminant to other organisms in the database

Genomic Purity

Able to Detect Contaminants at less than 1% Cell Concentrations for Most Pairwise Comparisons

Genomic Purity

Conclusions● Next generation sequencing in conjunction with read classification

algorithms can be used to assess sample purity

● Achieved ○ Genus level classification specificity○ With sensitivity ranging from 5% to 2.5 x 10-4%

● Future work includes further validation of the method using real mixtures

Genomic Purity

SNP Method Evaluation

Challenge: Defining confidence in sample identification

SNP Method Evaluation

Approach: Whole genome (SNP) sample identification

SNP Method Evaluation Requirements

1. Reference with known truth○ Genomic DNA○ Data

■ real vs. simulated2. Performance metrics3. Replicates for assessing

uncertainty○ multiple sequencing runs○ bootstrap replicates ○ multiple reference genomes

SNP Method Evaluation

Truth Tables can be static or dynamic

SNP Method Evaluation

Truth Table Values Used to Calculate Performance Metrics

SNP Method Evaluation

Quality Score Algorithm

Conclusions● Three requirements for evaluation

○ Reference with known truth○ Performance metrics○ Replicates for uncertainty assessment

● Working to develop tools for implementing these requirements

● Application of these requirements will help to establish confidence in SNP based sample identification

SNP Method Evaluation

Genomic Reference Materials

Development of Microbial Reference Materials

Strains selected based on public health relevance and GC content

Genomic Reference Materials

Orthogonal Methods used to Characterize Genome Structure, Sequence, and Purity

Genomic Reference Materials

● Microbial genomic reference materials characterized○ Genome Structure○ Sequence○ Purity○ Stability

● Material and data will help validate pathogen detection assays as well as sequencing and bioinformatic workflows.

Genomic Reference Materials

Conclusions

Developing a measurement infrastructure to support genome-

based characterization of microbial samples.

Microbial Genomics @ NIST

Summary

● Biosystems and Biomaterials Division● Genome Scale Measurements

Genomic Purity● Justin Zook● Nancy Lin

MIcrobial Genomic Reference Materials● Marc Salit● Justin Zook● Steven Lund● Scott Jackson● Marc Allard and others at FDA● Heike Sichtig at the FDA

SNP Calling Method Evaluation● NIST

○ Jayne Morrow○ Justin Zook○ Steven Lund○ Nancy Lin○ Marc Salit

● Northern Arizona University○ Jim Schrup○ Becky Coleman○ Jason Sahl○ Paul Keim

● University of New Hampshire○ Jeff Foster

This work was supported by the Department of Homeland Security (DHS) Science and Technology Directorate under the Interagency Agreement HSHQPM-12-X-00078 with NIST and by two interagency agreements with the FDA.

Acknowledgements