advanced molecular detection duncan maccannell, phd georgia tech / cdc collaborates march 12 th,...

14
Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th , 2014 National Center for Emerging and Zoonotic Infectious Diseases Office of the Director

Upload: grant-hodges

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

Advanced Molecular Detection

Duncan MacCannell, PhD

Georgia Tech / CDC Collaborates

March 12th, 2014

National Center for Emerging and Zoonotic Infectious DiseasesOffice of the Director

Page 2: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

Roche 454 PTP plate, Ion Torrent 314, Pacific BioSciences SMRTcells (x 3)

Page 3: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

Devices and brand names provided for illustrative purposes only. Their use does not imply endorsement by CDC or HHS.

Page 4: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

VO

LU

ME O

F R

AW

DATA

Page 5: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

>70,000 samples/yearx

2 to 3 GB raw sequence + 5-10 GB intermediate

~0.9 petabytes of raw data/year

Data Acquisition/Analysis Challenges

Transmission and storage?Is better data compression the answer?Distributed processing and extraction?Is full WGS the right approach for large-scale surveillance?

Any solution must balance the advantages of WGS, with the costs of implementation.

For PulseNet USA alone:

Page 6: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

Input: DNA/RNA

Source:GenomicAmpliconWhole sample

Host/vector/pathogen/environment

Library

Output: InformationFrom Sequence Data

Comparative GenomicsHR Straintyping/SubtypingCluster identificationMolecular evolutionGenotypic characterizationVirulence, AR, signaturesFunctional annotationDiagnostic dev/validation

MetagenomicsPathogen identification/discoveryCulture-independent diagnosticsMicrobial ecology/diversity….

Data Info.

ACAATTTGTGCATAACATGTGGACAGTTTTAATCACATGTGGGTAAATAGTTGTCCACATTTGCTTTTTT TGTCGAAAACCCTATCTCATATACAAACGACGTTTTTAGGTTTTAAAATACGTTTCGTATAAATATACAT TTTATATTTATTAGGTTGTACATTTGTTGCGCAACCTTATTCTTTTACCATCTTAGTAAAGGAGGGACAC CTTTGGAAAATATCTCTGATTTATGGAATAGTGCCTTAAAAGAATTAGAAAAAAAGGTAAGCAAGCCTAG TTATGAAACATGGTTAAAATCAACAACGGCTCATAACTTGAAGAAAGACGTATTAACGATTACAGCTCCA AATGAATTTGCTCGTGACTGGCTAGAATCTCATTACTCAGAACTTATTTCGGAAACACTATACGATTTAA CAGGGGCAAAATTAGCAATTCGCTTTATTATTCCCCAAAGTCAATCGGAAGAGGACATTGATCTTCCTCC AGTTAAGCGGAATCCAGCACAAGATGATTCAGCTCATTTACCACAGAGCATGTTAAATCCAAAATATACA TTTGATACATTTGTTATCGGCTCTGGTAACCGTTTTGCCCATGCAGCTTCATTAGCTGTAGCCGAGGCGC CAGCTAAAGCGTATAATCCACTCTTTATTTATGGGGGAGTTGGGCTTGGAAAGACGCATTTAATGCACGC AATTGGTCATTATGTAATTGAACATAATCCAAATGCAAAAGTTGTATATTTATCATCAGAAAAATTCACG AATGAATTTATTAACTCTATTCGTGATAATAAAGCTGTTGATTTTCGTAATAAATATCGCAACGTAGATG

NGS

Workflow:PlatformsChemistryPerf. char.Labor/TaTCost

Bioinformatics

Workflow:Hardware/softwareSpecialized skillsetsAlgorithms/pipelinesPathogen databasesData analysis/interpret/Integration/visualization

Increasingly Universal WorkflowsEstablished sequencing workflows for a wide range of pathogens.

Objective, “Future-Proof” DataIntrinsic quality metrics. Ability to back-test retrospective sequence data in silico for genes/markers identified at a future date.

MANY RESULTS POSSIBLE FROM A SINGLE DATASET!

A Moving TargetRapidly evolving technology space. Changing hardware and COTS/OSS capabilities. Lots of choice, but lack of consistent standards. BIG DATA. New workforce and skillset is required.

Pathogen- and application-specific, CLIA-compliant assays

File hashes/versioningValidated

methods/databasesProcess logging/audit

QA/QCSkills/proficiency

Standards

Reporting

Security

Sample intakePrep/stagingExtraction

ConversionLibrary prepSequencing

Page 7: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

WGS and Pathogen Genomics: Advantages

It’s universal… DNA/RNA sequencing workflows and approaches can be

applied to a wide range of pathogenic organisms. It’s fundamental…

Genomics is a cornerstone for other “omic” approaches Sequence databases starting point for assay

devel./validation. It’s objective…

Sequence-based methods avoid subjectivity of phenotypic or fragment-based approaches. Volume of data internal controls.

It’s (relatively) future proof… Comprehensive sequencing captures the features you

know about, and those you don’t. Quality may change, but the sequence will not.

This makes it possible to back-test future approaches/targets on the data you collect today.

Page 8: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

WGS and Genomic Epidemiology: Limitations

It lacks standardization… WGS is a rapidly-evolving technology space, both in terms

of sequencing and analytics. Standards and mechanisms for data/metadata analysis,

storage and exchange remain under active debate and development.

Comprehensive databases are still being built… Without a useful baseline understanding of pathogen

features/diversity, interpretation may be limited. Need curated,and comprehensive epi-linked reference

databases. Many analyses require specialized

bioinformatics infrastructure and staff. Bioinformaticists, DBAs, programmers, system

administrators, etc. Technical and computational complexity of tasks can vary

widely. Data management, retention and release.

Storage. LIMS.

Page 9: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

Advanced Molecular DetectionProposed $30M FY2014 budget request to:

1. Improve pathogen identification and detectionOutcome: Rapid progress toward modernizing PulseNet and other critical lab-based surveillance systems

2. Adapt new diagnostics to meet evolving public health needs

Outcome: Enhance CDC’s ability to detect outbreaks early, develop new test during outbreaks, and better characterize infectious disease threats

3. Help states meet future reference testing needs in a coordinated manner

Outcome: More effective and better integrated outbreak response activities

4. Implement enhanced, sustainable, and integrated laboratory information systems

Outcome: Labs inside and outside CDC can share information quickly and seamlessly, including with other CDC databases, such as MicrobeNet and PulseNet

5. Develop prediction, modeling, and early recognition tools

Outcome: Better equipped to prevent, detect & respond to infectious diseases.

EPI NGS

BIOINFO

AMD

Page 10: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

AMD Initiative: Strategic Investments (1)

Scientific Infrastructure: Critical laboratory and bioinformatics infrastructure at

CDC, state/local PHL, and key overseas laboratories. • Sequencers, mass-spec, other instrumentation, reagents.• High performance computing, workstations.• Data storage, networking; data integration, knowledge

management.• Service contracts, software licensing, etc.

Page 11: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

AMD Initiative: Strategic Investments (2)

Workforce development: Training for CDC and PHL staff (bioinformatics, genomics,

-omics) New or re-tooled fellowship programs (bioinformatics,

genomics) Recruitment of new staff and skillsets (bioinformaticians,

data scientists, lab specialists, …)

Page 12: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

AMD Initiative: Strategic Investments (3)

Consortia, partnerships and alignment of efforts Academic institutions State, Federal (NIH, FDA, DHS, DoD, DoE/National

Laboratories) Non-Profit/NGO International community Commercial/For-Profit Clinical laboratories

Pilot projects with state/local and other partners. Outbreak detection, investigation and response Leverage existing laboratory-based surveillance systems

Page 13: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

Challenges and Opportunities for CDC/GT

Training and workforce development. Development of wet bench and bioinformatics

curriculum for public health audiences. Scientific exchanges. Fellowship programs. MOOC-style coursework and training modules for PHL.

Bioinformatic challenges. Analysis and visualization of complex structured and

unstructured data. Epi/lab integration. Dashboards/decision support.

Development and standardization of deployable, CLIA compatible bioinformatics workflows. Fieldable/portable systems.

Machine learning and other approaches for genotypic prediction of complex microbial phenotypes (eg: antimicrobial resistance)

Approaches to address CIDT: eg: accelerated metagenomic classification, lab/bifx approaches for complex sample matrices.

Tools for rapid assay design and validation from HTS data

Hardware-accelerated algorithms, scalable HPC (+NoSQL/Hadoop)

Page 14: Advanced Molecular Detection Duncan MacCannell, PhD Georgia Tech / CDC Collaborates March 12 th, 2014 National Center for Emerging and Zoonotic Infectious

Questions and Discussion

For more information please contact Centers for Disease Control and Prevention

1600 Clifton Road NE, Atlanta, GA 30333Telephone: 1-800-CDC-INFO (232-4636)/TTY: 1-888-232-6348Visit: www.cdc.gov | Contact CDC at: 1-800-CDC-INFO or www.cdc.gov/info

The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

National Center for Emerging and Zoonotic Infectious DiseasesOffice of the Director