speeding up sequencing: sequencing in an hour enables sample to answer in a workday

1
Colin Davidson 1 , Mindy Landes 2 , Rongsu Qi 1 , Chaitali Parikh 1 , David Mandelman 2 , Haythem Latif 2 , Adam Harris 2 , Nur Hasan 3 , Poorani Subramanian 3 , and Srinka Ghosh 1 1 Thermo Fisher Scientific, 200 Oyster Point Blvd, South San Francisco, CA 94080; 2 Thermo Fisher Scientific, 5781 Van Allen Way, Carlsbad, CA 92008; 3 CosmosID, 155 Gibbs St #436, Rockville, MD 20850 Figure 3 Transposon-based library preparation. Each transposon complex consists of a tetramer of MuA transposase and two double-stranded transposon end oligonucleotides. The complexes insert the transposon ends into target gDNA, resulting in fragmentation and simultaneous tagging with the transposon sequence. Those sequences are used as priming sites for introduction of sequencing adapters during amplification by PCR. Figure 4 Observed read length distribution at 200 flows is similar to standard 500 flows and with the expected amplicon insert length distribution The majority of the amplicons for the targeted sequencing libraries have a mean insert length of 95 bp; it was estimated that the minimum number of flows needed was 200. Analysis of Ion S5 XL runs (with 500 flows) at 300, 250, 200 and 180 flows demonstrated that at 200 flows, the read length distribution is still comparable to 500 flows. Whereas at 180 flows, read length distribution is starting to be compromised. Decreasing overall sequencing workflow times on the Ion S5™ XL sequencing system. Improvements in library preparation, template preparation (clonal amplification of library molecules onto capture beads) and optimizing sequencing reagent flows on the S5 system provide a dramatic decrease in overall sequencing workflow time. For Ion AmpliSeq™-based libraries such as the pan-bacterial ID panel sequencing and data analysis can be completed in about 6.5 hrs. For whole genome sequencing libraries produced from bacterial isolates, the sequencing workflow (from library preparation to data) can be completed in approximately 11 hrs. Figure 1 Figure 5 Figure 6 Figure 7 Figure 8 INTRODUCTION At this time next generation sequencing (NGS) is hindered by slow and often manual workflow procedures. Decreasing overall workflow times is critical for the widespread adoption of targeted and whole genome sequencing (WGS) for many time-sensitive applications, in particular for infectious disease analysis. To this end, we describe improvements to the four main steps of the NGS workflow: i) library preparation; ii) template preparation, iii) sequencing; iv) and data analysis. Together, these advances dramatically decrease the overall turnaround times. Ion Torrent semiconductor-based sequencing instruments utilities flow sequencing with speed largely dependent on and the number of nucleotide flows (one flow produces ~0.5 base) and the speed of the flows (Figure 2). CONCLUSIONS For the targeted assays described, the total turnaround time could be completed in a standard workday ACKNOWLEDGEMENTS Bacterial ID AmpliSeq™ assay: Kunal Banjara, Jamsheed Ghadiri, Andrew Hutchison, Peter Vander Horn, Karen Clyde, Nisha Mulakken, Rajesh Gottimukkala, Diana Jeon, and Simon Cawley. Torrent Suite Software: Dominique Belhachemi, Christian Koller, and Mohit Gupta TRADEMARKS/LICENSING © 2016 Thermo Fisher Scientific, Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and its subsidiaries unless otherwise specified. TT01: Speeding up sequencing: Sequencing in an hour enables sample to answer in a workday Thermo Fisher Scientific • 5791 Van Allen Way • Carlsbad, CA 92008 • www.lifetechnologies.com A sequencing run on the Ion S5™ XL System is started by inserting an Ion S5 chip preloaded with templated beads containing clonal amplified library molecules. Nucleotides are flowed across the chip with the addition of bases by the DNA polymerase resulting in the production of hydrogen ions, changing the pH which is converted to a sequencing signal through ion-sensitive wells that hold the templated beads. Read lengths of up to 400 bp can be produced with three available semiconductor chips that produce up to 5, 20, or 80 million reads per chip Figure 2 Whole Genome and Targeted Sequencing: Sample-to-data in ~6.5 to 11 hr Bacterial Nucleic acid Manual Isothermal Amplification (2 hr) Sample Template Preparation Rapid Ion S5 XL Sequencing CosmosID 200 flows (55 min) 300 flows (80 min) N/A ~ 1 hr Targeted AmpliSeq panel (3.5 hr) WGS with MuSeek (1.5 hr) Library Preparation 1.5 - 3.5 hr 2 - 7 hr 1 hr 55 - 80 min Total time ~ 6.5 hr ~ 11.0 hr MATERIALS AND METHODS The new rapid workflow innovations were applied to two different library preparation protocols: i) targeted libraries created using a highly-multiplexed PCR approach consisting of 1200 amplicons targeting the 16S rRNA gene as well as species-specific identification targets and antimicrobial resistance determinants; and ii) an unbiased WGS approach using a MuA transposon-based library preparation method. (Figure 3). WGS libraries: Purified Escherichia coli (balanced G:C content), Rhodopseudomonas palustris (high G:C content), and Staphylococcus aureus (low G:C content) gDNA was used as input into the MuSeek™ library preparation protocol (Figure 3). Targeted libraries: Total nucleic acids from six bacterial cultures (Acinetobacter baumannii, Enterobacter cloacae, Enterococcus faecium, Klebsiella pneumoniae, Pseudomonas aeruginosa, Staphylococcus aureus) were extracted as input for into the Ion AmpliSeq™ Library Kit 2.0, except thermocycling were optimized by reducing anneal/extend time from 4min to 1min. Template preparation: Targeted libraries were clonally amplified using an isothermal amplification approach using the Ion PGM™ Template IA 500 Kit. Template preparation for WGS libraries was performed using an automated and rapid an isothermal amplification approach on the Ion Chef™ System Sequencing: Sequencing was performed on the Ion S5™ XL Sequencer using an Ion 520™ Chip. Sequencing times were improved by reducing flow times and the total number of flows. The implementation of On-Instrument Analysis (OIA) enabled near real-time base calling reducing the total primary analysis time. Bioinformatic Analysis: Unassembled WGS data were analyzed by CosmosID MetaGenID Bioinformatics package using curated GenBook databases of Bacteria, Virus, Fungi, Parasite, Antibiotic Resistance, and Virulence Factors. RESULTS Targeted and WGS libraries were generated, sequenced, and analyzed in ≤6.5 – 11 hours with targeted sequencing and analysis taking as little as 50 minutes compared to 2.5 hours and 1 hour for standard sequencing and analysis, respectively. Analysis of sequencing accuracy for targeted libraries revealed a raw read accuracy >99.5%, comparable to data from the standard workflow (Figure 4). The read length distribution for the targeted libraries was similar to standard workflow (Figure 5) with 100% specificity for species identification and ,at flows greater than 200, mapping to 16S rRNA families for targeted libraries indicating rapid sequencing without compromising detection accuracy (Figure 6). Sequencing times on the Ion S5 TM XL were vastly improved by the combination of reduced flow times, total number of flows, and the implementation of On-Instrument Analysis (OIA), in which phase estimation prior to base calling was moved on instrument (Figure 7). The modifications for rapid sequencing were found to be robust and highly reproducible (Figure 8). Sequencing libraries prepared from bacterial genomes representing a range of GC content demonstrated that the resulting shortened average read lengths did not adversely affect species identification (Figure 9). Antimicrobial resistance determinant identification from WGS libraries was robust to 300 flows (Figure 10) Read quality of Ion S5 XL with reduced flow number numbers is comparable to that of standard Ion PGM flow numbers. Decreased flow numbers demonstrate a raw read accuracy >99.5% with Ion S5 XL at 300 and 200 flows Performance of S5 XL Sequencing with modified parameters is highly reproducible (n=3) for targeted sequencing using the AmpliSeq panel Sequencing times on the Ion S5 XL were improved by reducing flow times, the total number of flows, and the implementation of On-Instrument Analysis (OIA) Reducing flow number to 200 does not reduce total number of reads nor the accuracy of species ID amplicon alignment nor 16S rRNA amplicon assignment. ~100% specificity with species ID targets were observed 180 flows. However, 16S rRNA identification specificity was affected at 180 flows. Based the above results, a 200 flow minimum could be used for targeted sequencing using this AmpliSeq panel without compromising detection accuracy using universal and species-specific amplicons. Reducing the flows number (400, 300, 200, 150) shortened the average read length for WGS libraries (reads subsampled to 1.1 million) but did not adversely affect species identification by percent matching of total unique k-mers at ≥300 flows Figure 10 Figure 9 Reducing flow number to 300 does not reduce identification (percent matching of total unique k-mers) of antimicrobial resistance determinants and virulence factors in WGS libraries Isothermal Amplification on Ion Chef (7 hr) For Research Use Only. Not for use in diagnostic applications

Upload: thermo-fisher-scientific

Post on 16-Apr-2017

996 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Speeding up sequencing: Sequencing in an hour enables sample to answer in a workday

Colin Davidson1, Mindy Landes2, Rongsu Qi1, Chaitali Parikh1, David Mandelman2, Haythem Latif2, Adam Harris2 , Nur Hasan3, Poorani Subramanian3, and Srinka Ghosh1 1Thermo Fisher Scientific, 200 Oyster Point Blvd, South San Francisco, CA 94080; 2Thermo Fisher Scientific, 5781 Van Allen Way, Carlsbad, CA 92008; 3CosmosID, 155 Gibbs St #436, Rockville, MD 20850

Figure 3

Transposon-based library preparation. Each transposon complex consists of a tetramer of MuA transposase and two double-stranded transposon end oligonucleotides. The complexes insert the transposon ends into target gDNA, resulting in fragmentation and simultaneous tagging with the transposon sequence. Those sequences are used as priming sites for introduction of sequencing adapters during amplification by PCR.

Figure 4

Observed read length distribution at 200 flows is similar to standard 500 flows and with the expected amplicon insert length distribution The majority of the amplicons for the targeted sequencing libraries have a mean insert length of 95 bp; it was estimated that the minimum number of flows needed was 200. Analysis of Ion S5 XL runs (with 500 flows) at 300, 250, 200 and 180 flows demonstrated that at 200 flows, the read length distribution is still comparable to 500 flows. Whereas at 180 flows, read length distribution is starting to be compromised.

Decreasing overall sequencing workflow times on the Ion S5™ XL sequencing system. Improvements in library preparation, template preparation (clonal amplification of library molecules onto capture beads) and optimizing sequencing reagent flows on the S5 system provide a dramatic decrease in overall sequencing workflow time. For Ion AmpliSeq™-based libraries such as the pan-bacterial ID panel sequencing and data analysis can be completed in about 6.5 hrs. For whole genome sequencing libraries produced from bacterial isolates, the sequencing workflow (from library preparation to data) can be completed in approximately 11 hrs.

Figure 1

Figure 5

Figure 6 Figure 7

Figure 8

INTRODUCTION At this time next generation sequencing (NGS) is hindered by slow and often manual workflow procedures. Decreasing overall workflow times is critical for the widespread adoption of targeted and whole genome sequencing (WGS) for many time-sensitive applications, in particular for infectious disease analysis. To this end, we describe improvements to the four main steps of the NGS workflow: i) library preparation; ii) template preparation, iii) sequencing; iv) and data analysis. Together, these advances dramatically decrease the overall turnaround times. Ion Torrent semiconductor-based sequencing instruments utilities flow sequencing with speed largely dependent on and the number of nucleotide flows (one flow produces ~0.5 base) and the speed of the flows (Figure 2).

CONCLUSIONS For the targeted assays described, the total turnaround time could be completed in a standard workday

ACKNOWLEDGEMENTS Bacterial ID AmpliSeq™ assay: Kunal Banjara, Jamsheed Ghadiri, Andrew Hutchison, Peter Vander Horn, Karen Clyde, Nisha Mulakken, Rajesh Gottimukkala, Diana Jeon, and Simon Cawley. Torrent Suite Software: Dominique Belhachemi, Christian Koller, and Mohit Gupta

TRADEMARKS/LICENSING © 2016 Thermo Fisher Scientific, Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and its subsidiaries unless otherwise specified.

TT01: Speeding up sequencing: Sequencing in an hour enables sample to answer in a workday

Thermo Fisher Scientific • 5791 Van Allen Way • Carlsbad, CA 92008 • www.lifetechnologies.com

A sequencing run on the Ion S5™ XL System is started by inserting an Ion S5 chip preloaded with templated beads containing clonal amplified library molecules. Nucleotides are flowed across the chip with the addition of bases by the DNA polymerase resulting in the production of hydrogen ions, changing the pH which is converted to a sequencing signal through ion-sensitive wells that hold the templated beads. Read lengths of up to 400 bp can be produced with three available semiconductor chips that produce up to 5, 20, or 80 million reads per chip

Figure 2

Whole Genome and Targeted Sequencing: Sample-to-data in ~6.5 to 11 hr

Bacterial Nucleic acid Manual Isothermal Amplification (2 hr)

Sample Template Preparation Rapid Ion S5 XL Sequencing CosmosID

200 flows (55 min)

300 flows (80 min)

N/A

~ 1 hr

Targeted AmpliSeq panel (3.5 hr)

WGS with MuSeek (1.5 hr)

Library Preparation

1.5 - 3.5 hr 2 - 7 hr 1 hr 55 - 80 min

Total time

~ 6.5 hr

~ 11.0 hr

MATERIALS AND METHODS The new rapid workflow innovations were applied to two different library preparation protocols: i) targeted libraries created using a highly-multiplexed PCR approach consisting of 1200 amplicons targeting the 16S rRNA gene as well as species-specific identification targets and antimicrobial resistance determinants; and ii) an unbiased WGS approach using a MuA transposon-based library preparation method. (Figure 3). WGS libraries: Purified Escherichia coli (balanced G:C content), Rhodopseudomonas palustris (high G:C content), and Staphylococcus aureus (low G:C content) gDNA was used as input into the MuSeek™ library preparation protocol (Figure 3). Targeted libraries: Total nucleic acids from six bacterial cultures (Acinetobacter baumannii, Enterobacter cloacae, Enterococcus faecium, Klebsiella pneumoniae, Pseudomonas aeruginosa, Staphylococcus aureus) were extracted as input for into the Ion AmpliSeq™ Library Kit 2.0, except thermocycling were optimized by reducing anneal/extend time from 4min to 1min. Template preparation: Targeted libraries were clonally amplified using an isothermal amplification approach using the Ion PGM™ Template IA 500 Kit. Template preparation for WGS libraries was performed using an automated and rapid an isothermal amplification approach on the Ion Chef™ System Sequencing: Sequencing was performed on the Ion S5™ XL Sequencer using an Ion 520™ Chip. Sequencing times were improved by reducing flow times and the total number of flows. The implementation of On-Instrument Analysis (OIA) enabled near real-time base calling reducing the total primary analysis time. Bioinformatic Analysis: Unassembled WGS data were analyzed by CosmosID MetaGenID Bioinformatics package using curated GenBook databases of Bacteria, Virus, Fungi, Parasite, Antibiotic Resistance, and Virulence Factors.

RESULTS Targeted and WGS libraries were generated, sequenced, and analyzed in ≤6.5 – 11 hours with targeted sequencing and analysis taking as little as 50 minutes compared to 2.5 hours and 1 hour for standard sequencing and analysis, respectively. Analysis of sequencing accuracy for targeted libraries revealed a raw read accuracy >99.5%, comparable to data from the standard workflow (Figure 4). The read length distribution for the targeted libraries was similar to standard workflow (Figure 5) with 100% specificity for species identification and ,at flows greater than 200, mapping to 16S rRNA families for targeted libraries indicating rapid sequencing without compromising detection accuracy (Figure 6). Sequencing times on the Ion S5TM XL were vastly improved by the combination of reduced flow times, total number of flows, and the implementation of On-Instrument Analysis (OIA), in which phase estimation prior to base calling was moved on instrument (Figure 7). The modifications for rapid sequencing were found to be robust and highly reproducible (Figure 8). Sequencing libraries prepared from bacterial genomes representing a range of GC content demonstrated that the resulting shortened average read lengths did not adversely affect species identification (Figure 9). Antimicrobial resistance determinant identification from WGS libraries was robust to 300 flows (Figure 10)

Read quality of Ion S5 XL with reduced flow number numbers is comparable to that of standard Ion PGM flow numbers. Decreased flow numbers demonstrate a raw read accuracy >99.5% with Ion S5 XL at 300 and 200 flows

Performance of S5 XL Sequencing with modified parameters is highly reproducible (n=3) for targeted sequencing using the AmpliSeq panel

Sequencing times on the Ion S5 XL were improved by reducing flow times, the total number of flows, and the implementation of On-Instrument Analysis (OIA)

Reducing flow number to 200 does not reduce total number of reads nor the accuracy of species ID amplicon alignment nor 16S rRNA amplicon assignment. ~100% specificity with species ID targets were observed 180 flows. However, 16S rRNA identification specificity was affected at 180 flows. Based the above results, a 200 flow minimum could be used for targeted sequencing using this AmpliSeq panel without compromising detection accuracy using universal and species-specific amplicons.

Reducing the flows number (400, 300, 200, 150) shortened the average read length for WGS libraries (reads subsampled to 1.1 million) but did not adversely affect species identification by percent matching of total unique k-mers at ≥300 flows

Figure 10

Figure 9

Reducing flow number to 300 does not reduce identification (percent matching of total unique k-mers) of antimicrobial resistance determinants and virulence factors in WGS libraries

Isothermal Amplification on Ion Chef (7 hr)

For Research Use Only. Not for use in diagnostic applications