the value of long read amplicon sequencing for clinical applications · 2019-10-10 · ngs is...

1
For Research Use Only. Not for use in diagnostic procedures. © Copyright 2019 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. Femto Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners. The Value of Long Read Amplicon Sequencing for Clinical Applications K. Neveling 1 , R. Derks 1, , A. Den Ouden 1 , S. van der Heuvel 1 , C. Heiner 3 , I. McLaughlin 3 , J. Harding 3 , L. Aro 3 , D. Lugtenberg 1 , A. Mensenkamp 1 , M. Kwint 1 , M. Tjon-Pon-Fong 1 , M. van der Vorst 1 , M. Ligtenberg 1,2 , H. Yntema 1 , M. Nelen 1 , L. Vissers 1 , L. Haer-Wigman 1 , R. de Voer 1 1 Department of Human Genetics, Radboud university medical center, Nijmegen, the Netherlands 2 Department of Pathology, Radboud university medical center, Nijmegen, the Netherlands 3 Pacific Biosciences, 1305 O’Brien Drive, Menlo Park, CA 94025 NGS is commonly used for amplicon sequencing in clinical applications to study genetic disorders and detect disease- causing mutations. This approach can be plagued by limited ability to phase sequence variants and makes interpretation of sequence data difficult when pseudogenes are present. Long-read highly accurate amplicon sequencing can provide very accurate, efficient, high throughput (through multiplexing) sequences from single molecules, with read lengths largely limited by PCR. Data is easy to interpret; phased variants and breakpoints are present within high fidelity individual reads. Here we show SMRT Sequencing of the PMS2 and OPN1 (MW and LW) genes using the Sequel System. Homologous regions make NGS and MLPA results very difficult to interpret. Introduction Long-Read SMRT Sequencing Workflow Methods and Results OPN1 (MW and LW) Sequencing Analysis Workflow Circular Consensus Sequencing (CCS) Analysis Methods and Results PMS2 Figure 1: Design of the PMS2 LR-PCR fragments. Figure 2: Run metrics of a 16 kb amplicon run: (I) the majority of mapped CCS reads (HiFi reads) represent the 11.4, 13.6 and 16.8 kb PMS2 fragments; (II) the N50 polymerase read length is >100 kb; and (III) the insert read length density plot shows the three LR_PCR fragments. Conclusion Targeted long-read sequencing with PacBio is highly accurate (>99.99%) and detects all types of variants, sequencing through various contexts. These results demonstrate the added value of long-read amplicon sequencing: Efficiency Less PCR, no nesting Fewer added tests (i.e. MLPA) Multiplexing for high throughput Improved results, easier data interpretation and analysis Distinguish between genes and pseudogenes Variant phasing within long reads Precise breakpoint detection Figure 3: Complete coverage of PMS2 by long-read sequencing (upper panel). Coverage of PMS2 is > 6000x, whereas the coverage of PMS2-CL is > 30x. Due to this large difference we do not worry about sequencing the pseudogene as well (lower panel). Figure 4: Long-read sequencing of PMS2 can detect exon deletions >1 kb in size (upper panel), SNVs (lower panel; left), small indels (lower panel; middle) and accurate breakpoint mapping of most exon deletions (lower panel; right). Figure 5: Representation of 16 kb LR-PCRs for OPN1 LW and MW. Figure 7: Protanopia patient, with two PCR products (1xLW and 1x MW) . Following sequencing, three MW copies were detected, one has an exon 1 that belongs to LW. All three copies map to different locations in the genome. The data confirm the patient’s phenotype. For PMS2, three amplicons ranging in size from 11.4 kb to 16.8 kb were designed using unique primers, covering 36 kb of sequence. SMRT Sequencing produced HiFi reads with coverage ranging from 200- fold to 1500-fold; data clearly indicated 2 deletions >1000 kb with precise breakpoint mapping. Full-length amplicons for OPN1LW and OPN1MW, 14 kb and 16 kb, respectively, were generated from samples with different known gene conversions / hybrid genes and subjected to SMRT Sequencing. For all cases, PacBio sequencing was 100% concordant, finding all gene conversions and hybrid genes originally identified by orthogonal technologies. Plus, in some cases SMRT sequencing generated additional relevant data. Polymerase Read (1 pass example) 1. Pre-Process Filtering (Analysis Parameters) Barcode 2 Barcode 3 Barcode “n” Barcode 1 BC 1 BC 1 Per Single Polymerase Read Barcode Group 1 Barcode Group “n” Barcode Group 2 BC 1 Adapter 1 Adapter 2 BC 1 In SMRT Analysis: 2. Demultiplex Subreads for Barcode 1 (from a single polymerase read) High-Accuracy CCS Read 3. Generate Circular Consensus § The CCS analysis method combines multiple passes from a single molecule resulting in high individual read accuracy (>99%) § CCS generate HiFi reads ready for further analysis (alignment, variant calling, etc. with standard informatic tools Amplicon Preparation SMRTbell Library Preparation Sequencing & Analysis PCR Amplicon Generation Amplicon QC AMPure PB Purification End Repair & Adapter Ligation AMPure PB Purification DNA Damage Repair AMPure PB Purification (X2-3) ExoIII and VII Library Cleanup Sequencing Primer Annealing Polymerase Binding Data Analysis Sequencing Pool barcode tagged samples post PCR amplification solved Mapped CCS Read Length x Base Yield Density Insert Read Length Density Figure 6: Run metrics of a 16 kb amplicon run. The N50 polymerase read length is >150 kb. The insert read length density plot shows the ~16 kb amplicons. The run output was 18.5 Gb. (II) Base Yield Density (III) Insert Read Length Density (I) Mapped CCS Read Length PMS2 Fragment Fragment Fragment PMS2 PMS2CL 1 PMS2 PMS2-3 - -2

Upload: others

Post on 10-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Value of Long Read Amplicon Sequencing for Clinical Applications · 2019-10-10 · NGS is commonly used for amplicon sequencing in clinical applications to study genetic disorders

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2019 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. Femto Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners.

The Value of Long Read Amplicon Sequencing for Clinical ApplicationsK. Neveling1, R. Derks1,, A. Den Ouden1 , S. van der Heuvel1, C. Heiner3, I. McLaughlin3, J. Harding3,L. Aro3, D. Lugtenberg1, A. Mensenkamp1, M. Kwint1, M. Tjon-Pon-Fong1, M. van der Vorst1,M. Ligtenberg1,2, H. Yntema1, M. Nelen1, L. Vissers1, L. Haer-Wigman1, R. de Voer1

1Department of Human Genetics, Radboud university medical center, Nijmegen, the Netherlands2Department of Pathology, Radboud university medical center, Nijmegen, the Netherlands3Pacific Biosciences, 1305 O’Brien Drive, Menlo Park, CA 94025

NGS is commonly used for amplicon sequencing in clinical applications to study genetic disorders and detect disease-causing mutations. This approach can be plagued by limited ability to phase sequence variants and makes interpretation of sequence data difficult when pseudogenes are present. Long-read highly accurate amplicon sequencing can provide very accurate, efficient, high throughput (through multiplexing) sequences from single molecules, with read lengths largely limited by PCR. Data is easy to interpret; phased variants and breakpoints are present within high fidelity individual reads.Here we show SMRT Sequencing of the PMS2 and OPN1 (MW and LW) genes using the Sequel System. Homologous regions make NGS and MLPA results very difficult to interpret.

Introduction

Long-Read SMRT Sequencing Workflow

Methods and Results OPN1 (MW and LW)

Sequencing Analysis Workflow

Circular Consensus Sequencing (CCS) Analysis

Methods and Results PMS2

Figure 1: Design of the PMS2 LR-PCR fragments.

Figure 2: Run metrics of a 16 kb amplicon run: (I) the majority of mapped CCS reads (HiFi reads) represent the 11.4, 13.6 and 16.8 kb PMS2 fragments; (II) the N50 polymerase read length is >100 kb; and (III) the insert read length density plot shows the three LR_PCRfragments.

Conclusion

Targeted long-read sequencing with PacBio is highly accurate (>99.99%) and detects all types of variants, sequencing through various contexts. These results demonstrate the added value of long-read amplicon sequencing:Efficiency• Less PCR, no nesting• Fewer added tests (i.e. MLPA)• Multiplexing for high throughputImproved results, easier data interpretation and analysis• Distinguish between genes and pseudogenes• Variant phasing within long reads• Precise breakpoint detection

Figure 3: Complete coverage of PMS2 by long-read sequencing (upper panel). Coverage of PMS2 is > 6000x, whereas the coverage of PMS2-CL is > 30x. Due to this large difference we do not worry about sequencing the pseudogene as well (lower panel).

Figure 4: Long-read sequencing of PMS2 can detect exon deletions >1 kb in size (upper panel), SNVs (lower panel; left), small indels(lower panel; middle) and accurate breakpoint mapping of mostexon deletions (lower panel; right).

Figure 5: Representation of 16 kb LR-PCRs for OPN1 LW and MW.

Figure 7: Protanopia patient, with two PCR products (1xLW and 1x MW) . Following sequencing, three MW copies were detected, one has an exon 1 that belongs to LW. All three copies map to different locations in the genome. The data confirm the patient’s phenotype.

For PMS2, three amplicons ranging in size from 11.4 kb to 16.8 kb were designed using unique primers, covering 36 kb of sequence. SMRT Sequencing produced HiFi reads with coverage ranging from 200-fold to 1500-fold; data clearly indicated 2 deletions >1000 kb with precise breakpoint mapping.

Full-length amplicons for OPN1LW and OPN1MW, 14 kb and 16 kb, respectively, were generated from samples with different known gene conversions / hybrid genes and subjected to SMRT Sequencing. For all cases, PacBio sequencing was 100% concordant, finding all gene conversions and hybrid genes originally identified by orthogonal technologies. Plus, in some cases SMRT sequencing generated additional relevant data.

Polymerase Read(1 pass example)

1. Pre-Process Filtering (Analysis Parameters)

Barcode 2Barcode 3

Barcode “n”

Barcode 1

BC 1 BC 1

Per Single Polymerase Read

Barcode Group 1 Barcode Group “n”Barcode Group 2

BC 1

Adapter 1 Adapter 2BC 1

In SMRT Analysis:

2. Demultiplex

Subreads for Barcode 1(from a single polymerase read)

High-Accuracy CCS Read

3. Generate Circular Consensus§ The CCS analysis method

combines multiple passes from asingle molecule resulting in highindividual read accuracy (>99%)

§ CCS generate HiFi reads readyfor further analysis (alignment,variant calling, etc. with standardinformatic tools

Ampl

icon

Prep

arat

ion

SMR

Tbel

l Lib

rary

Prep

arat

ion

Sequ

enci

ng&

Anal

ysis

PCR Amplicon Generation

Amplicon QC

AMPure PB Purification

End Repair & Adapter Ligation

AMPure PB Purification

DNA Damage Repair

AMPure PB Purification (X2-3)

ExoIII and VII Library Cleanup

Sequencing Primer Annealing

Polymerase Binding

Data Analysis

Sequencing

Pool barcode tagged samples post PCR amplification

solved

Mapped CCS Read Length x Base Yield Density Insert Read Length Density

Figure 6: Run metrics of a 16 kb amplicon run. The N50 polymerase read length is >150 kb. The insert read length density plot shows the ~16 kb amplicons. The run output was 18.5 Gb.

(II) Base Yield Density (III) Insert ReadLength Density(I) Mapped CCS Read Length

PMS2Fragment Fragment

Fragment

PMS2

PMS2CL

1PMS2

PMS2-3

--2