improved accuracy of ultra-low frequency variant detection using...

1
Improved accuracy of ultra-low frequency variant detection using a novel library tagging strategy Jiashi Wang * , Kevin Lai, Madelyn Light, Kristina Giorda, Mirna Jarosz, Yun Bao, and Caifu Chen Integrated DNA Technologies, Redwood City, CA * Corresponding author: [email protected] Introduction Accurate variant detection at 0.1% allele frequency Simple workflow and analysis overview Figure 1. Novel double-stranded molecular tagging strategy enables use of genetic information embedded in both DNA strands. xGen Duplex Seq Adapters incorporate degenerate bases to pair top and bottom strands during analysis. NGS technologies and throughput allow analyses of low-input clinical samples, which are rapidly changing and shaping the way future cancer care will be carried out Detection of ultra-low frequency (<1%) variants is confounded by errors introduced during NGS sample preparation, library target enrichment, and sequencing A unique library-tagging adapter strategy that offers significantly higher library conversion than previous molecular labeling approaches has been developed based on IDT xGen ® Duplex Seq AdaptersTech Access Figure 3. Duplexed molecular tagging and consensus analysis enable error correction. Diagram of analysis methods used to evaluate duplex adapters with true positive (TP) variants shown in green. Reads which map to the same location and share the same unique molecular barcode (UMI) are used to build single-strand consensus (Min3minimum of 3 reads) or duplex consensus reads, when both the top and bottom strand are observed. Figure 2. Hybridization capturebased targeted sequencing workflow. Conclusions xGen Duplex Seq Adapters (1) are compatible with common library preparation kits and many sample types, including FFPE and cfDNA; (2) are easily incorporated into hybridization-based target enrichment workflows; (3) enable exceptional error correction strategies, reducing the number of false positive calls; and (4) can accurately detect rare variants as low as 0.1%. Figure 4. Low-frequency variant model. Two samples, Genome In A Bottle (GIAB) genomic DNANA12878 and NA24385, were mixed. All libraries were enriched with a custom xGen Lockdown ® Panel (IDT) targeting a 75 kb region of highly polymorphic SNPs. Accuracy of variant calling was assessed over a 35 kb GIAB high-confidence region. Figure 5. Accurate low-frequency variant detection. 100 ng of cell-line DNA (0.2% mixture) was acoustically sheared to 300 bp for library preparation with the KAPA Hyper Prep Kit (Kapa Biosystems) and xGen Duplex Seq Adapters. (A) Raw or duplicate-aware coverages are shown. (B) Sensitivity is correlated with coverage measured with each deduplication method, while using a variant- calling threshold of 0. The positive predictive value (PPV) was largely dictated by the degree of molecular tagging and read consensus reconstruction for low-frequency variant detection. (C) Rates of base substitution, i.e. pair-specific error rate, was measured by Picard suite. A C B A B Figure 6. Improved coverage and variant calling for cell-free DNA samples. Commercially acquired cell-free DNA (cfDNA) samples that were individually genotyped across the target region were mixed to model low-frequency variants with minimum alternative allele frequencies of 0.1%. 25 ng of cfDNA mixture was used for library preparation. The raw sequencing depth was ~80,000X.

Upload: others

Post on 25-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improved accuracy of ultra-low frequency variant detection using …go.idtdna.com/rs/400-UEU-432/images/Improved accuracy of... · 2020. 6. 28. · xGen Duplex Seq Adapters incorporate

Improved accuracy of ultra-low frequency variant detection using a

novel library tagging strategy Jiashi Wang*, Kevin Lai, Madelyn Light, Kristina Giorda, Mirna Jarosz, Yun Bao, and Caifu Chen

Integrated DNA Technologies, Redwood City, CA

* Corresponding author: [email protected]

Introduction Accurate variant detection at 0.1% allele frequency

Simple workflow and analysis overview

Figure 1. Novel double-stranded molecular tagging strategy enables use

of genetic information embedded in both DNA strands. xGen Duplex Seq

Adapters incorporate degenerate bases to pair top and bottom strands during

analysis.

• NGS technologies and throughput allow analyses of low-input clinical

samples, which are rapidly changing and shaping the way future cancer

care will be carried out

• Detection of ultra-low frequency (<1%) variants is confounded by errors

introduced during NGS sample preparation, library target enrichment, and

sequencing

• A unique library-tagging adapter strategy that offers significantly higher

library conversion than previous molecular labeling approaches has been

developed based on IDT xGen® Duplex Seq Adapters—Tech Access

Figure 3. Duplexed molecular tagging and consensus analysis enable

error correction. Diagram of analysis methods used to evaluate duplex

adapters with true positive (TP) variants shown in green. Reads which map

to the same location and share the same unique molecular barcode (UMI)

are used to build single-strand consensus (Min3–minimum of 3 reads) or

duplex consensus reads, when both the top and bottom strand are observed.

Figure 2. Hybridization capture–based targeted sequencing workflow.

Conclusions

xGen Duplex Seq Adapters (1) are compatible with common library preparation kits and many sample types, including FFPE and

cfDNA; (2) are easily incorporated into hybridization-based target enrichment workflows; (3) enable exceptional error correction

strategies, reducing the number of false positive calls; and (4) can accurately detect rare variants as low as 0.1%.

Figure 4. Low-frequency

variant model. Two samples,

Genome In A Bottle (GIAB)

genomic DNA—NA12878

and NA24385, were mixed.

All libraries were enriched

with a custom xGen

Lockdown® Panel (IDT)

targeting a 75 kb region of

highly polymorphic SNPs.

Accuracy of variant calling

was assessed over a 35 kb GIAB high-confidence region.

Figure 5. Accurate low-frequency variant detection. 100 ng of cell-line DNA (0.2% mixture) was acoustically sheared to 300 bp for

library preparation with the KAPA Hyper Prep Kit (Kapa Biosystems) and xGen Duplex Seq Adapters. (A) Raw or duplicate-aware

coverages are shown. (B) Sensitivity is correlated with coverage measured with each deduplication method, while using a variant-

calling threshold of 0. The positive predictive value (PPV) was largely dictated by the degree of molecular tagging and read consensus

reconstruction for low-frequency variant detection. (C) Rates of base substitution, i.e. pair-specific error rate, was measured by Picard

suite.

A

CB

A B

Figure 6. Improved coverage and variant calling for cell-free DNA samples. Commercially acquired cell-free DNA (cfDNA) samples

that were individually genotyped across the target region were mixed to model low-frequency variants with minimum alternative allele frequencies of 0.1%. 25 ng of cfDNA mixture was used for library preparation. The raw sequencing depth was ~80,000X.