developing a framework for for detection of low frequency somatic genetic alterations in targeted...

1
Overview of the Framework Background BACKGROUND: Cancer is a complex, heterogeneous disease of the genome. Most cancers result from an accumulation of multiple genetic alterations that lead to dysfunction of cancer-associated genes and pathways. Recent advances in sequencing technology have enabled comprehensive profiling of genetic alterations in cancer. We have established a targeted sequencing platform (IMPACT: Integrated Mutation Profiling of Actionable Cancer Targets) using hybridization capture and next-generation sequencing (NGS) technology, which can reveal mutations, indels and copy number alterations involving 340 cancer related genes. METHOD: To identify mutations, indels, and copy number alterations, we present a unified analytic framework developed in perl to discover and genotype variation among multiple samples simultaneously with high sensitivity and specificity. Our framework incorporates many elements that have become standard practice for NGS data analysis such as i) adaptor trimming, ii) mapping and duplicate masking, iii) local realignment around indels, iv) base quality score recalibration, v) SNV and indel calling, vi) annotation, and vii) filtering. Importantly, we utilize a tumor-normal pair approach, where each tumor is always processed with a matched normal sample in order distinguish somatic mutations from inherited variants. Local realignment is performed jointly for all samples from the same patient to maximize the sensitivity and specificity for detecting somatic indels. To distinguish true low-frequency somatic mutations from systematic sequencing artifacts, we genotype each candidate sequence variant in a collection of unmatched normal samples from multiple sequence runs. Filtering based on genotyping and genomic annotation not only eliminates sequencing artifacts but also provides confidence in the calls that are made. We have applied this framework to analyze deep coverage targeted sequencing data from >1,000 archived tumor specimens and have implemented it for the prospective characterization of patient samples in the Molecular Diagnostics Service at Memorial Sloan-Kettering Cancer Center. *Abstract altered after submission. Results Genome Informatics 2013 Meeting, 10/30/2013-11/03/2013, Cold Spring Harbor, NY Targeted Sequencing Developing a framework for for detection of low frequency somatic genetic alterations in targeted sequencing data Ronak H. Shah , A. Rose Brannon, Donavan T. Cheng, Helen H. Won, Sasinya N. Scott, Ahmet Zehir, Talia Mitchell, Ryma Benayed, Catherine O Reilly, Aijazuddin Syed, Nancy Bouvier, Michael F. Berger Department of Pathology, Memorial Sloan-Kettering Cancer Center, New York , NY Conclusions This analysis framework helps to identify low frequency, high confidence somatic alterations, making our targeted sequencing platform suitable for clinical use. This platform may provide important individual information regarding tumor initiation and progression and a more reliable prediction of personalized cancer therapies. Berger Lab and the Diagnostic Molecular Pathology Laboratory Staff References 1. McKenna A et all The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303. 2. Picard Tools:http://picard.sourceforge.net 3. Li H.et al. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. 4. MARTIN, M.. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, North America, 17, may. 2011. 5. Li H. et al. Fast and accurate short read alignment with Burrows- Wheeler Transform. Bioinformatics, 25:1754-60. Acknowledgements Prepare 24-48 libraries Probes for 340 cancer genes B B Sequence to 500-1000X (HiSeq 2500) Align to genome & analyze Hybridize & select (NimbleGen SeqCap: IMPACT Assay) Image 1: KRAS p.Q61H exon 3 mutation found at 8% allele frequency in a patient having liver cancer. NORMAL TUMOR Image 2: EGFR p.M766_A767insASV exon 20 insertion found at 7% allele frequency in a patient having lung cancer. TUMOR NORMAL Image 3: EGFR p.K745_A750del exon 19 deletion found at 6% allele frequency in a patient having lung cancer. NORMAL TUMOR Image 5: EGFR amplification observed with positive fold change of 21 & CDKN2A/CDKN2B deletion observed with negative fold change of 5 in a patient having glioblastoma. EGFR Amplific ation (21 folds) CDKN2A/ CDKN2B Deletion (- 5 folds) NORMAL TUMOR Image 4: EML4-Alk fusion detected as inversion with 3% of reads supporting the fusion in patient having lung cancer. Effect of filters on mutation calling Sensitivity to detect mutations at all frequencies Correlation : 99% Type of Mutation Raw Calls Filter Using Allele Depth & Variant Frequency Filter using Annotation , Genotyping Informatio n & Variant Frequency Filter from Genotyping informatio n for other normal’s SNV's 9674 652 (15%) 165 (25%) 137 (83 %) INDEL's 1900660 1644 (0.08%) 102 (6%) 66 (64%) Table 1: Change in number of mutation calls with the application of filters. These mutation consist of 30 samples sequenced in a single pool. Image 7: 99 % Correlation is is achieved between expected and observed variant frequency at snp sites for mixed normal samples vs. normal sample on its own. Sensitivity: Known and Novel Variant Frequency vs. Total Depth Image 6: Found all true positive mutation with 98% recall rate and also found many Hot-Spot mutation at varying variant frequencies. Hotspot mutations are recurrent mutations in Cosmic and TCGA. 98% of targets at >50% of median 99% of targets at >20% of median

Upload: ronak-shah

Post on 23-Dec-2014

42 views

Category:

Health & Medicine


1 download

DESCRIPTION

Cancer is a complex, heterogeneous disease of the genome. Most cancers result from an accumulation of multiple genetic alterations that lead to dysfunction of cancer-associated genes and pathways. Recent advances in sequencing technology have enabled comprehensive profiling of genetic alterations in cancer. We have established a targeted sequencing platform (IMPACT: Integrated Mutation Profiling of Actionable Cancer Targets) using hybridization capture and next-generation sequencing (NGS) technology, which can reveal mutations, indels and copy number alterations involving 340 cancer related genes.

TRANSCRIPT

Page 1: Developing a framework for for detection of low frequency somatic genetic alterations in targeted sequencing data

Overview of the Framework

Background

BACKGROUND: Cancer is a complex, heterogeneous disease of the genome. Most cancers result from an accumulation of multiple genetic alterations that lead to dysfunction of cancer-associated genes and pathways. Recent advances in sequencing technology have enabled comprehensive profiling of genetic alterations in cancer. We have established a targeted sequencing platform (IMPACT: Integrated Mutation Profiling of Actionable Cancer Targets) using hybridization capture and next-generation sequencing (NGS) technology, which can reveal mutations, indels and copy number alterations involving 340 cancer related genes. METHOD: To identify mutations, indels, and copy number alterations, we present a unified analytic framework developed in perl to discover and genotype variation among multiple samples simultaneously with high sensitivity and specificity. Our framework incorporates many elements that have become standard practice for NGS data analysis such as i) adaptor trimming, ii) mapping and duplicate masking, iii) local realignment around indels, iv) base quality score recalibration, v) SNV and indel calling, vi) annotation, and vii) filtering. Importantly, we utilize a tumor-normal pair approach, where each tumor is always processed with a matched normal sample in order distinguish somatic mutations from inherited variants. Local realignment is performed jointly for all samples from the same patient to maximize the sensitivity and specificity for detecting somatic indels. To distinguish true low-frequency somatic mutations from systematic sequencing artifacts, we genotype each candidate sequence variant in a collection of unmatched normal samples from multiple sequence runs. Filtering based on genotyping and genomic annotation not only eliminates sequencing artifacts but also provides confidence in the calls that are made. We have applied this framework to analyze deep coverage targeted sequencing data from >1,000 archived tumor specimens and have implemented it for the prospective characterization of patient samples in the Molecular Diagnostics Service at Memorial Sloan-Kettering Cancer Center.*Abstract altered after submission.

Results

Genome Informatics 2013 Meeting, 10/30/2013-11/03/2013, Cold Spring Harbor, NY

Targeted Sequencing

Developing a framework for for detection of low frequency somatic genetic alterations in targeted sequencing data

Ronak H. Shah, A. Rose Brannon, Donavan T. Cheng, Helen H. Won, Sasinya N. Scott, Ahmet Zehir, Talia Mitchell,

Ryma Benayed, Catherine O Reilly, Aijazuddin Syed, Nancy Bouvier, Michael F. Berger

Department of Pathology, Memorial Sloan-Kettering Cancer Center, New York , NY

Conclusions• This analysis framework helps to identify low frequency, high

confidence somatic alterations, making our targeted sequencing platform suitable for clinical use.

• This platform may provide important individual information regarding tumor initiation and progression and a more reliable prediction of personalized cancer therapies.

Berger Lab and the Diagnostic Molecular Pathology Laboratory Staff

References

1. McKenna A et all The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303.

2. Picard Tools:http://picard.sourceforge.net3. Li H.et al. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9.4. MARTIN, M.. Cutadapt removes adapter sequences from high-throughput sequencing reads.

EMBnet.journal, North America, 17, may. 2011.5. Li H. et al. Fast and accurate short read alignment with Burrows-Wheeler Transform.

Bioinformatics, 25:1754-60.

Acknowledgements

Prepare 24-48 libraries

Probes for 340 cancer genes

B

B

Sequence to 500-1000X (HiSeq 2500)

Align to genome & analyze

Hybridize & select(NimbleGen SeqCap:

IMPACT Assay)

Image 1: KRAS p.Q61H exon 3 mutation found at 8% allele frequency in a patient having liver cancer.

NORMAL

TUMOR

Image 2: EGFR p.M766_A767insASV exon 20 insertion found at 7% allele frequency in a patient having lung cancer.

TUMOR

NORMAL

Image 3: EGFR p.K745_A750del exon 19 deletion found at 6% allele frequency in a patient having lung cancer.

NORMAL

TUMOR

Image 5: EGFR amplification observed with positive fold change of 21 & CDKN2A/CDKN2B deletion observed with negative fold change of 5 in a patient having glioblastoma.

EGFR Amplificati

on (21 folds)

CDKN2A/CDKN2B

Deletion (- 5 folds)

NORMAL

TUMOR

Image 4: EML4-Alk fusion detected as inversion with 3% of reads supporting the fusion in patient having lung cancer.

Effect of filters on mutation calling

Sensitivity to detect mutations at all frequencies

Correlation : 99%

Type of Mutation

Raw Calls

Filter Using Allele Depth

& Variant Frequency

Filter using Annotation, Genotyping

Information & Variant

Frequency

Filter from Genotyping information

for other normal’s

SNV's 9674 652 (15%) 165 (25%) 137 (83 %)

INDEL's 1900660 1644 (0.08%) 102 (6%) 66 (64%)

Table 1: Change in number of mutation calls with the application of filters. These mutation consist of 30 samples sequenced in a single pool.

Image 7: 99 % Correlation is is achieved between expected

and observed variant frequency at snp sites for mixed normal samples vs. normal sample on its own.

Sensitivity: Known and Novel Variant Frequency vs. Total Depth

Image 6: Found all true positive mutation with 98% recall rate and also found many Hot-Spot mutation at varying variant frequencies. Hotspot mutations are recurrent mutations in Cosmic and TCGA.

98% of targets at >50% of median99% of targets at >20% of median