we obtained breast cancer tissues from the breast cancer biospecimen repository of fred hutchinson...

1
We obtained breast cancer tissues from the Breast Cancer Biospecimen Repository of Fred Hutchinson Cancer Research Center. We performed two rounds of next-gen sequencing; 1) a primary whole exome sequencing on the normal subsection, seven primary subsections, and one metastatic subsection, which identified 281 mutation candidates. 2) a deep sequencing on the normal subsection, eight primary subsections, and two metastatic subsections which was targeted on the candidates and validated 17 mutations. Deep targeted sequencing (median coverage of 1100 reads per locus) provided reliable counts of the normal allele, and counts of the alternate allele for our algorithm. The approximate anatomic location of the samples, and the frequency of 17 alternative alleles for each of 12 tumor samples are shown in the plot in right. Performance on simulated data To validate our implementation of the EM optimization procedure and to understand our model’s behavior, we produce simulated deep sequencing data and measure the extent to which the model successfully recovers the true clonal structure of the data. We observed two primary trends: the overall error rate, as measured by either genotype or clone frequency error, decreases systematically as the number of samples increases, and increases as the number of clones increases. Overall, both error rates are low, especially for the case of 3 clones. Abstract The ability of cancer to evolve within an individual patient is the most significant reason that cancer treatments fail. Because today’s oncologists lack the means to predict a cancer’s next move, the disease can escape the current treatment strategies. Capturing the subclonal heterogeneity of tumors can improve cancer treatment significantly by providing effective insight into the structure of the tumor and the history of a patient's cancer. If the mutations specific to each subclone are known, clinicians can design the most efficient treatment to block the escape mechanisms of the tumor by attacking all subclones simultaneously. We obtained data from next-gen sequencing of several tumor samples from a single patient in a single time point, and developed a novel method that analyzes thess data to accurately estimate the frequency and mutational content of subclones. We model the counts of alternative allele with a binomial distribution, and infer the parameters such that the likelihood of the observed data is maximized. The outputs of our algorithm are genotypes of all clones, and their frequencies in each sample. Our results can be used to infer a phylogeny tree which describes the evolution of the cancer in time, and also provide a map of the clonal heterogeneity of cancer which describes how the tumor evolved anatomically. Methodology Clonal structure can be inferred from multiple sections of a breast cancer by a novel binomial model Habil Zare 1 , Junfeng Wang 2 , Alex Hu 1 , Daniela Witten 3 , Anthony Blau 2 , and William S. Noble 1 1 Department of Genome Sciences, University of Washington, Seattle, WA, USA 2 Devision of Hematology, Deparment of Medicine, University of Washington, Seattle, WA, USA 3 Department of Biostatistics, University of Washington, Seattle, WA, USA Results Conclusion Data A collection of subsections are subjected to next-generation sequencing to measure counts of two allelse—the normal allele that was observed in a matched normal sample at that locus, and a tumor allele. The resulting counts matrices are provided as input to an inference procedure that estimates the clonal genotypes and frequencies. We developed a novel method that uses the EM algorithm to maximize the likelihood of observed data assuming the alternative counts have a binomial distribution. Clone frequencies vary smoothly across the tumor Each panel plots, for a different section, the pattern of inferred clone frequencies across subsections. Subsections in white were not subjected to sequencing. The primary clone frequencies vary in a monotonic fashion as we traverse the sample from left to right. Evolution of a breast cancer The above phylogeny tree shows the inferred clonal phylogeny from our deep sequencing data assuming there are 4 clones. Nodes correspond to observed or inferred clonal populations, and edges are annotated with mutations that occur between the parent and child clones. Two mutations are grouped into a colored box if they both occur on the same branch in all four phylogenies inferred under the assumption of 3, 4, 5, or 6 clones (data not shown). The tree provides valuable insight into the development of each clone by ordering in time the mutations which lead to that clone. We developed a method to infer the clonal structure of a single cancer from multiple samples of the same tumor. We provide three types of evidence that the inferred structure is accurate: (1) analysis of simulated data, (2) analysis of the inferred clone frequencies relative to tumor anatomy, and (3) consistency of the clonal genotypes with a phylogenetic tree. The inferred clonal architecture of a tumor may help in understanding its etiology and in designing appropriate treatments.

Upload: janel-daniels

Post on 04-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: We obtained breast cancer tissues from the Breast Cancer Biospecimen Repository of Fred Hutchinson Cancer Research Center. We performed two rounds of next-gen

• We obtained breast cancer tissues from the Breast Cancer Biospecimen Repository of Fred Hutchinson Cancer Research Center.

• We performed two rounds of next-gen sequencing; 1) a primary whole exome sequencing on the normal subsection, seven primary subsections, and one metastatic subsection, which identified 281 mutation candidates.2) a deep sequencing on the normal subsection, eight primary subsections, and two metastatic subsections which was targeted on the candidates and validated 17 mutations.

• Deep targeted sequencing (median coverage of 1100 reads per locus) provided reliable counts of the normal allele, and counts of the alternate allele for our algorithm.

• The approximate anatomic location of the samples, and the frequency of 17 alternative alleles for each of 12 tumor samples are shown in the plot in right.

Performance on simulated data

To validate our implementation of the EM optimization procedure and to understand our model’s behavior, we produce simulated deep sequencing data and measure the extent to which the model successfully recovers the true clonal structure of the data. We observed two primary trends: the overall error rate, as measured by either genotype or clone frequency error, decreases systematically as the number of samples increases, and increases as the number of clones increases. Overall, both error rates are low, especially for the case of 3 clones.

Abstract

The ability of cancer to evolve within an individual patient is the most significant reason that cancer treatments fail. Because today’s oncologists lack the means to predict a cancer’s next move, the disease can escape the current treatment strategies. Capturing the subclonal heterogeneity of tumors can improve cancer treatment significantly by providing effective insight into the structure of the tumor and the history of a patient's cancer. If the mutations specific to each subclone are known, clinicians can design the most efficient treatment to block the escape mechanisms of the tumor by attacking all subclones simultaneously.

We obtained data from next-gen sequencing of several tumor samples from a single patient in a single time point, and developed a novel method that analyzes thess data to accurately estimate the frequency and mutational content of subclones. We model the counts of alternative allele with a binomial distribution, and infer the parameters such that the likelihood of the observed data is maximized. The outputs of our algorithm are genotypes of all clones, and their frequencies in each sample. Our results can be used to infer a phylogeny tree which describes the evolution of the cancer in time, and also provide a map of the clonal heterogeneity of cancer which describes how the tumor evolved anatomically.

Methodology

Clonal structure can be inferred from multiple sections of a breast cancer by a novel binomial model

Habil Zare1, Junfeng Wang2, Alex Hu1, Daniela Witten3, Anthony Blau2, and

William S. Noble1

1Department of Genome Sciences, University of Washington, Seattle, WA, USA2Devision of Hematology, Deparment of Medicine, University of Washington, Seattle, WA, USA3Department of Biostatistics, University of Washington, Seattle, WA, USA

Results

Conclusion

Data

A collection of subsections are subjected to next-generation sequencing to measure counts of two allelse—the normal allele that was observed in a matched normal sample at that locus, and a tumor allele. The resulting counts matrices are provided as input to an inference procedure that estimates the clonal genotypes and frequencies. We developed a novel method that uses the EM algorithm to maximize the likelihood of observed data assuming the alternative counts have a binomial distribution.

Clone frequencies vary smoothly across the tumor

Each panel plots, for a different section, the pattern of inferred clone frequencies across subsections. Subsections in white were not subjected to sequencing. The primary clone frequencies vary in a monotonic fashion as we traverse the sample from left to right.

Evolution of a breast cancer

The above phylogeny tree shows the inferred clonal phylogeny from our deep sequencing data assuming there are 4 clones. Nodes correspond to observed or inferred clonal populations, and edges are annotated with mutations that occur between the parent and child clones. Two mutations are grouped into a colored box if they both occur on the same branch in all four phylogenies inferred under the assumption of 3, 4, 5, or 6 clones (data not shown). The tree provides valuable insight into the development of each clone by ordering in time the mutations which lead to that clone.

We developed a method to infer the clonal structure of a single cancer from multiple samples of the same tumor. We provide three types of evidence that the inferred structure is accurate: (1) analysis of simulated data, (2) analysis of the inferred clone frequencies relative to tumor anatomy, and (3) consistency of the clonal genotypes with a phylogenetic tree. The inferred clonal architecture of a tumor may help in understanding its etiology and in designing appropriate treatments.