computational identification of tumor heterogeneity 2015-03-25 sangwoo kim

33
Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Upload: moris-lucas

Post on 13-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Computational Identification of Tumor heterogeneity

2015-03-25Sangwoo Kim

Page 2: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Tumor heterogeneity

• Inter-tumor heterogeneity: genetic and phenotypic variation be-tween individuals with the same tumor type

• Intra-tumor heterogeneity: subclonal diversity within a tumor

Page 3: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Tumor heterogeneity in AML

Page 4: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Tumor progression and response

Page 5: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Heterogeneity and resistance

Page 6: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Inferring tumor heterogeneity

1. single cell sequenc-ing

2. bulk sequencing and recon-struction

Page 7: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

COMPUTATIONAL IDENTIFICA-TION OF TUMOR SUBCLONES

Page 8: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Today’s paper 1 (PyClone)

• Shorab Shah, Ph.D.– Associate Professor in the Departments of Pathology

and Computer Science, University of British Colum-bia

– Dr. Shah’s work focuses on characterization of can-cer genomes for determination of pathogenic driver mutations in cancer subtypes and measuring and quantifying tumour evolution

Page 9: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Conceptual overview

• Sequencing – pool sequencing – unclassified tools

Page 10: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Allele frequency and Cellular prevalence

• Allele frequency (af): – ratio of alternative allele to total haploid

• cellular prevalence (cp): – proportion of tumor cells harboring a mutation

70%30%

subclone 1 (AA)

subclone 2 (AB)

• allele frequency = 15%• cellular prevalence = 30%

70%30%

subclone 1 (AA)

subclone 2 (AAB)

• allele frequency = 10%• cellular prevalence = 30%

Page 11: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Allele frequency to cellular preva-lence

Example AF Genotype CP

mutation 1 10% AB 20%

mutation 2 10% AAB 30%

mutation 3 10% ABB 15%

mutation 4 20% AB 40%

mutation 5 20% AABB 40%

mutation 6 50% AB 100%

mutation 7 50% ABB 75%

Genotype (copy number) is essential for heterogene-ity estimation

A toy example

Page 12: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Cellular prevalence and evolution model

Assumption:1) clonal population follows a perfect phylogeny:

no site mutates more than once in its evolutionary history and each harbors at most one somatic mutant genotype

2) clonal population follows a persistent phylogeny:mutations do not disappear or revert

Page 13: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Cellular prevalence and evolution model

10%

10% 20%

30%30%

What to infer:1) number and composition of subclones2) cellular prevalence (cp):

proportion of tumor cells harboring a mutation

Page 14: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Input and Output

• Input (observation):– a set of deeply sequenced mutations (AF)

• from one or multiple locus in each sample

– a measure of allele specific copy number at each muta-tion locus (genotype)

• Output:– CP of each mutation– Clustering among mutations– overall CP and cluster

Clusters and CP

CNV

muta-tion (AF)

Page 15: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Pyclone population structure

Allele frequency of this mutation: 6*4*(2/4) / {2*2 + 4*3 + 6*4}Cellular prevalence of this mutation: 6 / (4 + 6)

Page 16: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Things to consider

• fraction of cancer cell: t– fraction of normal cell = 1-t

• genotype of normal, reference, variant population of nth mutation– gN, gR, gV ∈ {-, A, B, AA, AB, BB, AAA, AAB...}

– ψn = (gnN , gn

R , gnV ) ∈ G3

• read depth at the locus of nth mutation: dn

• number of reads harboring nth mutation: bn

Cellular prevalence of nth mutation

Page 17: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

The generative model

prior parameter

posterior parameter

ψn = (gnN , gn

R , gnV )

φn = fraction of cancer cells from the variant populations

Page 18: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

The probability

the probability of sampling a read containing the variant allele covering a mutation with state ψ = (gN, gR, gV) and cellular preva-lence φ

c(g) : copy number of the genotype (e.g. g=AAB, c(g)=3)b(g) : number of variant allele of the genotype (e.g. g=AAB, b(g)=1)µ(g) : probability of sampling a variant allele from a cell = b(g)/c(g)

Page 19: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

The probability of bn

)

when cp is given we can calculate the probability of observing bn

Page 20: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

inferring cp from bn

1. mutations with same cellular prevalence are clustered to a same clone

2. We want to infer the most likely cellular prevalence of mutations from observation; and find clusters for subclonee,g, if the best is [0.7, 0.5, 0.5, 0.4, 0.2, 0.5, 1.0, 0.9, 0.1, 0.4]

always problematic!!

Page 21: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Getting cp by sampling

• Cp prior ~ Dirichlet process– to have discrete cp values

• Sampling:– Metropolis-Hastings algorithm

Let f(x) be a function that is proportional to the desired probability distribution P(x).1.Initialization:

• Choose an arbitrary point x0 to be the first sample, and choose an arbitrary probability density   which suggests a candidate for the next sample value x, given the previous sample value y. For the Metropolis algorithm, Q must be symmetric; in other words, it must

satisfy  . A usual choice is to let   be a Gaussian distribution centered at y, so that points closer to y are more likely to be visited next—making the sequence of samples into a random walk. The function Q is referred to as the proposal density or jumping distribution.

2.For each iteration t:• Generate a candidate x' for the next sample by picking from the distribution  .• Calculate the acceptance ratio α = f(x')/f(xt), which will be used to decide whether to accept or reject the candidate. Because f is

proportional to the density of P, we have that α = f(x')/f(xt) = P(x')/P(xt).• If α ≥ 1, then the candidate is more likely than xt; automatically accept the candidate by setting xt+1 = x'. Otherwise, accept the candidate

with probability α; if the candidate is rejected, set xt+1 = xt, instead.

Page 22: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

example of cluster

Page 23: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

results (synthetic data)

• accuracy with synthetic data– di ~ Poisson(10,000), t=0.75, 8 clusters with CP~Uniform(0,1), genotype -> total copy number

(1~5),

AB, BB, NZ, TCN, PCN -> genotype prior (goto 17p)

Page 24: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

results (synthetic data)

Page 25: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

prior for mutational genotype

• copy number must be measured– for each mutation site:

• =total copy number• =copy number of each homologous chromosome

• 5 different strategies for assigning genotype– AB prior: gR=AA, gV=AB

– BB prior: gR=AA, gV=BB

– No Zygosity (NZ) prior: gR=AA, c(gV)=, b(gV)=1

– Total Copy Number (TCN) prior: c(gV)=, b(gV) ∈{1... }, • gR=AA or c(gR)=, b(gR)=0

– Parental Copy Number (PCN) prior: c(gV)=, b(gV) ∈{1,}• if b(gV) ∈{}, gR=gN (AA) => mutation occurred before copy number in-

crease• if b(gV)=1, or c(gR)=, b(gR)=0 => mutation occurred after copy number in-

crease

c=4, c1=c2=2

c=3, c1=1, c2=2

Page 26: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

results (real data)

Data = physical mixture of 4 individuals (from 1000 Genomes) {0.01,0.05,0.20,0.74)

- NA12156, NA12878, NA18507, NA19240- generated 7 clusters (unique 4, NA18507+NA19240,

NA12878+NA18507+NA19240, All four shared)

BeBin = Beta Binomial (instead of binomial) to emulate over-dis-persion

Page 27: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

results (real data)

True answer

Pyclone (7 clusters)

naïve (12 clusters)false separation of clusters with homo and hetero

cluster1

Page 28: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

result (ovarian cancer)

Four spatially sampled high-grade serous ovarian cancer -> 49 deeply sequenced validated mutations

LOH

hetero

CNV1~3

IBBMM cluster 1,2,6 should be collapsed to PyClone cluster 1 => single cell sequencing of 25

Page 29: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

result (ovarian cancer)

IBBMM cluster 1, 2 is one cluster (as Pyclone ex-pected)

pyclone clus-ter(yellow box = cluster 1)

IBBMM

non-so-matic

Page 30: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Conclusions• PyClone can infer clonal population structures in cancer

1. Using beta-binomial emission densities, which models data sets with more variance in allelic prevalence measurements more effectively than a binomial model.

2. Flexible prior probability estimates ('priors') of possible muta-tional genotypes are used, reflecting how allelic prevalence measurements are deterministically linked to zygosity and co-incident copy-number variation events.

3. Bayesian nonparametric clustering is used to discover group-ings of mutations and the number of groups simultaneously. This obviates fixing the number of groups a priori and allows for cellular prevalence estimates to reflect uncertainty in this parameter.

4. Multiple samples from the same cancer may be analyzed jointly to leverage the scenario in which clonal populations are shared across samples.

Page 31: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

Software

• Implemented in Python• Freely available in

– http://compbio.bccrc.ca/software/request-to-download/?sw=pyClone

• License: GPL3 (free for academic use)

Page 32: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim
Page 33: Computational Identification of Tumor heterogeneity 2015-03-25 Sangwoo Kim

V-measure