reconstructing mutation histories from single-cell dataen)/zif/kg/2016gene... · casp3 dummy2...
Post on 28-Jan-2021
2 Views
Preview:
TRANSCRIPT
-
Katharina Jahn, ETH Zurich, Basel With Jack Kuipers and Niko Beerenwinkel
September 12, 2016
Reconstructing mutation histories from single-cell data
-
2
Intra-tumour heterogeneity
Heterogeneous tumour Clonal expansion Mutation tree
time
-
§ Bulk sequencing data § Mixture of hundred thousands of cells § Deconvolution of admixed mutation profiles § Limited resolution: no low-frequency subclones, limited #subclones § Most data is of this type
§ Single-cell sequencing data § No deconvolution necessary § Higher error-rates § Subsampling § Few data sets available
3
Why single-cell data?
-
§ Infinite sites assumption: no recurrent mutations, no
backmutations 4
Single-cell phylogenies
cell lineage tree mutation tree mutation tree with samples attached
-
§ Mutation matrix: binary character state matrix
§ True matrix E forms perfect phylogeny
§ Observed matrix D contains noise
§ FN rate between 10% and 45%
5
Single-cell mutation matrices
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 0 0 0 0
0 0 0 0 1 1 1
0 0 0 0 1 1 0
cells
mutations
1 1 - 1 1 - 1
1 1 1 1 0 1 1
1 1 1 0 0 0 0
1 0 0 0 1 1 1
0 0 - 0 0 1 0
cellsmutations
D
E
-
6
Error model
-
§ Model (T, σ , θ) § Mutation tree T § Attachment of samples σ § Error rates θ=(α,β)
§ Basic assumptions § Infinite sites § Independence of observational errors
7
Model for Learning Mutation Histories
s1s2
s3
s4
s5 s6
s7
R
M1 M2
M3
-
§ Given mutation matrix D for n mutations and m samples
§ Likelihood
§ Posterior
8
Model for Learning Mutation Histories
-
9
Marginalization of sample attachment
-
10
Marginalization of sample attachment
O(mn)
-
§ For n mutations and m samples
§ After marginalization
§ Independent of number of samples
11
Search Space Size
-
§ Moves in joint (T,θ) space:
§ Transition probability:
§ Acceptance probability:
§ Ergodic mixture of moves
§ Markov chain converges to posterior distribution 12
MCMC Scheme
-
§ Change (T,θ) component wise § Tree-moves: e. g. prune & reattach
§ θ-moves: Gaussian random walk 13
MCMC moves
-
§ Maximum a posteriori
§ Maximum likelihood
14
Point estimates
-
§ Current datasets: often few samples, high error rates
§ Flat posterior, global optimum hard to find
§ #mutations > # samples
§ Idea: Use binary cell lineage trees
§
15
Alternative tree representation
#binary leaf-labeled trees with m leafs
#mutation trees with n mutations
-
§ For ML trees only § Tree scoring still in O(mn)
16
Alternative tree representation
M2
M11
M1
M3
M4
M5
M6
M7
M8
M9
M10
s1
s2 s3
s4
s5
M1
M2 M3 M4
M9M5 M6 M7 M8
s1 s2 s3 s4
M10 M11
s5
61 billion trees
180 trees
-
§ Accuracy vs. #samples § ML trees § 40 mutations § False positive rate α = 10-5
§ Accuracy vs. missing data § ML trees § 20 mutations § False negative rate β = 0.1
17
Evaluation of SCITE on simulated data
#samples
-
§ 20 mutations, ML trees
Effect of doublet samples
-
§ SCITE, Jahn, Kuipers et al., 2016 § KS, Kim & Simon, 2014 § BP, BitPhylogeny, Yuan et al., 2015
§ PW, PhyloWGS, Deshwar et al., 2015 § AT, AncesTree. El-Kebir et al., 2015
Comparison to previous approaches
Δd = normalized consensus node-based shortest path distance (Yuan et al. 2015)
n=20
-
§ Wang et al, Nature 2014 § nuc-seq of 47 cells § 40 mutations § 1.4% missing data § α = 1.24× 10‒6 § β = 0.097
20
ER+ breast tumor
-
CASP3
dummy2
PIK3CA
PANK3 FCHSD2DNM3
PPP2RE
dummy6
FBN2
dummy7 PRDM9 s19s33 s46
dummy0
dummy1 s44
dummym1
LSG1ITGAD
DCAF8L1 BTLA
TRIM58
dummy3 s41
MARCH11 DUSP12 TCP11
dummy4
dummy5 s43
PITRM1 ROPN1B
s40 s45
SEC11A MUTHYGPR64
dummy8
PLXNA2 RABGAP1LCALD1 s15
dummy9
CXXX1 TECTA
dummy10
s34
CABP2 DKEZ H1ENT GLCE s3 s4s12 s21
TRIB2 c1orf223 C15orf23 s1 s2 s5 s7 s9 s10 s11 s13 s18s24 s37
s6 s16 s17 s35
ZEHX4 s32
s23 s26
s14 s30
KIAA1539 s36
FGFR2
CNDP1 s25
s0 s29
s42
FUBP3 ZNE318
s20
s22
WDR16 s28
s8s27
s31 s38
s39
§ ML tree
21
ER+ breast tumor
-
§ Sampling from posterior: false negative rate
§ Mean β more than twice the rate estimated by Wang et al. § False negative rate > allelic drop-out rate
22
ER+ breast tumor
-
§ Sampling from posterior: branchiness
23
ER+ breast tumor
-
§ Hou et al., Cell 2012 § WES of 58 cancer cells § 18 selected mutations
§ 45% missing data § α = 6.04 × 10‒5, β = 0.43
24
Myeloproliferative neoplasm
-
§ MAP tree
§ Sampling from posterior distribution: branchiness
25
MAP
Myeloproliferative neoplasm
DNAJC17
ABCB5
SESN2
PDE4DIP
DLEC1
NTRK1
DMXL1
TOP1MT
ST13
ANAPC1
ARHGAP5
ASNS
MLL3
FAM115C
RETSAT
USP32
FRG1
PABPC1
-
§ Sampling from posterior distribution: false negative rate
26
MAP
Myeloproliferative neoplasm
-
27
Myeloproliferative neoplasm
§ ML tree from larger set of mutations § 78 mutations, 58 samples § Search performed in binary tree
representation § Same overall structure § Order changes a bit § But determined by few samples
-
§ SCITE: Single-cell based inference of tumor evolution https://github.com/cbg-ethz/SCITE
§ Genome Biology 2016 17:86 (Special Issue: Single-Cell Omics) § Robust against various types of noise § Posterior computation scales linearly with #samples § Search space size independent from #samples § Many mutations, few cells? SCITE on binary phylogenies § Observation: no branchings in upper part of tree
28
Conclusion
-
§ Testing infinite sites assumption (Jack’s talk) § Connect with spatial information (Mykola’s talk) § Modeling of doublets (Jack’s talk) § Joint use of bulk and single-cell data § Use of variant allele frequencies as data § Integration of copy number changes
29
Outlook
-
§ Jack Kuipers § Niko Beerenwinkel
30
Acknowledgements
Thank you for your attention!
-
Comparison to previous approaches
Δd = normalized consensus node-based shortest path distance (Yuan et al. 2015)
Modified from https://scientificbsides.files.wordpress.com/2015/02/comparingclonaltrees-idea2.png?w=1500&h=1163
top related