conditional expectations of sufficient statistics for...
TRANSCRIPT
Paula Tataru
Conditional expectations of sufficient statistics for continuous time Markov chains
Joint work with Asger Hobolth
November 28, 2012
Motivation
Methods
Results
Paula Tataru
2/30
Conditional expectations of sufficient statistics for CTMCs
Continuous-time Markov chains
A stochastic process {X(t) | t ≥ 0}
that fulfills the Markov property
Motivation
Methods
Results
Paula Tataru
3/30
Conditional expectations of sufficient statistics for CTMCs
Continuous-time Markov chains
• Credit risk
• States: different types of ratings
• Chemical reactions
• States: different states of a molecule
• DNA data
• States:• Nucleotides• Amino acids• Codons
Motivation
Methods
Results
Paula Tataru
4/30
Conditional expectations of sufficient statistics for CTMCs
Modeling DNA data
1st 2nd base 3rdbase T C A G base
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTTC TCC TAC TGC CTTA
Leu
TCA TAA Stop TGA Stop ATTG TCG TAG Stop TGG Trp G
C
CTT CCT
Pro
CATHis
CGT
Arg
TCTC CCC CAC CGC CCTA CCA CAA
GlnCGA A
CTG CCG CAG CGG G
A
ATTIle
ACT
Thr
AATAsn
AGTSer
TATC ACC AAC AGC CATA ACA AAA
LysAGA
ArgA
ATG Met ACG AAG AGG G
G
GTT
Val
GCT
Ala
GATAsp
GGT
Gly
TGTC GCC GAC GGC CGTA GCA GAA
GluGGA A
GTG GCG GAG GGG G
Motivation
Methods
Results
Paula Tataru
5/30
Conditional expectations of sufficient statistics for CTMCs
Modeling DNA data
• Nucleotides (n = 4)
• Amino acids (n = 20)
• Codons (n = 64)
Motivation
Methods
Results
Paula Tataru
6/30
Conditional expectations of sufficient statistics for CTMCs
Continuous-time Markov chains
• Characterized by a rate matrix
with properties
Motivation
Methods
Results
Paula Tataru
7/30
Conditional expectations of sufficient statistics for CTMCs
Modeling DNA data
• Nucleotides (n = 4)
• Jukes Cantor (JC)
Motivation
Methods
Results
Paula Tataru
8/30
Conditional expectations of sufficient statistics for CTMCs
Continuous-time Markov chains
• Waiting time is exponentially distributed
• Jumps are discretely distributed
Motivation
Methods
Results
Paula Tataru
9/30
Conditional expectations of sufficient statistics for CTMCs
General EM
Motivation
Methods
Results
Paula Tataru
10/30
Conditional expectations of sufficient statistics for CTMCs
General EM
• Full log likelihood
• E-step
• M-step
Motivation
Methods
Results
Paula Tataru
11/30
Conditional expectations of sufficient statistics for CTMCs
EM for Jukes Cantor
• Full log likelihood
• M-step
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
12/30
Calculating expectations
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
13/30
Linear combinations
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
14/30
Linear combinations
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
15/30
Linear combinations
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
16/30
Methods
• In Hobolth&Jensen (2010), three methods are presented to evaluate the necessary statistics
• Eigenvalue decomposition (EVD)
• Uniformization (UNI)
• Exponentiation (EXPM)
• Extend efficiently to linear combinations
• Which method is more accurate?
• Which method is faster?
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
17/30
Eigenvalue decomposition
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
18/30
Uniformization
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
19/30
Uniformization
• Choose truncation point s(μt)
• Bound error using the tail of Pois(m+1; μt)
• s(μt) = 4 + 6 √μt + μt
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
20/30
Exponentiation
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
21/30
Algorithms
EVD UNI EXPM*
1 A
Order
2 J(t)
Order
3 (C; t)Σ (C; t)Σ (C; t)Σ
Order
Influenced by Q
, U, UΛ -1 μ, s(μt), R
O(n3) O(n2) O(n2)
Rm eAt
O(n2) O(s(μt)n3) O(n3)
O(n3) O(s(μt)n2) O(n2)
μt —
* expm package in R
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
22/30
Accuracy
• Use two models for which ∑(C; t) is available in analytical form
• JC, varying n
• HKY, varying t
• Measure accuracy as the normalized difference: approximation / true - 1
• as a function of n
• as a function of t
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
23/30
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
24/30
Speed
EVD UNI EXPM*
1 A
Order
2 J(t)
Order
3 (C; t)Σ (C; t)Σ (C; t)Σ
Order
Influenced by Q
, U, UΛ -1 μ, s(μt), R
O(n3) O(n2) O(n2)
Rm eAt
O(n2) O(s(μt)n3) O(n3)
O(n3) O(s(μt)n2) O(n2)
μt —
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
25/30
Speed
• Partition of computation
• Precomputation• EVD• UNI
• Main computation
• Use three models and different time points to asses the speed
• GY, n = 61
• GTR, n = 4
• UNR, n = 4
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
26/30
Speed
• 10 equidistant time points
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
27/30
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
28/30
Results
• Accuracy
• Similar accuracy
• EXPM is the most accurate one
• Speed
• EXPM is the slowest
• EVD vs UNI
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
29/30
Results
• UNI has potential
• Better cut off point s(μt)
• Ameliorate effect of μt by
• Adaptive uniformization
• On-the-fly variant of adaptive uniformization
Thank you!
Comparison of methods for calculating conditional expectations of sufficient statistics
for continuous time Markov chains
BMC Bioinformatics 12(1):465, 2011