conditional expectations of sufficient statistics for...

30
Paula Tataru Conditional expectations of sufficient statistics for continuous time Markov chains Joint work with Asger Hobolth November 28, 2012

Upload: others

Post on 06-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Paula Tataru

Conditional expectations of sufficient statistics for continuous time Markov chains

Joint work with Asger Hobolth

November 28, 2012

Page 2: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

2/30

Conditional expectations of sufficient statistics for CTMCs

Continuous-time Markov chains

A stochastic process {X(t) | t ≥ 0}

that fulfills the Markov property

Page 3: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

3/30

Conditional expectations of sufficient statistics for CTMCs

Continuous-time Markov chains

• Credit risk

• States: different types of ratings

• Chemical reactions

• States: different states of a molecule

• DNA data

• States:• Nucleotides• Amino acids• Codons

Page 4: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

4/30

Conditional expectations of sufficient statistics for CTMCs

Modeling DNA data

1st 2nd base 3rdbase T C A G base

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTTC TCC TAC TGC CTTA

Leu

TCA TAA Stop TGA Stop ATTG TCG TAG Stop TGG Trp G

C

CTT CCT

Pro

CATHis

CGT

Arg

TCTC CCC CAC CGC CCTA CCA CAA

GlnCGA A

CTG CCG CAG CGG G

A

ATTIle

ACT

Thr

AATAsn

AGTSer

TATC ACC AAC AGC CATA ACA AAA

LysAGA

ArgA

ATG Met ACG AAG AGG G

G

GTT

Val

GCT

Ala

GATAsp

GGT

Gly

TGTC GCC GAC GGC CGTA GCA GAA

GluGGA A

GTG GCG GAG GGG G

Page 5: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

5/30

Conditional expectations of sufficient statistics for CTMCs

Modeling DNA data

• Nucleotides (n = 4)

• Amino acids (n = 20)

• Codons (n = 64)

Page 6: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

6/30

Conditional expectations of sufficient statistics for CTMCs

Continuous-time Markov chains

• Characterized by a rate matrix

with properties

Page 7: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

7/30

Conditional expectations of sufficient statistics for CTMCs

Modeling DNA data

• Nucleotides (n = 4)

• Jukes Cantor (JC)

Page 8: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

8/30

Conditional expectations of sufficient statistics for CTMCs

Continuous-time Markov chains

• Waiting time is exponentially distributed

• Jumps are discretely distributed

Page 9: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

9/30

Conditional expectations of sufficient statistics for CTMCs

General EM

Page 10: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

10/30

Conditional expectations of sufficient statistics for CTMCs

General EM

• Full log likelihood

• E-step

• M-step

Page 11: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

11/30

Conditional expectations of sufficient statistics for CTMCs

EM for Jukes Cantor

• Full log likelihood

• M-step

Page 12: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

12/30

Calculating expectations

Page 13: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

13/30

Linear combinations

Page 14: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

14/30

Linear combinations

Page 15: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

15/30

Linear combinations

Page 16: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

16/30

Methods

• In Hobolth&Jensen (2010), three methods are presented to evaluate the necessary statistics

• Eigenvalue decomposition (EVD)

• Uniformization (UNI)

• Exponentiation (EXPM)

• Extend efficiently to linear combinations

• Which method is more accurate?

• Which method is faster?

Page 17: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

17/30

Eigenvalue decomposition

Page 18: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

18/30

Uniformization

Page 19: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

19/30

Uniformization

• Choose truncation point s(μt)

• Bound error using the tail of Pois(m+1; μt)

• s(μt) = 4 + 6 √μt + μt

Page 20: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

20/30

Exponentiation

Page 21: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

21/30

Algorithms

EVD UNI EXPM*

1 A

Order

2 J(t)

Order

3 (C; t)Σ (C; t)Σ (C; t)Σ

Order

Influenced by Q

, U, UΛ -1 μ, s(μt), R

O(n3) O(n2) O(n2)

Rm eAt

O(n2) O(s(μt)n3) O(n3)

O(n3) O(s(μt)n2) O(n2)

μt —

* expm package in R

Page 22: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

22/30

Accuracy

• Use two models for which ∑(C; t) is available in analytical form

• JC, varying n

• HKY, varying t

• Measure accuracy as the normalized difference: approximation / true - 1

• as a function of n

• as a function of t

Page 23: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

23/30

Page 24: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

24/30

Speed

EVD UNI EXPM*

1 A

Order

2 J(t)

Order

3 (C; t)Σ (C; t)Σ (C; t)Σ

Order

Influenced by Q

, U, UΛ -1 μ, s(μt), R

O(n3) O(n2) O(n2)

Rm eAt

O(n2) O(s(μt)n3) O(n3)

O(n3) O(s(μt)n2) O(n2)

μt —

Page 25: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

25/30

Speed

• Partition of computation

• Precomputation• EVD• UNI

• Main computation

• Use three models and different time points to asses the speed

• GY, n = 61

• GTR, n = 4

• UNR, n = 4

Page 26: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

26/30

Speed

• 10 equidistant time points

Page 27: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

27/30

Page 28: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

28/30

Results

• Accuracy

• Similar accuracy

• EXPM is the most accurate one

• Speed

• EXPM is the slowest

• EVD vs UNI

Page 29: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Motivation

Methods

Results

Paula Tataru

Conditional expectations of sufficient statistics for CTMCs

29/30

Results

• UNI has potential

• Better cut off point s(μt)

• Ameliorate effect of μt by

• Adaptive uniformization

• On-the-fly variant of adaptive uniformization

Page 30: Conditional expectations of sufficient statistics for …pure.au.dk/portal/files/51249407/PaulaTataru.pdfMotivation Methods Results Paula Tataru 3/30 Conditional expectations of sufficient

Thank you!

Comparison of methods for calculating conditional expectations of sufficient statistics

for continuous time Markov chains

BMC Bioinformatics 12(1):465, 2011