statistical analysis of compartmental models: epidemiology

Statistical analysis of compartmental models:epidemiology, molecular biology, and

everything in between

Vladimir N. Minin

Department of StatisticsUniversity of California, Irvine

BIRS Workshop “Challenges in the Statistical Modeling ofStochastic Processes for the Natural Sciences"

July 2017

joint work with Jason Xu (UW → UCLA), Sam Koelle (UW), Peter Guttorp (UW), Chuanfeng Wu(NIH), Cynthia Dunbar (NIH), Jan Abkowitz (UW), Amanda Koepke (NIST), Ira Longini (UF), BetzHalloran (Fred Hutch and UW), Jon Wakefield (UW), and Jon Fintzi (UW)

funding: NIH R01-AI107034 and NIH U54 GM111274 1

Hematopoiesis HMM driven by a branching process

λνa

µ a

HSC Progenitor

µ 5

µ 3

µ 2

µ 1

Mo

T

B

Gr

NK

µ 4

ν 1

ν 2

ν3

ν 4

ν 5

I Latent process: each barcode lineage evolves as a multi-typebranching process X(t) whose components are counts of eachcell type.

I Observation process: multivariate hypergeometric distributionY ∼ mvhypgeo(X) — driven by experimental design.

I Read data: read counts Y are proportional to Y with unknownamplification constant.

I Likelihood is intractable due to massive size of the state space ofthe branching process.

2

Systems (molecular) biologynature.com Publications A-Z index Browse by subject

Nature Protocols ISSN 1754-2189 EISSN 1750-2799

© 2014 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

partner of AGORA, HINARI, OARE, INASP, ORCID, CrossRef, COUNTER and COPE

Login Register Cart

Figure 2: Models and data.FromA framework for parameter estimation and model selection from experimental data in systems biology usingapproximate Bayesian computationJuliane Liepe, Paul Kirk, Sarah Filippi, Tina Toni, Chris P Barnes & Michael P H StumpfNature Protocols 9, 439–456 (2014) doi:10.1038/nprot.2014.025Published online 23 January 2014

(a–e) The full mRNA self-regulation model is shown in a. mRNA (m) produces protein P1, which can be transformed into protein P2. P1 is required to producemRNA, whereas P2 degrades mRNA into the empty set (–). P1 and P2 can also be degraded. The reactions that occur according to this model are shown in b.Fitting of the model to the data (c), which comprise mRNA measurements over time. The second model (d) is based on the first model, but it does not containprotein P2. The relevant reactions are shown in e. 3

Case study: cholera in BangladeshGoal: To understand the dynamics of endemic cholera in Bangladeshand to develop a model that will be able to predict outbreaks severalweeks in advance.

I Specifically, to understand how the disease dynamics are relatedto environmental covariates.

Bakerganj

Chhatak

Chaugachha Matlab

Mathbaria

4

Mathbaria cholera incidence data and covariates

Jan

04A

pr 0

4Ju

l 04

Oct

04

Jan

05A

pr 0

5Ju

l 05

Oct

05

Jan

06A

pr 0

6Ju

l 06

Oct

06

Jan

07A

pr 0

7Ju

l 07

Oct

07

Jan

08A

pr 0

8Ju

l 08

Oct

08

Jan

09A

pr 0

9Ju

l 09

Oct

09

Jan

10A

pr 1

0Ju

l 10

Oct

10

Jan

11A

pr 1

1Ju

l 11

Oct

11

Jan

12A

pr 1

2Ju

l 12

Oct

12

Jan

13A

pr 1

3Ju

l 13

0

2

4

6

8

10N

umbe

r C

hole

ra C

ases

in M

athb

aria

−2

−1

01

23

Sta

ndar

dize

d C

ovar

iate

s

Case CountsWater DepthWater Temperature

I Covariates are the smoothed standardized daily valuesCWT (i) = (WT (i)−WT )/sWT , CWD(i) = (WD(i)−WD)/sWD.

I We could use a phenomenological model (e.g., Poissonregression), but this would not tell us anything about the numberof infected people, and disease transmission parameters.

5

Deterministic SIR-like models

Easy system of nonlinear differential equations. Models meanbehavior of a stochastic system.

Problem: Poorly mimics stochastic dynamics when one of thecompartments is low. Not clear how to model noisy data.Solution: Use a real stochastic model (e.g., continuous-time Markovchain or SDE).

6

SIRS model

S, I, RS+1, I, R-1 S, I-1, R+1

S-1, I+1, R

I S, I, and R denote the numbers of susceptible, infectious, andrecovered individuals at time t ; N = S + I + R

I β represents the infectious contact rate,α(t) = αAi = exp [α0 + α1CWD(i − κ) + α2CWT (i − κ)] representsthe time-varying environmental force of infection, γ is therecovery rate, and µ is the rate at which immunity is lost.

7

Hidden SIRS model

St0 It0

Rt0

St1

It1

Rt1

Stn

Itn

Rtn

Xt0 Xt1

Xtn

yt0 yt1

ytn

I Xt only indirectly observed through yt ⇒hidden Markov model (HMM)

I yt ∼ Binomial(It , ρ)

I ρ depends on the number ofsymptomatic infectious individuals thatseek treatment

We are interested in the posterior distribution

Pr(θ|y) ∝ Pr(y |θ)Pr(θ),

where y = (yt0 , . . . , ytn), θ = (β, γ, ρ, α0, . . . , αk ), and

Pr(y|θ)=∑

X

(n∏

i=0

Pr(yti |Iti , ρ)

[Pr(X t0 |φS, φI)

n∏i=1

p(X ti |X ti−1 ,θ)

]).

Problem: This likelihood is intractable, because the state space of Xtis too large even for moderately high population size N.

8

Estimation resultsPosterior medians and 95% CIs for the parameters of the SIRSmodel.

Coefficient Estimate 95% CIsβ × N 0.491 (0.103, 0.945)

γ 0.115 (0.096, 0.142)(β × N)/γ 4.35 (0.99 , 7.15)

α0 -5.32 (-6.63 , -4.51)α1 -1.37 (-1.98 , -0.98)α2 2.18 (1.8 , 2.62)

ρ× N 55.8 (43.4 , 73.5)

Recall:I N is the population size (set artificially to 10,000).I β is the infectious contact rate.I γ is the recovery rate.I ρ is the reporting rate.I α(t) = αAi = exp [α0 + α1CWD(i − κ) + α2CWT (i − κ)]

9

Prediction results

Mar 12 Apr 12 May 12 Jun 12 Jul 12 Aug 12 Sep 12 Oct 12 Nov 12 Dec 12 Jan 13 Feb 13 Mar 13 Apr 13 May 13

01

23

45

67

89

Cou

nts

0.0

0.2

0.4

0.6

0.8

1.0

Pos

terio

r P

roba

bilit

y

Mar 12 Apr 12 May 12 Jun 12 Jul 12 Aug 12 Sep 12 Oct 12 Nov 12 Dec 12 Jan 13 Feb 13 Mar 13 Apr 13 May 13

00.

050.

10.

15S

usce

ptib

le F

ract

ion

Time (Day)

00.

020.

040.

06

Infe

cted

Fra

ctio

n

SusceptibleInfected

10

What models cause all this trouble?I Too broad of an answer: partially observed stochastic processes

I Too narrow of an answer: intractable HMMs

I Mouthful, but good enough answer: Markov compartmentalmodels, with a subset of compartments are observed over timewith noise

11

Statistical inference options

I “Exact" likelihood-based methods:– Bayesian inference with particle MCMC (Koepke et al.)

– Maximum likelihood inference with particle MCMC (Ionides et al.)

I “Exact" summary statistics based methods:– Approximate Bayesian Computation (Liepe et al.)

– Quasi- and pseudo-maximum likelihood estimators and generalizedmethods of moments (Chen and Hyrien, Xu et al.)

I Likelihood approximations:– Trajectory matching with iid normal error

– Pretending some compartments change slowly (diffusion, branchingprocesses)

– Emulating with GPs to reduce costly likelihood evaluations (Jandarov et al.)

– Linear noise approximation — makes Markov transition densities Gaussian,with complicated mean and covariance functions.

12

Open problems and glimpses of possible solutions

I Open problem:

– Computationally stable inference procedure that can handle at least 10-100compartments

I Current lines of attack:

– Ho et al. use clever re-parameterization to develop a new deterministicalgorithm for computing SIR (and more complex) transition densities“exactly." Doesn’t solve all the problems, because not all compartments areobserved perfectly.

– Approximate Bayesian computation and/or indirect inference. But modelcomparison is tough!

– New data augmentation methods with block updates of latent variables.

13

References

I Xu J, Koelle S, Guttorp G, Wu C, Dunbar CE, Abkowitz JL, Minin VN. Statistical inference inpartially observed stochastic compartmental models with application to cell lineage trackingof in vivo hematopoiesis, https://arxiv.org/abs/1610.07550.

I Koepke AA, Longini, IM, Halloran ME, Wakefield J, Minin VN. Predictive modeling of choleraoutbreaks in Bangladesh, Annals of Applied Statistics, 10, 575–595, 2016, arXiv:1402.0536[stat.AP].

I Fintzi J, Wakefield J, Minin VN. Efficient data augmentation for fitting stochastic epidemicmodels to prevalence data, arXiv:1606.07995 [stat.CO].

I Ho LST, Xu J, Crawford FW, Minin VN, Suchard, MA. Birth(death)/birth-death processes andtheir computable transition probabilities with statistical applications, arXiv:1603.03819[stat.CO].

I http://www.stat.osu.edu/ pfc/BIRS/

14

statistical analysis of compartmental models: epidemiology

Documents