waterloo glmm talk

40

Upload: ben-bolker

Post on 21-Jun-2015

363 views

Category:

Science


3 download

DESCRIPTION

Generalized linear mixed models for quantitative biology at GLMM

TRANSCRIPT

Page 1: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Generalized linear mixed models

Ben Bolker

McMaster University, Mathematics & Statistics and Biology

26 September 2014

Page 2: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Acknowledgments

lme4: Doug Bates, MartinMächler, Steve Walker

Data: Josh Banta, Adrian Stier,Sea McKeon, David Julian,Jada-Simone White

NSERC (Discovery)

SHARCnet

Page 3: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Page 4: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Page 5: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

combinations of categorical and continuous predictors,and interactions

(some) non-Normal responses(e.g. binomial, Poisson, and extensions)

(some) nonlinearity(e.g. logistic, exponential, hyperbolic)

non-independent (grouped) data

Page 6: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

combinations of categorical and continuous predictors,and interactions

(some) non-Normal responses(e.g. binomial, Poisson, and extensions)

(some) nonlinearity(e.g. logistic, exponential, hyperbolic)

non-independent (grouped) data

Page 7: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

combinations of categorical and continuous predictors,and interactions

(some) non-Normal responses(e.g. binomial, Poisson, and extensions)

(some) nonlinearity(e.g. logistic, exponential, hyperbolic)

non-independent (grouped) data

Page 8: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

least−squaresnonlinear

generalizedlinear models

} correlation

smooth

nonlinearityscaled

variance

effectsrandom

} (non−normal errors)

nonlineartime seriesmodels

thresholds;mixtures;compound distributions;etc. etc. etc. etc.

nonlinearity

effects

randomnonlinearity

correlation

(nonlinearity)

generallinear models

logistic regressionbinomial regressionlog−linear models

linear regressionANOVAanalysis of covariancemultiple linear regression

(non−normal errors)(nonlinearity)

GLMM

mixed models

repeated−measurestime series (ARIMA)models;

generalizedadditivemodels

quasilikelihood negativebinomial models

Page 9: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Coral protection from seastars (Culcita)by symbionts (McKeon et al., 2012)

none shrimp crabs both

Number of predation events

Symbionts

Num

ber

of b

lock

s

0

2

4

6

8

10

1

2

0

1

2

0

2

0

1

2

Page 10: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Environmental stress: Glycera cell survival(D. Julian unpubl.)

H2S

Cop

per

0

33.3

66.6

133.3

0 0.03 0.1 0.32

Osm=12.8Normoxia

Osm=22.4Normoxia

0 0.03 0.1 0.32

Osm=32Normoxia

Osm=41.6Normoxia

0 0.03 0.1 0.32

Osm=51.2Normoxia

Osm=12.8Anoxia

0 0.03 0.1 0.32

Osm=22.4Anoxia

Osm=32Anoxia

0 0.03 0.1 0.32

Osm=41.6Anoxia

0

33.3

66.6

133.3

Osm=51.2Anoxia

0.0

0.2

0.4

0.6

0.8

1.0

Page 11: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Arabidopsis response to fertilization & herbivory(Banta et al., 2010)

Log(

1+fr

uit s

et)

0

1

2

3

4

5

unclipped clipped

●●●●●●

●●

●●

●●

●●

●●●

●●● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●● ●● ●

●●

●●

●●

●● ●

●●

● ●●● ●●● ●●● ●●● ●● ●● ●● ●

● ●● ●●● ●●●●

●●

● ●●

●●●●●● ●●

●● ●

● ●

: nutrient 1

unclipped clipped

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●●● ●● ●●● ●●●● ●

●●

●●

●●●●●●● ●●● ●●

●●

●●●● ●●●●●●●

●●

: nutrient 8

Page 12: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Coral demography(J.-S. White unpubl.)

Before Experimental

●● ●

●●●

●●

●● ●

●●●

● ●●

●●●

●● ●● ●

●●

●● ●●

● ●●● ●

●● ●● ●●●●● ●

●●

● ●● ●

●● ●

● ●

●●●

●●●●

● ●●

●●

● ●

● ●● ●●

●●●

● ●●

●●

●●●

●●●

●●● ●●● ●●

●●●●

● ● ●●●

●●

● ●●●● ●●

●●●

●●● ●

●● ●●

●●●

●●

●● ●●

●●● ●●●●

●●● ●●

●● ●

● ●

●●0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50 0 10 20 30 40 50Previous size (cm)

Mor

talit

y pr

obab

ility

Treatment

Present

Removed

Page 13: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Technical de�nition

Yi︸︷︷︸response

conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸

inverselink

function

, φ︸︷︷︸scale

parameter

)

Page 14: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Technical de�nition

Yi︸︷︷︸response

conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸

inverselink

function

, φ︸︷︷︸scale

parameter

)

η︸︷︷︸linear

predictor

= Xβ︸︷︷︸�xede�ects

+ Zb︸︷︷︸randome�ects

Page 15: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Technical de�nition

Yi︸︷︷︸response

conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸

inverselink

function

, φ︸︷︷︸scale

parameter

)

η︸︷︷︸linear

predictor

= Xβ︸︷︷︸�xede�ects

+ Zb︸︷︷︸randome�ects

b︸︷︷︸conditionalmodes

∼ MVN(0, Σ(θ)︸ ︷︷ ︸variance-covariancematrix

)

Page 16: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

Page 17: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

Page 18: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

Page 19: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

Page 20: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

Page 21: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

Page 22: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Page 23: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Maximum likelihood estimation

Best �t is a compromise between two components(consistency of data with �xed e�ects and conditional modes;consistency of random e�ect with RE distribution)

Goodness-of-�t integrates over conditional modes

●●

−2

−1

0

1

2

1 2 3 4 5f

y

Page 24: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Shrinkage: Arabidopsis conditional modes

● ●

●●

● ● ●●

● ● ● ●● ● ● ●

● ● ●●

Genotype

Mea

n fr

uit s

et

0 5 10 15 20 25

0

0.1

1

1020

● group meanshrinkage est.

Page 25: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Estimation methods

deterministic : various approximate integrals (Breslow, 2004)

Penalized quasi-likelihood, Laplace,Gauss-Hermite quadrature, . . .�exibility and speed vs. accuracy

. . .

stochastic (Monte Carlo): frequentist and Bayesian (Booth andHobert, 1999; Ponciano et al., 2009; Sung, 2007)

usually slower but �exible and accurate

Page 26: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Estimation: Culcita (McKeon et al., 2012)

Log−odds of predation−6 −4 −2 0 2

Symbiont

Crab vs. Shrimp

Added symbiont

GLM (fixed)GLM (pooled)PQLLaplaceAGQ

Page 27: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Page 28: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Wald tests

typical results of summary

exact for ANOVA, regression:approximation for GLM(M)s

fast

approximation is sometimesawful (Hauck-Donner e�ect) parameter

log−

likel

ihoo

d

Page 29: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

2D pro�les for Culcita data

Scatter Plot Matrix

.sig01

2 4 6 8 101214

−3−2−1

0

(Intercept)

0

5

10

1510 15

0 1 2 3

tttcrabs

−10−8−6−4−20

−4 −2 0

0 1 2 3

tttshrimp

−10−8−6−4−2 −6 −4 −2

0 1 2 3

tttboth

−12−10−8−6−4−2

0 1 2 3

Page 30: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Likelihood ratio tests

better than Wald, but still have to two problems:

�denominator degrees of freedom� (when estimating scale)for GLMMs, distributions are approximate anyway (Bartlettcorrections)Kenward-Roger correction? (Stroup, 2014)

Pro�le con�dence intervals: expensive/fragile

Page 31: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Parametric bootstrapping

�t null model to data

simulate �data� from null model

�t null and working model, compute likelihood di�erence

repeat to estimate null distribution

should be OK but ??? not well tested(assumes estimated parameters are �su�ciently� good)

Page 32: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Parametric bootstrap results

True p value

Infe

rred

p v

alue

0.020.040.060.08

0.02 0.06

Osm Cu

H2S

0.02 0.06

0.020.040.060.08

Anoxia

normalt(14)t(7)

Page 33: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Bayesian inference

If we have a good sample from the posterior distribution(Markov chains have converged etc. etc.) we get most of theinferences we want for free by summarizing the marginalposteriors

post hoc Bayesian can work, but mode at zero causes problems

Page 34: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Culcita con�dence intervals

●●●

●●●

●●●

●●●

●●●

block

(Intercept)

tttboth

tttcrabs

tttshrimp

−15 −10 −5 0 5 10 15Effect (log−odds of predation)

CI

Wald

profile

boot

MCMC

fun

● glmer

glmmADMB

MCMCglmm

Page 35: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Page 36: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

On beyond R

Julia: MixedModels package

SAS: PROC MIXED, NLMIXED

AS-REML

Stata (GLLAMM, xtmelogit)

AD Model Builder

HLM, MLWiN

Page 37: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Challenges

Small/medium data: inference, singular �ts (blme, MCMCglmm)

Big data: speed!

Worst case: large n, small N (e.g. telemetry/genomics)

Model diagnosis

Con�dence intervals accounting for uncertainty in variances

See also: http://rpubs.com/bbolker/glmmchapter, https://groups.nceas.ucsb.edu/non-linear-modeling/projects

Page 38: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Spatial and temporal correlations

Sometimes blocking takes care of non-independence ...

but sometimes there is temporal or spatial correlation within

blocks

. . . also phylogenetic . . . (Ives and Zhu, 2006)

�G-side� vs. �R-side� e�ects

tricky to implement for GLMMs,but new possibilities on the horizon (Rousset and Ferdy, 2014;Rue et al., 2009)

Page 39: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Next steps

Complex random e�ects:regularization, model selection, penalized methods(lasso/fence)

Flexible correlation and variance structures

Flexible/nonparametric random e�ects distributions

hybrid & improved MCMC methods

Reliable assessment of out-of-sample performance

Page 40: Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

References

Banta, J.A., Stevens, M.H.H., and Pigliucci, M., 2010. Oikos, 119(2):359�369. ISSN 1600-0706.doi:10.1111/j.1600-0706.2009.17726.x.

Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265�285.doi:10.1111/1467-9868.00176.

Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattlesymposium in biostatistics: Analysis of correlated data, pages 1�22. Springer. ISBN 0387208623.

Ives, A.R. and Zhu, J., 2006. Ecological Applications, 16(1):20�32.

McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):1095�1103. ISSN 0029-8549.doi:10.1007/s00442-012-2275-2.

Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356�362. ISSN 0012-9658.

Rousset, F. and Ferdy, J.B., 2014. Ecography, page no�no. ISSN 1600-0587. doi:10.1111/ecog.00566.

Rue, H., Martino, S., and Chopin, N., 2009. Journal of the Royal Statistical Society, Series B,71(2):319�392.

Stroup, W.W., 2014. Agronomy Journal, 106:1�17. doi:10.2134/agronj2013.0342.

Sung, Y.J., 2007. The Annals of Statistics, 35(3):990�1011. ISSN 0090-5364.doi:10.1214/009053606000001389.