dr. matteo tanadini [email protected] herbst ... · julian j faraway. extending the...

Angewandte statistische Regression I

Dr. Matteo [email protected]

Herbst Semester 2019 (ETHZ)

7. Vorlesung Angewandte statistische Regression I 1 / 37

Outline

1 Introduction

2 Designs

3 Get to know data

4 Fitting a Mixed Effects Model with lme4

5 Checking the model assumptions

6 Inference procedure

7 Concluding remarks


Section 1

Introduction


What is a Mixed Model?

Definition: a statistical model that contains both random and fixedeffects.

Example: a clinical trial where a new drug is tested along with the goldstandard.

Usually fixed effects are predictors of interest, for which we want to drawconclusions (e.g. drug).

Random effects are predictors that need to be considered in the analysis(design variables e.g. patient), but are usually not of primary interest1.

1Nevertheless, we may be interested in their variability.7. Vorlesung Angewandte statistische Regression I 4 / 37

What is a Mixed Model?

It comes natural to use Mixed Effect Models when “grouped data“ ispresent. This data arises in experiments where an “experimental unit” ismeasured more than once (e.g. diameter of a tree measured yearly).

How to decide whether a variable must be considered to be random ornot?

Design of the experiment/study

Questions addressed

Hospitals example:Response variable: length of stay [days]Design: 10 hospitals, about 1.000 hospitalisations per hospitalQ1: What is the average length of stay?Q2: Which hospital has, on average, the longest stays?


Orthodont Example

data(Orthodont, package = "nlme")

names(Orthodont)

[1] "distance" "age" "Subject" "Sex"

head(Orthodont)

Grouped Data: distance ~ age | Subject

distance age Subject Sex

1 26 8 M01 Male

2 25 10 M01 Male

3 29 12 M01 Male

4 31 14 M01 Male

5 22 8 M02 Male

6 22 10 M02 Male


Orthodont Example

age

dist

ance

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female

This graph was produced with the xyplot() of lattice package [6]. Similar graphs can be produced with ggplot2 package [7].


Orthodont Example

Aims: model the growth of the measured distance2 and inspect genderdifferences. For more information about the data set typehelp(Orthodont) in R.

Response variable: distance

Fixed: Sex and age

Random: Subject

We are interested in growth (i.e. age) and gender (i.e. Sex).We must consider person, but this is not our main interest.

2between the pituitary gland and the pterygomaxillary fissure [mm].7. Vorlesung Angewandte statistische Regression I 8 / 37

Can we drop “Subject” from the analysis?

Given that we are not interested in “Subject”, we drop it from theanalysis? Can we? (motivate)

age

dist

ance

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female



Let’s fit a simple linear model:

lm.Ort <- lm(distance ~ age * Sex, data = Orthodont)

summary(lm.Ort)

Call:

lm(formula = distance ~ age * Sex, data = Orthodont)

Residuals:

Min 1Q Median 3Q Max

-5.616 -1.322 -0.168 1.330 5.247

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 16.341 1.416 11.54 < 2e-16 ***

age 0.784 0.126 6.22 1.1e-08 ***

SexFemale 1.032 2.219 0.47 0.64

age:SexFemale -0.305 0.198 -1.54 0.13

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 2.3 on 104 degrees of freedom

Multiple R-squared: 0.423,Adjusted R-squared: 0.406

F-statistic: 25.4 on 3 and 104 DF, p-value: 2.11e-12

Is that fine? Let’s check the model assumptions



par(mfrow = c(2, 2), oma = c(0, 0, 0, 0), mar = c(2, 4, 2, 1))

plot(lm.Ort)

21 22 23 24 25 26 27

−6

−4

−2

02

46

Fitted values

Res

idua

ls

Residuals vs Fitted

49

3539

−2 −1 0 1 2

−2

−1

01

2

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

49

35 39

21 22 23 24 25 26 27

0.0

0.5

1.0

1.5

Sta

ndar

dize

d re

sidu

als

Scale−Location49

3539

0.00 0.01 0.02 0.03 0.04 0.05 0.06

−3

−2

−1

01

2

Sta

ndar

dize

d re

sidu

als

Cook's distance

Residuals vs Leverage

101104

49

The main model assumptions look good! What about theindependence of the observations?



resi

dual

s(lm

.Ort

)

−6

−4

−2

0

2

4

F07M

05 F08M

16 F03M

11 F05 F11M

03M

06 F02M

08M

12 F10M

14M

09 F04M

02M

15 F09 F06M

04M

01 F01M

13M

07M

10

There is a clear “Subject” effect! Omitting it from the analysis is incorrect.



The previous graph clearly showed that if we don’t consider “Subject” inour analysis, we cannot assume that observations are independent.

Thus, the inference procedure (i.e. computing p-values and estimatingconfidence intervals) is not valid when subjects are omitted3.

Design variables should always be there!

3Although the estimated regression coefficients are still unbiased.7. Vorlesung Angewandte statistische Regression I 13 / 37

Section 2

Designs


Experiments can have several random effects:

Agricultural studywith plots and subplots, response variable = plant height

Educational studywith schools, classes and scholars, response variable = grade

Fish reproduction studywith mothers and fathers, response variable = larvae survival


Experiments can have several random effects, and things becomplex/complicated

Educational study (II)schools, teachers and scholars, response variable = grade

Pedomorphological studysites, response variable = [Zn] and [Cu]

Survival studycomputer processors monitored over time, response variable =working/broken

Note: these are the 3 most common situations that yield “grouped data”.

Warning: Not all studies that have “grouped data” are best analysedwithin the framework of Mixed Effects Models.


Section 3

Get to know data


When dealing with Mixed Models it is essential to understand the datastructure.

There are 2 fundamental steps to carry out before fitting models:

summaries (str(), head(), table(), ...)

graphics

The structure of the data determines what questions you can answer.


See MixedModels Lab.pdf


Section 4

Fitting a Mixed Effects Model with lme4


lme4

Currently the best and more powerful way to fit Mixed Models in R is byusing the lme4 package [1].

To fit a Linear Mixed Effect Model we use the function lmer()

There are 2 arguments that need to be specified:formula4 and data.

mod.0 <- lmer(distance ~ age * Sex + (1 | Subject),

data = Orthodont)

4Which must contain at least one random effect.7. Vorlesung Angewandte statistische Regression I 21 / 37

Models can contain several random effects:

mem.0 <- lmer(grade ~ subject + (1 | School) + (1 | Class) + (1 | Student),

data = MathGrades)

##

fish.0 <- lmer(weight ~ days + (1 | Mother) + (1 | Father) + (1 | larvae),

data = FishDeveloppement)

We don’t need to specify whether random effects are nested or crossed.


Section 5

Checking the model assumptions


Model assumptions

There are two classes of assumptions:

Structural assumptions (model equation)

Distributional assumptions

All assumptions are assessed simultaneously using graphs.


Structural assumptions

When fitting mod.0 we assumed growth to be linear, and we allowed thetwo gender groups to have different slopes.We also assumed that growth was the same within gender groups. In otherwords, that regression lines for subjects run parallel to the gender line (i.esimple random intercept).

We may assume that growth (slopes) are the same in both groups, in thatcase we would fit the model:

mem.1 <- lmer(distance ~ age + Sex + (1 | Subject),

data = Orthodont)

The graphical analysis suggested quite strongly that the two groups havedifferent slopes.


Distributional assumptions

distancei ,j ,k = β(0)i + β

(1)i · agei ,j ,k + bj + εi ,j ,k

i = 1, ..., nsex j = 1, ..., nsub k = 1, ...,N

ε ∼ N (0, σ2ε)

b ∼ N (0, σ2b)


The fundamental graphics to assess model assumptions are:

Tukey-Anscombe plot

Scale-location plot

Quantile-Quantile plots

Residuals against predictor plots

Other further graphics (if required)


Section 6

Inference procedure


p-values and confidence intervals can be computed with the functionsanova(), drop1() and confint()

Fixed effects are usually tested with Likelihood Ratio Tests.

For random effects, we prefer to estimate confidence intervals via profilinglikelihood (profile()).


Section 7

Concluding remarks


Take home message

Mixed Effects Models are very powerful methods well suited toanalyse a wide variety of modern experiments

They are relatively easy to use

Complex experiments/studies require the use of appropriatevisualisation techniques

The is some debate on how to perform inference. However, rememberthat statistical significance does not imply practical/biologicalsignificance.

lme4 is one of the most up-to-date software to analyse theseexperiments. It does not implement *everything* for good reasons

Extensions to LMMs and GLMMS exist (e.g. mgcv)


Further topics (not covered here):

Generalised Mixed Models (GLMMs; glmer(..., family =...))

Generalised Additive Mixed Models (GAMMs; mgcv)

R2 and stepwise model selection (Don’t use them with LMMs!)

Partial residual plots

Dynamic documents (knitr and RMarkdown)

Getting advice (Internet, consulting offices)

Literature (see next slide)

...


Literature

lme4: Mixed-effects modeling with R [2] (short!)

Mixed Effects Models in S and S-plus [4]

Extending the linear model with R: generalized linear, mixed effectsand non-parametric regression models [3]

Generalized additive models: an introduction with R [8]

Lattice: multivariate data visualization with R [5]


Bibliography I

Douglas Bates et al. lme4: Linear Mixed-Effects Models using ’Eigen’and S4. R package version 1.1-18-1. 2018. url:https://CRAN.R-project.org/package=lme4.

Douglas M Bates. “lme4: Mixed-effects modeling with R”. In: URLhttp://lme4. r-forge. r-project. org/book (2010).

Julian J Faraway. Extending the linear model with R: generalizedlinear, mixed effects and nonparametric regression models. CRC press,2005.

Jose Pinheiro and Douglas Bates. Mixed-effects models in S andS-PLUS. Springer Science & Business Media, 2006.

Deepayan Sarkar. Lattice: multivariate data visualization with R.Springer Science & Business Media, 2008.


https://CRAN.R-project.org/package=lme4

Bibliography II

Deepayan Sarkar. lattice: Trellis Graphics for R. R package version0.20-35. 2017. url:https://CRAN.R-project.org/package=lattice.

Hadley Wickham et al. ggplot2: Create Elegant Data VisualisationsUsing the Grammar of Graphics. R package version 3.0.0. 2018. url:https://CRAN.R-project.org/package=ggplot2.

Simon Wood. Generalized additive models: an introduction with R.CRC press, 2006.


https://CRAN.R-project.org/package=lattice

https://CRAN.R-project.org/package=ggplot2

dr. matteo tanadini [email protected] herbst ... · julian j faraway. extending the...

Documents