solution pigs exercise - publicifsv.sund.ku.dk

Solution pigs exerciseCourse repeated measurements - R exercise class 2

November 24, 2017

Contents1 Question 1: Import data 3

1.1 Data management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Inspection of the dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Question 2: Descriptive statistics 52.1 Raw data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Question 3: Modeling the group effect 93.1 Fitting a model using an unstructured covariance matrix . . . . . . . . . . . . . . . . 93.2 Inference on the mean parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Technical note: the nlme provides not very accurate results in small samples . . . . . 103.4 Inspection of the variance-covariance parameters . . . . . . . . . . . . . . . . . . . . 11

4 Question 5: Investigating the group effect in the first four weeks 13

5 Question 6: Modeling the treatment effect 145.1 Definition of the new variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.1.1 the treatment variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.1.2 number of weeks under treatment . . . . . . . . . . . . . . . . . . . . . . . . . 155.1.3 Interaction between time and treatment . . . . . . . . . . . . . . . . . . . . . 16

5.2 Model (a): non-parametric treatment effect . . . . . . . . . . . . . . . . . . . . . . . 175.3 Model (b): linear effect of the treatment . . . . . . . . . . . . . . . . . . . . . . . . . 185.4 Model (c): splitting the treatment effect into a linear effect and a non linear effect . 18

6 Question 7: Predicted weight profiles 196.1 Compute individual predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.2 Graphical display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.3 Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1

7 Question 8: Estimate of the difference in weight between the group at the endof week 7 217.1 Model (a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217.2 Model (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227.3 Model (c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

8 Question 10: Specification of the covariance matrix (compound symmetry vs.unstructured) 248.1 Comparison of the model fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248.2 Comparison of the fitted values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

NOTE: This document contains an example of R code and related software outputs that answersthe questions of the pigs exercise. The focus here is on the implementation using the R softwareand not on the interpretation - we refer to the SAS solution for a more detailed discussion of theresults.

2

Load the packages that will be necessary for the analysis:

library(data.table) # data management

library(nlme) # implementation of models for repeated measurements (e.g. gls, lme)

library(ggplot2) # graphical displaylibrary(fields) # graphical display: image.plot

library(multcomp) # Test for linear hypothesis (glht function)library(AICcmodavg) # predictSE.gls

1 Question 1: Import data1.1 Data managementWe first specify the location of the data through a variable called path.data:

path.data <- "http://publicifsv.sund.ku.dk/∼jufo/courses/rm2017/vitamin.txt"

Then we use the function fread to import the dataset:

dtL.vitamin <- fread(path.data, header = TRUE)str(dtL.vitamin)

Classes ‘data.table’ and ’data.frame’: 60 obs. of 4 variables:$ grp : int 1 1 1 1 1 1 1 1 1 1 ...$ animal: int 1 1 1 1 1 1 2 2 2 2 ...$ week : int 1 3 4 5 6 7 1 3 4 5 ...$ weight: int 455 460 510 504 436 466 467 565 610 596 ...- attr(*, ".internal.selfref")=<externalptr>

We rename the group variable using the function factor:

dtL.vitamin[, grp := factor(grp, levels = 1:2, labels = c("C","T"))]

and convert the animal and week variables to factor:

dtL.vitamin[, animal := as.factor(animal)]dtL.vitamin[, week.factor := paste0("w",as.factor(week))]

3

1.2 Inspection of the datasetThe summary method provides useful information about the dataset:

summary(dtL.vitamin, maxsum = 10)

grp animal week weight week.factorC:30 1 :6 Min. :1.000 Min. :436.0 Length:60T:30 2 :6 1st Qu.:3.000 1st Qu.:508.5 Class :character

3 :6 Median :4.500 Median :565.0 Mode :character4 :6 Mean :4.333 Mean :555.75 :6 3rd Qu.:6.000 3rd Qu.:594.56 :6 Max. :7.000 Max. :702.07 :68 :69 :610:6

We have a total of 60 observations divided into 2 groups of 30 observations each. Further insidecan be obtain with the table function:

dtL.vitamin[, table(grp,animal)]

animalgrp 1 2 3 4 5 6 7 8 9 10

C 6 6 6 6 6 0 0 0 0 0T 0 0 0 0 0 6 6 6 6 6

Each group contain 5 animals and each animal has 6 measurements.dtL.vitamin[, table(animal,week.factor)]

week.factoranimal w1 w3 w4 w5 w6 w7

1 1 1 1 1 1 12 1 1 1 1 1 13 1 1 1 1 1 14 1 1 1 1 1 15 1 1 1 1 1 16 1 1 1 1 1 17 1 1 1 1 1 18 1 1 1 1 1 19 1 1 1 1 1 110 1 1 1 1 1 1

Each animal has been measured once at each of the 6 timepoints. Note that there is no missingvalues:

colSums(is.na(dtL.vitamin))

grp animal week weight week.factor0 0 0 0 0

4

2 Question 2: Descriptive statistics2.1 Raw dataWe can visualize the weight variable using a spaguetti plot:

gg.spaguetti <- ggplot(dtL.vitamin, aes(x = week.factor, y = weight, group = animal,color = animal))

gg.spaguetti <- gg.spaguetti + geom_line() + geom_point()gg.spaguetti <- gg.spaguetti + facet_grid(∼grp, labeller = label_both)gg.spaguetti <- gg.spaguetti + xlab("week")gg.spaguetti

Here we use ggplot2 instead of matplot since the data is in the long format. But one couldconvert dtL.vitamin to the wide format (e.g. using dcast) and use matplot. The syntax ofggplot2 in the previous code chunk worked as follow:

• first we specify the dataset to use and the variables corresponding the x-axis and y-axis. Wealso specify that the points should be grouped and colored according to the variable animal.

• second we specify how to display the data: with points and lines.

• finally we request to split the elements to be plotted in two windows according to the variablegrp.

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

grp: C grp: T

w1 w3 w4 w5 w6 w7 w1 w3 w4 w5 w6 w7

500

600

700

week

wei

ght

animal●

●

●

●

●

●

●

●

●

●

12345678910

5

2.2 Summary statisticsWe can compute the mean and standard deviation of the weight for each group at each time:

dt.descriptive <- dtL.vitamin[, .(n = .N, mean = mean(weight), sd = sd(weight)) ,by = c("grp","week.factor")]

dt.descriptive

grp week.factor n mean sd1: C w1 5 466.4 16.727222: C w3 5 519.4 40.642343: C w4 5 568.8 39.587884: C w5 5 561.6 42.840405: C w6 5 546.6 66.879006: C w7 5 572.0 61.818287: T w1 5 494.4 31.910818: T w3 5 551.0 41.892729: T w4 5 574.2 27.99464

10: T w5 5 567.0 62.0604511: T w6 5 603.0 53.3057212: T w7 5 644.0 57.54998

and plot it:

gg.mean <- ggplot(dt.descriptive, aes(x = week.factor, y = mean,group = grp, color = grp))

gg.mean <- gg.mean + geom_line() + geom_point()gg.mean <- gg.mean + ylab("sample mean (weight)") + xlab("week")gg.mean

●

●

●

●

●

●

●

●

●

●

●

●

500

550

600

650

w1 w3 w4 w5 w6 w7week

sam

ple

mea

n (w

eigh

t)

grp●

●

CT

6

gg.sd <- ggplot(dt.descriptive, aes(x = week.factor, y = sd, group = grp, color = grp))gg.sd <- gg.sd + geom_line() + geom_point()gg.sd <- gg.sd + ylab("sample standard deviation (weight)") + xlab("week")gg.sd

●

●

●

●

●

●

●

●

●

●

●

●

20

30

40

50

60

w1 w3 w4 w5 w6 w7week

sam

ple

stan

dard

dev

iatio

n (w

eigh

t)

grp●

●

CT

Instead of first computing the mean/variance and then plotting it, we could use ggplot to doboth at the same time:

gg.mean2 <- ggplot(dtL.vitamin, aes(x = week.factor, y = weight,group = grp, color = grp))

gg.mean2 <- gg.mean2 + stat_summary(geom = "line", fun.y = mean,size = 3, fun.data = NULL)

gg.mean2

500

550

600

650

w1 w3 w4 w5 w6 w7week.factor

wei

ght grp

CT

7

If we wanted to compute the correlation matrix, it would be easier to first move to the wideformat:

dtW.vitamin <- dcast(dtL.vitamin, value.var = "weight",formula = grp+animal ∼ week.factor)

dtW.vitamin

grp animal w1 w3 w4 w5 w6 w71: C 1 455 460 510 504 436 4662: C 2 467 565 610 596 542 5873: C 3 445 530 580 597 582 6194: C 4 485 542 594 583 611 6125: C 5 480 500 550 528 562 5766: T 6 514 560 565 524 552 5977: T 7 440 480 536 484 567 5698: T 8 495 570 569 585 576 6779: T 9 520 590 610 637 671 702

10: T 10 503 555 591 605 649 675

and then compute the correlation matrices relative to each group:

list("grp=T" = cor(dtW.vitamin[grp=="T", .(w1,w3,w4,w5,w6,w7)]),"grp=C" = cor(dtW.vitamin[grp=="C", .(w1,w3,w4,w5,w6,w7)]))

$‘grp=T‘w1 w3 w4 w5 w6 w7

w1 1.0000000 0.9505691 0.8271283 0.7324280 0.4525202 0.6711257w3 0.9505691 1.0000000 0.8514018 0.8394614 0.4948230 0.8207430w4 0.8271283 0.8514018 1.0000000 0.9521629 0.8698131 0.8880628w5 0.7324280 0.8394614 0.9521629 1.0000000 0.8466142 0.9854188w6 0.4525202 0.4948230 0.8698131 0.8466142 1.0000000 0.7803782w7 0.6711257 0.8207430 0.8880628 0.9854188 0.7803782 1.0000000

$‘grp=C‘w1 w3 w4 w5 w6 w7

w1 1.00000000 0.2332189 0.2523425 -0.04856259 0.4263431 0.2441859w3 0.23321886 1.0000000 0.9982323 0.93341452 0.7258512 0.8263880w4 0.25234249 0.9982323 1.0000000 0.93923176 0.7595188 0.8489105w5 -0.04856259 0.9334145 0.9392318 1.00000000 0.7265133 0.8502560w6 0.42634312 0.7258512 0.7595188 0.72651331 1.0000000 0.9648446w7 0.24418591 0.8263880 0.8489105 0.85025600 0.9648446 1.0000000

8

3 Question 3: Modeling the group effect3.1 Fitting a model using an unstructured covariance matrixWe use the gls function to fit the mixed model, specifying the correlation and weights argumentsto model the within individual variability in weights using an unstructured covariance matrix:

gls.UN <- gls(weight ∼ week.factor + grp + grp:week.factor,data = dtL.vitamin,correlation = corSymm(form = ∼1 | animal),weights = varIdent(form = ∼1 | week.factor)

)logLik(gls.UN)

’log Lik.’ -218.9236 (df=33)

3.2 Inference on the mean parametersWe can then extract the estimated coefficients:

summary(gls.UN)$tTable

Value Std.Error t-value p-value(Intercept) 466.4 11.39344 40.9358448 5.487679e-39week.factorw3 53.0 13.58787 3.9005379 2.981543e-04week.factorw4 102.4 13.55350 7.5552443 1.041548e-09week.factorw5 95.2 20.38023 4.6711939 2.446059e-05week.factorw6 80.2 24.73664 3.2421536 2.160099e-03week.factorw7 105.6 23.37022 4.5185709 4.063580e-05grpT 28.0 16.11275 1.7377538 8.866687e-02week.factorw3:grpT 3.6 19.21615 0.1873424 8.521818e-01week.factorw4:grpT -22.6 19.16754 -1.1790765 2.441793e-01week.factorw5:grpT -22.6 28.82200 -0.7841233 4.368203e-01week.factorw6:grpT 28.4 34.98290 0.8118253 4.208997e-01week.factorw7:grpT 44.0 33.05048 1.3312967 1.893800e-01

their confidence intervals

intervals(gls.UN)[["coef"]]

lower est. upper(Intercept) 443.491958 466.4 489.30804week.factorw3 25.679757 53.0 80.32024week.factorw4 75.148863 102.4 129.65114week.factorw5 54.222804 95.2 136.17720week.factorw6 30.463643 80.2 129.93636week.factorw7 58.611022 105.6 152.58898grpT -4.396864 28.0 60.39686

9

week.factorw3:grpT -35.036658 3.6 42.23666week.factorw4:grpT -61.138928 -22.6 15.93893week.factorw5:grpT -80.550507 -22.6 35.35051week.factorw6:grpT -41.937830 28.4 98.73783week.factorw7:grpT -22.452450 44.0 110.45245attr(,"label")[1] "Coefficients:"

the F-tests:

anova(gls.UN, type = "marginal")

Denom. DF: 48numDF F-value p-value

(Intercept) 1 1675.7434 <.0001week.factor 5 42.9724 <.0001grp 1 3.0198 0.0887week.factor:grp 5 5.2803 0.0006

3.3 Technical note: the nlme provides not very accurate results in smallsamples

The last p.value does not match the one of the SAS output. Indeed according to gls we have thefollowing test:

1-pf(5.2803, df1 = 5, df2 = 48)

[1] 0.0006067529

Here the degree of freedom are clearly wrong. We perform a comparison between individuals,here the 10 pigs, so we do not really have 60 independent observations (minus 12 parameters) butsomething closer to 10 observations. The Satterthwaite approximation can be used to obtain amore sensible value for the degree of freedom:

library(lavaSearch2) ## not (yet!) available on CRAN, see github/bozenne/lavaSearch2system.time(

df.Satterthwaite <- dfVariance(gls.UN, adjust.residuals = TRUE))

Le chargement a nécessité le package : lavalava version 1.5.1

Attachement du package : ‘lava’

The following object is masked from ‘package:fields’:

surface

10

lavaSearch2 version 0.1.2utilisateur système écoulé

57.416 0.000 57.446

df.Satterthwaite[names(coef(gls.UN))]

(Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw66.944444 6.944444 6.944444 6.944444 6.944444

week.factorw7 grpT week.factorw3:grpT week.factorw4:grpT week.factorw5:grpT6.944444 6.944444 6.944444 6.944444 6.944444

week.factorw6:grpT week.factorw7:grpT6.944444 6.944444

We obtain something close to 7 degrees of freedom, so the p.value for the F-test of the interactionshould be:

1-pf(5.2803, df1 = 5, df2 = 7)

[1] 0.02505935

3.4 Inspection of the variance-covariance parametersWe can display the modeled variance-covariance matrix between the vitamin measurements withinindividuals using the getVarCov function:

Sigma.UN <- getVarCov(gls.UN, individuals = 1)Sigma.UN

Marginal variance covariance matrix[,1] [,2] [,3] [,4] [,5] [,6]

[1,] 649.05 714.65 453.01 707.86 623.37 742.52[2,] 714.65 1703.40 1302.30 1903.90 1539.00 2027.50[3,] 453.01 1302.30 1175.50 1623.60 1654.50 1754.20[4,] 707.86 1903.90 1623.60 2843.40 2441.20 2885.70[5,] 623.37 1539.00 1654.50 2441.20 3657.20 3191.60[6,] 742.52 2027.50 1754.20 2885.70 3191.60 3566.80

Standard Deviations: 25.477 41.272 34.285 53.324 60.475 59.723

This matrix can be converted into a correlation matrix:

Cor.UN <- cov2cor(Sigma.UN)

A graphical representation of the correlation matrix can be obtain with the following code:

seqTime <- paste0("week",unique(dtL.vitamin$week))seqTime.num <- as.numeric(as.factor(seqTime))palette.z <- rev(heat.colors(12))

11

par(mar = c(4,4,5,5))image(x = seqTime.num, y = seqTime.num, z = Cor.UN,

main = "correlation matrix", axes = FALSE, col = palette.z,xlab = "", ylab = "")

axis(1, at = seqTime.num, labels = seqTime)axis(2, at = seqTime.num, labels = seqTime, las = 2)image.plot(x = seqTime.num, y = seqTime.num, z = Cor.UN, legend.only = TRUE, col =

palette.z)

correlation matrix

week1 week3 week4 week5 week6 week7

week1

week3

week4

week5

week6

week7

0.4

0.5

0.6

0.7

0.8

0.9

1.0

12

4 Question 5: Investigating the group effect in the first fourweeks

We can create a new dataset containing the data of the first four week doing:dt.tempo <- dtL.vitamin[week<=4]table(dt.tempo$week)

1 3 410 10 10

So we can use a syntax similar to Question 3 to fit the mixed model using only the first weeks:gls.UN.w14 <- gls(weight ∼ week.factor + grp:week.factor,

data = dtL.vitamin[week<=4],correlation = corSymm(form = ∼1 | animal),weights = varIdent(form = ∼1 | week.factor)

)logLik(gls.UN.w14)

’log Lik.’ -112.2367 (df=12)

We can then extract the estimated coefficients:summary(gls.UN.w14)$tTable

Value Std.Error t-value p-value(Intercept) 466.4 11.39342 40.9359096 1.018777e-23week.factorw3 53.0 13.58786 3.9005406 6.772109e-04week.factorw4 102.4 13.55359 7.5551913 8.555732e-08week.factorw1:grpT 28.0 16.11273 1.7377566 9.507109e-02week.factorw3:grpT 31.6 26.10287 1.2105946 2.378384e-01week.factorw4:grpT 5.4 21.68364 0.2490357 8.054520e-01

the F-tests:anova(gls.UN.w14, type = "marginal")

Denom. DF: 24numDF F-value p-value

(Intercept) 1 1675.7487 <.0001week.factor 2 40.1472 <.0001week.factor:grp 3 2.1888 0.1155

As before, the F-test should be computed with something close to 7 degree of freedom insteadof 24, e.g. for the interaction:

1-pf(2.1888, df1 = 3, df2 = 7)

[1] 0.1772701

13

5 Question 6: Modeling the treatment effect5.1 Definition of the new variablesWe first define the new variables suggested in the exercise:

5.1.1 the treatment variable

This variable takes value:

• "No" in the control group.

• "No" in the treated group at week 4 and before.

• "Yes" in the treated group after week 4.

We can use the following syntax to obtain it:

dtL.vitamin[, treat := as.character(NA)] # initialization to missingdtL.vitamin[grp == "C", treat := "No"]dtL.vitamin[week<=4 & grp == "T", treat := "No"]dtL.vitamin[week>4 & grp == "T", treat := "Yes"]

We can display the result for the first observation of each group at each time:

dtL.vitamin[,.(treat = treat[1]), by = c("week","grp")]

week grp treat1: 1 C No2: 3 C No3: 4 C No4: 5 C No5: 6 C No6: 7 C No7: 1 T No8: 3 T No9: 4 T No

10: 5 T Yes11: 6 T Yes12: 7 T Yes

14

5.1.2 number of weeks under treatment

This variable takes value:

• 0 when no treatment is given.

• 1 at week 5 when a treatment is given.



dtL.vitamin[, vitaweeks := as.integer(NA)] # initialization to missingdtL.vitamin[treat == "No", vitaweeks := 0]dtL.vitamin[treat == "Yes" & week == 5, vitaweeks := 1]dtL.vitamin[treat == "Yes" & week == 6, vitaweeks := 2]dtL.vitamin[treat == "Yes" & week == 7, vitaweeks := 3]

We can display the result for the first observation of each group at each time:

dtL.vitamin[,.(vitaweeks = vitaweeks[1]), by = c("week","grp")]

week grp vitaweeks1: 1 C 02: 3 C 03: 4 C 04: 5 C 05: 6 C 06: 7 C 07: 1 T 08: 3 T 09: 4 T 0

10: 5 T 111: 6 T 212: 7 T 3

A more concise syntax is:

setkeyv(dtL.vitamin, c("animal","week"))dtL.vitamin[, vitaweeks2 := cumsum(treat=="Yes"), by = "animal"]

Here we count the number of week under treatement using the cumsum function. We can checkthat both coincide using:

all(dtL.vitamin$vitaweeks == dtL.vitamin$vitaweeks2)

[1] TRUE

15

5.1.3 Interaction between time and treatment

To obtain an interation coefficients only at week 5, 6, and 7, we can define a new variable whosevalue is:

• baseline when the individual is not treated.

• the week number (e.g. w5, w6, w7) when the individual is treated.

dtL.vitamin[treat == "No", I.treat_week := "baseline"]dtL.vitamin[treat == "Yes", I.treat_week := week.factor]

We can display the result for the first observation of each group at each time:dtL.vitamin[,.(I.treat_week = I.treat_week[1]), by = c("week","grp")]

week grp I.treat_week1: 1 C baseline2: 3 C baseline3: 4 C baseline4: 5 C baseline5: 6 C baseline6: 7 C baseline7: 1 T baseline8: 3 T baseline9: 4 T baseline

10: 5 T w511: 6 T w612: 7 T w7

We also define another interaction term with only 2 coefficients. Here we decided not model aninteraction at week 5:

dtL.vitamin[, I.treat_week67 := I.treat_week]dtL.vitamin[week == 5, I.treat_week67 := "baseline"]

We can display the result for the first observation of each group at each time:dtL.vitamin[,.(I.treat_week67 = I.treat_week67[1]), by = c("week","grp")]

week grp I.treat_week671: 1 C baseline2: 3 C baseline3: 4 C baseline4: 5 C baseline5: 6 C baseline6: 7 C baseline7: 1 T baseline8: 3 T baseline9: 4 T baseline

10: 5 T baseline11: 6 T w612: 7 T w7

16

5.2 Model (a): non-parametric treatment effect

ls.UN.a0 <- try(gls(weight ∼ week.factor + treat:week.factor,data = dtL.vitamin,correlation = corSymm(form =∼ 1 | animal),weights = varIdent(form =∼ 1 | week.factor)))

Error in glsEstimate(object, control = control) :computed "gls" fit is singular, rank 10

The gls function cannot fit the model since the model is not properly defined by the formula.To see that let’s look at how many coefficients gls is trying to estimate:

X <- model.matrix(weight ∼ week.factor + treat:week.factor, data = dtL.vitamin)summary(X)

(Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw6 week.factorw7Min. :1 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00001st Qu.:1 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000Median :1 Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000Mean :1 Mean :0.1667 Mean :0.1667 Mean :0.1667 Mean :0.1667 Mean :0.16673rd Qu.:1 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000Max. :1 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000week.factorw1:treatYes week.factorw3:treatYes week.factorw4:treatYes week.factorw5:treatYesMin. :0 Min. :0 Min. :0 Min. :0.000001st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0.00000Median :0 Median :0 Median :0 Median :0.00000Mean :0 Mean :0 Mean :0 Mean :0.083333rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.00000Max. :0 Max. :0 Max. :0 Max. :1.00000week.factorw6:treatYes week.factorw7:treatYesMin. :0.00000 Min. :0.000001st Qu.:0.00000 1st Qu.:0.00000Median :0.00000 Median :0.00000Mean :0.08333 Mean :0.083333rd Qu.:0.00000 3rd Qu.:0.00000Max. :1.00000 Max. :1.00000

So gls is trying to estimate interactions before time 0 (e.g. week.factorw1:treatYes) eventhough they do not exist. The corresponding columns in the design matrix (X) contain only 0making the design matrix singular. We therefore need to manually define the interaction using thevariable I.treat_week that we have defined in the last subsection:

gls.UN.a <- gls(weight ∼ week.factor + I.treat_week,data = dtL.vitamin,correlation = corSymm(form = ∼1 | animal),weights = varIdent(form = ∼1 | week.factor))

logLik(gls.UN.a)

17

’log Lik.’ -232.0818 (df=30)

5.3 Model (b): linear effect of the treatment

gls.UN.b <- gls(weight ∼ week.factor + vitaweeks,data = dtL.vitamin,correlation = corSymm(form =∼ 1 | animal),weights = varIdent(form =∼ 1 | week.factor))

logLik(gls.UN.b)

’log Lik.’ -243.859 (df=28)

5.4 Model (c): splitting the treatment effect into a linear effect and anon linear effect

Once again if we try to fit the model with interactions, we have an overparametrized model. Wetherefore redefine the interactions such that there is one degree of freedom left for vitaweeks.

gls.UN.c <- gls(weight ∼ week.factor + vitaweeks + I.treat_week67,data = dtL.vitamin,correlation = corSymm(form =∼ 1 | animal),weights = varIdent(form =∼ 1 | week.factor))

logLik(gls.UN.c)

’log Lik.’ -232.0818 (df=30)

As suggested by the log-likelihood, this is the same model as (a) but parametrized in anotherway:

logLik(gls.UN.a) - logLik(gls.UN.c)

’log Lik.’ 7.538006e-10 (df=30)

18

6 Question 7: Predicted weight profiles6.1 Compute individual predictionsTo compute the predicted profiles for all individuals, you can use the predict function:

dtL.vitamin[, weight.UN.a := predict(gls.UN.a, newdata = dtL.vitamin)]dtL.vitamin[, weight.UN.b := predict(gls.UN.b, newdata = dtL.vitamin)]dtL.vitamin[, weight.UN.c := predict(gls.UN.c, newdata = dtL.vitamin)]

6.2 Graphical displayWe can directly display the prediction for a given model:

gg.prediction <- ggplot(dtL.vitamin, aes(x = week, y = weight.UN.a,group = grp, color = grp))

gg.prediction <- gg.prediction + geom_point() + geom_line()gg.prediction <- gg.prediction + ylab("model (a): week grpTweek")gg.prediction

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

500

550

600

2 4 6week

mod

el (

a): w

eek

grpT

wee

k

grp●

●

CT

To display the predictions of the three models on several panels, we need to move to the wideformat:

vec.name <- paste0("weight.",c("UN.a","UN.b","UN.c"))vec.name

[1] "weight.UN.a" "weight.UN.b" "weight.UN.c"

19

dtL.prediction <- melt(dtL.vitamin, id.vars = c("grp","animal","week"),value.name = "weight", variable.name = "model",measure.vars = vec.name)

dtL.prediction

grp animal week model weight1: C 1 1 weight.UN.a 480.40002: C 1 3 weight.UN.a 535.20003: C 1 4 weight.UN.a 571.50004: C 1 5 weight.UN.a 570.46395: C 1 6 weight.UN.a 537.8247

---176: T 10 3 weight.UN.c 535.2000177: T 10 4 weight.UN.c 571.5000178: T 10 5 weight.UN.c 558.1363179: T 10 6 weight.UN.c 611.7753180: T 10 7 weight.UN.c 635.8557

We can then use facet to divide the window into three sub-windows, each displaying the resultof a specific

gg.prediction2 <- ggplot(dtL.prediction, aes(x = week, y = weight, group = grp, color =grp))

gg.prediction2 <- gg.prediction2 + geom_point() + geom_line()gg.prediction2 <- gg.prediction2 + facet_wrap(∼model, labeller = label_both)gg.prediction2

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

model: weight.UN.a model: weight.UN.b model: weight.UN.c

2 4 6 2 4 6 2 4 6

500

550

600

week

wei

ght grp

●

●

CT

20

6.3 NoteIn the previous graph we have displayed the predicted values for all individuals. However we couldonly distinguish two curves for each model. This is because given a group and a week the predictionare the same for all individuals:

• we don’t model individual specific covariates like age.

• we use the marginal predictions and not predictions conditional on the random effects. Inother words, if we have already observed an individual at week 1 and 3, we could we thesevalues to have a more accurate prediction on week 4 (predictions conditional on the individualrandom effect). Here we display the predicted values as if we were to perform prediction fora new individual (i.e. not already included in the study).

7 Question 8: Estimate of the difference in weight betweenthe group at the end of week 7

7.1 Model (a)The estimated difference in weight is given by the interaction term:

CI.UN.a <- intervals(gls.UN.a, which = "coef")CI.UN.a[["coef"]]["I.treat_weekw7",]

lower est. upper17.15920 55.71152 94.26384

This matches the difference in predicted profiles:

dtL.vitamin[grp=="T" & week=="7",unique(weight.UN.a)] - dtL.vitamin[grp=="C" & week=="7",unique(weight.UN.a)]

[1] 55.71152

However, once again, the confidence intervals are computed using the wrong degree of freedom:

beta <- summary(gls.UN.a)$tTable["I.treat_weekw7","Value"]sd.beta <- summary(gls.UN.a)$tTable["I.treat_weekw7","Std.Error"]

CI.default <- c("lower" = beta + qt(0.025, df = 60-9) * sd.beta,"est." = beta,"upper" = beta + qt(0.975, df = 60-9) * sd.beta)

CI.default

lower est. upper17.15920 55.71152 94.26384

21

CI.corrected <- c("lower" = beta + qt(0.025, df = 7) * sd.beta,"est." = beta,"upper" = beta + qt(0.975, df = 7) * sd.beta)

CI.corrected

lower est. upper10.30283 55.71152 101.12021

7.2 Model (b)In this model the difference is three times the linear term:

coef(gls.UN.b)["vitaweeks"]*3

vitaweeks11.57111

One can check that this matches the difference in predicted profiles:

dtL.vitamin[grp=="T" & week=="7",unique(weight.UN.b)] - dtL.vitamin[grp=="C" & week=="7",unique(weight.UN.b)]

[1] 11.57111

To obtain the p.values and the standard error (and deduce the confidence interval) one canuse the glht function. We first need to indicate that we are interested in 3 times the coefficientvitaweeks:

coef.UN.b <- coef(gls.UN.b)C <- matrix(0,nrow = 1, ncol=length(coef.UN.b), dimnames =list(NULL,names(coef.UN.b)))C[,"vitaweeks"] <- 3C

(Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw6 week.factorw7 vitaweeks[1,] 0 0 0 0 0 0 3

and then call glht:

glht.UN.b <- summary(glht(gls.UN.b, linfct = C))glht.UN.b

Simultaneous Tests for General Linear Hypotheses

Fit: gls(model = weight ~ week.factor + vitaweeks, data = dtL.vitamin,correlation = corSymm(form = ~1 | animal), weights = varIdent(form = ~1 |

week.factor))

22

Linear Hypotheses:Estimate Std. Error z value Pr(>|z|)

1 == 0 11.57 13.79 0.839 0.401(Adjusted p values reported -- single-step method)

We can obtain the corresponding confidence interval using confint

confint(glht(gls.UN.b, linfct = C))

Simultaneous Confidence Intervals

Fit: gls(model = weight ~ week.factor + vitaweeks, data = dtL.vitamin,correlation = corSymm(form = ~1 | animal), weights = varIdent(form = ~1 |

week.factor))

Quantile = 1.9695% family-wise confidence level

Linear Hypotheses:Estimate lwr upr

1 == 0 11.5711 -15.4488 38.5910

In this case this is simply three time the confidence interval of vitaweeks:

3*intervals(gls.UN.b, type = "coef")[["coef"]]["vitaweeks",]

lower est. upper-16.07991 11.57111 39.22212

7.3 Model (c)The results are the same as model (a) but obtaining them would be a bit more complex since thedifference is the interaction terms at week 7 plus the three times the linear term.

In this case using glht simplifies a lot the implementation:

coef.UN.c <- coef(gls.UN.c)C <- matrix(0,nrow = 1, ncol=length(coef.UN.c), dimnames =list(NULL,names(coef.UN.c)))C[,"vitaweeks"] <- 3C[,"I.treat_week67w7"] <- 1C

(Intercept) week.factorw3 week.factorw4 week.factorw5 week.factorw6 week.factorw7 vitaweeks[1,] 0 0 0 0 0 0 3

I.treat_week67w6 I.treat_week67w7[1,] 0 1

23

glht.UN.c <- summary(glht(gls.UN.c, linfct = C))glht.UN.c

Simultaneous Tests for General Linear Hypotheses

Fit: gls(model = weight ~ week.factor + vitaweeks + I.treat_week67,data = dtL.vitamin, correlation = corSymm(form = ~1 | animal),weights = varIdent(form = ~1 | week.factor))

Linear Hypotheses:Estimate Std. Error z value Pr(>|z|)

1 == 0 55.71 19.20 2.901 0.00372 **---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Adjusted p values reported -- single-step method)

8 Question 10: Specification of the covariance matrix (com-pound symmetry vs. unstructured)

8.1 Comparison of the model fitSpecifying an unstructured correlation matrix:

gls.CS <-gls(weight ∼ week.factor + I.treat_week,data = dtL.vitamin,correlation = corCompSymm(form = ∼1 | animal))

logLik(gls.CS)

’log Lik.’ -258.8007 (df=11)

is equivalent to a "standard" mixed model fitted using lme:e.lme <-lme(weight ∼ week.factor + I.treat_week,

data = dtL.vitamin,random = ∼1 | animal)

logLik(e.lme)

’log Lik.’ -258.8007 (df=11)

or lmer from the lme4 package:library(lme4)e.lmer <- lmer(weight ∼ week.factor + I.treat_week+ (1|animal),

data = dtL.vitamin)logLik(e.lmer)

’log Lik.’ -258.8007 (df=11)

24

As we can expect the variance-covariance structure is much simpler compared to the previousmodels:

list("unstructured" = unclass(getVarCov(gls.UN.a)),"compound symmetry" = unclass(getVarCov(gls.CS))

)

$unstructured[,1] [,2] [,3] [,4] [,5] [,6]

[1,] 794.7107 881.0221 444.6671 767.0827 417.5841 786.688[2,] 881.0221 1791.5135 1205.0034 1847.9253 1213.9257 1945.202[3,] 444.6671 1205.0034 1052.9483 1469.7741 1444.3001 1583.659[4,] 767.0827 1847.9253 1469.7741 2676.8199 2114.2963 2692.863[5,] 417.5841 1213.9257 1444.3001 2114.2963 3428.2684 2848.488[6,] 786.6880 1945.2020 1583.6589 2692.8628 2848.4881 3346.580

$‘compound symmetry‘[,1] [,2] [,3] [,4] [,5] [,6]

[1,] 2196.278 1510.800 1510.800 1510.800 1510.800 1510.800[2,] 1510.800 2196.278 1510.800 1510.800 1510.800 1510.800[3,] 1510.800 1510.800 2196.278 1510.800 1510.800 1510.800[4,] 1510.800 1510.800 1510.800 2196.278 1510.800 1510.800[5,] 1510.800 1510.800 1510.800 1510.800 2196.278 1510.800[6,] 1510.800 1510.800 1510.800 1510.800 1510.800 2196.278

We can compare the two models using a likelihood ratio test:

anova(update(gls.CS, method = "REML"),update(gls.UN.a, method = "REML")

)

Model df AIC BIC logLik Test L.Ratio p-valueupdate(gls.CS, method = "REML") 1 11 539.6013 560.8514 -258.8007update(gls.UN.a, method = "REML") 2 30 524.1637 582.1184 -232.0818 1 vs 2 53.43764 <.0001

So it seems that the unstructured model gives a better fit (p<0.0001).

25

8.2 Comparison of the fitted valuesComputation of the predicted values with confidence intervals using predictSE.gls:

resCS.tempo <- predictSE.gls(gls.CS, newdata = dtL.vitamin)

dtL.vitamin[, weight.CS := resCS.tempo$fit]dtL.vitamin[, weightInf.CS := resCS.tempo$fit - 1.96 * resCS.tempo$se.fit]dtL.vitamin[, weightSup.CS := resCS.tempo$fit + 1.96 * resCS.tempo$se.fit]

resUN.tempo <- predictSE.gls(gls.UN.a, newdata = dtL.vitamin)

dtL.vitamin[, weight.UN.a := resUN.tempo$fit]dtL.vitamin[, weightInf.UN.a := resUN.tempo$fit - 1.96 * resUN.tempo$se.fit]dtL.vitamin[, weightSup.UN.a := resUN.tempo$fit + 1.96 * resUN.tempo$se.fit]

With the current dataset we could create one graph for each model. But putting the results onboth model side by side may help to visualize discrepancies between the models. To do so we firstconvert the data to the long format. Since this involves to reshape simultaneously several variables,it might be easier to do that manually:

keep.colsCS <- c("grp","animal","week","weight.CS","weightInf.CS","weightSup.CS")keep.colsUN <- c("grp","animal","week","weight.UN.a","weightInf.UN.a","weightSup.UN.a")

dt.tempo1 <- dtL.vitamin[,.SD,.SDcols = keep.colsCS]setnames(dt.tempo1, old = names(dt.tempo1),

new = c("grp","animal","week","estimate","lower", "upper"))dt.tempo1[, model := "CS"]

dt.tempo2 <- dtL.vitamin[,.SD,.SDcols = keep.colsUN]setnames(dt.tempo2, old = names(dt.tempo2),

new = c("grp","animal","week","estimate","lower", "upper"))dt.tempo2[, model := "UN"]

dtL.prediction2 <- rbind(dt.tempo1,dt.tempo2)

dtL.prediction2

grp animal week estimate lower upper model1: C 1 1 480.4000 451.3531 509.4469 CS2: C 1 3 535.2000 506.1531 564.2469 CS3: C 1 4 571.5000 542.4531 600.5469 CS4: C 1 5 571.0101 536.6110 605.4093 CS5: C 1 6 556.0101 521.6110 590.4093 CS

---116: T 10 3 535.2000 508.9659 561.4341 UN117: T 10 4 571.5000 551.3878 591.6122 UN118: T 10 5 558.1361 522.8819 593.3904 UN119: T 10 6 611.7753 571.3432 652.2075 UN120: T 10 7 635.8558 595.3615 676.3500 UN

26

melt also enables to obtain a similar result in one operation:

dtL.prediction2.bis <- melt(dt.tempo,id.vars = c("grp","animal","week"),measure.vars = patterns("weight\\.","weightInf\\.","weightSup\\."),variable.name = "model",value.name = c("estimate","lower","upper"))

dtL.prediction2.bis

grp animal week1: C 1 12: C 1 33: C 1 44: C 2 15: C 2 36: C 2 47: C 3 18: C 3 39: C 3 4

10: C 4 111: C 4 312: C 4 413: C 5 114: C 5 315: C 5 416: T 6 117: T 6 318: T 6 419: T 7 120: T 7 321: T 7 422: T 8 123: T 8 324: T 8 425: T 9 126: T 9 327: T 9 428: T 10 129: T 10 330: T 10 4

grp animal week

27

We can now use ggplot2 to display the predictions:

gg.predictionIC <- ggplot(dtL.prediction2, aes(x = week, y = estimate, group = grp,color = grp))

gg.predictionIC <- gg.predictionIC + geom_point() + geom_line()gg.predictionIC <- gg.predictionIC + geom_ribbon(aes(ymin = lower, ymax = upper, fill =

grp), alpha = 0.33)gg.predictionIC <- gg.predictionIC + facet_grid(grp∼model,labeller = label_both)gg.predictionIC <- gg.predictionIC + ylab("weight")gg.predictionIC

●

●

● ●

●

●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

model: CS model: UN

grp: Cgrp: T

2 4 6 2 4 6

450

500

550

600

650

450

500

550

600

650

week

wei

ght grp

●

●

CT

28

solution pigs exercise - publicifsv.sund.ku.dk

Documents