counting chickens and other tales using random effect models and mcmc estimation in applied...

‘Counting chickens and other tales’ Using random effect models and

MCMC estimation in applied statistics research.

Dr William J. Browne

School of Mathematical Sciences

University of Nottingham

Outline

• Background to my research, random effect and multilevel models and MCMC estimation.

• Random effect models for complex data structures including artificial insemination and Danish chicken examples.

• Multivariate random effect models and great tit nesting behaviour.

• Sample size calculations for complex random effect models.

• Multilevel modelling of mastitis incidence.• Other research and future work.

Background

• 1995-1998 – PhD in Statistics, University of Bath.“Applying MCMC methods to multilevel models.”• 1998-2003 – Postdoctoral research positions at the

Centre for Multilevel Modelling at the Institute of Education, London.

• 2003-2006 – Lecturer in Statistics at University of Nottingham.

• 2006- Associate professor of Statistics at University of Nottingham.

Research interests:Multilevel modelling, complex random effect modelling,

applied statistics, Bayesian statistics and MCMC estimation.

Random effect models

• Models that account for the underlying structure in the dataset.• Originally developed for nested structures (multilevel models), for

example in education, pupils nested within schools.• An extension of linear modelling with the inclusion of random effects.A typical 2-level model is

Here i might index pupils and j index schools.Alternatively in another example i might index cows and j index herds.The important thing is that the model and statistical methods used are

the same!

),0(~),,0(~ 22

10

eijuj

ijjijij

NeNu

euxy

Estimation Methods for Multilevel Models

Due to additional random effects no simple matrix formulae exist for finding estimates in multilevel models.

Two alternative approaches exist:1. Iterative algorithms e.g. IGLS, RIGLS, that alternate

between estimating fixed and random effects until convergence. Can produce ML and REML estimates.

2. Simulation-based Bayesian methods e.g. MCMC that attempt to draw samples from the posterior distribution of the model.

One possible computer program to use for multilevel models which incorporates both approaches is MLwiN.

MLwiN

• Software package designed specifically for fitting multilevel models.

• Developed by a team led by Harvey Goldstein and Jon Rasbash at the Institute of Education in London over past 15 years or so. Earlier incarnations ML2, ML3, MLN.

• Originally contained ‘classical’ estimation methods (IGLS) for fitting models.

• MLwiN launched in 1998 also included MCMC estimation.

• My role in the team was as developer of the MCMC functionality in MLwiN in my time at Bath and during 4.5 years at the IOE.

Note: MLwiN core team relocated to Bristol in 2005.

MCMC Algorithm

• Consider the 2-level normal response model

• MCMC algorithms usually work in a Bayesian framework and so we need to add prior distributions for the unknown parameters.

• Here there are 4 sets of unknown parameters:

• We will add prior distributions

),0(~),,0(~ 22

10

eijuj

ijjijij

NeNu

euxy

22 ,,, euu

)(),(),( 22eu ppp

MCMC Algorithm (2)

One possible MCMC algorithm for this model then involves simulating in turn from the 4 sets of conditional distributions. Such an algorithm is known as Gibbs Sampling. MLwiN uses Gibbs sampling for all normal response models.

Firstly we set starting values for each group of unknown parameters,

Then sample from the following conditional distributions, firstly

To get .

)0(2

)0(2

)0()0( ,,, euu

),,,|( 2)0(

2)0()0( euuyp

)1(

MCMC Algorithm (3)

We next sample from

to get , then

to get , then finally

To get . We have then updated all of the unknowns in the model. The process is then simply repeated many times, each time using the previously generated parameter values to generate the next set

),,,|( 2)0(

2)0()1( euyup

)1(u),,,|( 2

)0()1()1(2

eu uyp 2

)1(u),,,|( 2

)1()1()1(2

ue uyp 2

)1(e

Burn-in and estimates

Burn-in: It is general practice to throw away the first n values to allow the Markov chain to approach its equilibrium distribution namely the joint posterior distribution of interest. These iterations are known as the burn-in.

Finding Estimates: We continue generating values at the end of the burn-in for another m iterations. These m values are then averaged to give point estimates of the parameter of interest. Posterior standard deviations and other summary measures can also be obtained from the chains.

So why use MCMC?

• Often gives better (in terms of bias) estimates for non-normal responses (see Browne and Draper, 2006).

• Gives full posterior distribution so interval estimates for derived quantities are easy to produce.

• Can easily be extended to more complex problems as we will see next.

• Potential downside 1: Prior distributions required for all unknown parameters.

• Potential downside 2: MCMC estimation is much slower than the IGLS algorithm.

• For more information see my book: MCMC Estimation in MLwiN – Browne (2003).

Extension 1: Cross-classified modelsFor example, schools by neighbourhoods. Schools will draw pupils from many different neighbourhoods and the pupils of a neighbourhood will go to several schools. No pure hierarchy can be found and pupils are said to be contained within a cross-classification of schools by neighbourhoods:

nbhd 1 nbhd 2 Nbhd 3

School 1 xx x

School 2 x x

School 3 xx x

School 4 x xxx

School S1 S2 S3 S4

Pupil P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12

Nbhd N1 N2 N3

Notation

With hierarchical models we use a subscript notation that has one subscript per level and nesting is implied reading from the left. For example, subscript pattern ijk denotes the i’th level 1 unit within the j’th level 2 unit within the k’th level 3 unit.

If models become cross-classified we use the term classification instead of level. With notation that has one subscript per classification, that also captures the relationship between classifications, notation can become very cumbersome. We propose an alternative notation introduced in Browne et al. (2001) that only has a single subscript no matter how many classifications are in the model.

Single subscript notationSchool S1 S2 S3 S4

Pupil P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12

Nbhd N1 N2 N3

i nbhd(i) sch(i)1 1 12 2 13 1 14 2 25 1 26 2 27 2 38 3 39 3 410 2 411 3 412 3 4

)1()3()(

)2()(0 iischinbhdi euuy

We write the model as

1)3(

4)2(

3011

1)3(

1)2(

101

euuy

euuy

where classification 2 is neighbourhood and classification 3 is school. Classification 1 always corresponds to the classification at which the response measurements are made, in this case pupils. For pupils 1 and 11 equation (1) becomes:

Classification diagrams

School

Pupil

Neighbourhood

School

Pupil

Neighbourhood

Nested structure where schools are contained within neighbourhoods

Cross-classified structure where pupils from a school come from many neighbourhoods and pupils from a neighbourhood attend several schools.

In the single subscript notation we lose information about the relationship (crossed or nested) between classifications. A useful way of conveying this information is with the classification diagram. Which has one node per classification and nodes linked by arrows have a nested relationship and unlinked nodes have a crossed relationship.

Example : Artificial insemination by donor

Women w1 w2 w3 Cycles c1 c2 c3 c4… c1 c2 c3 c4… c1 c2 c3 c4… Donations d1 d2 d1 d2 d3 d1 d2 Donors m1 m2 m3

1901 women279 donors 1328 donations12100 ovulatory cyclesresponse is whether conception occurs in a given cycle

In terms of a unit diagram:

Donor

Woman

Cycle

Donation

Or a classification diagram:

Model for artificial insemination data

),0(~

),0(~

),0(~

)()logit(

),1(~

2)4(

)4()(

2)3(

)3()(

2)2(

)2()(

)4()(

)3()(

)2()(i

uidonor

uidonation

uiwoman

idonoridonationiwomani

ii

Nu

Nu

Nu

uuuX

Binomialy

We can write the model as

2)4(u

0

1

2

3

4

5

6

7

2)2(u

2)3(u

Parameter Description Estimate(se)

intercept -4.04(2.30)

azoospermia 0.22(0.11)

semen quality 0.19(0.03)

womens age>35 -0.30(0.14)

sperm count 0.20(0.07)

sperm motility 0.02(0.06)

insemination to early -0.72(0.19)

insemination to late -0.27(0.10)

women variance 1.02(0.21)

donation variance 0.644(0.21)

donor variance 0.338(0.07)

Results:

Note cross-classified models can be fitted in IGLS but are far easier to fit using MCMC estimation.

Extension 2: Multiple membership models

When level 1 units are members of more than one higher level unit we describe a model for such data as a multiple membership model.

For example,

• Pupils change schools/classes and each school/class has an effect on pupil outcomes.

• Patients are seen by more than one nurse during the course of their treatment.

Notation

),0(~

)2(),0(~

)(

2

2)2(

)2(

)(

)2()2(,

ei

uj

inursejijjiii

Ne

Nu

euwXBy

Note that nurse(i) now indexes the set of nurses that treat patient i and w(2)

i,j is a weighting factor relating patient i to nurse j. For example, with four patients and three nurses, we may have the following weights:

n1(j=1) n2(j=2) n3(j=3)

p1(i=1) 0.5 0 0.5

p2(i=2) 1 0 0

p3(i=3) 0 0.5 0.5

p4(i=4) 0.5 0.5 0

4)2(

2)2(

144

3)2(

3)2(

233

2)2(

122

1)2(

3)2(

111

5.05.0)(

5.05.0)(

1)(

5.05.0)(

euuXBy

euuXBy

euXBy

euuXBy

Here patient 1 was seen by nurse 1 and 3 but not nurse 2 and so on. If we substitute the values of w(2)

i,j , i and j. from the table into (2) we get the series of equations :

Classification diagrams for multiple membership relationships

Double arrows indicate a multiple membership relationship between classifications.

patient

nurseWe can mix multiple membership, crossed and hierarchical structures in a single model.

patient

nurse

hospital

GP practice

Here patients are multiple members of nurses, nurses are nested within hospitals and GP

practice is crossed with both nurse and hospital.

Example involving nesting, crossing and multiple membership – Danish chickens

Production hierarchy10,127 child flocks 725 houses 304 farms

Breeding hierarchy10,127 child flocks200 parent flocks

farm f1 f2… Houses h1 h2 h1 h2 Child flocks c1 c2 c3… c1 c2 c3…. c1 c2 c3…. c1 c2 c3…. Parent flock p1 p2 p3 p4 p5….

Child flock

House

Farm

Parent flock

As a unit diagram: As a classification diagram:

Model and results

),0(~

),0(~),0(~

)()logit(

),1(~

2)4(

)4()(

2)3(

)3()(

2)2(

)2(

)(.

)4()(

)3()(

)2()2(,i

uifarm

uihouseuj

iflockpjiifarmihousejjii

ii

Nu

NuNu

euuuwXB

Binomialy

0

1

2

3

4

5

2)2(u

2)3(u

2)4(u

Parameter Description Estimate(se)

intercept -2.322(0.213)

1996 -1.239(0.162)

1997 -1.165(0.187)

hatchery 2 -1.733(0.255)

hatchery 3 -0.211(0.252)

hatchery 4 -1.062(0.388)

parent flock variance 0.895(0.179)

house variance 0.208(0.108)

farm variance 0.927(0.197)

Results:

Response is cases of salmonella

Note multiple membership models can be fitted in IGLS and this model/dataset represents roughly the most complex model that the method can handle.

Such models are far easier to fit using MCMC estimation.

Random effect modelling of great tit nesting behaviour

• An extension of cross-classified models to multivariate responses.

• Collaborative research with Richard Pettifor (Institute of Zoology, London), and Robin McCleery and Ben Sheldon (University of Oxford).

Wytham woods great tit dataset

• A longitudinal study of great tits nesting in Wytham Woods, Oxfordshire.

• 6 responses : 3 continuous & 3 binary. • Clutch size, lay date and mean

nestling mass.• Nest success, male and female

survival.• Data: 4165 nesting attempts over a

period of 34 years. • There are 4 higher-level classifications

of the data: female parent, male parent, nestbox and year.

Data background

Source Numberof IDs

Median#obs

Mean#obs

Year 34 104 122.5

Nestbox 968 4 4.30

Male parent 2986 1 1.39

Female parent 2944 1 1.41

Note there is very little information on each individual male and female bird but we can get some estimates of variability via a random effects model.

The data structure can be summarised as follows:

Diagrammatic representation of the dataset.

Nest-box

Year

1

2

3

4

5

6

70

♀16♂16

♀17♂17

♀15♂18

♀18♂11

69

♀14♂14

♀13♂15

♀15♂11

68

♀11♂10

♀13♂13

♀12♂11

67

♀11♂10

♀12♂11

♀10♂12

66

♀8♂8

♀9♂6

♀10♂9

65

♀5♂1

♀6♂5

♀7♂6

♀3♂7

64

♀1♂1

♀2♂2

♀3♂3

♀4♂4

Univariate cross-classified random effect modelling

• For each of the 6 responses we will firstly fit a univariate model, normal responses for the continuous variables and probit regression for the binary variables. For example using notation of Browne et al. (2001) and letting response yi be clutch size:

)2,0(~

),2)5(

,0(~)5()(

),2)4(

,0(~)4()(

),2)3(

,0(~)3()(

),2)2(

,0(~)2()(

)5()(

)4()(

)3()(

)2()(

eN

ie

uN

iyearu

uN

inestboxu

uN

ifemaleu

uN

imaleu

ie

iyearu

inestboxu

ifemaleu

imaleu

iy

Estimation

• We use MCMC estimation in MLwiN and choose ‘diffuse’ priors for all parameters.

• We run 3 MCMC chains from different starting points for 250k iterations each (500k for binary responses) and use the Gelman-Rubin diagnostic to decide burn-in length.

• We compared results with the equivalent classical model using the Genstat software package and got broadly similar results.

• We fit all four higher classifications and do not consider model comparison.

Clutch SizeParameter Estimate (S.E.) Percentage

variance β 8.808 (0.109) -

2)5(u (Year) 0.365 (0.101) 14.3%

2)4(u (Nest box) 0.107 (0.026) 4.2%

2)3(u (Male) 0.046 (0.043) 1.8%

2)2(u (Female) 0.974 (0.062) 38.1%

2e (Observation) 1.064 (0.055) 41.6%

Here we see that the average clutch size is just below 9 eggs with large variability between female birds and some variability between years. Male birds and nest boxes have less impact.

Lay Date (days after April 1st)Parameter Estimate (S.E.) Percentage

variance β 29.38 (1.07) -

2)5(u (Year) 37.74 (10.08) 50.3%

2)4(u (Nest box) 3.38 (0.56) 4.5%

2)3(u (Male) 0.22 (0.39) 0.3%

2)2(u (Female) 8.55 (1.03) 11.4%

2e (Observation) 25.10 (1.04) 33.5%

Here we see that the mean lay date is around the end of April/beginning of May. The biggest driver of lay date is the year which is probably indicating weather differences. There is some variability due to female birds but little impact of nest box and male bird.

Nestling MassParameter Estimate (S.E.) Percentage

variance β 18.829 (0.060) -

2)5(u (Year) 0.105 (0.032) 9.0%

2)4(u (Nest box) 0.026 (0.013) 2.2%

2)3(u (Male) 0.153 (0.030) 13.1%

2)2(u (Female) 0.163 (0.031) 14.0%

2e (Observation) 0.720 (0.035) 61.7%

Here the response is the average mass of the chicks in a brood at 10 days old. Note here lots of the variability is unexplained and both parents are equally important.

Human example

Sarah Victoria Browne

Born 20th July 2004

Birth Weight 6lb 6oz

Helena Jayne Browne

Born 22nd May 2006

Birth Weight 8lb 0oz

Father’s birth weight 9lb 13oz, Mother’s birth weight 6lb 8oz

Nest SuccessParameter Estimate (S.E.) Percentage

overdispersion β 0.010 (0.080) -

2)5(u (Year) 0.191 (0.058) 56.0%

2)4(u (Nest box) 0.025 (0.020) 7.3%

2)3(u (Male) 0.065 (0.054) 19.1%

2)2(u (Female) 0.060 (0.052) 17.6%

Here we define nest success as one of the ringed nestlings captured in later years. The value 0.01 for β corresponds to around a 50% success rate. Most of the variability is explained by the Binomial assumption with the bulk of the over-dispersion mainly due to yearly differences.

Male SurvivalParameter Estimate (S.E.) Percentage

overdispersion β -0.428 (0.041) -

2)5(u (Year) 0.032 (0.013) 41.6%

2)4(u (Nest box) 0.006 (0.006) 7.8%

2)3(u (Male) 0.025 (0.023) 32.5%

2)2(u (Female) 0.014 (0.017) 18.2%

Here male survival is defined as being observed breeding in later years. The average probability is 0.334 and there is very little over-dispersion with differences between years being the main factor. Note the actual response is being observed breeding in later years and so the real probability is higher!

Female survivalParameter Estimate (S.E.) Percentage

overdispersion β -0.302 (0.048) -

2)5(u (Year) 0.053 (0.018) 36.6%

2)4(u (Nest box) 0.065 (0.024) 44.8%

2)3(u (Male) 0.014 (0.017) 9.7%

2)2(u (Female) 0.013 (0.014) 9.0%

Here female survival is defined as being observed breeding in later years. The average probability is 0.381 and again there isn’t much over-dispersion with differences between nestboxes and years being the main factors.

Multivariate modelling of the great tit dataset

• We now wish to combine the six univariate models into one big model that will also account for the correlations between the responses.

• We choose a MV Normal model and use latent variables (Chib and Greenburg, 1998) for the 3 binary responses that take positive values if the response is 1 and negative values if the response is 0.

• We are then left with a 6-vector for each observation consisting of the 3 continuous responses and 3 latent variables. The latent variables are estimated as an additional step in the MCMC algorithm and for identifiability the elements of the level 1 variance matrix that correspond to their variances are constrained to equal 1.

Multivariate Model

),0(~

),)5(

,0(~)5()(

),)4(

,0(~)4()(

),)3(

,0(~)3()(

),)2(

,0(~)2()(

)5()(

)4()(

)3()(

)2()(

eMVN

ie

uMVN

iyearu

uMVN

inestboxu

uMVN

ifemaleu

uMVN

imaleu

ie

iyearu

inestboxu

ifemaleu

imaleu

iy

05.000000

005.00000

0005.0000

0002.000

0000150

000005.0

)(

],)(*6,6[6~))((

iuS

iuSinvWisharti

up

Here the vector valued response is decomposed into a mean vector plus random effects for each classification.

Inverse Wishart priors are used for each of the classification variance matrices. The values are based on considering overall variability in each response and assuming an equal split for the 5 classifications.

Use of the multivariate model

• The multivariate model was fitted using an MCMC algorithm programmed into the MLwiN package which consists of Gibbs sampling steps for all parameters apart from the level 1 variance matrix which requires Metropolis sampling (see Browne 2006).

• The multivariate model will give variance estimates in line with the 6 univariate models.

• In addition the covariances/correlations at each level can be assessed to look at how correlations are partitioned.

Partitioning of covariances

a

Laying date

Clu

tch

siz

e

Raw phenotypic correlation

(1) Causal environmental effect

(2) Female condition effect

[partitioning ofcovariance

at different levels]

Laying date

Clu

tch

Siz

e

(i) Territory

(ii) Female

(iii) Observation

Laying date

Clu

tch

Siz

e

Laying date

Clu

tch

Siz

e

Laying date

Clu

tch

Siz

e

Laying date

Clu

tch

Siz

e

Laying date

Clu

tch

Siz

e

b

Fecundity

Su

rviv

al

partitioning ofcovariance atdifferent levels

Raw phenotypic correlation

Fecundity

Sur

viva

l

Fecundity

Sur

viva

l

(i) Female

(ii) Observation

Correlations from a 1-level model• If we ignore the structure of the data and consider it as 4165

independent observations we get the following correlations:

CS LD NM NS MS

LD -0.30 X X X X

NM -0.09 -0.06 X X X

NS 0.20 -0.22 0.16 X X

MS 0.02 -0.02 0.04 0.07 X

FS -0.02 -0.02 0.06 0.11 0.21

Note correlations in bold are statistically significant i.e. 95% credible interval doesn’t contain 0.

Correlations in full modelCS LD NM NS MS

LD N, F, O

-0.30

X X X X

NM F, O

-0.09

F, O

-0.06

X X X

NS Y, F

0.20

N, F, O

-0.22

O

0.16

X X

MS -

0.02

-

-0.02

-

0.04

Y

0.07

X

FS F, O

-0.02

F, O

-0.02

-

0.06

Y, F

0.11

Y, O

0.21

Key: Blue +ve, Red –ve: Y – year, N – nestbox, F – female, O - observation

Pairs of antagonistic covariances at different classifications

There are 3 pairs of antagonistic correlations i.e. correlations with different signs at different classifications:

LD & NM : Female 0.20 Observation -0.19Interpretation: Females who generally lay late, lay heavier

chicks but the later a particular bird lays the lighter its chicks.

CS & FS : Female 0.48 Observation -0.20Interpretation: Birds that lay larger clutches are more likely

to survive but a particular bird has less chance of surviving if it lays more eggs.

LD & FS : Female -0.67 Observation 0.11Interpretation: Birds that lay early are more likely to survive

but for a particular bird the later they lay the better!

Sample size calculations in random effect models

• A current ESRC grant (2006-2009) that funds a postdoc (Mousa Golalizadeh).

• The grant will consider the problem of deciding on how much data to collect for a research question taking account of the likely structure in the collected data (See later slides).

• The grant will also investigate how various MCMC algorithm developments perform in practice when applied to real datasets.

• Finally we will investigate when complex models are identifiable in the presence of ‘sparse’ data.

Background

• Many quantitative research questions are of the form of a hypothesis – A has a significant effect on B.

• To answer such a question data is collected that allows the researcher to (hopefully) test whether statistically A has a significant effect on B. (In fact we aim to reject the hypothesis that A doesn’t significantly affect B).

• A test is performed and either the researcher is happy and A indeed has a significant effect on B or is left wondering why the data collected do not back up their hypothesis. Is the hypothesis false or was the data not sufficient?

• The sufficiency of the data is the motivation for sample size calculations.

Example

• Suppose I have the research question ‘Are Welshmen on average taller than 175 cms?’

• I now need to get hold of a random sample of n Welshmen and measure each of their heights.

• I make some statistical assumption about the distribution of the heights of Welshmen e.g. that they come from a Normal distribution.

• I might like to check this assumption by plotting a histogram of the data.

• I can then form a statistical hypothesis test and test whether indeed Welshmen are taller than 175cms.

• I need to decide how big to make n, my sample of Welshmen.

Hypothesis Testing

• Let us assume our null hypothesis is that the average height of Welshmen (μ) is 175cm.

• So we test H0:μ=175 vs HA:μ>175 (or alternatively H0:θ=0 vs HA:θ>0 where θ=μ-175)

• In practice we calculate from our sample its mean ( ) and standard deviation (s2) and use these along with n to form a test statistic which we can compare with the distribution assumed under H0

x

Type I and Type II errors• No hypothesis test is perfect and there is always the possibility of

errors

• P(Type I error) = α = significance level or size• P(Type II error) = β, 1-β is the power of the test. • In general we fix α to some value e.g. 0.05, 0.01 then 1-β depends on

our sample size.

Truth

H0 True H0 False

Decision

Reject H0 Type I error Correct

Accept H0 Correct Type II error

Example hypothesis test

• Let us assume that in reality our sample mean is 180cms and the population standard deviation (sd) is 5cms (known).

• We can then form a test statistic as follows:

• Note here that for small n and unknown sd we should use a student-t distribution rather than Normal.

• For a 1-sided Z test we wish Z= > 1.645 and so we need our sample to be of size 3 to reject H0, using a student-t distribution increases this to 5. (Here α=0.05)

• However if the sample mean had been only 176cms then we would need n > (1.645*5)2 = 68 Welshmen to reject H0

n

)1,0(~5

5175N

nn

XZ

Power calculations• Our last slide in some sense is backwards as we cannot get

from a given sample mean to choosing a sample size!• What we do instead is use different terminology and play

God!• We will choose an ‘effect size’, γ which will represent a

guess at the increase in the sample mean for Welshmen. • There then exists an (approximate) formula that links four

quantities, size (α), power (1-β), effect size (γ) and sample size (n)

• Note that the standard error (SE) of γ is a function of n and σ the population sd which is assumed known.

• We can now evaluate one of these quantities conditional on the others e.g. what sample size is required given α,1-β and γ?

11)(

zzSE

Here RHS is sum of cases H0 true and H0 false.

Welsh height exampleHere we have looked at two examples with effect sizes 5 and 1

respectively. Assume σ takes the value 5 and so let us suppose we take a sample of size 25 Welshmen.

Then

Case 1: 5/(5/√25)=1.645+z1-β,z1-β=3.355β=0.9996

Case 2: 1/(5/ √25)=1.645+z1-β,z1-β=-0.645β=0.25946

So here a sample of 25 Welshmen from a population with mean 180cms would almost always result in rejecting H0,

but if the population mean is 176cms then only 26% of such samples would be rejected.

We can plot curves of how power increases with sample size as shown in the next slide.

11)(

zzSE

Power curve for Welshmen example

Here we see the two power curves for the two scenarios:

Extending the idea

• The simple formula can

be used in many situations and hypothesis tests.• To generalise the idea we assume that γ is an effect

size associated with a statistic that we wish to compare with a (null) hypothesized value of 0.

• The complication occurs in finding a formula for the standard error for the statistic and relating this formula to the sample size, n.

• We will next consider an alternative approach before returning to look at how the above approach extends to multilevel models.

11)(

zzSE

The use of simulation

• In reality our (hoped for) research path will be as follows:Construct research question -> Form null hypothesis that

we believe false -> Collect appropriate data -> Reject hypothesis therefore proving our research question.

• Assuming what we believe in our research question is correct and hence null hypothesis is false we can still be let down by not collecting enough data.

• The idea behind using simulation is to simulate the data gathering process (assuming we know the right answer) many times and see how often we can reject the null hypothesis. The percentage of rejected null hypotheses (via simulation) will then estimate power.

Simulation in our example

• Consider our Welsh height example case 2 where we believe Welshmen have a mean height of 176cms (and sd = 5cms) and we are testing the hypothesis H0:μ=175cms, and we consider a sample size 25.

• Then we generate N samples (e.g. 5000) of size 25,

• and for each sample form a lower bound for the confidence interval of the form

• . This we compare with the value 175 and the proportion greater than 175 is an estimate of the power of the test.

• We can repeat this exercise for different sample sizes and form a power curve.

).(.645.1 ii xESx

NiNxi ,...,1),5,176(~ 2

Power curve comparisonNote simulation curve is a good approximation of the theoretical curve although there are some minor (Monte Carlo) errors even with 5000 simulations per sample size.

Advantages/Disadvantages

• Theoretical approach is quick when the formula can be derived.

• Approximations for more complex situations exist which are equally quick.

• Simulation approach generalizes to more situations but is much slower and we may need large numbers of simulations per scenario to get accurate power estimates.

What happens with multilevel data?

We will here mainly consider 2-level models and take as our application area education, so we have students nested within schools.

When deciding on a sampling scheme we have many choices:

• How many schools, N ?• How many pupils per school, nj ?• Should we collect the same size sample from

each school ?Our decision will depend on which parameter we

wish to estimate in the model.

Education Example • For motivation we considered a two level dataset with exam marks

measured for each student in a collection of schools. In fact this dataset exists and has 4915 students in 96 schools.

• Our hypothesis of interest is that the exam mark for an average student is > 20 (null hypothesis = 20) which with such a large sample results in the null hypothesis being rejected for our particular data.

• If we fit the following multilevel model to the data we get the estimates given:

• If we treat these estimates as population values, we are interested in what power for testing our hypothesis results from various combinations of N and nj

367.139,205.16,685.21

),0(~),,0(~22

0

22

0

eu

eijuj

ijjij

NeNu

euy

Design effect formula• If we assume balance then with n pupils in each of N schools for our

simple model (and only this simple model) the following formula holds:• Design effect = 1 + (n-1)ρ where ρ is the intra-class correlation. • So if we know the simple random sample size required for a given power

we need to multiply this by the design effect.• For example our data has ρ=16.205/(16.205+139.367)=0.104• So for schools of size 10 pupils we would need 1+9*0.104=1.94 times as

many students (in total) to get the same power.• For this model (and this model only) we could therefore perform our

power calculations assuming simple random sampling from a population with variance 155.572 and scale up the sample required based on the design.

• So

• And for schools of size 10 we require 1.94*338.4=657 pupils which we can round up to 66 schools.

4.33884.0645.1685.1

572.155 n

n

Simulating multilevel designs

• The process here is similar to the earlier example except that we need to simulate from a multilevel model and fit the models using MLwiN (Rasbash, Browne et al. 2000).

• To this end we will write macro code in the MLwiN macro language to perform the task.

• The MLwiN macro language allows datasets to be simulated, models to be set up and run using various algorithms and results collected.

• It has the advantage of performing all the operations in one package but programming in the macro language is not for the faint hearted!

Simulation continued• We will perform simulations for schools of 10 pupils where number

of schools (N) ranges from 5 to 70. For each N, 5000 datasets are generated.

• For each dataset we need to generate 10*N level 1 residuals with variance 139.367, N level 2 residuals with variance 16.205 and add these residuals up correctly with the fixed effect estimate 21.685.

• MLwiN has commands to generate random Normally distributed observations but also has a SIMU command which given a model is set up and estimates given will simulate from it directly making life easier.

• For each simulated dataset we fit the variance components model using the RIGLS algorithm. For small numbers of level 2 units we may have estimation difficulties but MLwiN has an ERROR 0 command which simply ignores such problems.

• Note it is also important to ensure the command BATCH 1 is included else MLwiN may only run RIGLS for 1 iteration for each model!!

Comparison of formula/simulations

• The following graph compares the design effect formula to the simulation approach:

Multilevel mastitis modelling!

• Martin Green has been successful in obtaining 4 years of funding from Wellcome to come and work with me.

• The project is entitled ‘Use of Bayesian statistical methods to investigate farm management strategies, cow traits and decision-making in the prevention of clinical and sub-clinical mastitis in dairy cows.’

• Martin is a specialist farm animal veterinary surgeon and has recently also been appointed to a chair in Nottingham’s new vet school.

• He completed a PhD in 2004 at the University of Warwick in veterinary epidemiology and used MCMC to fit binary response multilevel models in both MLwiN and WinBUGS to look at various factors that affect clinical mastitis in dairy cows.

Wellcome Fellowship

• In the 4 years of the grant we are fitting random effect models to a large dataset that Martin has been collecting in an earlier Milk Development Council funded grant.

• In particular we are looking at how farm management practices, environmental conditions and cow characteristics influence the risk of mastitis during the dry period.

• We aim to get both interesting applied results and also some novel statistical methodology from the grant and MCMC will again play a big part.

• From a statistical point of view we are looking at model fit diagnostics and model comparison in binary response random effect models.

Other applications

• Current PhD student projects:Kelly Handley : Statistical Analysis of Mass

Spectrometry data.Chris Brignell : Statistical Shape Analysis in

Brain Imaging.• Various MMath /BSc. projects looking at

applications of multilevel modelling of share prices, educational data, house prices and disease mapping.

Future Work

• Co-Investigator on BBSRC grant application (joint between Nottingham and Bristol vet schools) entitled ‘An investigation of Molecular Characteristics, Infection Patterns and Prevention of E. Coli and S. uberis Mastitis in UK Dairy Herds.’ (Main investigators Andrew Bradley & Martin Green).

• Have been investigating MCMC algorithms for ‘structured multivariate Normal models’ which are what in reality the IGLS algorithm fits. This model family also includes multilevel time series models. I have been investigating these models with an MMath. student and I will have a visitor from Turkey who has government funding to visit me and investigate the models.

• I am a named collaborator on the ESRC research node held by Kelvyn Jones and the multilevel team in Bristol.

References• Browne, W.J. (2003). MCMC Estimation in MLwiN. London: Institute

of Education, University of London.• Browne, W.J. (2006). MCMC Estimation of ‘constrained’ variance

matrices with applications in multilevel modelling. Computational Statistics and Data Analysis. 50: 1655-1677.

• Browne, W.J. and Draper D. (2006). A Comparison of Bayesian and likelihood methods for fitting multilevel models (with discussion). Bayesian Analysis. 1: 473-550.

• Browne, W.J., Goldstein, H. and Rasbash, J. (2001). Multiple membership multiple classification (MMMC) models. Statistical Modelling 1: 103-124.

• Chib, S. and Greenburg, E. (1998). Analysis of multivariate probit models. Biometrika 85, 347-361.

• Rasbash, J., Browne, W.J., Goldstein, H., Yang, M., Plewis, I., Healy, M., Woodhouse, G.,Draper, D., Langford, I., Lewis, T. (2000). A User’s Guide to MLwiN, Version 2.1, London: Institute of Education, University of London.

counting chickens and other tales using random effect models and mcmc estimation in applied...

Documents

random effect models

mcmc estimation

mcmc methods

fitting multilevel models

complex random effect

fitting models

use mcmc

mcmc functionality