Download - Andrew Thomson on Generalised Estimating Equations (and simulation studies)

Andrew Thomsonon

Generalised Estimating Equations

(and simulation studies)

Topics Covered

• What are GEE?

• Relationship with robust standard errors

• Why they are not as complicated as they appear

• How does simulation answer (or not) the differences between different GEE approaches

Issues…

• My results are questionable (thanks to Richard…)

• Not shown in their entirety

• But – Agree with other studies

• Fixed cluster size is definitely correct

A simple example

• Consider simple uncorrelated linear regression , e.g. height on weight

• Minimize sum of squares

iixy

10

2

10)(iiyx

Simple example II

• Differentiate wrt each parameter and set = 0

• In general if we have p covariates then minimizing ss is the same as solving p estimating equations

0)(2

0))(1(2

10

10

iii

ii

yxx

yx

Extensions

• Non-linear regression (logistic)

• Weighting, based on the correlation of the results

matrix ncorrelatio workinga is

and

where

)(

)( 2

1

2

1

1

R

ARAV

SVD

j

jj

J

j

T

j

Surprisingly – Not that bad

• For each cluster, Dj is a 2 x mij matrix

control

0 ... 0

)1(

1......

)1(

1

0000

IV

)1(

1......

)1(

1

)1(

1......

)1(

1

1111

1111

• A is an mij x mij matrix with diagonal elements

• Independence – Identity matrix

• Exchangeable. 1s on the diagonal, rho everywhere else

• Unadjusted studies -

)1(ii

choices common 2 )(R

vector an 2)( xmYSijjj

)()1( RViij

So what is DjTVj ?

• Independence – Control

• Independence - IV

• Exch Control

• Exch IV

0...0

1...1

1...1

1...1

0...0

)11...)11

(m (m ijij

(m (m

(m (m

ijij

ijij

)11...)11

)11...)11

Missing Out Some Algebra

• Independence. Estimate

• And estimate OR as

• Exch -

ijm

O as

0̂

ijm

O as

1̂

1

1

0

0

ˆ1ˆ

ˆ1ˆ

ijij

ij

i mm

Om

))1(1(

))1(1(ˆ

as

Simple Interpretation

• Independence gives equal weight to each observation

• Exchangeable gives weight proportional to the variance (measured by rho)

• No obvious working correlation matrix which gives equal weight to each cluster

Note on Simulation

• Used to make inference about methods behaviour when unclear as to theoretical properties

• Simulator has choice over– Parameters varied– Output measured

• These should answer relevant questions

Relevance for simulation studies

• Equal cluster sizes give the same point estimate

• Any potential benefits of one approach over the other in terms of precision (measured by MSE) cannot be found

• Simulation studies should always consider the variable cluster size case

Unadjusted studies

• What outcome (OR, RR, RD) are we interested in measuring?

• What weights do we use for each cluster?

• Does the estimating procedure e.g. confidence interval construction have the right size?

Estimating the Variance

• Done using robust standard errors

• F is a matrix which depends on V and D

• is estimated by

• Independence is identical to robust standard errors

• Criticism of GEE is also criticism of RSE

FYCovFj

))((

)(j

YCov j

T

jSS

Problems and solutions

• is biased downwards for small samples (< 40 clusters) p-values too small

• We “know” what this bias is (function of D and V). Lets call it H

• We replace with • Basically changing the filling of our

sandwich

j

T

jSS

j

T

jSS 11 HSSH

j

T

j

C.I Construction

1. Wald Testa) Independence

b) Exchangeable

c) Bias Corrected

2. Score Test (adjusted score test) Evaluate score equations at H0 obtain a χ2 statistic.

More on the score test

• Score test is conservative

• Using bias correction will make it worse

• Multiply χ2 statistic by J / (J-1)

• CI construction is done using the bisection algorithm

Results! - Size (5% Nominal)

4-6 clusters 15-20 clusters

Naïve 12% 12%

Ind 11% 9.5%

Exch 9% 8%

B.C. 7.5% 7%

Adj. Score 5.2% 5%

Power

• H0 is not true.

• Simulation studies tend to use beta-binomial distribution to simulate

• Common rho (?)

• If size is above nominal, power will e inflated as well. If they have the same size, does MSE have an effect?

Power results

• In general above nominal.

• Due to incorrect size

• Naïve > Ind > Exch > B.C = Score

• This result is expected and surprising at the same time. Score and B.C actually attain the nominal level

• Considered later

Adjusted studies

• Very few have been done ( 2.5)

• Beta – binomial distribution is not amenable to including covariates

• Cluster level covariate – same argument applies for the fixed / variable cluster size issue

• Results are identical

Why is the adjusted score powerful?

1. The score test is just better

2. Power is based on p-values, rather than C.Is. Containing 1. It is possible to have a p-value that is significant but the confidence interval contains 1

3. Score statistic not derived for all data sets due to model fitting

Fitting the models

• R – various libraries (gee, geese, geepack). No score test. Crashes

• STATA – xtgee – no score test

• SAS – Proc Genmod. Score test. No score test CI construction

• S-Plus – code from authors (allegedly)

Convergence

• Depends on number of clusters

• 15 – 20 clusters 100% convergence

• 10 clusters 99.7% convergence

• 4 – 6 clusters 99% convergence

• Score test – lose even more in SAS

• 15 – 20 clusters lose another 0.5%

• 4 – 6 clusters lose another 1%

Conclusions

• If you wish to use GEE then the adjusted score test is the (only?) appropriate way for a small number of clusters

• This is perhaps questionable

• The most complicated model to fit in terms of code.

What Should Simulation Do?

• Reflect what you’ll see in practice– Variable cluster size– Include individual level covariates (ideally

imbalanced)

• Look not only at size but power (and coverage)

• Measure MSE for no IV cases• Sensitivity to departures from assumptions

Number of Studies that do this

• 0

• Mine does.

• Perhaps ‘luck’ rather than judgement

• Designed it 2 years ago

• Decided 2 months ago that it was actually quite good

‘Luck’

• 1 supervisor, 2 advisors

• One advisor suggested MSE

• The other was adamant I did sensitivity analysis

• Richard obviously made outstanding contribution.

• Something of a consortium approach

Data sharing

• Given this – might be useful to have data files available online

• Use these for any further analysis methods that may become available

• Server space? Interactivity?

• Results?

Thank You

Download - Andrew Thomson on Generalised Estimating Equations (and simulation studies)

Top Related