andrew thomson on generalised estimating equations (and simulation studies)

31
Andrew Thomson on Generalised Estimating Equations (and simulation studies)

Upload: jace

Post on 26-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Andrew Thomson on Generalised Estimating Equations (and simulation studies). Topics Covered. What are GEE? Relationship with robust standard errors Why they are not as complicated as they appear How does simulation answer (or not) the differences between different GEE approaches. Issues…. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Andrew Thomsonon

Generalised Estimating Equations

(and simulation studies)

Page 2: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Topics Covered

• What are GEE?

• Relationship with robust standard errors

• Why they are not as complicated as they appear

• How does simulation answer (or not) the differences between different GEE approaches

Page 3: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Issues…

• My results are questionable (thanks to Richard…)

• Not shown in their entirety

• But – Agree with other studies

• Fixed cluster size is definitely correct

Page 4: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

A simple example

• Consider simple uncorrelated linear regression , e.g. height on weight

• Minimize sum of squares

iixy

10

2

10)(iiyx

Page 5: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Simple example II

• Differentiate wrt each parameter and set = 0

• In general if we have p covariates then minimizing ss is the same as solving p estimating equations

0)(2

0))(1(2

10

10

iii

ii

yxx

yx

Page 6: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Extensions

• Non-linear regression (logistic)

• Weighting, based on the correlation of the results

matrix ncorrelatio workinga is

and

where

)(

)( 2

1

2

1

1

R

ARAV

SVD

j

jj

J

j

T

j

Page 7: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Surprisingly – Not that bad

• For each cluster, Dj is a 2 x mij matrix

control

0 ... 0

)1(

1......

)1(

1

0000

IV

)1(

1......

)1(

1

)1(

1......

)1(

1

1111

1111

Page 8: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

• A is an mij x mij matrix with diagonal elements

• Independence – Identity matrix

• Exchangeable. 1s on the diagonal, rho everywhere else

• Unadjusted studies -

)1(ii

choices common 2 )(R

vector an 2)( xmYSijjj

)()1( RViij

Page 9: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

So what is DjTVj ?

• Independence – Control

• Independence - IV

• Exch Control

• Exch IV

0...0

1...1

1...1

1...1

0...0

)11...)11

(m (m ijij

(m (m

(m (m

ijij

ijij

)11...)11

)11...)11

Page 10: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Missing Out Some Algebra

• Independence. Estimate

• And estimate OR as

• Exch -

ijm

O as

ijm

O as

1

1

0

0

ˆ1ˆ

ˆ1ˆ

ijij

ij

i mm

Om

))1(1(

))1(1(ˆ

as

Page 11: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Simple Interpretation

• Independence gives equal weight to each observation

• Exchangeable gives weight proportional to the variance (measured by rho)

• No obvious working correlation matrix which gives equal weight to each cluster

Page 12: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Note on Simulation

• Used to make inference about methods behaviour when unclear as to theoretical properties

• Simulator has choice over– Parameters varied– Output measured

• These should answer relevant questions

Page 13: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Relevance for simulation studies

• Equal cluster sizes give the same point estimate

• Any potential benefits of one approach over the other in terms of precision (measured by MSE) cannot be found

• Simulation studies should always consider the variable cluster size case

Page 14: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Unadjusted studies

• What outcome (OR, RR, RD) are we interested in measuring?

• What weights do we use for each cluster?

• Does the estimating procedure e.g. confidence interval construction have the right size?

Page 15: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Estimating the Variance

• Done using robust standard errors

• F is a matrix which depends on V and D

• is estimated by

• Independence is identical to robust standard errors

• Criticism of GEE is also criticism of RSE

FYCovFj

))((

)(j

YCov j

T

jSS

Page 16: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Problems and solutions

• is biased downwards for small samples (< 40 clusters) p-values too small

• We “know” what this bias is (function of D and V). Lets call it H

• We replace with • Basically changing the filling of our

sandwich

j

T

jSS

j

T

jSS 11 HSSH

j

T

j

Page 17: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

C.I Construction

1. Wald Testa) Independence

b) Exchangeable

c) Bias Corrected

2. Score Test (adjusted score test) Evaluate score equations at H0 obtain a χ2 statistic.

Page 18: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

More on the score test

• Score test is conservative

• Using bias correction will make it worse

• Multiply χ2 statistic by J / (J-1)

• CI construction is done using the bisection algorithm

Page 19: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Results! - Size (5% Nominal)

4-6 clusters 15-20 clusters

Naïve 12% 12%

Ind 11% 9.5%

Exch 9% 8%

B.C. 7.5% 7%

Adj. Score 5.2% 5%

Page 20: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Power

• H0 is not true.

• Simulation studies tend to use beta-binomial distribution to simulate

• Common rho (?)

• If size is above nominal, power will e inflated as well. If they have the same size, does MSE have an effect?

Page 21: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Power results

• In general above nominal.

• Due to incorrect size

• Naïve > Ind > Exch > B.C = Score

• This result is expected and surprising at the same time. Score and B.C actually attain the nominal level

• Considered later

Page 22: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Adjusted studies

• Very few have been done ( 2.5)

• Beta – binomial distribution is not amenable to including covariates

• Cluster level covariate – same argument applies for the fixed / variable cluster size issue

• Results are identical

Page 23: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Why is the adjusted score powerful?

1. The score test is just better

2. Power is based on p-values, rather than C.Is. Containing 1. It is possible to have a p-value that is significant but the confidence interval contains 1

3. Score statistic not derived for all data sets due to model fitting

Page 24: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Fitting the models

• R – various libraries (gee, geese, geepack). No score test. Crashes

• STATA – xtgee – no score test

• SAS – Proc Genmod. Score test. No score test CI construction

• S-Plus – code from authors (allegedly)

Page 25: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Convergence

• Depends on number of clusters

• 15 – 20 clusters 100% convergence

• 10 clusters 99.7% convergence

• 4 – 6 clusters 99% convergence

• Score test – lose even more in SAS

• 15 – 20 clusters lose another 0.5%

• 4 – 6 clusters lose another 1%

Page 26: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Conclusions

• If you wish to use GEE then the adjusted score test is the (only?) appropriate way for a small number of clusters

• This is perhaps questionable

• The most complicated model to fit in terms of code.

Page 27: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

What Should Simulation Do?

• Reflect what you’ll see in practice– Variable cluster size– Include individual level covariates (ideally

imbalanced)

• Look not only at size but power (and coverage)

• Measure MSE for no IV cases• Sensitivity to departures from assumptions

Page 28: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Number of Studies that do this

• 0

• Mine does.

• Perhaps ‘luck’ rather than judgement

• Designed it 2 years ago

• Decided 2 months ago that it was actually quite good

Page 29: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

‘Luck’

• 1 supervisor, 2 advisors

• One advisor suggested MSE

• The other was adamant I did sensitivity analysis

• Richard obviously made outstanding contribution.

• Something of a consortium approach

Page 30: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Data sharing

• Given this – might be useful to have data files available online

• Use these for any further analysis methods that may become available

• Server space? Interactivity?

• Results?

Page 31: Andrew Thomson on Generalised Estimating Equations  (and simulation studies)

Thank You