‘interpreting coefficients from longitudinal models’

41
‘Interpreting coefficients from longitudinal models’ Professor Vernon Gayle and Dr Paul Lambert (Stirling University) Wednesday 1st April 2009

Upload: marli

Post on 08-Jan-2016

25 views

Category:

Documents


1 download

DESCRIPTION

‘Interpreting coefficients from longitudinal models’. Professor Vernon Gayle and Dr Paul Lambert (Stirling University) Wednesday 1st April 2009. Structure of this Session. Briefly Mention Change Score Models Transition (table etc) Repeated Cross-Sectional Data Duration Models - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ‘Interpreting coefficients from longitudinal models’

‘Interpreting coefficients from longitudinal models’

Professor Vernon Gayle and Dr Paul Lambert(Stirling University)

Wednesday 1st April 2009

Page 2: ‘Interpreting coefficients from longitudinal models’

Structure of this Session

Briefly Mention• Change Score Models• Transition (table etc) • Repeated Cross-Sectional Data

• Duration Models

• Panel Models

Page 3: ‘Interpreting coefficients from longitudinal models’

Yi 2 - Yi 1 = ’(Xi2-Xi1) + (i2 - i1)

Change in Score (first difference model)

Here the ’ is simply a regression on the difference or change in scores

The panel fixed effects linear model is a special case of the change score model

This modelling approach identifies on switcher!

Page 4: ‘Interpreting coefficients from longitudinal models’

Transitions• Historically, social mobility tables

• Large literature on log-linear models

• Essentially cross-sectional models are fitted

• Care is required if b is essentially a lagged effect (association between mother & daughter)

– In some circumstances this may swamp other effects

Page 5: ‘Interpreting coefficients from longitudinal models’

Repeated Cross-Sectional Surveys• UK has a wealth of repeated cross-sectional data

– Much of it is comparable

• Often not considered longitudinal because there are no explicit repeated contacts

• However, very useful for trend over time analyses

• Cross-sectional models are employed– Be careful of the interpretation of and the int of time– Time is often survey year, but can be cohort (e.g. YCS)

Page 6: ‘Interpreting coefficients from longitudinal models’

Duration Models

• Modelling time to an event taking place

• Duration is the outcome

Page 7: ‘Interpreting coefficients from longitudinal models’

Simple approach accelerated life model

Loge ti = x1i+ei

This is a regression model is the effect on the log duration

When there are no (or a small number) of right censored cases this approach is suitable – it may be questioned by referees however!

This model is a little old fashioned, but often results are very similar to hazard models (although in practice betas should be carefully compared to hazard models

Page 8: ‘Interpreting coefficients from longitudinal models’

Duration Models

• Duration models• Survival models• Cox regression• Failure time analysis• Event history models• Hazard modelsCox, D.R. (1972) ‘Regression models and life tables’ JRSS,B, 34 pp.187-220.

These are all the same thing – depending on your substantive discipline

Page 9: ‘Interpreting coefficients from longitudinal models’

Hazard Models

• Model time to an event

• They do no model duration – they model the

‘Harzard’

Hazard: measure of the probability that an event occurs at time t conditional on it not having occurred before t

• These models appropriately control for right-censored data

Page 10: ‘Interpreting coefficients from longitudinal models’

Hazard Models

• Hazard models are similar to logit models

• is estimated on the logit scale

• estimates the increase/decrease in the speed at which individuals (in the group) leave the risk set

• is about speed and not rate (as is commonly suggested)

Page 11: ‘Interpreting coefficients from longitudinal models’

Alternative Types of Event History Analysis

Describing sequences / trajectories:

characterise progression through states into clusters / sequences / frameworks

• Growing recent social science interest sequence analysis

– Often analyse cluster membership as categorical factor

A problem – neutrality of data, e.g. cluster 1= Men in full time employment

Page 12: ‘Interpreting coefficients from longitudinal models’

Panel Models

Page 13: ‘Interpreting coefficients from longitudinal models’

1 2 3 4 1 2 3 4 1 2 3 4 5 1 2 3 4 5 1 2

Individuals

Orthodox Panel Data Structure

1 2 3 4 5

Observations (t)

Page 14: ‘Interpreting coefficients from longitudinal models’

Panel Regression Approach

• xt suite in Stata

• can usually be interpreted relatively easily

• Similarity to in the multilevel modelling framework

Page 15: ‘Interpreting coefficients from longitudinal models’

Standard Linear Model Slopes and Intercepts

Constant slopesConstant intercept

0 is a constant intercept

1 is a constant slope

Page 16: ‘Interpreting coefficients from longitudinal models’

Possible Slopes and Intercepts

Constant slopesVarying intercepts

Varying slopesVarying intercepts

The fixed effects model

Separate regression for each individual

0j is not a constant intercept

1 is a constant slope

0j is not a constant intercept

1j is not a constant slope

Page 17: ‘Interpreting coefficients from longitudinal models’

Regression Approach

Fixed or Random effects estimators

• Fierce debate

– F.E. will tend be consistent– R.E. standard errors will be efficient but may not

be consistent

– R.E. assumes no correlation between observed X variables and unobserved characteristics

Page 18: ‘Interpreting coefficients from longitudinal models’

xt Regression ApproachFixed or Random effects

– Economists tend towards F.E.

(attractive property of consistent )

– With continuous Y – little problem, fit both F.E. and R.E. models and then Hausman test f.e. / r.e.

(don’t be surprised if it points towards F.E. model)

( Steve Pudney’s suggestion)

Page 19: ‘Interpreting coefficients from longitudinal models’

xt Regression ApproachFixed or Random effects estimators

• Preference for Random Effects (RE) models in some areas (e.g. education studies)

• Frequent criticism – A key assumption in RE models is than random effects are uncorrelated with the observed variables in the model

• In practice this assumption goes untested and could potentially result in biased estimates (see Halaby 2004 Ann. Rev. Sociology 30)

Page 20: ‘Interpreting coefficients from longitudinal models’

Which approaches in practice?

• Some more general thoughts

– banana skins– flies in the ointment

Page 21: ‘Interpreting coefficients from longitudinal models’

01

02

03

04

0Y

0 5 10 15X

Observed Fitted values

e.g. results for a standard regressionVernon Gayle & Paul Lambert.

Unsuitable approach for individual level change over time (growth or development)(this illustration shows observations with constant slopes but different intercepts)

Repeated Measures Data

Page 22: ‘Interpreting coefficients from longitudinal models’

01

02

03

04

0Y

0 5 10 15X

Individual 1 Individual 2Individual 3 Individual 4

e.g. results from a fixed effects panel modelVernon Gayle & Paul Lambert.

Change over time (growth or development) panel modelling

Repeated Measures Data

Page 23: ‘Interpreting coefficients from longitudinal models’

01

02

03

0Y

5 10 15 20 25 30X

Individual 1 Individual 2

Individual 3 Individual 4

Vernon Gayle & Paul Lambert

No clear change over time (growth or development) repeated measures a nuisance

Repeated Measures: Cluster or Pop Ave Approach

Page 24: ‘Interpreting coefficients from longitudinal models’

The Hausman test is very sensitive and will usually lead to a preference for the FE model

Substantively the RE may be better, the FE is more appropriate in relation to growth or individual level change

Page 25: ‘Interpreting coefficients from longitudinal models’

Fixed or Random Effect Estimators?

In our view R.E. is most appropriate when there are substantively important fixed in time X variables (which are not correlated with unobserved effects)

F.E. can be especially misleading for variables that change little in time (e.g. trade union members) because they are “identified by changers”This may be compounded by measurement errors

Page 26: ‘Interpreting coefficients from longitudinal models’

A further thought about fixed effects models….

Page 27: ‘Interpreting coefficients from longitudinal models’

The Panel Model

Earnings (y)

Time changing x vars

Unobserved ability

The F.E. panel model estimator is theoretically attractive in this situation

F.E. is commonly used in economics, as the effect of education level is correlated with ability

Remember that this rests on the (potentially strong) assumption that ability is fixed in time

Education level (x) fixed in time

Page 28: ‘Interpreting coefficients from longitudinal models’

The Panel Model

Earnings (y)Time changing x vars

Unobserved ability

R.E. is commonly used in multilevel modelling, but the effect of education level may be correlated with ability

Remember that this rests on the (potentially strong) assumption that ability is fixed in time

Education level (x) fixed in time

Correlation

Page 29: ‘Interpreting coefficients from longitudinal models’

The Panel Model

Explanatory variable

Unobserved

Fixed Effects - econometrician Stephen Pudney makes this point

The standard theoretical position (two slides back) is questionable if there is two-way causality

Page 30: ‘Interpreting coefficients from longitudinal models’

Population Ave Model (Marginal Models)

• Is a model that accounts for clustering between individuals all we need? logit y x1, cluster(id)

• Becoming more popular (Pickles –preference in USA in public health)

• Do we need ‘subject’ specific random/fixed effect?(is ‘frailty’ or unobserved heterogeneity important)

• Time constant X variables might be analytically important

• Marginal Modelling (GEE approaches) may be all we need (e.g. estimating a policy or ‘social group’ difference)

Page 31: ‘Interpreting coefficients from longitudinal models’

Some further thoughts on comparing estimates between models……

Page 32: ‘Interpreting coefficients from longitudinal models’

Binary Outcome Panel Models:An example

Married women’s employment (SCELI Data)

y is the woman working yes=1; no=0

x woman has child aged under 1 year

I have contrived this illustration….

Page 33: ‘Interpreting coefficients from longitudinal models’

Probit Probit

s.e. s.e.

Child under 1 -1.95 0.56 -1.95 0.40

Constant 0.67 0.14 0.67 0.10

Log likelihood -54.70 -109.39

n 101.00 202.00

Pseudo R2 0.13 0.13

Clusters- -

Consistent smaller standard errors (double the sample size) but Stata thinks that there are 202 individuals and not 101 people surveyed in two waves!

Page 34: ‘Interpreting coefficients from longitudinal models’

Probit Probit Probit

s.e. s.e. Robust

Child under 1 -1.95 0.56 -1.95 0.40 -1.95 0.56

Constant 0.67 0.14 0.67 0.10 0.67 0.14

Log likelihood -54.70 -109.39 -109.39

n 101.00 202.00 202.00

Pseudo R2 0.13 0.13 0.13

Clusters- - 101.00

Consistent - standard errors are now corrected – Stata knows that there are 101 individuals (i.e. repeated measures)

Page 35: ‘Interpreting coefficients from longitudinal models’

Probit Probit Probit R.E. Probit

s.e. s.e. Robust s.e.

Child under 1 -1.95 0.56 -1.95 0.40 -1.95 0.56 -19.41 1.22

Constant 0.67 0.14 0.67 0.10 0.67 0.14 6.39 0.28

Log likelihood -54.70 -109.39 -109.39 -49.57

n 101.00 202.00 202.00 202.00

Pseudo R2 0.13 0.13 0.13

Clusters- - 101.00 101.00

Beware and standard errors are no longer measured on the same scale

Stata knows that there are 101 individuals (i.e. repeated measures)

Page 36: ‘Interpreting coefficients from longitudinal models’

in Binary Panel Models

The in a probit random effects model is scaled differently– Mark Stewart suggests

r.e. * (1-rho) compared with pooled probit

rho (is analogous to an icc) – proportion of the total variance contributed by the person level variance

Panel logit models also have this issue!

Page 37: ‘Interpreting coefficients from longitudinal models’

in Binary Panel Models

• Conceptually two types of in a binary random effects model

• X is time changing - is the ‘effect’ for a woman of changing her value of X

• X is fixed in time - is analogous to the effect for two women (e.g. Chinese / Indian) with the same value of the random effect (e.g. ui=0) – For fixed in time X Fiona Steele suggests simulating to get more appropriate value of

Page 38: ‘Interpreting coefficients from longitudinal models’

Population Ave Model / Marginal Models

• Motivation for thinking about these approaches:– Not really been adopted in British Sociology

• Population average models/Marginal Modelling/GEE approaches are developing rapidly. They might be useful for estimating a policy or ‘social group’ differences

• Population average models are becoming more popular (Pickles – preference in USA in public health)

• Is a model that accounts for clustering between individual observations adequate?

Simple pop. average model: regress y x1, cluster(id)

Page 39: ‘Interpreting coefficients from longitudinal models’

Conclusion• Clustering is sometimes part of the substantive story

– e.g. orthodox hierarchical (or multi-level) situation, pupils nested in schools

• Explicitly modelling hierarchical structure may be desirable – Ironically, in some instances even with ‘highly’ clustered

data we would tell a similar story which ever model we used (strength of coefficient, signs & significance)

Page 40: ‘Interpreting coefficients from longitudinal models’

Conclusion

• Population average models/Marginal Modelling/GEE might be useful for estimating a policy or ‘social group’ differences

– Is the ‘average’ effect for a group the substantively more interesting or more important for informing policy or practice

Page 41: ‘Interpreting coefficients from longitudinal models’

Conclusion• Some estimators (xtprobit) don’t have F.E.

equivalents (xtlogit F.E. is not equivalent to R.E.)

• Here population average approaches might be attractive since a key assumption in RE models is than random effects are uncorrelated with the observed variables in the model and this can’t be formally tested