eha diagnostics sociology 229a: event history analysis class 5 copyright © 2008 by evan schofer do...

30
EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Upload: philip-randell-clarke

Post on 05-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

EHA Diagnostics

Sociology 229A: Event History AnalysisClass 5

Copyright © 2008 by Evan SchoferDo not copy or distribute without permission

Page 2: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Announcements

• Class topics: • Cox model: examining the baseline hazard

– And hazard for various groups in your data

• Cox model diagnostics (part 1)• Discussion of readings

Page 3: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox Model: Baseline Hazard

• Cox models involve a “baseline hazard”• Note: baseline = when all covariates are zero• Question: What does the baseline hazard look like?

– Or baseline survivor & integrated hazard?

– Stata can estimate the baseline survivor, hazard, integrated hazard. Two steps:

• 1. You must ask stata to save the info when you run the Cox model

– Ex: stcox gdp degradation education democracy ngo ingo, robust nohr basehc(h0)

• 2. Use “stcurve” command to plot the baseline curves– Ex: stcurve, hazard OR stcurve, survival

Page 4: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox Model: Baseline Hazard

• Baseline rate: Adoption of environmental law0

.02

.04

.06

.08

Sm

ooth

ed

haza

rd fu

nctio

n

1970 1980 1990 2000analysis time

Cox proportional hazards regression

Page 5: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox Model: Baseline Hazard

• Note: It may not always make sense to plot the baseline hazard

• Baseline shows hazard when X variables are zero• Sometimes zero values aren’t very useful/interesting

– Example: Does it make sense to plot hazard of countries adopting laws, if X vars = zero?

• Hazard rate might be quite low• In some cases, you’ll just get a flat zero curve

– Or extremely high values

– Solutions:• 1. Rescale indep vars before running cox model• 2. Use stcurve to choose relevant values of vars.

Page 6: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox Model: Estimated Hazards

• You can also use stcurve to plot estimated hazard rates based on values of indep vars

• Ex: What is hazard curve if democracy = 1, 5, 10?

• Strategy: use “at” subcommand:• stcurve , hazard at(democ=1) at2(democ=10) • NOTE: All other variables are pegged at the mean…

Page 7: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox: Estimated Hazard Rate

• Hazard rate for adoption of environmental law0

.2.4

.6.8

Sm

ooth

ed

haza

rd fu

nctio

n

1970 1980 1990 2000analysis time

democracy=1 democracy=10

Cox proportional hazards regression

Page 8: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox Model Diagnostics

• Issues that you must deal with:• 1. How to estimate results with “ties” in your data

– Ties = cases that fail at the exact same time

• 2. How to identify violations of the proportional hazard assumption

• 3. Dealing with outliers/influential cases• 4. Assessing model fit

– Most of this applies to parametric models• Ties are not a concern• But, additional issues come up: choosing the right

functional form (shape) to model the hazard.

Page 9: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox Model Issues: Ties

• How to handle ties in data• It is mathematically complex to estimate models when

there are tied failures– That is: two cases that have events at the exact same time

• Several mathematical approaches:– Breslow approximation – simplest approach

• Stata default, but not the best choice!

– Efron approximation – generally better• More computationally intensive, but given the power of

modern computers it is not an issue• stcox var1 var2 var3, efron

Page 10: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox Model Issues: Ties– Exact marginal – “continuous time approximation”

– Box-Steffensmeier & Jones: “Averaged Likelihood”

• Assumes ties didn’t happen EXACTLY at the same time… and considers all possible orderings

– Exact partial – “discrete”– Box-Steffensmeier & Jones: “exact discrete method”

• Assumes ties happened EXACTLY at the same time

– Advice:• Use Efron at a minimum• Exact methods are often more accurate

– Exact marginal often makes most sense… events rarely occur at the EXACT same time… unless you have discrete data

– But, exact methods can take a LONG time.– For big datasets with many ties, Efron is OK.

Page 11: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Key assumption: Proportional hazards• Estimated Hazard ratios are proportional over time• i.e., Estimates of a hazard ratio do NOT vary over time

– Example: Effect of “abstinence” program on sexual behavior

• Issue: Do abstinence programs lower the rate in a consistent manner across time?

– Or, perhaps the rate is lower initially… but then the rate jumps back up (maybe even exceeds the control group).

– Groups are assumed to have “parallel” hazards• Rather than rates that diverge, converge (or cross).

Page 12: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Strategies:

• 1. Visually examine raw hazard plots for sub-groups in your data

• Watch for non-parallel trends• A crude method… not the best approach… but often

identifies big violations

Page 13: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption• Visual examination of raw hazard rate

0.0

5.1

.15

1970 1980 1990 2000analysis time

west = 0 west = 1

Smoothed hazard estimates, by west

You want them to change proportionally

If one doubles, so does the other…

Page 14: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• 2. Plot –ln(-ln(survival plot)) versus ln(time) across values of X variables

• What stata calls “stphplot”• Parallel lines indicate proportional hazards• Again, convergence and divergence (or crossing)

indicates violation

– A less-common approach: compare observed survivor plot to predicted values (for different values of X)

• What stata calls “stcoxkm”• If observed are similar to predicted, assumption is not

likely to be violated.

Page 15: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption• -ln(-ln(survivor)) vs. ln(time) – “stphplot”

Parallel=good

Convergence suggests violation of proportional hazard assumption

(But, I’ve seen worse!)

-10

12

34

-ln[-

ln(S

urv

ival

Pro

babi

lity)

]

7.585 7.59 7.595 7.6 7.605ln(analysis time)

west = 0 west = 1

Page 16: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption• Cox estimate vs. observed KM – “stcoxkm”

0.0

00.

20

0.4

00.

60

0.8

01.

00

Sur

viva

l Pro

bab

ility

1970 1980 1990 2000analysis time

Observed: west = 0 Observed: west = 1Predicted: west = 0 Predicted: west = 1

Predicted differs from observed for countries in West

Page 17: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• 3. Piecewise Models• Piecewise = break model up into pieces (by time)

– Ex: Split analysis in to “early” vs “late” time

• If coefficients vary in different time periods, hazards are not proportional

– Example:• stcox var1 var2 var3 if _t < 10 • stcox var1 var2 var3 if _t >= 10 • Look for large changes in coefficients!

Page 18: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• In a piecewise model, coefficients would differ in non-proportional models

Proportional Non-Proportional

Here, the effect is the same in both time periods

Early Late Early Late

Here, the effect is negative in the early period and positive in the late period

Page 19: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Piecewise Models• Look at coefficients at 2 (or more) spans of timeEARLY. stcox gdp degradation education democracy ngo ingo if year < 1985, robust nohr          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]-------------+----------------------------------------------------------------         gdp |   .4465818   .4255587     1.05   0.294    -.3874979    1.280661 degradation |   -.282548   .1572746    -1.80   0.072    -.5908005    .0257045   education |  -.0195118   .0328195    -0.59   0.552    -.0838368    .0448131   democracy |   .2295673   .2625205     0.87   0.382    -.2849634     .744098         ngo |   .6792462   .3110294     2.18   0.029     .0696399    1.288853        ingo |   .6664661   .4804229     1.39   0.165    -.2751456    1.608078------------------------------------------------------------------------------LATE. stcox gdp degradation education democracy ngo ingo if year >= 1985, robust nohr          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]-------------+----------------------------------------------------------------         gdp |   .4963942    .357739     1.39   0.165    -.2047613     1.19755 degradation |  -.5702894   .2395257    -2.38   0.017    -1.039751   -.1008277   education |   .0142118   .0143762     0.99   0.323    -.0139649    .0423886   democracy |   .2541799   .0981386     2.59   0.010     .0618317    .4465281         ngo |   .1742862   .1448187     1.20   0.229    -.1095532    .4581256        ingo |  -.1134661   .2104308    -0.54   0.590    -.5259028    .2989707------------------------------------------------------------------------------

Note: Effect of ngo is larger in early period

Page 20: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• 4. Tests based on re-estimating model• Try including time interactions in your model• Recall: Interactions – effect of A on C varies with B• If effect of variable X on hazard rate (or ratio) varies

with time, then hazards aren’t proportional

– Recall example: Abstinence programs• Perhaps abstinence programs have a big effect initially,

but the effect diminishes (or reverses) later on

Page 21: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Red = Abstinence group; green = control

No time interaction Positive timeinteraction

In non-proportional case, the effect of abstinence programs varies across time

Page 22: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Strategy: Create variables that reflect the interaction of X variables with time

• Significant effects of time interactions indicate non-proportional hazard

• Fortunately, inclusion of the interaction term in the model corrects the problem.

• Issue: X variables can interact with time in multiple ways…

– Linearly– With “log time” or time squared– With time dummies– You may have to try a range of things…

Page 23: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Red = Abstinence group; green = control

Linear time interactionEffect grows consistently over timeTry “Abstinence*time”

Interaction with time-period… Effect differs early vs. late Try “Abstinence*DLate”

Page 24: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• 5. Grambsch & Therneau test – Ex: Stata “estat phtest”

• Test for non-zero slope of Schoenfeld residuals vs time– Implies log hazard ratio function = proportional

• Can be applied to general model, or for each variable

stcox gdp degradation education democracy ngo ingo, robust nohr scaledsch(sca*) schoenfeld(sch*)

. estat phtest

Test of proportional hazards assumption

Time: Time ---------------------------------------------------------------- | chi2 df Prob>chi2 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ----------------------------------------------------------------

Significant chi-square indicates violation of proportional hazard assumption

Page 25: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Variable-by-variable test “estat phtest”:

. estat phtest, detail

Test of proportional hazards assumption

Time: Time ---------------------------------------------------------------- | rho chi2 df Prob>chi2 ------------+--------------------------------------------------- gdp | 0.09035 0.63 1 0.4277 degradation | -0.22735 3.41 1 0.0646 education | 0.06915 0.47 1 0.4950 democracy | -0.04929 0.20 1 0.6560 ngo | -0.18691 4.56 1 0.0327 ingo | -0.03759 0.34 1 0.5609 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ----------------------------------------------------------------

Note: Certain variables are especially problematic…

Page 26: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption• Notes on estat phtest :

– 1. Requires that you calculate “schoenfeld residuals” when you run the original cox model

– And, if you want a test for each variable, you must also request scaled schoenfeld residuals

– 2. Test is based on identifying non-zero time trend… but how should we characterize time?

• Options: normal/linear time, log time, time dummies, etc– Results may differ depending on your choice– Ex: estat phtest, log – specifies “log time”

• Plot of smoothed Schoenfeld residuals can indicate best way to characterize time

– Linear trend (not a curve) indicates that time is characterized OK– Ex: estat phtest, plot(ngo) OR estat phtest, log plot(ngo)

Page 27: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• What if the assumption is violated?

• 1. Improve model specification• Add time interactions to address nonproportionality• Ex: If high democracies are not proportional to low

democracies, try adding “highdemoc*time”• Variables can be interacted with linear time, log time,

time dummies, etc., to address the issue

• 2. Model groups separately• Split sample along variables that are non-proportional.

Page 28: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• What if the assumption is violated?

• 3. Use a stratified Cox model• Allows a different baseline hazard for each group

– But, you can’t estimate effect of stratifying variable!

• Ex: stcox var1 var2 var3, strata(Dhighdemoc)

• 4. Use a piecewise model• Split time into chunks… in which PH assumption is met

– Requires sufficient sample size in all time periods!

Page 29: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• What if the assumption is violated?

• 5. Live with it (but temper your conclusions)• Violation of proportional hazard assumption tends to:

– Overestimate the effect of variables whose hazard ratios are increasing over time

– And, underestimate those whose hazard ratios are decreasing

• However, Allison points out: Cox model is reasonably robust

– Other issues (e.g., model misspecification) are bigger issues

Page 30: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission