eha diagnostics sociology 229a: event history analysis class 5 copyright © 2008 by evan schofer do...
TRANSCRIPT
EHA Diagnostics
Sociology 229A: Event History AnalysisClass 5
Copyright © 2008 by Evan SchoferDo not copy or distribute without permission
Announcements
• Class topics: • Cox model: examining the baseline hazard
– And hazard for various groups in your data
• Cox model diagnostics (part 1)• Discussion of readings
Cox Model: Baseline Hazard
• Cox models involve a “baseline hazard”• Note: baseline = when all covariates are zero• Question: What does the baseline hazard look like?
– Or baseline survivor & integrated hazard?
– Stata can estimate the baseline survivor, hazard, integrated hazard. Two steps:
• 1. You must ask stata to save the info when you run the Cox model
– Ex: stcox gdp degradation education democracy ngo ingo, robust nohr basehc(h0)
• 2. Use “stcurve” command to plot the baseline curves– Ex: stcurve, hazard OR stcurve, survival
Cox Model: Baseline Hazard
• Baseline rate: Adoption of environmental law0
.02
.04
.06
.08
Sm
ooth
ed
haza
rd fu
nctio
n
1970 1980 1990 2000analysis time
Cox proportional hazards regression
Cox Model: Baseline Hazard
• Note: It may not always make sense to plot the baseline hazard
• Baseline shows hazard when X variables are zero• Sometimes zero values aren’t very useful/interesting
– Example: Does it make sense to plot hazard of countries adopting laws, if X vars = zero?
• Hazard rate might be quite low• In some cases, you’ll just get a flat zero curve
– Or extremely high values
– Solutions:• 1. Rescale indep vars before running cox model• 2. Use stcurve to choose relevant values of vars.
Cox Model: Estimated Hazards
• You can also use stcurve to plot estimated hazard rates based on values of indep vars
• Ex: What is hazard curve if democracy = 1, 5, 10?
• Strategy: use “at” subcommand:• stcurve , hazard at(democ=1) at2(democ=10) • NOTE: All other variables are pegged at the mean…
Cox: Estimated Hazard Rate
• Hazard rate for adoption of environmental law0
.2.4
.6.8
Sm
ooth
ed
haza
rd fu
nctio
n
1970 1980 1990 2000analysis time
democracy=1 democracy=10
Cox proportional hazards regression
Cox Model Diagnostics
• Issues that you must deal with:• 1. How to estimate results with “ties” in your data
– Ties = cases that fail at the exact same time
• 2. How to identify violations of the proportional hazard assumption
• 3. Dealing with outliers/influential cases• 4. Assessing model fit
– Most of this applies to parametric models• Ties are not a concern• But, additional issues come up: choosing the right
functional form (shape) to model the hazard.
Cox Model Issues: Ties
• How to handle ties in data• It is mathematically complex to estimate models when
there are tied failures– That is: two cases that have events at the exact same time
• Several mathematical approaches:– Breslow approximation – simplest approach
• Stata default, but not the best choice!
– Efron approximation – generally better• More computationally intensive, but given the power of
modern computers it is not an issue• stcox var1 var2 var3, efron
Cox Model Issues: Ties– Exact marginal – “continuous time approximation”
– Box-Steffensmeier & Jones: “Averaged Likelihood”
• Assumes ties didn’t happen EXACTLY at the same time… and considers all possible orderings
– Exact partial – “discrete”– Box-Steffensmeier & Jones: “exact discrete method”
• Assumes ties happened EXACTLY at the same time
– Advice:• Use Efron at a minimum• Exact methods are often more accurate
– Exact marginal often makes most sense… events rarely occur at the EXACT same time… unless you have discrete data
– But, exact methods can take a LONG time.– For big datasets with many ties, Efron is OK.
Proportional Hazard Assumption
• Key assumption: Proportional hazards• Estimated Hazard ratios are proportional over time• i.e., Estimates of a hazard ratio do NOT vary over time
– Example: Effect of “abstinence” program on sexual behavior
• Issue: Do abstinence programs lower the rate in a consistent manner across time?
– Or, perhaps the rate is lower initially… but then the rate jumps back up (maybe even exceeds the control group).
– Groups are assumed to have “parallel” hazards• Rather than rates that diverge, converge (or cross).
Proportional Hazard Assumption
• Strategies:
• 1. Visually examine raw hazard plots for sub-groups in your data
• Watch for non-parallel trends• A crude method… not the best approach… but often
identifies big violations
Proportional Hazard Assumption• Visual examination of raw hazard rate
0.0
5.1
.15
1970 1980 1990 2000analysis time
west = 0 west = 1
Smoothed hazard estimates, by west
You want them to change proportionally
If one doubles, so does the other…
Proportional Hazard Assumption
• 2. Plot –ln(-ln(survival plot)) versus ln(time) across values of X variables
• What stata calls “stphplot”• Parallel lines indicate proportional hazards• Again, convergence and divergence (or crossing)
indicates violation
– A less-common approach: compare observed survivor plot to predicted values (for different values of X)
• What stata calls “stcoxkm”• If observed are similar to predicted, assumption is not
likely to be violated.
Proportional Hazard Assumption• -ln(-ln(survivor)) vs. ln(time) – “stphplot”
Parallel=good
Convergence suggests violation of proportional hazard assumption
(But, I’ve seen worse!)
-10
12
34
-ln[-
ln(S
urv
ival
Pro
babi
lity)
]
7.585 7.59 7.595 7.6 7.605ln(analysis time)
west = 0 west = 1
Proportional Hazard Assumption• Cox estimate vs. observed KM – “stcoxkm”
0.0
00.
20
0.4
00.
60
0.8
01.
00
Sur
viva
l Pro
bab
ility
1970 1980 1990 2000analysis time
Observed: west = 0 Observed: west = 1Predicted: west = 0 Predicted: west = 1
Predicted differs from observed for countries in West
Proportional Hazard Assumption
• 3. Piecewise Models• Piecewise = break model up into pieces (by time)
– Ex: Split analysis in to “early” vs “late” time
• If coefficients vary in different time periods, hazards are not proportional
– Example:• stcox var1 var2 var3 if _t < 10 • stcox var1 var2 var3 if _t >= 10 • Look for large changes in coefficients!
Proportional Hazard Assumption
• In a piecewise model, coefficients would differ in non-proportional models
Proportional Non-Proportional
Here, the effect is the same in both time periods
Early Late Early Late
Here, the effect is negative in the early period and positive in the late period
Piecewise Models• Look at coefficients at 2 (or more) spans of timeEARLY. stcox gdp degradation education democracy ngo ingo if year < 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | .4465818 .4255587 1.05 0.294 -.3874979 1.280661 degradation | -.282548 .1572746 -1.80 0.072 -.5908005 .0257045 education | -.0195118 .0328195 -0.59 0.552 -.0838368 .0448131 democracy | .2295673 .2625205 0.87 0.382 -.2849634 .744098 ngo | .6792462 .3110294 2.18 0.029 .0696399 1.288853 ingo | .6664661 .4804229 1.39 0.165 -.2751456 1.608078------------------------------------------------------------------------------LATE. stcox gdp degradation education democracy ngo ingo if year >= 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | .4963942 .357739 1.39 0.165 -.2047613 1.19755 degradation | -.5702894 .2395257 -2.38 0.017 -1.039751 -.1008277 education | .0142118 .0143762 0.99 0.323 -.0139649 .0423886 democracy | .2541799 .0981386 2.59 0.010 .0618317 .4465281 ngo | .1742862 .1448187 1.20 0.229 -.1095532 .4581256 ingo | -.1134661 .2104308 -0.54 0.590 -.5259028 .2989707------------------------------------------------------------------------------
Note: Effect of ngo is larger in early period
Proportional Hazard Assumption
• 4. Tests based on re-estimating model• Try including time interactions in your model• Recall: Interactions – effect of A on C varies with B• If effect of variable X on hazard rate (or ratio) varies
with time, then hazards aren’t proportional
– Recall example: Abstinence programs• Perhaps abstinence programs have a big effect initially,
but the effect diminishes (or reverses) later on
Proportional Hazard Assumption
• Red = Abstinence group; green = control
No time interaction Positive timeinteraction
In non-proportional case, the effect of abstinence programs varies across time
Proportional Hazard Assumption
• Strategy: Create variables that reflect the interaction of X variables with time
• Significant effects of time interactions indicate non-proportional hazard
• Fortunately, inclusion of the interaction term in the model corrects the problem.
• Issue: X variables can interact with time in multiple ways…
– Linearly– With “log time” or time squared– With time dummies– You may have to try a range of things…
Proportional Hazard Assumption
• Red = Abstinence group; green = control
Linear time interactionEffect grows consistently over timeTry “Abstinence*time”
Interaction with time-period… Effect differs early vs. late Try “Abstinence*DLate”
Proportional Hazard Assumption
• 5. Grambsch & Therneau test – Ex: Stata “estat phtest”
• Test for non-zero slope of Schoenfeld residuals vs time– Implies log hazard ratio function = proportional
• Can be applied to general model, or for each variable
stcox gdp degradation education democracy ngo ingo, robust nohr scaledsch(sca*) schoenfeld(sch*)
. estat phtest
Test of proportional hazards assumption
Time: Time ---------------------------------------------------------------- | chi2 df Prob>chi2 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ----------------------------------------------------------------
Significant chi-square indicates violation of proportional hazard assumption
Proportional Hazard Assumption
• Variable-by-variable test “estat phtest”:
. estat phtest, detail
Test of proportional hazards assumption
Time: Time ---------------------------------------------------------------- | rho chi2 df Prob>chi2 ------------+--------------------------------------------------- gdp | 0.09035 0.63 1 0.4277 degradation | -0.22735 3.41 1 0.0646 education | 0.06915 0.47 1 0.4950 democracy | -0.04929 0.20 1 0.6560 ngo | -0.18691 4.56 1 0.0327 ingo | -0.03759 0.34 1 0.5609 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ----------------------------------------------------------------
Note: Certain variables are especially problematic…
Proportional Hazard Assumption• Notes on estat phtest :
– 1. Requires that you calculate “schoenfeld residuals” when you run the original cox model
– And, if you want a test for each variable, you must also request scaled schoenfeld residuals
– 2. Test is based on identifying non-zero time trend… but how should we characterize time?
• Options: normal/linear time, log time, time dummies, etc– Results may differ depending on your choice– Ex: estat phtest, log – specifies “log time”
• Plot of smoothed Schoenfeld residuals can indicate best way to characterize time
– Linear trend (not a curve) indicates that time is characterized OK– Ex: estat phtest, plot(ngo) OR estat phtest, log plot(ngo)
Proportional Hazard Assumption
• What if the assumption is violated?
• 1. Improve model specification• Add time interactions to address nonproportionality• Ex: If high democracies are not proportional to low
democracies, try adding “highdemoc*time”• Variables can be interacted with linear time, log time,
time dummies, etc., to address the issue
• 2. Model groups separately• Split sample along variables that are non-proportional.
Proportional Hazard Assumption
• What if the assumption is violated?
• 3. Use a stratified Cox model• Allows a different baseline hazard for each group
– But, you can’t estimate effect of stratifying variable!
• Ex: stcox var1 var2 var3, strata(Dhighdemoc)
• 4. Use a piecewise model• Split time into chunks… in which PH assumption is met
– Requires sufficient sample size in all time periods!
Proportional Hazard Assumption
• What if the assumption is violated?
• 5. Live with it (but temper your conclusions)• Violation of proportional hazard assumption tends to:
– Overestimate the effect of variables whose hazard ratios are increasing over time
– And, underestimate those whose hazard ratios are decreasing
• However, Allison points out: Cox model is reasonably robust
– Other issues (e.g., model misspecification) are bigger issues