event history analysis 5 sociology 8811 lecture 19 copyright © 2007 by evan schofer do not copy or...

47
Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Post on 22-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Event History Analysis 5

Sociology 8811 Lecture 19

Copyright © 2007 by Evan SchoferDo not copy or distribute without permission

Page 2: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Announcements

• Class topic: • Time-varying data: example• More details on Cox models & other fully parametric

Proportional Hazard models

• Paper Assignment #2 handed out today• Due April 26

Page 3: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

EHA Models: In Greater Depth

• Issues:– Properties of Cox models (semi-parametric) and

fully parametric models• Plus relevant assumptions, diagnostics• Strategies for Outliers• Model Fit

– Choosing a model. What should you do?– Other issues

• Accelerated failure time models• Frailty• Etc..

Page 4: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Event History Example

• What factors affect how soon a country passes an environmental protection law?

• Event: Passing an environmental law in a given year• Risk set: All countries that have not yet passed an

environmental protection law

– We decided that risk begins at 1970 (when such laws were invented)

• Countries independent after 1970 are treated as entering the analysis “late”

• Option #2: Duration since independence (age)– But, that was less appropriate for the research question.

Page 5: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Example: Environmental Laws• Cross-national time series dataset of nearly

100 countries• Event: when a country writes its first comprehensive

environmental law (e.g., EPA)

• Data taken from various sources• Independent variables: GDP, population, democracy,

degradation, education, domestic and international NGOs

• Time duration: analyses are from 1970-1998• In other words, countries enter the “risk set” in 1970, or

when they become independent

• Total sample of 97 countries• 73 countries have an event between 1970 and 1998.

Page 6: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time-Varying Data Structure

• In the previous example, each row of data was a separate survey respondent

• Because survey respondents were not tracked over multiple years, this data was not “time-varying”

• In the current example, we have the advantage of time-varying data

• Each row of data is a country-year• Our independent variables may change over time.

Page 7: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

States, Spells, and Events

• Example (India):

1

0

1970 … 1983 1984 1985 1986 1987 1988 … 1998Year

Sta

te

Spell #2Spell #1

Law written

Page 8: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

States, Spells, and Events

• Example (Iran):

1

0

1970 … 1983 1984 1985 1986 1987 1988 … 1998Year

Sta

te Spell #1

No law written as of 1998

Page 9: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time-Varying Data Structure

newname2 newid3 year law eventnum start end ss es popINDIA 1119 1978 0 1 1978 1979 0 0 656941INDIA 1119 1979 0 1 1979 1980 0 0 672021INDIA 1119 1980 0 1 1980 1981 0 0 687332INDIA 1119 1981 0 1 1981 1982 0 0 702821INDIA 1119 1982 0 1 1982 1983 0 0 718426INDIA 1119 1983 0 1 1983 1984 0 0 734072INDIA 1119 1984 0 1 1984 1985 0 0 749677INDIA 1119 1985 0 1 1985 1986 0 0 765147INDIA 1119 1986 1 1 1986 1987 0 1 781893INDIA 1119 1987 0 1 1987 1988 1 1 798680INDIA 1119 1988 0 1 1988 1989 1 1 815590INDIA 1119 1989 0 1 1989 1990 1 1 832535INDIA 1119 1990 0 1 1990 1991 1 1 849515INDIA 1119 1991 0 1 1991 1992 1 1 866530

• Example:

Law writtenSpell State

Population

Page 10: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time-Varying Data Structure

newname2 newid3 year law eventnum start end ss es popINDIA 1119 1978 0 1 1978 1979 0 0 656941INDIA 1119 1979 0 1 1979 1980 0 0 672021INDIA 1119 1980 0 1 1980 1981 0 0 687332INDIA 1119 1981 0 1 1981 1982 0 0 702821INDIA 1119 1982 0 1 1982 1983 0 0 718426INDIA 1119 1983 0 1 1983 1984 0 0 734072INDIA 1119 1984 0 1 1984 1985 0 0 749677INDIA 1119 1985 0 1 1985 1986 0 0 765147INDIA 1119 1986 1 1 1986 1987 0 1 781893INDIA 1119 1987 0 1 1987 1988 1 1 798680INDIA 1119 1988 0 1 1988 1989 1 1 815590INDIA 1119 1989 0 1 1989 1990 1 1 832535INDIA 1119 1990 0 1 1990 1991 1 1 849515INDIA 1119 1991 0 1 1991 1992 1 1 866530

• Stset command:stset end, failure(es==1) origin(1970)

Note: It is common to drop cases that are not at risk (ex: if start state = 1)BUT, it is not necessary… Stata drops cases after the event by default…unless you specify exit(time .)

Page 11: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time-Varying Data Structure

• What if countries pass multiple laws?• Called “repeated events• 1. start state could be reset to zero• 2. We can override the stata default of removing

cases after the first event occurs: exit(time .)

newname2 newid3 year law eventnum start end ss es popINDIA 1119 1978 0 1 1978 1979 0 0 656941INDIA 1119 1979 0 1 1979 1980 0 0 672021INDIA 1119 1980 0 1 1980 1981 0 0 687332INDIA 1119 1981 0 1 1981 1982 0 0 702821INDIA 1119 1982 0 1 1982 1983 0 0 718426INDIA 1119 1983 0 1 1983 1984 0 0 734072INDIA 1119 1984 0 1 1984 1985 0 0 749677INDIA 1119 1985 0 1 1985 1986 0 0 765147INDIA 1119 1986 1 1 1986 1987 0 1 781893INDIA 1119 1987 0 1 1987 1988 0 0 798680INDIA 1119 1988 0 1 1988 1989 0 0 815590INDIA 1119 1989 0 1 1989 1990 0 0 832535INDIA 1119 1990 0 1 1990 1991 0 1 849515INDIA 1119 1991 0 1 1991 1992 0 0 866530

Page 12: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cumulative Survivor Function

0.0

00

.25

0.5

00

.75

1.0

0

1970 1980 1990 2000analysis time

Kaplan-Meier survival estimate

Page 13: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cumulative Survivor Functionby Region

0.0

00

.25

0.5

00

.75

1.0

0

1970 1980 1990 2000analysis time

gregion = industrialized west gregion = central and south americagregion = asia gregion = middle eastgregion = africa

Kaplan-Meier survival estimates, by gregion

Page 14: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cumulative Survivor Function West vs. non-West

0.0

00

.25

0.5

00

.75

1.0

0

1970 1980 1990 2000analysis time

west = 0 west = 1

Kaplan-Meier survival estimates, by west

Page 15: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Smoothed Hazard FunctionWest vs. non-West

0.0

5.1

.15

1970 1980 1990 2000analysis time

west = 0 west = 1

Smoothed hazard estimates, by west

Page 16: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Constant Rate Model: Example

• Simple one-variable model comparing west vs. non-west

streg west, dist(exponential) nohr

Exponential regression -- log relative-hazard form

No. of subjects = 97 Number of obs = 2047No. of failures = 81Time at risk = 2047 Wald chi2(1) = 12.10Log pseudolikelihood = 275.49924 Prob > chi2 = 0.0005

(Std. Err. adjusted for 97 clusters in newid3)------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- west | .6931146 .1992638 3.48 0.001 .3025648 1.083664 _cons | -3.34054 .0807514 -41.37 0.000 -3.49881 -3.18227

Page 17: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Constant Rate Model: Example• Model with time-varying covariatesNo. of subjects = 92 Number of obs = 1938No. of failures = 77Time at risk = 1938 Wald chi2(6) = 94.29Log pseudolikelihood = 282.11796 Prob > chi2 = 0.0000

(Std. Err. adjusted for 92 clusters in newid3)------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | -.044568 .1842564 -0.24 0.809 -.4057039 .3165679 degradation | -.4766958 .1044108 -4.57 0.000 -.6813372 -.2720543 education | .0377531 .0130314 2.90 0.004 .0122121 .0632942 democracy | .2295392 .0959669 2.39 0.017 .0414475 .417631 ngo | .4258148 .1576803 2.70 0.007 .1167671 .7348624 ingo | .3114173 .365112 0.85 0.394 -.4041891 1.027024 _cons | -4.565513 1.864396 -2.45 0.014 -8.219663 -.9113642

Democratic countries enact laws at a higher rate than less-democratic countries

Page 18: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Constant Rate Model: Example• Same model – with Hazard RatiosNo. of subjects = 92 Number of obs = 1938No. of failures = 77Time at risk = 1938 Wald chi2(6) = 94.29Log pseudolikelihood = 282.11796 Prob > chi2 = 0.0000

(Std. Err. adjusted for 92 clusters in newid3)------------------------------------------------------------------------------ | Robust _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | .9564106 .1762248 -0.24 0.809 .6665075 1.372409 degradation | .6208314 .0648215 -4.57 0.000 .50594 .7618129 education | 1.038475 .0135328 2.90 0.004 1.012287 1.06534 democracy | 1.25802 .1207283 2.39 0.017 1.042318 1.51836 ngo | 1.530837 .2413828 2.70 0.007 1.123858 2.085195 ingo | 1.365359 .498509 0.85 0.394 .6675179 2.792742------------------------------------------------------------------------------

A 1-point increase in democracy increases the hazard rate by 25.8%!

Page 19: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Constant Rate Model : Example

• What if we expect global civil society to have a particularly strong effect in the non-West?

• Option #1: Create an interaction termNo. of subjects = 92 Number of obs = 1938No. of failures = 77Time at risk = 1938 Wald chi2(8) = 91.25Log pseudolikelihood = 282.5435 Prob > chi2 = 0.0000

(Std. Err. adjusted for 92 clusters in newid3)------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | -.0789765 .2546507 -0.31 0.756 -.5780827 .4201298 degradation | -.4656443 .1177774 -3.95 0.000 -.6964838 -.2348047 education | .0425672 .0137641 3.09 0.002 .01559 .0695444 democracy | .2277121 .0951693 2.39 0.017 .0411836 .4142406 ngo | .4069064 .1595268 2.55 0.011 .0942397 .7195732 ingo | -.1326514 .6842896 -0.19 0.846 -1.473834 1.208532 nonwest | -3.345421 4.94285 -0.68 0.499 -13.03323 6.342387ingoXnonwest | .49408 .6819827 0.72 0.469 -.8425815 1.830741 _cons | -1.28664 5.692187 -0.23 0.821 -12.44312 9.869841

Page 20: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Constant Rate Model : Example

• What if we expect global civil society to have a particularly strong effect in the non-West?

• Option #2: Include only non-Western countries in the analysis

No. of subjects = 76 Number of obs = 1720No. of failures = 61Time at risk = 1720 Wald chi2(6) = 55.26Log pseudolikelihood = 215.57325 Prob > chi2 = 0.0000

(Std. Err. adjusted for 76 clusters in newid3)------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | .3521921 .3470927 1.01 0.310 -.3280971 1.032481 degradation | -.7326479 .2566293 -2.85 0.004 -1.235632 -.2296637 education | .0314009 .0193698 1.62 0.105 -.0065633 .069365 democracy | .2387203 .0935281 2.55 0.011 .0554087 .422032 ngo | .3604018 .1984957 1.82 0.069 -.0286426 .7494462 ingo | .5447586 .4949746 1.10 0.271 -.4253738 1.514891 _cons | -8.446306 3.872579 -2.18 0.029 -16.03642 -.8561915

Page 21: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cox Models• The basic Cox model:

)(0

2211)()( nnXbXbXbethth

• Where h(t) is the hazard rate

• h0(t) is some baseline hazard function (to be inferred from the data)

• This obviates the need for building a specific functional form into the model

• Also written as:

)exp()()( 0 Xthth

Page 22: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cox Model: Example• Mostly similar to exponential model…Cox regression -- Breslow method for ties

No. of subjects = 92 Number of obs = 1938No. of failures = 77Time at risk = 1938 Wald chi2(6) = 65.49Log pseudolikelihood = -287.27209 Prob > chi2 = 0.0000

(Std. Err. adjusted for 92 clusters in newid3)------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | .4572288 .2025104 2.26 0.024 .0603157 .8541419 degradation | -.4311475 .1131853 -3.81 0.000 -.6529867 -.2093083 education | .0027517 .0136965 0.20 0.841 -.024093 .0295964 democracy | .2836321 .0911985 3.11 0.002 .1048862 .4623779 ngo | .2874221 .1614045 1.78 0.075 -.0289248 .603769 ingo | -.026845 .2391101 -0.11 0.911 -.4954922 .4418021

Most effects = similar… though education effect loses significance…

Page 23: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cox Model Issues: Ties

• 1. How to handle ties in data• It is mathematically complex to estimate models when

there are tied failures– That is: two cases that have events at the exact same time

• Several mathematical approaches:– Breslow approximation – simplest approach

• Stata default, but not the best choice!

– Efron approximation – generally better• More computationally intensive, but given the power of

modern computers it is not an issue• Efron is generally preferred

Page 24: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cox Model Issues: Ties– Exact marginal – “continuous time approximation”

– Box-Steffensmeier & Jones: “Averaged Likelihood”

• Assumes ties didn’t happen EXACTLY at the same time… and considers all possible orderings

– Exact partial – “discrete”– Box-Steffensmeier & Jones: “exact discrete method”

• Assumes ties happened EXACTLY at the same time

– Advice:• Use Efron at a minimum• Exact methods are often more accurate

– Exact marginal often makes most sense… events rarely occur at the EXACT same time

– But, exact methods can take a LONG time.– For big datasets with many ties, Efron is OK.

Page 25: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cox Model: Baseline Hazard

• Cox models involve a “baseline hazard”• Note: baseline = when all covariates are zero• Question: What does the baseline hazard look like?

– Or baseline survivor & integrated hazard?

– Stata can estimate the baseline survivor, hazard, integrated hazard. Two steps:

• 1. You must ask stata to save the info when you run the Cox model

– Ex: stcox gdp degradation education democracy ngo ingo, robust nohr basehc(h0)

• 2. Use “stcurve” command to plot the baseline curves– Ex: stcurve, hazard OR stcurve, survival

Page 26: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cox Model: Baseline Hazard

• Baseline rate: Adoption of environmental law0

.02

.04

.06

.08

Sm

ooth

ed

haza

rd fu

nctio

n

1970 1980 1990 2000analysis time

Cox proportional hazards regression

Page 27: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cox Model: Baseline Hazard

• Note: It may not always make sense to plot the baseline hazard

• Baseline shows hazard when X variables are zero• Sometimes zero values aren’t very useful/interesting

– Example: Does it make sense to plot hazard of countries adopting laws, if GDP is zero?

• Hazard rate is quite low• In some cases, you’ll just get a flat zero curve

– Or extremely high values

– Solutions:• 1. Rescale indep vars before running cox model• 2. Use stcurve to choose relevant values of vars.

Page 28: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cox Model: Estimated Hazards

• You can also use stcurve to plot estimated hazard rates based on values of indep vars

• Ex: What is hazard curve if democracy = 1, 5, 10?

• Strategy: use “at” subcommand:• stcurve , hazard at(democ=1) at2(democ=10) • NOTE: All other variables are pegged at the mean…

Page 29: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cox: Estimated Hazard Rate

• Hazard rate for adoption of environmental law0

.2.4

.6.8

Sm

ooth

ed

haza

rd fu

nctio

n

1970 1980 1990 2000analysis time

democracy=1 democracy=10

Cox proportional hazards regression

Page 30: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Key assumption: Proportional hazards• Estimated Hazard ratios are proportional over time• i.e., Estimates of a hazard ratio do NOT vary over time

– Example: Effect of “abstinence” program on sexual behavior

• Issue: Do abstinence programs lower the rate in a consistent manner across time?

– Or, perhaps the rate is lower initially… but then the rate jumps back up (maybe even exceeds the control group).

– Groups are assumed to have “parallel” hazards• Rather than rates that diverge, converge (or cross).

Page 31: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Strategies:

• 1. Visually examine raw hazard plots for sub-groups in your data

• Watch for non-parallel trends• A simple, crude method… but often identifies big

violations

Page 32: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption• Visual examination of raw hazard rate

0.0

5.1

.15

1970 1980 1990 2000analysis time

west = 0 west = 1

Smoothed hazard estimates, by west

Parallel trends in hazard rate look good!

Page 33: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• 2. Plot –ln(-ln(survival plot)) versus ln(time) across values of X variables

• What stata calls “stphplot”• Parallel lines indicate proportional hazards• Again, convergence and divergence (or crossing)

indicates violation

– A less-common approach: compare observed survivor plot to predicted values (for different values of X)

• What stata calls “stcoxkm”• If observed are similar to predicted, assumption is not

likely to be violated.

Page 34: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption• -ln(-ln(survivor)) vs. ln(time) – “stphplot”

Convergence suggests violation of proportional hazard assumption

(But, I’ve seen worse!)

-10

12

34

-ln[-

ln(S

urv

ival

Pro

babi

lity)

]

7.585 7.59 7.595 7.6 7.605ln(analysis time)

west = 0 west = 1

Page 35: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption• Cox estimate vs. observed KM – “stcoxkm”

0.0

00.

20

0.4

00.

60

0.8

01.

00

Sur

viva

l Pro

bab

ility

1970 1980 1990 2000analysis time

Observed: west = 0 Observed: west = 1Predicted: west = 0 Predicted: west = 1

Predicted differs from observed for countries in West

Page 36: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• 3. Piecewise Models• Piecewise = break model up into pieces (by time)

– Ex: Split analysis in to “early” vs “late” time

• If coefficients vary in different time periods, hazards are not proportional

– Example:• stcox var1 var2 var3 if _t < 10 • stcox var1 var2 var3 if _t >= 10 • Look for large changes in coefficients!

Page 37: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• In a piecewise model, coefficients would differ in non-proportional models

Proportional Non-Proportional

Here, the effect is the same in both time periods

Early Late Early Late

Here, the effect is negative in the early period and positive in the late period

Page 38: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Piecewise Models• Look at coefficients at 2 (or more) spans of timeEARLY. stcox gdp degradation education democracy ngo ingo if year < 1985, robust nohr          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]-------------+----------------------------------------------------------------         gdp |   .4465818   .4255587     1.05   0.294    -.3874979    1.280661 degradation |   -.282548   .1572746    -1.80   0.072    -.5908005    .0257045   education |  -.0195118   .0328195    -0.59   0.552    -.0838368    .0448131   democracy |   .2295673   .2625205     0.87   0.382    -.2849634     .744098         ngo |   .6792462   .3110294     2.18   0.029     .0696399    1.288853        ingo |   .6664661   .4804229     1.39   0.165    -.2751456    1.608078------------------------------------------------------------------------------LATE. stcox gdp degradation education democracy ngo ingo if year >= 1985, robust nohr          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]-------------+----------------------------------------------------------------         gdp |   .4963942    .357739     1.39   0.165    -.2047613     1.19755 degradation |  -.5702894   .2395257    -2.38   0.017    -1.039751   -.1008277   education |   .0142118   .0143762     0.99   0.323    -.0139649    .0423886   democracy |   .2541799   .0981386     2.59   0.010     .0618317    .4465281         ngo |   .1742862   .1448187     1.20   0.229    -.1095532    .4581256        ingo |  -.1134661   .2104308    -0.54   0.590    -.5259028    .2989707------------------------------------------------------------------------------

Note: Effect of ngo is larger in early period

Page 39: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• 4. Tests based on re-estimating model• Try including time interactions in your model• Recall: Interactions – effect of A on C varies with B• If effect of variable X on hazard rate (or ratio) varies

with time, then hazards aren’t proportional

– Recall example: Abstinence programs• Perhaps abstinence programs have a big effect initially,

but the effect diminishes (or reverses) later on

Page 40: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Red = Abstinence group; green = control

Proportional Non-Proportional

In non-proportional case, the effect of abstinence programs varies across time

Page 41: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Strategy: Create variables that reflect the interaction of X variables with time

• Significant effects of time interactions indicate non-proportional hazard

• Fortunately, inclusion of the interaction term in the model corrects the problem.

• Issue: X variables can interact with time in multiple ways…

– Linearly– With “log time” or time squared– With time dummies– You may have to try a range of things…

Page 42: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Red = Abstinence group; green = control

Linear time interactionEffect grows consistently over timeTry “Abstinence*time”

Interaction with time-period… Effect differs early vs. late Try “Abstinence*DLate”

Page 43: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• 5. Grambsch & Therneau test – Ex: Stata “estat phtest”

• Test for non-zero slope of Schoenfeld residuals vs time– Implies log hazard ratio function = proportional

• Can be applied to general model, or for each variable

stcox gdp degradation education democracy ngo ingo, robust nohr scaledsch(sca*) schoenfeld(sch*)

. estat phtest

Test of proportional hazards assumption

Time: Time ---------------------------------------------------------------- | chi2 df Prob>chi2 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ----------------------------------------------------------------

Significant chi-square indicates violation of proportional hazard assumption

Page 44: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• Variable-by-variable test “estat phtest”:

. estat phtest, detail

Test of proportional hazards assumption

Time: Time ---------------------------------------------------------------- | rho chi2 df Prob>chi2 ------------+--------------------------------------------------- gdp | 0.09035 0.63 1 0.4277 degradation | -0.22735 3.41 1 0.0646 education | 0.06915 0.47 1 0.4950 democracy | -0.04929 0.20 1 0.6560 ngo | -0.18691 4.56 1 0.0327 ingo | -0.03759 0.34 1 0.5609 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ----------------------------------------------------------------

Note: Certain variables are especially problematic…

Page 45: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption• Notes on estat phtest :

– 1. Requires that you calculate “schoenfeld residuals” when you run the original cox model

– And, if you want a test for each variable, you must also request scaled schoenfeld residuals

– 2. Test is based on identifying non-zero time trend… but how should we characterize time?

• Options: normal/linear time, log time, time dummies, etc– Results may differ depending on your choice– Ex: estat phtest, log – specifies “log time”

• Plot of smoothed Schoenfeld residuals can indicate best way to characterize time

– Linear trend (not a curve) indicates that time is characterized OK– Ex: estat phtest, plot(ngo) OR estat phtest, log plot(ngo)

Page 46: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• What if the assumption is violated?

• 1. Improve model specification• Add time interactions to address nonproportionality• Ex: If high democracies are not proportional to low

democracies, try adding “highdemoc*time”• Variables can be interacted with linear time, log time,

time dummies, etc., to address the issue

• 2. Model groups separately• Split sample along variables that are non-proportional.

Page 47: Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Assumption

• What if the assumption is violated?

• 3. Use a stratified Cox model• Allows a different baseline hazard for each group

– But, you can’t estimate effect of stratifying variable!

• Ex: stcox var1 var2 var3, strata(Dhighdemoc)

• 4. Use a piecewise model• Split time into chunks… in which PH assumption is met

– Requires sufficient sample size in all time periods!

• 5. Live with it (but temper your conclusions)• Allison points out: Cox model is reasonably robust

– Other issues (e.g., model misspecification) are bigger issues.