more eha models & diagnostics sociology 229a: event history analysis class 7 copyright © 2008...

27
More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Upload: cory-oneal

Post on 04-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

More EHA Models & Diagnostics

Sociology 229A: Event History AnalysisClass 7

Copyright © 2008 by Evan SchoferDo not copy or distribute without permission

Page 2: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Announcements

• Assignment #5 due

• Final paper assignment handed out• Due at end of quarter

• Class topic: • AFT models• Stratified Models• More on residuals, diagnostics• Discussion: Empirical Paper

Page 3: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Short Paper Assignment

• New Topic: Organizational mortality among “licensed lenders”

• A type of credit company regulated by New York state– “Mom & pop” lenders… eventually largely outcompeted by

modern banks/credit cards…

– Examples:• Empire City Personal Loan Company

– Founded 1932, Dissolved 1938

• American Credit Company» Renamed “Liberty Loan Company” in 1942

– Founded 1902, Dissolved 1964– Branch office in 1947; dissolved in 1955– Branch office in 1955; censored in 1965.

Page 4: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Short Paper Assignment

• Licensed lenders dataset– Unit of analysis: Organization

• Branch offices each have an independent government license, are treated as fully separate organizations

– Data structure:• Annual data set

– Time-series / “Long form”, split-spell data

– Outcome of interest: Organizational mortality• When the organization dies/dissolves/shuts down

– Rudimentary independent variables included…

Page 5: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Short Paper Assignment

• Project goals:– 1. Test a series of hypotheses (which I provide)

using EHA models– 2. Run some simple EHA diagnostics

• Check proportionality assumption for one X var• Check for outliers using residuals

– 3. Write up results (4-5 pages)• Like the methods/results section of a short journal

article…

Page 6: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Accelerated Failure Time Models

• We’ve been modeling the hazard rate: h(t)• Most parametric approaches build on Cox strategy…

• An alternative approach: model log time• Using parametric approach like exponential or Weibull• Focus is time rather than hazard rate:

Xt )ln(• Where last term “e” is assumed to have a distribution

that defines the model (e.g., making it Weibull)– Recall: odd distrubution of e is the problem with OLS– What if we introduced a complex parameter here!

Page 7: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Accelerated Failure Time Models

• Cleves et al. 2004: AFT (or “log time) models aren’t actually new kinds of models

• Rather, they are re-expressing the same models in a different metric…

• Instead of expressing effects on hazard rate, coefficients reflect effect on log time to event

• Instead of “hazard ratios” you can compute “time ratios”

– Substantive emphasis is on TIME to event• This can be desirable… more concrete than haz rates

– Issue: coefficients have opposite signs!!!• A variable that increases hazard rate will decrease

time to event.

Page 8: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard vs. AFT

• Blossfeld data: Upward employment moves. streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(exponential) nohr

Exponential regression -- log relative-hazard form

No. of subjects = 591 Number of obs = 591No. of failures = 84Time at risk = 40161 LR chi2(6) = 131.39Log likelihood = -253.68509 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- edu | .3020663 .0429622 7.03 0.000 .2178619 .3862708 coho2 | .6366232 .2713856 2.35 0.019 .1047172 1.168529 coho3 | .7340517 .2766077 2.65 0.008 .1919105 1.276193 lfx | -.0022632 .0020781 -1.09 0.276 -.0063363 .0018098 pnoj | .1734636 .1003787 1.73 0.084 -.0232751 .3702022 pres | -.143771 .0142008 -10.12 0.000 -.171604 -.115938 _cons | -5.116249 .6197422 -8.26 0.000 -6.330922 -3.901577------------------------------------------------------------------------------

Log relative hazard = Proportional hazards model

Page 9: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard vs. AFT• Blossfeld data: Upward employment moves. streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(exponential) nohr time

Exponential regression -- accelerated failure-time form

No. of subjects = 591 Number of obs = 591No. of failures = 84Time at risk = 40161 LR chi2(6) = 131.39Log likelihood = -253.68509 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- edu | -.3020663 .0429622 -7.03 0.000 -.3862708 -.2178619 coho2 | -.6366232 .2713856 -2.35 0.019 -1.168529 -.1047172 coho3 | -.7340517 .2766077 -2.65 0.008 -1.276193 -.1919105 lfx | .0022632 .0020781 1.09 0.276 -.0018098 .0063363 pnoj | -.1734636 .1003787 -1.73 0.084 -.3702022 .0232751 pres | .143771 .0142008 10.12 0.000 .115938 .171604 _cons | 5.116249 .6197422 8.26 0.000 3.901577 6.330922------------------------------------------------------------------------------

Streg option “time” specifies AFT form

Note that log likelihood and T/Z values are the same. However, all signs are opposite & in a different scale.

Page 10: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional vs. AFT metric• Weibull models: Here, coefficients differ…. streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(weibull) nohrWeibull regression -- log relative-hazard form _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- edu | .3004217 .0438282 6.85 0.000 .2145201 .3863234 coho2 | .6259013 .2775622 2.25 0.024 .0818895 1.169913 coho3 | .7189294 .2886739 2.49 0.013 .1531389 1.28472 lfx | -.0022896 .0020818 -1.10 0.271 -.0063698 .0017906 pnoj | .1719096 .1007356 1.71 0.088 -.0255286 .3693478 pres | -.1430822 .0146639 -9.76 0.000 -.171823 -.1143414 _cons | -5.043614 .7361298 -6.85 0.000 -6.486402 -3.600826

. streg edu coho2 coho3 lfx pnoj pres if pres <=65, dist(weibull) nohr timeWeibull regression -- accelerated failure-time form _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- edu | -.3048278 .046158 -6.60 0.000 -.3952959 -.2143598 coho2 | -.635081 .2753596 -2.31 0.021 -1.174776 -.0953861 coho3 | -.7294735 .2817224 -2.59 0.010 -1.281639 -.1773078 lfx | .0023232 .0021333 1.09 0.276 -.0018581 .0065045 pnoj | -.1744309 .1019852 -1.71 0.087 -.3743182 .0254564 pres | .1451807 .0163841 8.86 0.000 .1130684 .1772929 _cons | 5.117586 .6280134 8.15 0.000 3.886702 6.348469

Page 11: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Accelerated Failure Time Models

• Remarks:– 1. AFT models are less common, but you’ll run

across them occasionally– 2. It is important to recognize them…

• Because coefficient interpretations are opposite!

– 3. STATA currently offers more parametric options for AFT models

• Log-logistic and log-normal are only available in AFT• These are non-monotonic curves, might be useful…

– So, you might consider them if you are having trouble with model fit.

Page 12: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Parametric Models & Predictions

• Parametric models allow prediction of failure times for all cases

• Whether using proportional hazard or AFT metric

– Strategy: run model, then use “predict” command– Issues:

• 1. You have many prediction options…– “Mean” estimated time; Median estimated time (+ log options)

• 2. If you have split-spell data, you’ll get a prediction for EACH record in the data

– Predictions take into account X variables– As X variables change, predicted time changes, too!

Page 13: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Predicted Times• Blossfeld job data (upward moves). list id duration event sex time mdtime

+----------------------------------------------------+ | id duration event sex time mdtime | |----------------------------------------------------| 1. | 1 427 0 1 130.2342 90.27149 | 2. | 2 45 1 2 192.2021 133.2243 | 3. | 2 33 0 2 5651.612 3917.399 | 4. | 2 219 0 2 5131.651 3556.99 | 14. | 6 25 1 1 205.6662 142.557 | 20. | 7 5 1 2 116.0007 80.40555 | 21. | 7 14 0 2 416.3065 288.5616 | 29. | 10 120 1 1 690.877 478.8794 | 30. | 10 141 1 1 2412.739 1672.383 | 31. | 10 120 0 1 21855.97 15149.41 | 37. | 12 27 1 1 92.27634 63.96109 | 38. | 12 70 0 1 2605.027 1805.667 | 39. | 13 38 0 2 774.3403 536.7318 | 40. | 13 101 0 2 1094.581 758.7059 | 41. | 14 35 0 2 579.2303 401.4919 | 42. | 14 86 0 2 528.3259 366.2076 | 43. | 15 11 0 1 1612.258 1117.532 | 44. | 15 11 1 1 139.5957 96.76038 |

Predicted median time is 80 months, actual upward move occurred in 5 months…

Model really doesn’t expect this case to have an upward job transition…

Page 14: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Parametric Models & Predictions

• Useful things you can do with predictions:– 1. Highlight some examples to give your reader a

concrete sense of event timing…– 2. Construct predictions that reflect different

values of X variables• Ex: Run model. Make predictions. Recode Xs. Make

further predictions– Example: How would the predicted time-to-event change if

case was male, rather than female– Ex: Environmental treaties: What is predicted time to treaty

signing if democracy were 10 rather than 1?

• Vividly illustrates coefficient effects.

Page 15: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Residuals – Summary• From Cleves et al. (2004) An Introduction to Survival

Analysis Using Stata, p. 184:• 1. Cox-Snell residuals

• … are useful for assessing overall model fit

• 2. Martingale residuals• Are useful in determining the functional form of the covariates to

be included in the model

• 3. Schoenfeld residuals (scaled & unscaled), score residuals, and efficient score residuals

• Are useful for checking & testing the proportional hazard assumption, examining leverage points, and identifying outliers

• NOTE: A residual is produced for each independent variable…

• 4. Deviance residuals• Are useful fin examining model accuracy and identifying outliers.

Page 16: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox-Snell Residuals

• Cox-Snell residuals for case i:

)ˆexp()(ˆ 0 ii XtHCSresid • Where H(t)-hat is the estimate of the cumulative hazard

– Based on model results

• B-hats are estimates from the model• Xi are values for each case in your data

– Interpretation: “The expected number of events in a given time-interval”

– Box-Steffensmeier & Jones 2004.

Page 17: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox-Snell residuals: Model Fit

• Cox-Snell residuals can be plotted to assess model fit

• If model fits well, graph of integrated (cumulative) hazard conditional on Cox-Snell residuals vs. Cox-Snell residuals will fall on a line

– Strategy in stata:• Run Cox model, request martingale residuals• Use “predict” to compute Cox-Snell residuals• Stset your data again, with Cox-Snell as time variable• Compute integrated hazard• Graph integrated hazard versus residuals.

Page 18: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox-Snell residuals: Model Fit

• Cox-Snell residuals can be plotted to assess model fit

• If model fits well, graph of integrated (cumulative) hazard conditional on Cox-Snell residuals vs. Cox-Snell residuals will fall on a line

– Strategy in stata:• Run Cox model, request martingale residuals• Use “predict” to compute Cox-Snell residuals• Stset your data again, with Cox-Snell as time variable• Compute integrated hazard• Graph integrated hazard versus residuals.

Page 19: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox-Snell Model Fit Example• Cox-Snell Plot for Environmental Law data

0.5

11.

52

0 .2 .4 .6partial Cox-Snell residual

Nelson-Aalen cumulative hazardpartial Cox-Snell residual

This looks quite bad. Cumulative hazard should fall on the line… Instead, there is a sizable gap.

Note: Don’t worry much about deviations from the line at the right edge of the plot. There are typically few cases there…

Page 20: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Martingale Residuals

• Martingale residuals: More intuitive…• Difference between observed event (vs. censored) and

expected number of events a case is predicted to have– Based on hazard rate given X vars…

• Martingale residuals range from –infinity to +1– Often very skewed

– Deviance residuals: Normalized version of martingale residuals.

Page 21: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

MG Residuals and Functional Form• Issue: What functional form of independent

variables should you choose?• Ex: Should you log your independent variables?

– Skewness is one consideration; but you also want to specify the correct relationship between vars…

– In OLS regression we can plot X vars versus residuals to identify departures from linearity

• In EHA, we can do something similar:• Estimate Cox model without covariates, save

martingale residuals• Use “lowess” command to plot mean residuals versus

X variables• Functional form that is closest to a flat line = best.

Page 22: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

MG Residuals and Functional Form

• Stata code:** Use Martingale Residuals to check functional form*

stset tf, fail(des)

* Estimate a cox model with NO covariates* -- option "estimate" makes this happen* Plus, create a new variable "mg" containing* Martingale residualsstcox , mgale(mg) estimate

* Next, plot residuals versus different transformations* of your X variables (with smoothed mean – lowess)lowess mg lfxlowess mg lfxcubedlowess mg loglfx

Page 23: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

-2-1

01

mar

tinga

le

0 2.00e+07 4.00e+07 6.00e+07 8.00e+07lfxcubed

bandwidth = .8

Lowess smoother

Martingale Functional Form Example• Blossfeld employment termination data

• Should labor force experience be raw, logged, cubed?

Labor force experience is CUBED…

Note the SHARP curve near zero… Very non-linear

This is really bad.

Page 24: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

-2-1

01

mar

tinga

le

0 100 200 300 400lfx

bandwidth = .8

Lowess smoother

Martingale Functional Form Example• Blossfeld employment termination data

• Should labor force experience be raw, logged, cubed?

This is RAW labor force experience

Not bad… close to a flat line.

Page 25: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

-2-1

01

mar

tinga

le

0 2 4 6loglfx

bandwidth = .8

Lowess smoother

Martingale Functional Form Example• Blossfeld employment termination data

• Should labor force experience be raw, logged, cubed?

Labor force experience, logged

This is the best yet… but not a big difference from raw…

Page 26: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Discussion: Empirical Example

• Soule, Sarah A and Susan Olzak. 2004. “When Do Movements Matter? The Politics of Contingency and the Equal Rights Amendment.” American Sociological Review, Vol. 69, No. 4. (Aug., 2004), pp. 473-497.

Page 27: More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission