event history models sociology 229: advanced regression class 5 copyright © 2008 by evan schofer do...

24
Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Post on 21-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Event History Models

Sociology 229: Advanced RegressionClass 5

Copyright © 2008 by Evan SchoferDo not copy or distribute without permission

Page 2: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Announcements

• Assignment 3 due

• Agenda• EHA models• Discrete time models• More details on Cox models & other fully parametric

Proportional Hazard models• Break• Discussion of paper: Allison and McGinnis

Page 3: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Review

• Event history analysis focuses attention on rates of events/failures over time

• Descriptive approaches include:• Survivor Plots• Hazard plots• Integrated / cumulative hazard plots

• Also, we can conduct non-parametric tests to see if rates differ across groups

• Example: Log rank test

Page 4: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Hazard Plot: Marriage• Smoothed Hazard Rate: Full Sample

0.0

2.0

4.0

6.0

8.1

0 20 40 60 80analysis time

Smoothed hazard estimate

Page 5: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

EHA Models• Strategy: Model the hazard rate as a

function of covariates• Goal: Estimate coefficients that show impact of

independent variables on the hazard rate

– Also, we can use information from sample to compute t-values (and p-values)

• Test hypotheses about coefficients.

Page 6: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

EHA Models• Issue: In standard regression, we must

choose a proper “functional form” relating X’s to Y’s

• OLS is a “linear” model – assumes a liner relationship– e.g.: Y = a + b1X1 + b2X2 … + bnXn + e

• Logistic regression for discrete dependent variables – assumes an ‘S-curve’ relationship between variables

• When modeling the hazard rate h(t) over time, what relationship should we assume?

• There are many options: assume a flat hazard, or various S-shaped, U-shaped, or J-shaped curves

• We’ll discuss details later…

Page 7: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Constant Rate Models

• The simplest parametric EHA model assumes that the base hazard rate is generally “flat” over time

• Any observed changes are due to changed covariates• Called a “Constant Rate” or “Exponential” model• Note: assumption of constant rate isn’t always tenable

• Formula: nnXbXbXbath 2211)(ln

• Usually rewritten as:

)()( 2211)( βXXbXbXba eeth nn

Page 8: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Figure 3. Estimated hazard rateof entry into first marriage for entire sample

Est

ima

ted

Ha

zard

Ra

te

Age in Years12 20 30 40 50 60 70 80

12 20 30 40 50 60 70 80

0

.05

.1

.15

.2

0

.05

.1

.15

.2

Constant Rate Models• Is the constant rate assumption tenable?

Page 9: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Constant Rate Models• Question: Is the constant rate assumption

tenable?

• Answer: Harder question than it seems…• The hazard rate goes up and down over time

– Not constant at all – even if smoothed

• However, if the change was merely the result of independent variables, then the underlying (base) rate might, in fact, be constant

• If your model doesn’t include variables that account for time variation in h(t), then a constant-rate model isn’t suitable.

Page 10: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Constant Rate Models

• Let’s run an analysis anyway…

• Ignore possible violation of assumptions regarding the functional form of h(t)

• Recall -- Constant rate model is:

)()( 2211)( βxXbXbXba nnn eeth

• In this case, we’ll only specify one X var:• DFEMALE – dummy variable indicating women• Coefficient reflects difference in hazard rate for women

versus men.

Page 11: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Constant Rate Model: Marriage• A simple one-variable model comparing gender

. streg sex, dist(exponential) nohr

No. of subjects = 29269 Number of obs = 29269No. of failures = 24108Time at risk = 693938 LR chi2(1) = 213.53Log likelihood = -30891.849 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- Dfemale | .1898716 .0130504 14.55 0.000 .1642933 .2154499 _cons | -3.655465 .0216059 -169.19 0.000 -3.697812 -3.613119------------------------------------------------------------------------------

• The positive coefficient for DFemale indicates a higher hazard rate for women

Page 12: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Constant Rate Coefficients• Interpreting the EHA coefficient: b = .19

• Coefficients reflect change in log of the hazard– Recall one of the ways to write the formula:

nnXbXbXbath 2211)(ln

• But – we aren’t interested in log rates• We’re interested in change in the actual rate

• Solution: Exponentiate the coefficient• i.e., use “inverse-log” function on calculator• Result reflects the impact on the actual rate.

Page 13: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Constant Rate Coefficients

• Exponentiate the coefficient to generate the “hazard ratio”

Ratio Hazard21.1)19(.)( ee coef

• Multiplying by the hazard ratio indicates the increase in hazard rate for each unit increase in the independent variable

• Multiplying by 1.21 results in a 21% increase• A hazard ratio of 2.00 = a 100% increase• A hazard ratio of .25 = a decreased rate by 75%.

Page 14: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Constant Rate Coefficients• The variable FEMALE is a dummy variable

• Women = 1, Men = 0• Increase from 0 to 1 (men to women) reflects a 21%

increase in the hazard rate

– Continuous measures, however can change by many points (e.g., Firm size, age, etc.)

• To determine effects of multiple point increases (e.g., firm size of 10 vs. 7) multiply repeatedly

• Ex: Hazard Ratio = .95, increase = 3 units:• .95 x .95 x .95 = .86 – indicating a 14% decrease.

Page 15: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Hypothesis Tests: Marriage

• Final issue: Is the 21% higher hazard rate for women significantly different than men?

• Or is the observed difference likely due to chance?

• Solution: Hazard rate models calculate standard errors for coefficient estimates

• Allowing calculation of T-values, P-values--------------------------------------------------

_t | Coef. Std. Err. t P>|t|

--------+---------------------------------------

Female | .1898716 .0130504 14.55 0.000

_cons | -3.465594 .0099415 -348.60 0.000

--------------------------------------------------

Page 16: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Types of EHA Models• Two main types of proportional EHA Models

• 1. Parametric Models• specify a functional form of h(t)• Constant rate; Also: Gompertz, Weibull,etc.

• 2. Cox Models• Also called “semi-parametric”• Doesn’t specify a particular form for h(t)

• Each makes assumptions• Like OLS assumptions regarding functional form, error

variance, normality, etc• If assumptions are violated, results can’t be trusted.

Page 17: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Parametric Models• Parametric models make assumptions about

the shape of the hazard rate over time– Conditional on X

• Much like OLS regression assumes a linear relationship between X and Y, logit assumes s-curve

• Options: constant, Gompertz, Weibull• There is a piecewise exponential option, too

• Note: They also make standard statistical assumptions:

• Independent random sample• Properly specified model, etc, etc…

Page 18: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox Models

• The basic Cox model:)(

02211)()( nnXbXbXbethth

• Where h(t) is the hazard rate

• h0(t) is some baseline hazard function (to be inferred from the data)• This obviates the need for building a specific

functional form into the model

• bX’s are coefficients and covariates

Page 19: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox Model: Example

• Marriage example:

No. of subjects = 29269 Number of obs = 29269No. of failures = 24108 Time at risk = 693938

LR chi2(1) = 1225.71Log likelihood = -229548.82 Prob > chi2 = 0.0000

-------------------------------------------------- _t | Coef. Std. Err. z P>|z| --------+-----------------------------------------Female | .4551652 .0131031 34.74 0.000 --------------------------------------------------

Page 20: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox vs. Parametric: Differences• Cox Models do not make assumptions about

the time-dependence of the hazard rate– Cox models focus on time-ordering of observed

events ONLY• They do not draw information from periods in which no

events occur– After all, to do this you’d need to make some assumption

about what rate you’d expect in that interval…

– Benefit: One less assumption to be violated– Cost: Cox model is less efficient than a properly

specified parametric model• Standard errors = bigger; more data needed to get

statistically significant results.

Page 21: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Cox vs. Parametric: Similarities• Models discussed so far are all “proportional

hazard” models• Assumption: covariates (X’s) raise or lower the

hazard rate in a proportional manner across time• Ex: If women have higher risk of marriage than men,

that elevated risk will be consistent over all time…

• Another way of putting it:– Cox Models assume that independent variables

don’t interact with time • At least, not in ways you haven’t controlled for• i.e., that the hazard rate at different values of X are

proportional (parallel) to each other over time

Page 22: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Models

• Proportionality: X variables shift h(t) up or down in a proportional manner

h(t

)

time

Proportional

Women

Men

h(t

)

Not Proportional

Women

Men

Page 23: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Proportional Hazard Models• Issue: Does the hazard rate for women

diverge or converge with men over time?• If so, the proportion (or ratio) of the rate changes. • The proportional hazard assumption is violated

• Upcoming classes:• We’ll discuss how to check the proportional hazard

assumption and address violations…

Page 24: Event History Models Sociology 229: Advanced Regression Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission

Reading Discussion• Hironaka, Ann M.  2005.  “World Patterns in

Civil War Duration.”  Chapter 2 in Neverending Wars.  Cambridge, MA:  Harvard University Press.

• How are the models set up?• What were the outcomes? Findings?

• Empirical Example:  Soule, Sarah A and Susan Olzak.  2004.  “When Do Movements Matter? The Politics of Contingency and the Equal Rights Amendment.”  American Sociological Review, Vol. 69, No. 4. (Aug., 2004), pp. 473-497.