event-history analysis: discrete- & continuous-time …slls... · event-history analysis:...

168
Event-history analysis: discrete- & continuous-time methods Society for Longitudinal & Life-course Studies Summer School, University of Amsterdam, 25-29 August 2014 Prof. dr. K. Neels, Sociology Department, University of Antwerp QASS-Programme, KULeuven

Upload: dangphuc

Post on 16-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Event-history analysis:

discrete- & continuous-time methods

Society for Longitudinal & Life-course Studies

Summer School, University of Amsterdam, 25-29 August 2014

Prof. dr. K. Neels,

Sociology Department, University of Antwerp

QASS-Programme, KULeuven

Outline

1. Introduction

2. Descriptive discrete-time methods

3. Discrete-time models

4. Descriptive continuous-time methods

5. Continuous-time models

6. Advanced topics

Introduction

1. Applied longitudinal data analysis: example

2. Modeling event-occurrence: methodological features

3. Event-history data

4. Censoring

Introduction

1. Applied longitudinal data analysis: example

Marital dissolution (South, 2001)

• Observation plan:

- 23-year study

- data on 3523 couples in different generations who married in different years

• Effect of wives’ employment on marital dissolution: 2 hypotheses

- Effect might diminish over time:

more women enter the labor force and working becomes normative

- Effect might increase:

changing mores weaken the link between marriage and parenthood

Introduction

Singer et al, 2003, v.

Marital dissolution (South, 2001)

• Results:

- Effect of wives’ employment increases over time:

risk differential is higher in 1990s than in 1970s

- Effect of wives’ employment also increases with marital duration

• Conclusions:

- research based on cross-sectional data has too often assumed that effects of

predictors like wives’ employment are constant over time, i.e. ignoring

‘EMPLOYMENT*TIME’-interaction

- Have too often assumed that effect of predictors like wives’ employment is

constant in terms of marital duration: predictors of divorce among newlyweds are

likely to differ from those among couples who have been married for years, i.e.

ignoring ‘EMPLOYMENT*DURATION’-interaction

Singer et al, 2003, v.

Introduction

Introduction

1. Applied longitudinal data analysis: example

2. Modeling event-occurrence: methodological features

Research questions involving events

• 3 methodological features:

- Target event whose occurrence is being studied:

e.g. whether and when marital dissolution occurs

- Beginning of time:

an initial starting point when no one under study has yet experienced the target

event, e.g. date of marriage

- Metric for clocking time:

a meaningful scale on which event occurrence is recorded, e.g. marital duration

Singer et al, 2003, 305-324.

Introduction

Lexis Chart: Marriage & Divorce

Hinde, 1998, 12.

Age

(in years)

26 Person A 25 Divorce 24 23 line represents 22 marital duration 21 Marriage 20

2000 2001 2002 2003 2004 2005 2006 Calendar time (in years)

Introduction

Lexis Chart: General Form

Hinde, 1998, 12.

Age

(in years)

26 Person A 25 Event Occurrence 24 23 Exposure or Duration 22 21 Censoring Entry into Risk Set 20 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Calendar time (in years)

Introduction

Research questions involving events

• Event

sharp disjunction between what precedes and what follows, transition from one

state to another state

• State Space

- States can be physical (migration, becoming home owner), psychological (healthy

or depressed) or social (married or divorced)

- Requirement for survival analysis: states are mutually exclusive (non overlapping)

and exhaustive (i.e. state space includes all possible states).

• Number of states

- Most applications focus on two states. Two-state methods are expanded later

using models for competing risks.

• Repeatable versus non-repeatable events

- Some states are non-repeatable (first job, death, …): once entered, a person can

never re-enter. Other states are repeatable (changing jobs, having children,…):

different episodes or spells can be analyzed for every person.

Singer et al, 2003, 305-324.

Introduction

The beginning of time…

• Beginning of time:

moment when everyone in the population is in one, and only one, of the possible

states (i.e. origin state); moment when persons enter risk set for transition to

destination state.

• Timing of the transition (event time):

distance from the beginning of time until event occurrence; distance between entry

into risk set and event occurrence.

• Starting points for metric:

- Birth: all studies using age as the metric of time (e.g. occurrence of death,

depression, suicidal thoughts,…

- Precipitating event: e.g. graduation for studying transition to first employment,

marriage for study of divorce, first birth for studying occurrence and timing of

second births,…

- Arbitrary start time: e.g. age 14 or 16 for analysis of occurrence of first birth

Singer et al, 2003, 305-324.

Introduction

Metric of time

• Metric

unit in which passing of time is recorded (e.g. seconds, minutes, hours, days,

months, years, decades,…)

• Continuous versus Discrete measurement

Occurrence of events can be recorded in precise units (continuous time

measurement) or coarse intervals (discrete-time measurements).

• Reasons for discrete time data:

- events can only occur at discrete time points (graduating, enrolling for higher

education, …)

- events can theoretically occur at any point in time but tend to occur at certain

intervals (e.g. leaving jobs at the end of the contract (although theoretically

possible to quit in middle of contract)

- coarse measurement, e.g. retrospectively collected duration data necessitate

collapsing duration into intervals (memory problems, rounding errors).

Singer et al, 2003, 305-324.

Introduction

Overview: discrete-time versus continuous-time methods

• Continuous-time methods

- methods assuming that time is measured exactly or in small enough discrete units

to avoid large numbers of ties

- predominant method of event history analysis in social sciences, engineering and

biostatistics

• Discrete-time methods

- methods assuming coarse measurement of time – e.g. months, years or decades –

resulting in large numbers of ties

- robust alternative to continuous-time methods

- computationally intensive

• Discrete-time versus continuous-time

Continuous-time and discrete-time data have implications for methodological aspects

of survival analysis: parameter definition, model construction, estimation and testing

Allison, 1984, 9-14; Allison, 2004, 369-385.

Introduction

Introduction

1. Applied longitudinal data analysis: example

2. Modeling event-occurrence: methodological features

3. Event-history data

Event history data

• Event History:

- longitudinal record of all the changes in qualitative variables and their timing

- continuous observation (i.e. independent of waves,…)

- if studying causes of events, histories should include data on explanatory variables

- explanatory variables may be constant in time (e.g. race, sex,…)

- explanatory variables may change in time (e.g. income, treatment,…)

• Collection of event histories

- retrospectively (Neels, 1998; Neels, 2000; Neels, 2006; FFS; GGS)

- prospectively (Gadeyne, 2006; Neels & De Wachter 2010 & 2011)

- administrative records: death certificates, birth register, social security,…

- panel design: combination of prospective and retrospective features

Blossfeld et al., 2002, 1-37; Allison, 1984, 9-14.

Introduction

Event history data: caveats

• Non-factual data:

- retrospective questions concerning motivational, attitudinal, cognitive or affective

states are problematic

- respondents cannot accurately recall timing of changes for those variables

- analysis of relationship between attitudes and behaviour:

combination of panel design and retrospectively collected factual data

e.g. Generations and Gender Panel Survey (UNECE - PAU)

- 3 wave panel design with waves spaced at intervals of 3 years

- each wave measures values, attitudes, intentions,…

- effects of values at wave 1 on divorce, becoming a parent,… are measured

retrospectively at subsequent wave (i.e. histories are drawn up for different

kinds of events)

Blossfeld et al., 2002, 1-37; Allison, 1984, 9-14.

Introduction

Source: www.unece.org/pau/ggp

Time

Age

44

79

2005

Wave 1

<< Past biographies

Covariates >>

Intentions >>

Wave 2

<< Events W1-W2

Covariates >>

Intentions >>

2007 2006 2011

Event

18

At risk

2008 2009 2010

Wave 3

<< Events

Covariates

Generations & Gender Survey (GGS)

Event history data: examples

Introduction

Neels (2009)

)()()()Pr(1

)Pr(ln zxt

tTtT

tTtTγϕα ++=

≥=−

≥=

Events &Outcomes:

- Transition to

Adulthood

- Partnership

Formation &

Dissolution

- Fertility

- Retirement

Time Dimensions:

- Age

- Time

- Duration

- …

IndividualCovariates:

- Education

- Income

- Deprivation

- Health

- Values

- …

ContextualEffects:

- Household

- Family

- Socio- economic

Context

- Cultural context

- Welfare-State

Context

- …

Modeling change:

Event history data: examples

Introduction

Introduction

1. Applied longitudinal data analysis: example

2. Modeling event-occurrence: methodological features

3. Event-history data

4. Censoring

Time-to-event data and event-history models

• Characteristics of event history data:

- censoring

- time-varying covariates

• Standard Techniques

- means inappropriate for analysis of time-to-event data

- multiple regression, … ill suited to deal with censoring

- multiple regression, … ill suited to deal with time-varying covariates

• Methods of Event History Analysis

- developed to study whether and when events occur

- developed to deal with censoring

- capable of incorporating time-varying covariates

Allison, 1984, 9-14; Singer et al., 2003, 305-324.

Introduction

Censoring: examples

• Observations of event histories are often censored:

censoring occurs when the information about the duration in the origin state is

incompletely recorded

• Examples

Blossfeld et al., 2002, 38-42; Allison, 1984, 9-14.

Observation Period Time t

A

B C

D

E

F

G

Introduction

Censoring: examples

• Observation A

- fully censored to the left

- date of entry into and exit from origin state are unknown

- difficult to take into account

• Observation B

- partially censored to the left

- date of entry into initial state is unknown

- length of time spent in initial state is unknown

- information can be incorporated if hazard rate is time-constant

• Observation C

- no censoring

- individual enter into initial state and experiences event within observation period

Blossfeld et al., 2002, 38-42.

Introduction

Censoring: examples

• Observation D

- right censoring within observation period (e.g. migration, death,…)

- can be taken into account if process generating censoring is independent from

process under study

- but dropouts or missing values in panel study are usually selective

• Observation E

- right censoring at the end of the observation period (e.g. date of a retrospective

survey, last wave in panel design,…)

- right censoring is unproblematic if censoring occurs independently from process

under study

- survival analysis designed to deal with right censoring

Blossfeld et al., 2002, 38-42.

Introduction

Censoring: examples

• Observation F

- completely censored on the right

- entry into and exit from initial state occur after observation period

- occurs in retrospective life history studies where different birth cohorts are

observed over different spans of life: variables controlling for selection should be

introduced into model

• Observation G

- left and right censoring combined

- e.g. panel study: individual is observed to be in same state throughout the

observation period, but date of entry into and date of exit from are missing

Blossfeld et al., 2002, 38-42.

Introduction

Causes of censoring

• Censoring: unknown whether and when event occurs

- Individual may never experience event

- Individual may still experience event, but after observation period

• Censoring can occur at one or multiple points in time, e.g.

- prospective follow-up of a single cohort (single point)

- prospective follow-up of multiple cohorts (multiple points)

- retrospective study of age-heterogeneous sample (multiple points)

• Amount of censoring depends on:

- rate at which events occur

- length of the observation period (influenced by research design!)

• In most cases, censoring is inevitable

Singer et al., 2003, 305-324.

Introduction

Types of censoring: noninformative versus informative

• Noninformative censoring mechanism

- Operates independent of event occurrence and risk of event occurrence: e.g.

under control of researcher, determined by design (e.g. survey date)

- we can assume that censored observations are not a selective subgroup

• Informative censoring mechanism

- censoring occurs because individuals have experienced the event or are likely to

do so in the future (migration as a result of recidivism,…)

- censored observations differ systematically from noncensored individuals

- no statistical procedure can yield unbiased estimates is the censoring mechanism

is informative.

- For censoring that is not determined by design, independence between censoring

and event-occurrence has to be assumed: validate findings with other sources

wherever possible

Singer et al., 2003, 305-324.

Introduction

Types of censoring: noninformative versus informative

• Right censoring

- event time is unknown because event occurrence not observed

- most frequently encountered in practice

- type for which survival models were developed

• Left censoring

- event time unknown because time of entry into risk set is unknown

- question omitted in questionnaire,…

- often in the analysis of repeatable events:

- not easily addressed even with most sophisticated methods: exclude leftcensored

spells unless hazard is constant over time (see also Allison, 1984, 57)

Singer et al., 2003, 305-324.

Introduction

Descriptive discrete-time methods

1. Single-decrement life table

2. Person-period files & descriptive statistics

Descriptive discrete-time methods

1. Single-decrement life table

Second birth: FFS1991

Singer et al., 2003, 326-356; Lesthaeghe, 1996.

Descriptive discrete-time methods

Fertility & Family Surveys

• Flanders & Brussels, March – October 1991

• Complete retrospective histories on:

- activities (educational career, jobs, unemployment,…)

- activities of current partner

- primary relationships

- children

- pregnancies and subfecundity

• Timing of life events measured in year-format:

discrete-time model

• Transition to second child:

- N=1773 women who had 1st child between 1968-1990

- follow-up from 1st birth until 2nd birth/end 1990

• Parallel histories:

Time-varying covariates, lagged 1 year

Second birth: FFS1991

Hinde, 1998, 12.

. tab time birth2

| RECODE of RKIDBY2

time | 0 1 | Total

-----------+----------------------+----------

0 | 104 26 | 130

1 | 102 163 | 265

2 | 53 407 | 460

3 | 42 244 | 286

4 | 33 141 | 174

5 | 31 67 | 98

6 | 35 38 | 73

7 | 23 28 | 51

8 | 23 11 | 34

9 | 31 12 | 43

10 | 17 7 | 24

11 | 18 5 | 23

12 | 18 5 | 23

13 | 27 2 | 29

14 | 18 2 | 20

15 | 8 0 | 8

16 | 8 0 | 8

17 | 8 2 | 10

18 | 8 0 | 8

19 | 5 0 | 5

21 | 1 0 | 1

-----------+----------------------+----------

Total | 613 1,160 | 1,773

Descriptive discrete-time methods

• Life Table

Tracks the event histories (“lives”) of a sample of individuals from the beginning oftime (when no one has yet experienced the target event) through the end of datacollection.

• Calender time vs Duration

- Event histories recorded in calendar time are re-arranged as a function of duration since entry into the initial state

• Time intervals

- Duration of time is divided into a number of substantially meaningful time interval

- each interval includes, i.e. [] the initial time and excludes, i.e. () the concluding time

• For each time interval:

- risk set, i.e. number who entered the interval

- number who made transition/experienced event

- number censored during the interval

Life Table

Singer et al., 2003, 326-356; Lesthaeghe, 1996.

Descriptive discrete-time methods

• Definition

- the number of people who enter each successive period

- the number eligible to experience an event in the interval

• Irreversible

- once an individual experiences an event or is censored the individual drops out of the risk set of future intervals

- everyone is retained in risk set until last moment of eligibility

• Assumption of non-informative censoring

- if censoring is non-informative risk set can be assumed to represent all individualswho would have been at risk of event occurrence if is everyone could have beenfollowed that long.

- under the assumption of non-informative censoring the experience of eachinterval’s risk set can be generalized back to the entire population

Risk Set

Singer et al., 2003, 326-356; Lesthaeghe, 1996.

Descriptive discrete-time methods

• Discrete random variable T

Values Ti indicate the time period j when individual i experienced the target event: e.g. if individual experiences event in year Ti =1

• Probability density function

Probability that individual i will experience the event in time period j:

(= unconditional probability !)

Life-table functions: discrete-time hazard, h(t) (1)

Singer et al., 2003, 326-356.

[ ]jTi =Pr

Descriptive discrete-time methods

• Discrete-time hazard h(tij)Conditional probability that individual i will experience the event in time period j, given that he or she did not experience the event in an earlier time period.

• Hazard function

The set of h(tij) is the hazard function for individual i. If individuals are notdistinguished on the basis of predictors, the hazard function for a random member ofthe population is denoted as h(t).

Life-table functions: discrete-time hazard, h(t) (2)

Singer et al., 2003, 326-356.

[ ]jTjTth iiij ≥== Pr)(

Descriptive discrete-time methods

• Estimate of the discrete-time hazard i period j

n eventsj represents the number of individuals who experience the target event intime period j and n at riskj represents the number of individuals at risk during timeperiod j:

• maximum likelihood estimate of the discrete-time hazard function

• discrete limit of the Kaplan-Meier estimate of the hazard for continuous-time data

• As the discrete-time hazard is a conditional probability:

Estimating discrete-time hazards

Singer et al., 2003, 326-356.

j

j

ijriskatn

eventsnth =)(ˆ

1)(ˆ0 ≤≤ ijth

Descriptive discrete-time methods

Calculation:

• h(tj): fraction of the risk set in period j experiencing target event in that period

• standard error of h(tj) obtained as standard error for a proportion

(e.g. square root of pq/N):

Rules of Thumb:

• the closer the hazard is to 0.50 the less precise the estimate; the closer to 0 or 1 the

more precise: usually hazards are small and thus estimated precisely.

• the larger the risk set, the more precise the estimate of the hazard; the smaller the

risk set, the less precise.

• Estimated standard errors are larger when fewer people are at risk. As the risk set

declines over time, estimated hazards tend to be less precise than earlier

exposures.

Standard Error of Estimated Hazard Probabilities

Singer et al., 2003, 325-356.

jriskat n

))(ˆ1)((ˆ))(ˆ(

jj

j

thththse

−=

Descriptive discrete-time methods

Second birth: life table

Hinde, 1998, 12.

Interval Risk Set Events Censoring Discrete-time

at start in interval in interval Hazard

[0,1) 1773 26 104 0,0147

[1,2) 1643 163 102 0,0992

[2,3) 1378 407 53 0,2954

[3,4) 918 244 42 0,2658

[4,5) 632 141 33 0,2231

[5,6) 458 67 31 0,1463

[6,7) 360 38 35 0,1056

[7,8) 287 28 23 0,0976

[8,9) 236 11 23 0,0466

[9,10) 202 12 31 0,0594

[10,11) 159 7 17 0,0440

[11,12) 135 5 18 0,0370

[12,13) 112 5 18 0,0446

[13,14) 89 2 27 0,0225

[14,15) 60 2 18 0,0333

[15,16) 40 0 8 0,0000

[16,17) 32 0 8 0,0000

17,18) 24 2 8 0,0833

[18,19 14 0 8 0,0000

[19,20) 6 0 5 0,0000

[20,21) 1 0 0 0,0000

[21,22) 1 0 1 0,0000

8560 1160 613

Descriptive discrete-time methods

• Plot of the discrete-time hazard

- Plot the discrete-time hazard probabilities over exposure/duration as a series ofpoints joined together by line segments.

- identify periods of high risk

- characterize the shape of the hazard function,

i.e. modeling the discrete-time function

Graphical representation of the hazard function

Singer et al., 2003, 326-356.

Largest number of events does not

necessarily occur in a period where

the hazard is high. The number ofpeople affected by the hazard in each

period depends on the number at risk

(i.e. the value of the survivor function).

Descriptive discrete-time methods

Second birth: hazard function

Hinde, 1998, 12.

0,00

0,05

0,10

0,15

0,20

0,25

0,30

0,35

[0,1) [1,2) [2,3) [3,4) [4,5) [5,6) [6,7) [7,8) [8,9) [9,10)[10,11)[11,12)[12,13)[13,14)[14,15)[15,16)[16,17)17,18)[18,19[19,20)[20,21)[21,22)

Dis

cre

te-t

ime H

azard

[P

r(T

=t|

T>

=t)

]

Time since First Birth (in years)

Hazard function

Descriptive discrete-time methods

• Definition

The survival probability S(tij) is the probability that an individual i will survive pasttime period j, i.e. probability that individual i does not experience event in time periodj or any earlier time period

• Discrete variable T:

• Survivor function

Set of S(tij) for an individual is the individual’s survivor function. When individuals arenot distinguished on the basis of predictors, the survivor function for a randommember of the population is denoted as S(t).

Life-table functions: survivor function, S(tij)

Singer et al., 2003, 326-356.

[ ]jTtS iij >= Pr)(

Descriptive discrete-time methods

• S(t0)=1

In the beginning of time when no one has yet experienced the event S(t)=1

• Behaviour of the survivor function over time:

- S(t) declines over time toward its lower bound 0: S(t) never increases!

- S(t) declines quickly if discrete-time hazard is high

- S(t) declines slowly is discrete-time hazard is low

- S(t) remains constant is discrete-time hazard = 0

- S(t) not necessarily reaches 0 by the end of the observation period

• S(t) cumulates risk over the observation period to estimate the fraction of the initial population surviving up to each successive time period:

• S(t) indicates the proportion of people exposed to each period’s hazard:

- h(t) is high when S(t) is high = many people affected

- h(t) is high when S(t) is low = few people affected

Characteristics of S(tij)

Singer et al., 2003, 326-356.

Descriptive discrete-time methods

• Direct method

- only applicable for intervals preceding first instance of censoring

- number of individuals surviving is not affected by censoring

• Indirect Method

- equally applicable in the presence of censoring

- denotes probability of surviving interval j

- denotes probability of surviving interval j

Maximum Likelihood Estimators of S(t)

Singer et al., 2003, 326-356.

set data in then

j period timeof end by theevent thedexperiencenot have n who)(ˆ =jtS

[ ])(ˆ1 jth−

[ ][ ][ ] [ ])(ˆ1...)(ˆ1)(ˆ1)(ˆ1)(ˆ121 ththththtS jjjj −−−−= −−

[ ])(ˆ1)1(ˆ)(ˆjjj thtStS −−=

Descriptive discrete-time methods

• Interpretation

Estimated survivor function provides maximum likelihood estimates of theprobability that an individual randomly selected from the population will survivethrough each successive time period

• Extrapolation of sample experience

- observed risk sets decline as a result of censoring and event occurrence

- declines only as a result of event occurrence

- assuming independent censoring allows to estimate what would havehappened to the initial population were there no censoring

Maximum Likelihood Estimators of S(t)

Singer et al., 2003, 326-356.

)(ˆjtS

)(ˆjtS

)(ˆjtS

Descriptive discrete-time methods

• More complex than :

Survival probability in period j is estimated as the product of in period j and all

previous periods

• Greenwood’s approximation:

• Approximation unreliable for periods where risk set drops below N=20

Standard Error of Estimated Survival Probabilities

Singer et al., 2003, 326-356.

))(ˆ( jthse

∑= −

=−

++−

+−

=j

i ii

i

j

jj

j

jj

thn

thtS

thn

th

thn

th

thn

thtStSse

122

2

11

1

))(ˆ1(

)(ˆ)(ˆ

))(ˆ1(

)(ˆ...

))(ˆ1(

)(ˆ

))(ˆ1(

)(ˆ)(ˆ))(ˆ(

Descriptive discrete-time methods

Second birth: survivor function

Singer et al., 2003, 326-356.

Interval Risk Set Events Censoring Discrete-time Probability Survivor

at start in interval in interval Hazard Survival Function

[0,1) 1773 26 104 0,0147 0,9853 1,0000

[1,2) 1643 163 102 0,0992 0,9008 0,9853

[2,3) 1378 407 53 0,2954 0,7046 0,8876

[3,4) 918 244 42 0,2658 0,7342 0,6254

[4,5) 632 141 33 0,2231 0,7769 0,4592

[5,6) 458 67 31 0,1463 0,8537 0,3567

[6,7) 360 38 35 0,1056 0,8944 0,3046

[7,8) 287 28 23 0,0976 0,9024 0,2724

[8,9) 236 11 23 0,0466 0,9534 0,2458

[9,10) 202 12 31 0,0594 0,9406 0,2344

[10,11) 159 7 17 0,0440 0,9560 0,2205

[11,12) 135 5 18 0,0370 0,9630 0,2107

[12,13) 112 5 18 0,0446 0,9554 0,2029

[13,14) 89 2 27 0,0225 0,9775 0,1939

[14,15) 60 2 18 0,0333 0,9667 0,1895

[15,16) 40 0 8 0,0000 1,0000 0,1832

[16,17) 32 0 8 0,0000 1,0000 0,1832

17,18) 24 2 8 0,0833 0,9167 0,1832

[18,19 14 0 8 0,0000 1,0000 0,1679

[19,20) 6 0 5 0,0000 1,0000 0,1679

[20,21) 1 0 0 0,0000 1,0000 0,1679

[21,22) 1 0 1 0,0000 1,0000 0,1679

8560 1160 613

Descriptive discrete-time methods

Second birth: survivor function

Singer et al., 2003, 326-356.

0,00

0,10

0,20

0,30

0,40

0,50

0,60

0,70

0,80

0,90

1,00

[0,1) [1,2) [2,3) [3,4) [4,5) [5,6) [6,7) [7,8) [8,9) [9,10)[10,11)[11,12)[12,13)[13,14)[14,15)[15,16)[16,17)17,18)[18,19[19,20)[20,21)[21,22)

Pro

po

tio

n s

urv

ivin

g

Time since First Birth (in years)

Survivor Function

Descriptive discrete-time methods

• Median Lifetime

- As a result of censoring sample mean is inappropriate to identify center of thedistribution of T

- estimated median lifetime identifies that value of T for which the value of theestimated survival function equals 0.50: half of the population leaves the initialstate before the estimated median lifetime, the other half leave after the medianlifetime or never leave at all

- precise estimate is obtained through linear interpolation:

Where m represents time interval where sample survivor function is just above0.50, is the value of the survivor function in that interval and is

the value of the sample survivor function in the following interval

Life-table functions: Median Lifetime

Singer et al., 2003, 326-356.

))1(()()(

50.0)(ˆ

1

mmtStS

tSmlifetimemedianEstimated

mm

m −+

−+=

+

)(ˆmtS )(ˆ

1+mtS

Descriptive discrete-time methods

• 3 important limitations

- estimates median lifetime merely estimates “average” lifetime: says little aboutdistribution of event times and is relatively insensitive to extreme values

- median lifetime does not necessarily reflect moment when h(tj) is particularlyelevated, merely indicates the point where S(tj) declines below 0.50

- median lifetime says little about the distribution of the hazard: identical meanlifetimes can result from dramatically different hazard functions

Life-table functions: Median Lifetime

Singer et al., 2003, 326-356.

Descriptive discrete-time methods

Second birth: FFS1991

Hinde, 1998, 12.

Interval Risk Set Events Censoring Discrete-time Probability Survivor

at start in interval in interval Hazard Survival Function

[0,1) 1773 26 104 0,0147 0,9853 1,0000

[1,2) 1643 163 102 0,0992 0,9008 0,9853

[2,3) 1378 407 53 0,2954 0,7046 0,8876

[3,4) 918 244 42 0,2658 0,7342 0,6254

[4,5) 632 141 33 0,2231 0,7769 0,4592

[5,6) 458 67 31 0,1463 0,8537 0,3567

[6,7) 360 38 35 0,1056 0,8944 0,3046

[7,8) 287 28 23 0,0976 0,9024 0,2724

[8,9) 236 11 23 0,0466 0,9534 0,2458

[9,10) 202 12 31 0,0594 0,9406 0,2344

[10,11) 159 7 17 0,0440 0,9560 0,2205

[11,12) 135 5 18 0,0370 0,9630 0,2107

[12,13) 112 5 18 0,0446 0,9554 0,2029

[13,14) 89 2 27 0,0225 0,9775 0,1939

[14,15) 60 2 18 0,0333 0,9667 0,1895

[15,16) 40 0 8 0,0000 1,0000 0,1832

[16,17) 32 0 8 0,0000 1,0000 0,1832

17,18) 24 2 8 0,0833 0,9167 0,1832

[18,19 14 0 8 0,0000 1,0000 0,1679

[19,20) 6 0 5 0,0000 1,0000 0,1679

[20,21) 1 0 0 0,0000 1,0000 0,1679

[21,22) 1 0 1 0,0000 1,0000 0,1679

8560 1160 613

Descriptive discrete-time methods

• Hazard h(t)

Unlike the probability density function, the conditional character of the hazard h(t)provides an unbiased description of how the process is evolving at each point in time

• Survivor function S(t)

- the survivor function provides the proportion surviving in the initial state at eachpoint t

- S(t) has a ‘memory’: the proportion surviving at t largely depends on themagnitude of the hazards prior to t and only marginally on the hazard at t

- the number of events occurring at each point in time is affected by the magnitudeof h(t) and the proportion of the population still at risk of experiencing the event:

small values of h(t) at the beginning of exposure may affect much larger numbersof people than large h(t) by the end of the exposure (e.g. prostate cancer).

• Median Lifetime

- intuitive description of time-to-event data

- very different types of hazard functions may have the same median lifetime

- complement with other descriptives: percentiles, S(t) at fixed durations,…

Life table functions: conclusions

Descriptive discrete-time methods

Descriptive discrete-time methods

1. Single-decrement life table

2. Person-period files & descriptive statistics

N and Risk Sets in construction of discrete-time life-table:

• h(tj) relates events in period j to risk set of period j

• N individuals are included in risk set of each period they are risk

• Individuals contribute several person-periods to discrete-time life-table

• Let N represent the number of individuals:

sum of risk sets in life-table = N*exposure

Discrete-time Event History Analysis in Practice:

• Data are rearranged from Person-format to Person-Period Format:

- Person File: N records represent data on N individuals

- Person-Period File: ‘N*Exposure’-records represent data on N individuals

Discrete-time Life-table: Person-Period File

Descriptive discrete-time methods

Principles of the Person-Period File (PPF):

a) for each unit of time that an individual is known to be at risk, a separate

observational record is created, i.e. a person-period

b) For each person-period:

- time variable indicating time-period j of the record in question

- event indicator is coded 0 for all time-periods except the last. In the last period the

event indicator is coded 0 for censored individuals and 1 for individuals

experiencing target event

- explanatory variables are assigned the values they take on in each person-year:

same value in each person period for time-constant variables; current or lagged

values for time-varying covariates

c) Person periods are pooled in a single sample. N of records in person-period file

(PPF) is equal to sum of risk sets in life table throughout observation period.

Discrete-time Life-table: Person-Period File

Singer et al., 2003, 326-356.

Descriptive discrete-time methods

Person File: Person-Period File*:

* Observation ends in 1990 as 1991 is only partially observed

Second birth: Construction of person-period file

ID BIRTHY EDUC KID1 KID2 T EVENT

1 1962 12 1988 - 2 0

3 1954 11 1978 1979 1 1

21 1951 3 1973 1979 6 1

ID BIRTHY EDUC KID1 KID2 YEAR T EVENT

1 1962 12 1988 - 1988 0 0

1 1962 12 1988 - 1989 1 0

1 1962 12 1988 - 1990 2 0

3 1954 11 1978 1979 1978 0 0

3 1954 11 1978 1979 1979 1 1

21 1951 3 1973 1979 1973 0 0

21 1951 3 1973 1979 1974 1 0

21 1951 3 1973 1979 1975 2 0

21 1951 3 1973 1979 1976 3 0

21 1951 3 1973 1979 1977 4 0

21 1951 3 1973 1979 1978 5 0

21 1951 3 1973 1979 1979 6 1

Descriptive discrete-time methods

• Discrete-time life-table using Person-Period File:

- person-period file & standard cross-tabulation routines:

generate J x 2 table (i.e. period*event)

- each row reflects risk set in period j

- row percentages correspond to estimated hazard in each period j

• Person-period file:

- PPF is also used for fitting discrete-time event history models

- conceptual basis for continuous-time models

Person-Period File: Estimation of Discrete-time Life-table

Descriptive discrete-time methods

Setting up a person-period file

* SETTING UP A PERSON-PERIOD FILE

generate npyears = time+1

expand npyears

bysort id: generate time_tv = _n - 1

generate birth2_tv = 0

replace birth2_tv = birth2 if time_tv == time

generate year_tv = yrbrnkid1 + time_tv

generate age_tv = year_tv - yrbrn

* ESTIMATE OF h(t) USING CROSSTABULATION

tab time_tv birth2_tv, row

* CALCULATING OBSERVED VALUE OF Pr(T = t | T >= t) AT EACH TIME t

bysort time_tv: egen ht_obs = mean(birth2_tv)

label variable ht_obs "Observed h(t)"

* PLOTTING MEAN VALUE OF h(t) AGAINST TIME

line ht_obs time_tv0

.1.2

.3O

bserv

ed h

(t)

0 5 10 15 20time_tv

Descriptive discrete-time methods

Discrete-time models

1. Within-group discrete-time life-table functions

2. Basic discrete-time hazard model

3. Maximum Likelihood Estimation

4. Specification of the baseline hazard function

5. Time-varying covariates

6. Proportionality assumption

7. Unobserved heterogeneity

Discrete-time models

1. Within-group discrete-time life-table functions

Discrete-time survival analysis

• Discrete-time life-table

- ad hoc procedure to gauge effect of covariates on event occurrence

• Discrete-time models

- relate hazard of event-occurrence to predictors

- quantify predictor effect size

• Discrete-time Survival Analysis:

- Specify a suitable model for the hazard and understand its assumptions

- Using sample data to estimate model parameters

- Evaluate model fit and model parameters

Singer et al., 2003, 357-467.

Discrete-time models

Discrete-time models: within-group hazard functions

• Exploratory analysis:

- plot sample hazard (survivor) functions for groups distinguished by predictors

- categorical predictors or temporarily categorize continuous variables

• Two major questions:

- Overall shape of the hazard for each group:

similar location of peaks and troughs as a function of exposure for different groups

considering presence of sampling variation

- relative level of the hazard function across groups:

Is there a differential across groups?

Does the differential have the same direction over time?

Is the relative magnitude of the differential constant over time?

Singer et al., 2003, 357-467.

Discrete-time models

Discrete-time models: within-group survival functions

• Hazard function

- provides information on process under consideration at every point in time

- unobserved variable generating event occurrence

- groups with higher hazards have more frequent event occurrence

• Survival function

survival functions cumulates information on the hazard from event origin to period

under consideration:

- provides information on compounded effect of predictor variables

- allows to compare median lifetimes across levels of predictor variables

- groups with lower levels of survival function have more frequent event occurrence

Singer et al., 2003, 357-467.

[ ][ ][ ] [ ])(ˆ1...)(ˆ1)(ˆ1)(ˆ1)(ˆ121 ththththtS jjjj −−−−= −−

Discrete-time models

Discrete-time models

1. Within-group discrete-time life-table functions

2. Basic discrete-time hazard model

A statistical model for the discrete-time hazard

Statistical Model

• represent the relationship between population discrete-time hazard function and

predictors

Two complications

• model should shape of the entire discrete-time hazard function over time

• discrete-time hazard is a conditional probability:

- value of h(tij) is bounded between 0 and 1

- requires use of link function (e.g. logit, complementary-log,…)

Singer et al., 2003, 357-467.

Discrete-time models

• Basic population model:

• Baseline logit hazard function γ(t)

- time is the fundamental covariate of the model:

research question on effect of predictors on timing of event occurrence is

answered by analysing dependence of hazard on time and covariates

- different functional specifications (general, linear, quadratic,…)

- omitting t from equation implies specific model, i.e. constant h(t) over time

• Covariate effects ϕ(x)

- categorical predictors

- continuous covariates (different functional specifications)

- time-constant and time-varying covariates

Discrete-time hazard models: basic model

Neels, 2005, 74-104.

)()()Pr(1

)Pr(ln

)(1

)(ln xt

tTtT

tTtT

th

thϕγ +=

≥=−

≥==

Discrete-time models

• Assumption 1: hazard functions

- different hazard function is postulated for each value of single covariate

- different hazard function is postulated for each combination of covariates

• Assumption 2: shape of the baseline hazard function

- any shape can be specified for the baseline hazard function:

i.e. ‘main’-effect of exposure/time on logit hazard

- other hazard functions constrained to have same shape as baseline function:

covariate effect vertically shifts the logit baseline hazard function

• Assumption 3: covariate effect on logit

- covariate effects on logit of the hazard are constant over time

- difference of logit hazard for different groups is constant over time

Note: assumptions can be relaxed by allowing covariate*time-interaction

Discrete-time hazard models: basic model

Singer et al., 2003, 357-467.

Discrete-time models

• Population model with general specification for time and dichotomous (X1) and

continuous predictors

A’s represent baseline logit hazard function for group where both X1 and X2

equal zero

B1 and B2 shift hazard function relative to baseline function:

identity of baseline group depends on covariates introduced into model

Discrete-time hazard models: example

Singer et al., 2003, 357-467.

[ ] 22112211 ...)(1

)(ln XXDDD

th

thkk

j

jββααα +++++=

Discrete-time models

• Basic population model:

• From logit h(tj) to odds & odds-ratios: the proportional odds model

- eγ(t) is the baseline odds hazard function

- eϕ is an odds-ratio:

odds hazard function of other groups is proportional to that of baseline group

Discrete-time hazard models: inverse transformation (1)

Neels, 2005, 74-104.

)()()Pr(1

)Pr(lnlogit xt

tTtT

tTtT)h(t j ϕγ +=

≥=−

≥==

X

eeetTtT

tTtT)h(t

txt

j

ϕγϕγ )()()(

)Pr(1

)Pr( Odds ==

≥=−

≥== +

Discrete-time models

• Basic population model:

• From logit h(tj) to the hazard:

- relationship between covariates and hazard is nonlinear

- effect of covariate on hazard is not constant

- compare within-groups functions on a hazard, odds and logit scale

Discrete-time hazard models: inverse transformation (2)

Neels, 2005, 74-104.

)()()Pr(1

)Pr(lnlogit xt

tTtT

tTtT)h(t j ϕγ +=

≥=−

≥==

))()(()()(

)()(

1

1

1)Pr(

xtxt

xt

ee

etTtT

ϕγϕγ

ϕγ

+−+

+

+=

+=≥=

Discrete-time models

• Discrete-time model using clog-log link

• Assumptions

- for each combination of predictors there is a clog-log hazard function

- each clog-log function has an identical shape

- distance between clog-log hazard functions is constant throughout exposure

Singer et al., 2003, 357-467.

Discrete-time models: complementary log-log link function

)()(h(t))]-ln[-ln(1 xt ϕγ +=

Discrete-time models

• Results

- clog-log: exponentiation of parameter estimates yields hazard ratio

- logit: exponentiation of parameter estimates yields odds ratio

- results are similar when hazard is small (e.g. time is measured less coarse):

denominators of the odds approach 1 and odds ratio converges to hazard ratio

Singer et al., 2003, 357-467.

Discrete-time models: complementary log-log link function

GROUP2

GROUP1

)(

)(ratio hazard

th

th=

))(1/()(

))(1/()(ratio odds

GROUP2GROUP2

GROUP1GROUP1

thth

thth

−=

Discrete-time models

Discrete-time models

1. Within-group discrete-time life-table functions

2. Basic discrete-time hazard model

3. Maximum Likelihood Estimation

• Maximum Likelihood

- seek estimates of population parameters α and β that maximize the likelihood of

observing the sample data

- express the probability to observe the specific pattern of event occurrence

actually observed

• Likelihood function

- each individual contributes Ji terms to the likelihood function

- contribution for each period at risk depends on value of h(tij) and event

occurrence

- each individual contributes only part of the term depending on event occurrence

Maximum Likelihood Estimates of the Discrete-time Model (1)

∏∏= =

−−=

n

i

J

j

EVENT

ij

EVENT

ij

i

ijij ththLikelihood1 1

)1())(1()(

Singer et al., 2003, 357-467.

Discrete-time models

• ML-estimates of α and β

Likelihood function expresses the probability of observing the sample data on

event occurrence that were actually observed as a value of the unknown

population parameters. According to the discrete-time model:

and thus:

indicating that the likelihood function is expressed as a function of the observed

variables and the unknown population parameters

Maximum Likelihood Estimates of the Discrete-time Model (2)

Singer et al., 2003, 357-467.

))()((1

1)(

xtije

thβα +−+

=

)1(

))()((1 1

))()(( 1

11

1

1ijij EVENT

xt

EVENTn

i

Ji

jxt

eeLikelihood

+−= =

+−

+−

+= ∏∏ βαβα

Discrete-time models

• Logarithm of the Likelihood Function

- make mathematics of estimation tractable

- standard logistic regression routines applied to the person-period data set

provide estimates of the parameters of the parameters of the discrete-time

hazard model

Maximum Likelihood Estimates of the Discrete-time Model (3)

Singer et al., 2003, 357-467.

))(1ln()1()(ln1 1

ijij

n

i

Ji

j

ijij thEVENTthEVENTLL −−+=∑∑= =

Discrete-time models

Discrete-time models

1. Within-group discrete-time life-table functions

2. Basic discrete-time hazard model

3. Maximum Likelihood Estimation

4. Specification of the baseline hazard function

• General specification of baseline hazard function: advantages

1) treating time as a categorical variable:

- independent estimate of the hazard in each interval

- does not constrain baseline hazard function

2) easily interpretable

3) informative on shape of baseline hazard function

4) consistent with life table estimates

5) specification with lowest possible deviance (best model fit)

• General specification of baseline hazard function: disadvantages

1) lacks parsimony:

requires large number of parameters if observation period is long

2) fitted hazard functions often erratic due to sampling variation

Discrete-time models: alternative specifications for baseline

Singer et al., 2003, 357-467.

Discrete-time models

• Alternative specifications of baseline hazard function

- more parsimonious than general specification

- similar goodness-of-fit

- smooth interpretable function

• Situations when alternative specifications should be considered

- large number of discrete time periods:

large number of dummies, difficulty in fitting covariate*time interactions

- hazard is expected to be near zero in some time periods

may yield difficulty in generating ML-estimates, large or implausible estimates

- some time periods have small risk sets

general specification unable to discern differences between intervals

Discrete-time models: alternative specifications for baseline

Singer et al., 2003, 357-467.

Discrete-time models

Order Behaviour of Number of Remarks

of polynomial logit hazard Parameters

0 Constant 1 - Largest deviance

1 Linear 2

2 Quadratic 3 consider using centred

3 Cubic 4 time variable to facilitate

4 3 Stationary 5 interpretation of effect

Points

5 4 Stationary 6

Points

General J - Smallest deviance

Specification - Reproduces life-table

Singer et al., 2003, 357-467.

Discrete-time models: alternative specifications for baseline

Discrete-time models

• Logarithm of time

- Logarithm of time is used as a predictor of logit hazard

- similar to Weibull specification for continuous-time models

- indistinguishable from linear specification if number of time periods is small

• Step function

- combination of continuous and categorical specifications of time

- allows to model one or several discontinuities in the baseline hazard function

- more parsimonious than completely general specification

Singer et al., 2003, 357-467.

Discrete-time models: alternative specifications for baseline

Discrete-time models

* CONSTANT LOGIT(t) NULL MODEL

logit event_tv

predict ht_constant

* LINEAR LOGIT_h(t) MODEL

logit event_tv time_tv

predict ht_linear

* QUADRATIC MODEL OF LOGIT_h(t)

gen time_quadratic = time_tv^2

logit event_tv time_tv time_quadratic

predict ht_quadratic

* CUBIC MODEL OF logit h(t)

gen time_cubic = time_tv^3

logit event_tv time_tv time_quadratic time_cubic

predict ht_cubic

* STEP FUNCTION OF LOGIT_h(t)

recode time_tv (0=0)(1=1)(2=2)(3=3)(4/35=4), gen(time_cat4)

logit event_tv time_tv time_quadratic i.time_cat4

predict ht_step

* GENERAL SPECIFICATION OF logit_h(t)

logit event_tv i.time_tv

predict ht_general

graph twoway line ht_observed ht_constant ht_linear ht_quadratic ht_cubic ht_general time_tv

Second birth: modeling baseline hazard function

Singer et al., 2003, 357-467.

0.1

.2.3

0 5 10 15 20time_tv

Observed h(t) Fitted h(t), linear model

Fitted h(t), quadratic model Pr(event_tv)

Fitted h(t), step function Fitted h(t), general specification

Discrete-time models

* INCLUDING TIME-CONSTANT CATEGORICAL COVARIATES

recode HIGHDIP (1/5 = 1 Low)(6/10 = 2 Medium)(11/13 = 3 High), gen(HIGHDIP3)

bysort time_tv HIGHDIP3: egen htobs_edu1 = mean(event_tv) if HIGHDIP3 == 1

bysort time_tv HIGHDIP3: egen htobs_edu2 = mean(event_tv) if HIGHDIP3 == 2

bysort time_tv HIGHDIP3: egen htobs_edu3 = mean(event_tv) if HIGHDIP3 == 3

graph twoway line htobs_edu1 htobs_edu2 htobs_edu3 time_tv

logit event_tv time_tv time_quadratic i.time_cat4 i.HIGHDIP3, or

predict ht_stepedu

gen htfit_edu1 = ht_stepedu if HIGHDIP3 == 1

gen htfit_edu2 = ht_stepedu if HIGHDIP3 == 2

gen htfit_edu3 = ht_stepedu if HIGHDIP3 == 3

graph twoway line htfit_edu1 htfit_edu2 htfit_edu3 time_tv

Second birth: adding covariate effect

Singer et al., 2003, 357-467.

Discrete-time models

Discrete-time models: including covariates

Singer et al., 2003, 357-467.

0.1

.2.3

.4

0 5 10 15 20time_tv

htobs_edu1 htobs_edu2

htobs_edu3

0.1

.2.3

.4

0 5 10 15 20time_tv

htfit_edu1 htfit_edu2

htfit_edu3

Discrete-time models

Discrete-time models

1. Within-group discrete-time life-table functions

2. Basic discrete-time hazard model

3. Maximum Likelihood Estimation

4. Specification of the baseline hazard function

5. Time-varying covariates

• Time-varying covariates

- easy to implement in discrete-time models because of person-period data

- time-varying covariates simply take on appropriate value in each period

• Assumptions

- for each value of the predictor in time period j, there is a postulated value of logit

hazard: the effect of time-varying covariates does not compare static groups of

people

- for individuals with constant values of the time-varying predictor, joining

consecutive postulated values of logit hazard for constant values of the time-

varying predictor yields logit hazard functions with identical shapes

- for individuals with constant values of the time-varying predictor, the distance

between each of these logit hazard functions is identical in every time period

Singer et al., 2003, 357-467.

Discrete-time models: time-varying covariates

Discrete-time models

• Caveats of time-varying covariates: state dependence

State Dependence: value of time-varying predictor at time tj is affected by event

occurrence status at time tj:

e.g. marital dissolution and employment: employment may be affected by marital

status because married individuals are more likely to be working

• Caveats of time-varying covariates: rate dependence

rate dependence: value of time-varying predictor at time tj is affected by value of

the hazard at time tj

e.g. spousal satisfaction is affected by risk of divorce as marital instability is likely

to decrease spousal satisfaction

e.g. marital status as an time-varying predictor of first birth

Singer et al., 2003, 357-467.

Discrete-time models: time-varying covariates

Discrete-time models

• Types of time-varying predictors

- Defined time-varying predictors:

values are predetermined in advance of data collection for everyone in the study

(e.g. seasons, birthday, treatment schemes,…);

- Ancillary time-varying predictors:

values are determined by stochastic process totally external to study

participants (e.g. economic context, marriage market,…);

- Contextual time-varying predictors:

also external process but closer link between units and larger risk of reverse

causation (e.g. parental divorce, school characteristics,…);

- Internal time-varying predictors:

describe individual’s changeable psychological, physical and social states over

time: direction of causality often unclear

• Risk of state & rate dependence depends on type of time-varying covariate

- defined and ancillary time-varying predictors: generally no problem

- contextual and internal predictors: risk of reverse causation!

Singer et al., 2003, 177-181; 357-467.

Discrete-time models: time-varying covariates

Discrete-time models

• Solving state and rate dependence: lagged predictors

- link current outcome status to prior predictor status

- e.g. set of time-varying lagged predictors indicating whether a ‘cause’ occurred

1, 2 or 3 years earlier

• Problems with lagged predictors

- may require imputation for first period (i.e. data on predictors prior to actual

observation period)

- less compelling theoretically, e.g. anticipatory effects (using predictor status in a

later period to predict event at an earlier stage.

e.g. risk of depression increases prior to (parental) divorce

Singer et al., 2003, 357-467.

Discrete-time models: time-varying covariates

Discrete-time models

Discrete-time models

1. Within-group discrete-time life-table functions

2. Basic discrete-time hazard model

3. Maximum Likelihood Estimation

4. Specification of the baseline hazard function

5. Time-varying covariates

6. Proportionality assumption

• Proportionality assumption

- assumes that each predictor has an identical effect in every time period

- covariate effect is independent of duration of residence in initial state

- logit hazard functions are equidistant for different groups of predictor; odds of

one group are proportional to that of baseline hazard function

• Violating the proportionality assumption

- effect of predictor may increase over time:

distance between logit hazard functions increases over time; predictor becomes

increasingly important to differentiate groups

- effect of predictor may decrease over time:

distance between logit hazard functions decreases over time; effect of predictor

becomes less critical with time

- effect of predictor may grow stronger/weaker throughout time

distance between logit hazard functions changes throughout exposure period

Singer et al., 2003, 357-467.

Discrete-time models: proportionality assumption

Discrete-time models

• Nonproportional discrete-time hazard models

- include statistical interaction between substantive predictors and time

- cross-product between substantive predictor and variables reflecting time

- compare deviance of models with and without interactions with time

• Caution with nonproportional discrete-time models

- less likely to find ‘significant’ interaction in case of general specification for

baseline hazard function: interaction requires large number of additional

parameters (df).

- examine the pattern of estimates in case of interaction with general specification

of the baseline hazard function

- consider testing mode parsimonious specifications of the interaction with time

(e.g. linear, quadratic terms for time, step functions,…)

Singer et al., 2003, 357-467.

Discrete-time models: proportionality assumption

Discrete-time models

Migration History & Social Mobility Surveys

• VUBrussels, Ghent University, Université de Liège

• Turkish & Moroccan men aged 18+ residing in Belgium

• Collected in 1994-1996

• Analysis of educational careers (Neels, 1998)

• Analysis of transition to employment (Neels, 2000)

Education & Transition to Employment

• 938 schoolleavers finishing educational career in Belgium

• Unemployment after graduation/drop-out:

retrospective data on duration of unemployment in months

• Type of first job on basis EGP-Classification

• Covariates: age, nationality, educational trajectory, educational

level, province

Source: Neels (2000)

Turkish & Moroccan Schoolleavers

Discrete-time models

* TIME-VARYING EFFECTS (schoolleaver.dta)

gen timexnational = INTERVAL*national

logit TRANS i.INTERVAL national timexnational, or

predict ht_nonlinear

gen ht_moroccan = ht_nonlinear if national == 0

gen ht_turkish = ht_nonlinear if national == 1

sort INTERVAL

graph twoway line ht_moroccan ht_turkish INTERVAL

Singer et al., 2003, 357-467.

Discrete-time models: proportionality assumption

Discrete-time models

Singer et al., 2003, 357-467.

Discrete-time models: proportionality assumption

.1.2

.3.4

.5.6

0 5 10 15INTERVAL

ht_moroccan ht_turkish

Discrete-time models

Discrete-time models

1. Within-group discrete-time life-table functions

2. Basic discrete-time hazard model

3. Maximum Likelihood Estimation

4. Specification of the baseline hazard function

5. Time-varying covariates

6. Proportionality assumption

7. Unobserved heterogeneity

• Assumption of observed heterogeneity

- models assume that population hazard function for individual I depends only on

his or her predictor values

- unobserved heterogeneity when one or more important predictors are omitted

from the model

• Illustration of unobserved heterogeneity

Assume initial sample includes equal proportions of 3 groups of individuals:

- high risk group with constant h(t) over time

- medium risk groups with constant h(t) over time

- low risk group with constant h(t) over time

Overall hazard function will decline (!) as a result of increasing selectivity and

changing composition of the risk set in terms of the unobserved variable over time.

Singer et al., 2003, 357-467; Allison, 1984.

Discrete-time models: unobserved heterogeneity (1)

Discrete-time models

Allison, 1984.

Discrete-time models: unobserved heterogeneity (2)

0,00

0,05

0,10

0,15

0,20

0,25

0,30

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Group 1 h(t)

Group 2 h(t)

Total h(t)

Discrete-time models

• Symptoms of unobserved heterogeneity

in case of time-constant unobserved within group hazards, unobserved

heterogeneity generates an observed hazard function that appears to be declining

over time

• Correcting for frailty or unobserved heterogeneity

- include a time-invariant random disturbance at person-level

- several distributions possible: normal, gamma distributed,…

- STATA: xtlogit, xtcloglog, xtmelogit,…

Singer et al., 2003, 357-467; Allison, 1984; Blossfeld, 2003.

Discrete-time models: unobserved heterogeneity (3)

Discrete-time models

* DISCRETE-TIME INDIVIDUAL FRAILTY-MODEL

* NO FRAILTY

cloglog TRANS i.INTERVAL

predict phat_nofrailty

* COMPLEMENTARY LOG-LOG WITH INDIVDUAL-LEVEL FRAILTY

xtset ID

xtcloglog TRANS i.INTERVAL

predict phat_frailty, pu0

* COMPARING FRAILTY VERSUS NONFRAILTY MODELS

collapse (mean) phat_nofrailty phat_frailty, by(INTERVAL)

twoway (line phat_nofrailty INTERVAL, sort) (connected phat_frailty INTERVAL, sort)

* MIXED EFFECTS LOGIT WITH INDIVIDUAL FRAILTY

logit TRANS i.INTERVAL

xtmelogit TRANS i.INTERVAL, or || ID:

See STATA Reference but also Blossfeld, Golsch & Röhwer (2003)

Discrete-time models: unobserved heterogeneity

Discrete-time models

Singer et al., 2003, 357-467; Allison, 1984; Blossfeld, 2003.

Discrete-time models: unobserved heterogeneity

.1.2

.3.4

.5.6

0 5 10 15Duration of unemployment (Intervals)

hazard_logit hazard_xtmelogit

Discrete-time models

Descriptive continuous-time methods

1. Kaplan-Meier

2. Kernel smoothing

Descriptive continuous-time methods

1. Kaplan-Meier

Features of continuous-time event occurrence data

• Timing of events:

- an infinite number of possible instants when the target event can occur

- finite observation period includes infinite number of instants

e.g. months > weeks > days > hours > minutes > seconds …

• Consequences for distribution of event-times:

- the probability of observing any particular event-time infinitesimally small

- the probability of event occurrence at any specific instant approaches 0

- the probability that 2 individuals share the same event time (i.e. tied events):

infinitesimally small as well

- co-occurrence of events – i.e. ties or tied events – are very unlikely to occur

• Event-occurrence in discrete-time

- events can only occur in limited number of time periods

- probability of event occurrence exceeds 0 in at least some periods

- ties are pervasive

Singer et al, 2003, 468-502.

Descriptive continuous-time methods

Survivor function in continuous time

• Similar definition in discrete and continuous time

• T is a continuous random variable

• value Ti indicates precise instant when individual i experiences event

• tj represents the potential values of T

• the survival probability for individual i at time tj is the probability that his orher event time Ti will exceed t

• probability that individual i remains in origin state (e.g. alive,unemployed,…) for a duration of at least tj

• the initial value of the continuous-time survivor function at time t0 is 1

Singer et al, 2003, 468-502.

[ ]jiij tTtS >= Pr)(

Descriptive continuous-time methods

Hazard function in continuous time

• Discrete-time hazard:

- conditional probability that event occurs in time period or interval of time

- events are related to a risk set

• Continuous time:

- infinite number of infinitesimally small instants of time when events can occur

- Concept of probability breaks down in continuous time: in truly continuous time

the probability that T takes on any specific value tj has to be 0, i.e.:

- continuous-time hazard rate regards the ratio of the transition probability to the

length of the time interval to represent the probability of future changes in the

dependent variable per unit of time:

Singer et al, 2003; Blossfeld, 2002, 21-37.

( ) 0Prlim =≥′<≤→′

tTtTttt

( )( )tt

tTtTtth

tt −′

≥′<≤=

→′

Prlim)(

Descriptive continuous-time methods

Hazard function in continuous time

• Continuous-time hazard:

- h(t) is the first-order derivative of cumulative hazard ( –ln S(tij))

- h(t) is NOT a probability, but the unobserved rate at which events occur

- Rates such as h(t) relate events to exposure, or, assesses the conditional

probability of event occurrence per unit of time.

- In contrast to probabilities, rates are not bounded from above:

rates cannot be negative, but can easily exceed unity.

- Analysis of continuous-time rate requires revision of statistical models

incorporating the effects of predictors:

- logit used for the analysis of the discrete-time hazard is only defined for

values of hazards between 0 and 1

- ln(h(t)) logarithm of the hazard is defined for all values greater than 0

Singer et al, 2003; Blossfeld, 2002; Allison, 1984, 23.

( )( )tt

tTtTtth

tt −′

≥′<≤=

→′

Prlim)(

Descriptive continuous-time methods

Hazard function in continuous time

• Interpretation of continuous-time hazard h(t):

- if h(t) is constant over time, say h(t)=1.25, then 1.25 is the expected number of

events in a time interval that is one unit long, i.e. number of events/exposure

• Examples of rates from everyday life:

- ‘rate’ of travel is 60 km per hour

note: speed can be evaluated instantaneously

- ‘rate’ of pay is 25.000 € per year

• Changing the unit of time allows rates to be expressed in different ways

- 60 km/h = 1 km/min

- 25000 €/year = 2083 €/month

• 1/h(t) gives the expected length of time until an event occurs:

- h(t) = 1,25 : expected time until event occurrence is 1/1.25 = .80 time units

- 60 km/h: expected time until 1 km has been travelled is 1/60 hour = 1 min

Singer, 2003, 473; Blossfeld, 2002, 32; Allison, 1984, 23

Descriptive continuous-time methods

Descriptive continuous-time methods: Cumulative Hazard

• Cumulative Hazard H(tij)

assesses at each point in time j the total amount of accumulated risk that an

individual i has faced from the beginning of time until time tj:

- easily estimated using by products of the Kaplan-Meier method

- because H(tij) cumulates h(tij), changing level of H(tij) over time reveals

information about the underlying distribution of h(tij)

- not a probability (even in discrete time) or a rate

- not bounded from above (double bounded nature of S(t) makes it relatively

insensitive to changes in hazard)

Singer et al., 2003, 468-502; Blossfeld et al., 2003.

∫==j

ijij dtthtH0

ij tand between t

)()h(t cumulation)(j0

Descriptive continuous-time methods

Descriptive continuous-time methods: Cumulative Hazard

• Estimating H(tij): -ln(S(t))

- mathematical relationship between cumulative hazard and survivor function:

- Estimate of H(j) is obtained using KM-estimate of S(t)

- More popular for descriptive analysis

Singer et al., 2003, 468-502; Blossfeld et al., 2003.

)(ln)( jj tStH −=

)(ln)(ˆjKMj tStH −=

rate hazard cumulative theis )( whereexp)(0

)(0 ∫∫=

− jdtth

j dtthtS

j

Descriptive continuous-time methods

Descriptive continuous-time methods: Kaplan-Meier

• Grouped estimation methods

- artificially categorize continuous variable T

- different categorizations yield different estimates

• Kaplan Meier

- capitalize on raw event times

- construct intervals so that each contains just 1 observed event time:

each interval starts at observed event time and ends just before the next

- initial interval: begins at t0 and ends immediately before the first event

- final interval: begins at the latest event time and ends at latest event time (if there

are no larger censored event times) of infinity (if the largest event time is

censored)

- if an individual is censored at an observed event time

tie is broken by assuming that event precedes censoring (censored case is thus

included in interval (and risk set) of interval

Singer et al., 2003, 468-502; Blossfeld et al., 2003.

Descriptive continuous-time methods

Descriptive continuous-time methods: Kaplan-Meier

• Kaplan Meier estimate of Survivor Function

- is obtained by applying discrete-time estimator to data in intervals:

- Kaplan-Meier esimates are plotted as a step function:

estimated survival probability is associated with the entire interval

- more refined than discrete-time or actuarial estimate because updated most

frequently (separate interval for each event!)

- standard errors are estimated using discrete-time formula

• Median lifetime

- identify first interval when the value of the estimated survivor function either

precisely hits or falls below 0.50

- less common: linear interpolation between bracket event times

Singer et al., 2003, 468-502; Blossfeld et al., 2003.

))(ˆ1))...((ˆ1))((ˆ1()(ˆ21 jtptptptS −−−=

Descriptive continuous-time methods

National Databank Mortality

Census 1991• survey covering entire population of legal residents in 1991

• Household Form:

household composition, dwelling characteristics, …

• Individual Form:

education, job characteristics, marital status, …

National Register & Death Register• mortality in period 1991-1996

• day of death: continuous-time measurement

• cause of death

Prospective quasi-experiment• 58 months follow-up period

• first 22 months omitted to avoid reverse causation

Socio-economic Mortality Differentials

Source: Deboosere & Gadeyne (2000)

Descriptive continuous-time methods

Descriptive continuous-time methods: Kaplan-Meier

* KM-ESTIMATE OF SURVIVOR FUNCTION, S(t)

sts list

* SAVING KM-ESTIMATE OF SURVIVOR FUNCTION TO DATASET

sts generate S_KM = s

label var S_KM "KM-estimate of survivor function S(t)"

* PLOTTING KM-ESTIMATE OF SURVIVIR FUNCTION

sts graph

* KM-ESTIMATE OF CUMULATIVE HAZARD FUNCTION, H(t)

generate CHAZ_KM = -ln(S_KM)

label var CHAZ_KM "KM-estimate of cumulative hazard"

* PLOTTING CUMULATIVE HAZARD FUNCTION

line CHAZ_KM _t, sort

* KM-ESTIMATE OF LOG CUMULATIVE HAZARD FUNCTION, Log H(t)

generate LMLS_KM = ln(-ln(S_KM))

label var LMLS_KM "KM-estimate of log cumulative hazard"

* GRAPHING CUMULATIVE HAZARD FUNCTIONS

line LMLS_KM _t, sort

Singer et al., 2003, 468-502; Blossfeld et al., 2003.

.92

.94

.96

.98

1K

M-e

stim

ate

of surv

ivo

r fu

nction

S(t

)

0 10 20 30 40_t

0.0

2.0

4.0

6.0

8.1

KM

-estim

ate

of cum

ula

tive

haza

rd

0 10 20 30 40_t

-10

-8-6

-4-2

KM

-estim

ate

of lo

g c

um

ula

tive

haza

rd

0 10 20 30 40_t

Descriptive continuous-time methods

Descriptive continuous-time methods: comparing subgroups

* ESTIMATING & PLOTTING WITHIN-GROUP SURVIVOR FUNCTIONS

sts list, by (EDUCATION) compare

* Generate KM-estimates of survivor function S(t) by subgroup (e.g. EDUCATION)

sts generate KMS_EDU = s, by(EDUCATION)

* Plotting KM-estimate of S(t) for subgroups (e.g. educational level)

sts graph, by(EDUCATION)

* TESTING WHETHER WITHIN-GROUP SURVIVOR FUNCTIONS DIFFER SIGNIFICANTLY

sts test EDUCATION, logrank

* WITHIN-GROUP LOG CUMULATIVE HAZARD FUNCTIONS

sort _t EDUCATION

gen LMLSEDU1_KM = ln(-ln(KMS_EDU)) if (EDUCATION == 1)

gen LMLSEDU2_KM = ln(-ln(KMS_EDU)) if (EDUCATION == 2)

gen LMLSEDU3_KM = ln(-ln(KMS_EDU)) if (EDUCATION == 3)

gen LMLSEDU4_KM = ln(-ln(KMS_EDU)) if (EDUCATION == 4)

gen LMLSEDU5_KM = ln(-ln(KMS_EDU)) if (EDUCATION == 5)

twoway (line LMLSEDU1 LMLSEDU2 LMLSEDU3 LMLSEDU4 LMLSEDU5 _t)

Singer et al., 2003, 468-502.

Descriptive continuous-time methods

Singer et al., 2003, 468-502.

0.0

00.2

50.5

00.7

51.0

0

0 10 20 30 40analysis time

EDUCATION = Unknown EDUCATION = None & Primary

EDUCATION = Lower Secundary EDUCATION = Higher Secundary

EDUCATION = Higher Education

Kaplan-Meier survival estimates

-10

-8-6

-4-2

0 10 20 30 40_t

LMLSEDU1_KM LMLSEDU2_KM

LMLSEDU3_KM LMLSEDU4_KM

LMLSEDU5_KM

Descriptive continuous-time methods: comparing subgroups

Descriptive continuous-time methods

Descriptive continuous-time methods

1. Kaplan-Meier

2. Kernel smoothing

Descriptive continuous-time methods: Kernel Smoothing

• similar to moving average

- estimate a function’s average value at time tj by aggregating together all point

estimates available within the focal time’s temporal vicinity

• ‘difference’ the cumulative hazard function

- differencing: H(tj) – H(tj-1)

- yields measures of local rate of change in cum hazard function at that time

• Bandwidth

- temporal scope used to calculate smoothed point estimates, e.g. +/- 2 days

- smaller bandwidth yields more erratic series

- larger bandwidth weakens relationship between smoothed value and each

specific point in time, increasing bias

- larger bandwidth narrows temporal region the smoothed series describes

Singer et al., 2003, 468-502.

Descriptive continuous-time methods

Descriptive continuous-time methods: Kernel Smoothing

* KERNEL-SMOOTHED HAZARD FUNCTIONS

* NO BOUNDARY ADJUSTMENT FOR BANDWITH

sts graph, hazard kernel(gaussian) width (0.1)

sts graph, hazard kernel(gaussian) width (0.5)

sts graph, hazard kernel(gaussian) width (1)

sts graph, hazard kernel(gaussian) width (2)

sts graph, hazard kernel(gaussian) width (4)

sts graph, hazard kernel(gaussian) width (5)

Singer et al., 2003, 468-502.

Descriptive continuous-time methods

Descriptive continuous-time methods: Kernel Smoothing

Singer et al., 2003, 468-502.

.001

.001

5.0

02

.00

25

.00

3.0

03

5

0 10 20 30 40analysis time

Smoothed hazard estimate

.001

5.0

02

.00

25

.00

3

0 10 20 30 40analysis time

Smoothed hazard estimate

.00

2.0

02

2.0

02

4.0

02

6.0

02

8

0 10 20 30 40analysis time

Smoothed hazard estimate

.002

.00

22

.00

24

.002

6.0

02

8

0 10 20 30 40analysis time

Smoothed hazard estimate

.002

.002

2.0

02

4.0

02

6

0 10 20 30 40analysis time

Smoothed hazard estimate

.002

.00

22

.00

24

.002

6

0 10 20 30 40analysis time

Smoothed hazard estimate

Descriptive continuous-time methods

Continuous-time models

1. Continuous-time Regression Models

2. A Statistical Model for the Continuous-time Hazard

3. Partial Likelihood Estimation

4. Example from the National Databank Mortality

5. Nonparametric Strategies for Displaying Results

6. Time-varying predictors

7. Nonproportional hazards models

Continuous-time models

1. Continuous-time Regression Models

Toward a Statistical Model for the Continuous-time Hazard

• Parametric models specify time-dependency of hazard

- constant hazard (exponential model)

- piecewise constant (constant h(t) within interval, but varies across intervals)

- monotonic duration dependence (Weibull model, Gompertz model,…)

- nonmonotonic duration dependence (Log-logistic model, Sickle model,…)

• Semi-parametric models

- time dependency of h(t) is left unspecified (Cox regression)

Singer et al., 2003, 503-606.

Continuous-time models

Continuous-time models

1. Continuous-time Regression Models

2. A Statistical Model for the Continuous-time Hazard

Toward a Statistical Model for the Continuous-time Hazard

• Cox Regression Model for H(tij)

- cumulative hazard is semi-bounded: greater than 0 without upper bound

- natural log of H(t) yields unbounded function:

- log cumulative hazard is negative when cumulative hazard is lower than 1

- logarithm expands vertical distance between small values and compresses

vertical distance between large values

- transformation stabilizes distance between within-group functions over time

- general specification for H(tij) similar to general specification in discrete-time:

where log H0(tj) represents the unspecified general baseline log cumulative

hazard function

Singer et al., 2003, 503-606.

)))(ln(ln(functionsurvivor log negative log tS−=

ijij XtHtH 10 )(log)(log β+=

Continuous-time models

Toward a Statistical Model for the Continuous-time Hazard

• Cox regression in terms of hazard

- distance β1 associated with effect of predictor remains identical whether log

cumulative hazard, log H(tij), or log hazard, log h(tij) is considered

- ratio associated with effect of predictor remains identical whether

cumulative hazard, H(tij) or hazard h(tij) is considered

- mathematical identities allow substitution of h(tij) for H(tij) in Cox model:

or equivalently by antilogging:

Singer et al., 2003, 503-606.

1βe

kijkijij XXX

jij eththβββ +++

=...

02211)()(

kijkijijjij XXXthth βββ ++++= ...)(log)(log 22110

Continuous-time models

Continuous-time models

1. Continuous-time Regression Models

2. A Statistical Model for the Continuous-time Hazard

3. Partial Likelihood Estimation

Cox Regression & Partial Likelihood Estimation

• ‘Conditional’ of partial likelihood

- likelihood function is constructed by ‘conditioning’ on the observed times, i.e.

given that someone experienced an event at time tj, what is the probability that it

is individual i.

- consequence of conditioning: only those individuals who actually experience the

target event contribute an explicit term to the likelihood function: number of

terms in partial likelihood function equals number of individuals with observed

event times

- Conditional probability is obtained by dividing individual i’s hazard at time tj by

sum of all contemporaneous hazards faced by everyone (incl. individual i and

censored cases). Contribution of individual i to partial likelihood function at t:

Where represent time tj when individual i experiences target event

Singer et al., 2003, 503-606.

∑*at tset risk

*

*

)(

)(

ij

ij

ij

th

th

*

ijt

Continuous-time models

Cox Regression & Partial Likelihood Estimation

• Contributions to partial likelihood

a) individuals with observed event times

- each contribute one explicit term to partial likelihood function

- contribute to the denominator of the explicit contribution for any noncensored

individual whose observed event time is smaller than or equal to their

observed event time (cfr. KM)

b) censored individuals only contribute indirectly

- censored individuals contribute to the denominator of the explicit contribution

for any noncensored individual whose observed with event time smaller than

or equal to their time of censoring (cfr. KM)

• Partial likelihood

Singer et al., 2003, 503-606.

∏ ∑sindividual

dnoncensore ij

ij

ij

th

th

*at tset risk

*

*

)(

)(

Continuous-time models

Cox Regression & Partial Likelihood Estimation

• Combination of Cox Model & Partial Likelihood

expresses the probability of observing the observed sample as a function of the

unknown parameters

and allows to determine the estimates of the unknown parameters that maximize

the partial likelihood function WITHOUT (!) estimating the baseline hazard function

Singer et al., 2003, 503-606.

∑∏∏ ∑ +++

+++

=

*

2211

2211

*at tset risk

...

0

...

0

at tset risk

*

*

)(

)(

)(

)(

ij

kijkijij

kijkijij

ij

XXX

j

XXX

j

sindividualdnoncensore

sindividualdnoncensore ij

ij

eth

eth

th

thβββ

βββ

∑∏

∑∏ +++

+++

+++

+++

=

*

2211

2211

*

2211

2211

at tset risk

...

...

at tset risk

...

0

...

0

)(

)(

ij

kijkijij

kijkijij

ij

kijkijij

kijkijij

XXX

XXX

sindividualdnoncensore

XXX

j

XXX

j

sindividualdnoncensore e

e

eth

ethβββ

βββ

βββ

βββ

Continuous-time models

Continuous-time models

1. Continuous-time Regression Models

2. A Statistical Model for the Continuous-time Hazard

3. Partial Likelihood Estimation

4. Example from National Databank Mortality

Cox Regression & Partial Likelihood Estimation

* Generating home ownership indicator

recode HOUSING (1 3 5 =1 owner)(2 4 6 = 0 renter)(7=.), gen(OWNER)

tab HOUSING OWNER

* Kaplan-Meier of general mortality by home ownership

sts list, by(OWNER) compare

sts graph, by(OWNER)

sts generate SOWNER_KM = s, by(OWNER)

gen LMLSKM_RENTER = ln(-ln(SOWNER)) if OWNER == 0

gen LMLSKM_OWNER = ln(-ln(SOWNER)) if OWNER == 1

twoway (line LMLSKM_RENTER LMLSKM_OWNER _t)

* Cox Proportional hazards model

stcox OWNER, nohr // option 'nohr' specifies 'no hazard ratio'

stcox OWNER // cox model reporting hazard ratios

Singer et al., 2003, 503-606.

Continuous-time models

Cox Regression & Partial Likelihood Estimation

Cox regression -- Breslow method for ties

No. of subjects = 19025 Number of obs = 19025No. of failures = 1590Time at risk = 655546.38

LR chi2(1) = 45.88Log likelihood = -15574.663 Prob > chi2 = 0.0000

------------------------------------------------------------------------------_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------OWNER | .6767467 .0378309 -6.98 0.000 .6065171 .7551084

------------------------------------------------------------------------------

Singer et al., 2003, 503-606.

0.0

00.2

50.5

00.7

51.0

0

0 10 20 30 40analysis time

OWNER = renter OWNER = owner

Kaplan-Meier survival estimates

-10

-8-6

-4-2

0 10 20 30 40_t

LMLSKM_RENTER LMLSKM_OWNER

Continuous-time models

Continuous-time models

1. Continuous-time Regression Models

2. A Statistical Model for the Continuous-time Hazard

3. Partial Likelihood Estimation

4. Example from National Databank Mortality

5. Nonparametric Strategies for Displaying Results

Risk Scores

• Risk score

- summarize the effect of several predictors simultaneously

- compares risk of event occurrence for individual i with specific profile to the risk

of event occurrence of baseline individual

- measured in relative not absolute terms:

compares relative level of fitted hazard to that of baseline hazard function

- center predictors to obtain substantially meaningful baseline (e.g. age)

- are calculated for every member of the sample, irrespective of censoring

Note: risk scores are result of individual’s predictor values and fitted model!

Singer et al., 2003, 503-606.

kk

kk

XXX

j

XXX

je

th

eth βββ

βββ

+++

+++

== ...

0

...

0

j2211

2211

)(

)(scorerisk

Continuous-time models

Recovering baseline functions

• Partial Likelihood Estimation

- parameter estimates are obtained with general unspecified baseline

- baseline hazard functions cancels out in partial likelihood functions

• Recovering baseline hazard function

- not a predicted function but nonparametric estimate based on risk scores

- use parameter estimates for model under consideration to calculate risk scores

for all cases (both cases with event occurrence and censored individuals)

- apply procedure similar to Kaplan-Meier:

for each interval estimate p(t) as the ratio of the number of events in the interval

to the sum of the risk scores of the cases at risk during the interval, i.e.:

Blossfeld et al., 2002, 228-254.

∑ +++=

j

2211

SetRisk

...

j

0exp

events ofnumber )(ˆ

kk XXXjtpβββ

Continuous-time models

Recovering baseline functions* Recovering fitted cumulative hazard functions

predict CHAZCOX_RENTER, basechazard

gen CHAZCOX_OWNER = CHAZCOX_RENTER*0.6767467

line CHAZCOX_RENTER CHAZCOX_OWNER _t, c(J J) sort

* Plotting log cumulative hazard functions

gen LNCHAZCOX_OWNER = ln(CHAZCOX_OWNER)

gen LNCHAZCOX_RENTER = ln(CHAZCOX_RENTER)

line LNCHAZCOX_RENTER LNCHAZCOX_OWNER _t, c(J J) sort

Blossfeld et al., 2002, 228-254.

0.0

5.1

.15

0 10 20 30 40_t

cumulative baseline hazard CHAZCOX_OWNER

-10

-8-6

-4-2

0 10 20 30 40_t

LNCHAZCOX_RENTER LNCHAZCOX_OWNER

Continuous-time models

Continuous-time models

1. Continuous-time Regression Models

2. A Statistical Model for the Continuous-time Hazard

3. Partial Likelihood Estimation

4. Example from the National Databank Mortality

5. Nonparametric Strategies for Displaying Results

6. Time-varying predictors

Cox regression & time-varying covariates

• No special strategies required:

where X1i is time-constant for individual i and X2ij is time-varying for individual i

• Time-varying predictors and partial likelihood

- partial likelihood estimation requires ratio of contemporaneous risk score of

individual experiencing event occurrence to sum of contemporaneous risk

scores of everyone in risk set

- at every point in time where event occurs (!!!) values of time-varying predictors

must be known for everyone in the risk set

- program time-varying covariates or episode splitting

Singer et al., 2003, 503-606.

)(

0

22110

2211)()(

)(ln)(ln

iji XX

jij

ijijij

ethth

XXthth

ββ

ββ

+=

++=

Continuous-time models

Continuous-time models

1. Continuous-time Regression Models

2. A Statistical Model for the Continuous-time Hazard

3. Partial Likelihood Estimation

4. Example from the National Databank Mortality

5. Nonparametric Strategies for Displaying Results

6. Time-varying predictors

7. Nonproportional hazards models

Nonproportional hazards

• Basic Cox model assumes proportionality

- in a model with time-constant predictors the hazard function of each group is

proportional to baseline hazard function

- assumption may not hold (e.g. plots of within-group log cumulative hazard

functions, analysis of residuals,…)

• Solutions

- stratification for predictor that violates proportionality assumption:

allows multiple baseline hazard functions for each level of stratification variable

- include PREDICTOR*TIME-interaction in model to specify time-varying effect

Singer et al., 2003, 503-606.

iX

jij etHtH 11)()( 0

β=

Continuous-time models

Nonproportional hazards: stratification

• Stratified Cox Regression

- separate baseline (log) hazard function for each level of stratification variable

- predictor effects constrained to be identical across strata at all points in time

• Partial likelihood estimation

- contribution of individual i is divided by sum of contemporaneous risk scores in

individual i’s stratum

- total partial likelihood across full sample is obtained by multiplying stratum-

specific partial likelihoods

Singer et al., 2003, 503-606; SPSS, Statistical Algorithms (Version 14.0)

kijkijijjSij XXXthth βββ ++++= ...)(ln)(ln 22110

∑∏∏ +++

+++

*

2211

2211

at tset risk

...

...

ij

kijkijij

kijkijij

XXX

s

XXX

s

stratuminsindividual

dnoncensorestrata e

eβββ

βββ

Continuous-time models

Nonproportional hazards: stratification

• Advantages of stratification

- reduces number of ties: tied individuals are spread across strata

- reduces computation time:

each individual is only compared to members of same stratum

• Disadvantages of stratification

- effect of stratification variable is not estimated

- no test to compare stratified model to unstratified model,

i.e. test whether stratification is required in the first place

• When use stratification

- stratification variable is a ‘nuisance’ that is of no analytic interest: e.g.

2 different hospitals where patients were followed, spells for same individual

- baseline hazard functions in strata are too different to model easily

Singer et al., 2003, 503-606; Allison, 2004.

Continuous-time models

Nonproportional hazards: interactions with time

• Effect of time-constant predictor X varies linearly over time:

- β1 reflects shift in ln(h(tij)) associated with unit difference in X at time c

- effect associated with in unit difference in X increases (when β2 > 0) or

decreases (when β2 > 0) linearly with time

- centering the TIME-variable at origin or median lifetime facilitates interpretation

• Effect of time-constant predictor varies piecewise over intervals of time

- divide time in k intervals represented by k time indicators D1-Dk

- interactions between X and time indicators allows different effect in each interval

(cfr. general specification in discrete-time model)

Singer et al., 2003, 503-606.

)()(ln)(ln 210 cTIMEXXthth ijiijij −++= ββ

kijiijiijijij DXDXDXthth 222110 ...)(ln)(ln βββ ++++=

Continuous-time models

Advanced Topics

1. Competing risks analysis

2. Repeated events and shared frailty

Multiple Decrement Life-table & Competing Risks

• Multiple-decrement life-table

Initial State S1 Transition Destination States

Parental home (S1) nest leaving Living alone (S2)

Cohabiting (S3)

Married (S4)

Advanced Topics

Competing Risks: why consider multiple kinds of events?

• Different kinds of events can often be distinguished

- mortality: death from cancer, hearth disease, accident,…

- transition to employment: white collar job, blue collar job, managerial job,…

- nest leaving: marriage, unmarried cohabitation, living single,…

• Effect of covariates may differ for each type of event

- effect of smoking on lung cancer, cardiovascular disease, car accident,…

- effect of regional labour markets on type of employment

- effect of values on unmarried cohabitation or marriage

• Time-path of effect may differ for each type of event

- effect of smoking on lung cancer, cardiovascular disease

- effect of education on type of employment

• Lumping together events may yield biased substantial conclusions

Allison, 1982; Allison, 1984, 42-50.

Advanced Topics

Competing Risks: a classification of multiple kinds of events

Allison, 1984, 42-50.

• Competing Risks

occurrence of one type of event removes the individual from the risk of another

event type; occurrence of one event is assumed to be noninformative for

occurrence of other event types

• Examples of competing risks

- e.g. death from competing causes: death from lung cancer (smoking) removes

person from risk set of death from heart disease (obesitas)

- voluntary versus involuntary job terminations

- marital dissolution: divorce versus death of spouse

• Strategy for analysis

Analyse type-specific or cause-specific hazard functions. Implementation differs in

discrete time and continuous time!

Advanced Topics

Discrete-time Models for Competing Risks

• Cause-specific ‘hazard’ in discrete time

Define a discrete-time hazard rate for each type of event:

• Competing risks & maximum likelihood

Unlike the likelihood function for the continuous-time model, the discrete-time

likelihood cannot be factored into separate components for the m type of events.

Hence maximum likelihood estimation must be done simultaneously for all kinds of

events.

• Strategy for analysis

Whereas the single-event model can be estimated using a person-period file and a

binary logit model, the multiple-event model can be estimated using a person-period

file and a multinomial logit model.

Allison, 1982; Neels, 2005.

∑=

=≥===m

j

tjttj PPwheretTjJtTP1

),Pr(

Advanced Topics

Discrete-time Models for Competing Risks

• Example:

- 2 events (white collar job, blue collar job) versus censoring

- dependent variable with 3 categories:

0 = survival/censoring; 1 = event 1; 2 = event 2

- 2 simultaneous equations:

- survival is reference category for both equations

Allison, 1982; Neels, 2005.

≡=

=

=

≡=

=

=

222

111

)0(

)2(ln

)0(

)1(ln

ZXbYP

YP

ZXbYP

YP

kk

kk

Advanced Topics

Discrete-time Models for Competing Risks

* PART 2E COMPETING RISKS (schoolleaver.dta)

numlabel LOGIT_M3, add

logit TRANS i.INTERVAL national i.dip_11, or

mlogit LOGIT_M3 national national i.INTERVAL i.dip_11, rrr

Allison, 1982; Neels, 2005.

Advanced Topics

Discrete-time Models for Competing Risks

Allison, 1982; Neels, 2005.

SINGLE

DECREMENT*

COMPETING RISKS MODEL*

Employment White collar/Unemploy Blue collar/Unemploy

Exp(b) Sig Exp(b) Sig Exp(b) Sig

CONSTANT .274 ** .141 ** .086 ***

TURK 1.087 ns .578 *** 1.224 ***

NONE Ref. - .206 *** 2.779 **

PRIMARY (Origin) 1.280 ns .192 *** 3.973 ***

PRIMARY (Belgium) 1.212 ns .197 *** 3.692 ***

LSE (Origin) 1.217 ns .100 ** 3.954 ***

LSE Voc&Tech (Belgium) 1.422 ns .229 *** 4.331 ***

LSE Gen&Arts (Belgium) .936 ns .237 *** 2.617 **

HSE (Origin) 1.089 ns .618 ns 1.757 ns

HSE Vov/Tech (Belgium) 1.329 ns .239 *** 3.994 ***

HSE Gen/Arts (Belgium) 1.593 ns 1.013 ns 1.888 ns

Tertiary (Origin) .614 ns .177 ns 1.638 ns

Tertiary (Belgium) 1.262 ns REF. - REF. -

* Models include general specification of baseline hazard functions (parameter estimates omitted)

Advanced Topics

Continuous-time Models for Competing Risks

• Cause-specific hazard functions

suppose there are m different kinds of events. Let j = 1, 2, …, m be the index

distinguishing between the different kinds of events. Let Pj(t,t+s) denote the

probability that event type j occurs in the interval between t and t+s, given that the

individual is at risk at time t. Note that the individual is not at risk at time t is any of the

m events have occurred prior to t. The cause-specific hazard rate is defined as:

instantaneous risk of experiencing that type of event, given that the individuals is still

at risk (i.e. didn’t experience this or any of the competing events earlier)

• Cause-specific hazard rates in continuous time:

overall hazard is just the sum of the different cause-specific hazard rates

Allison, 1982; Allison, 1984, 42-50.

s

sttPth

j

sj

),(lim)(

0

+=

∑=

=m

j

j thth1

)()(

Advanced Topics

Continuous-time Models for Competing Risks

• Cause-specific hazard functions

each of the m cause-specific hazard can be expressed as a function of explanatory

variables. The most common specification is the proportional hazards model:

• Competing risks & partial-likelihood

partial likelihood estimation for covariate effects poses no particular problems

because likelihood function for the continuous-time model factors into a separate

component for each different kind of event. This implies that the covariate effects for

each type of event can be estimated separately.

• Strategy for analysis

estimate covariate effects for each of the m events as if there were only one kind of

event; events other than j are treated as if the individual were censored at the time

that the event occurred

Allison, 1982; Allison, 1984, 42-50.

kkjj XXthth ββ +++= ...)(log)(log 110

Advanced Topics

Continuous-time Models for Competing Risks

* CAUSE-SPECIFIC HAZARD FUNCTIONS: CANCER

stset MONTH_CONT, fail(CAUSE==1)

stcox i.EDUCATION

* CAUSE-SPECIFIC HAZARD FUNCTIONS: CIRCULAR DISEASES

stset MONTH_CONT, fail(CAUSE==2)

stcox i.EDUCATION

* CAUSE-SPECIFIC HAZARD FUNCTIONS: RESPIRATORY DISEASES

stset MONTH_CONT, fail(CAUSE==3)

stcox i.EDUCATION

* CAUSE-SPECIFIC HAZARD FUNCTIONS: OTHER DISEASES DISEASES

stset MONTH_CONT, fail(CAUSE==4)

stcox i.EDUCATION

Allison, 1982; Allison, 1984, 42-50.

Advanced Topics

Continuous-time Models for Competing Risks

• Similarity of covariate effects

null hypothesis that the set of coefficients associated with covariate effects is identical

across event types is tested by comparing goodness-of-fit statistics for the separate

models to that of the global model that does not distinguish between event types:

which is chi-square distributed with p(k-1) degrees of freedom, where p is the number

of predictors and k is the number of competing event-types.

Singer et al., 2003, 586-595.

∑ −−−−sevent type

model specificeventmodel global 22 LLLL

Advanced Topics

Continuous-time Models for Competing Risks

Overall Mortality: -2LL = 33664.643

Cause-specific mortality: -2LL = 33643.214

Cancer: -2LL = 12176.997

Circular: -2LL = 11287.542

Respiratory: -2LL = 3902.120

Other: -2LL = 6276.555

Difference in -2LL: 21.429 (df=12)

Effect of EDUCATION is significantly different for different causes of mortality

Allison, 1982; Allison, 1984, 42-50.

Advanced Topics

Competing Risks: alternative approach

• Treating competing risks as censoring may yield biased estimates ofcause-specific hazard functions: assumes that individuals would havesame risk of experiencing event had they not experienced competingevent (e.g. entry into marriage vs unmarried cohabitation)

• Alternative strategy: nested model

- fit a discrete-time of continuous-time hazard model for partnershipformation (event occurrence)

- estimate binary or multinomial logit model of partnership type forindividuals experiencing event

Allison, 1982; Neels, 2005.

Advanced Topics

Advanced Topics

1. Competing risks analysis

2. Repeated events and shared frailty

Repeated events and shared frailty

• Many events are repeatable in essence, e.g.

Job changes, births, marriages , divorces, arrests, convictions, visits to physician

• Spells

- interval between events for each individual, e.g.

| Observation period |

|---------------------x---------------------x---------------------x------------------|

0 Event 1 Event 2 Event 3 Censoring

- persons with 3 events have 4 spells, one of which is censored

- there is only 1 right censored interval per individual

- first interval may be left censored

Allison, 1984, 51-57.

Advanced Topics

Repeated events and shared frailty

• Strategy 1: separate analyses for each event

- e.g. analyse each birth interval separately

- useful if model is likely to differ from one event to another

e.g. first birth versus later births

- tedious and statistically inefficient if process is essentially similar

• Strategy 2: pooling spells

- treat interval between events for each individual as a separate observation:

i.e. generate ‘person-spell’-file

- estimate models for single- or multiple-decrement transitions

Allison, 1984, 51-57.

Advanced Topics

Repeated events and shared frailty

• Statistically independent spells

- people who are frequently arrested will continue to be frequently arrested

- spells are assumed to be statistically independent, conditional on covariates:

dependence between spells must be accounted for by covariates in the model

- assumption is likely to be violated

estimates of standard errors are biased downward

• Solutions?

- include covariates that reflect individuals’ prior history

e.g. number of prior arrests, length of prior intervals,…

- modify estimated standard errors to reflect N individuals rather than spells

- introduce individual as a stratification variable:

i.e. different baseline hazard functions are allowed per individual, but parameter

estimates are constrained to be the same

- estimate models incorporating random coefficients for individuals

Allison, 1984, 51-57; Allison, 2004.

Advanced Topics

Repeated events and shared frailty: discrete-time model* SPELLS NESTED IN PATIENTS: GENERATING PERSON-PERIOD-SPELL FILE

bysort patient: generate spell = _n

expand time

bysort patient spell: generate days = _n

generate infection = 0

replace infection = infect if time == days

* MODEL WITHOUT RANDOM EFFECT AT THE INDIVIDUAL LEVEL

gen months_lin = int(days/30)

gen months_qua = months_lin^2

cloglog infection months_lin months_qua age female, eform

* random-effects complementary-log-log model

xtset patient

xtcloglog infection months_lin months_qua age female, eform

* mixed-effects logistic regression (random intercept at the patient-level)

xtmelogit infection months_lin months_qua age female, or || patient:

predict log_frailtyxtmelogit, reffect

tab log_frailtyxtmelogit

gen frailtyxtmelogit = exp(log_frailtyxtmelogit)

tab frailtyxtmelogit

Allison, 1984, 51-57.

Advanced Topics

Repeated events and shared frailty: discrete-time model

Allison, 1984, 51-57.

Advanced Topics

Repeated events and shared frailty

* SHARED FRAILTY IN MODELING RECURRENT EVENTS

* RANDOM-EFFECTS COX MODELS (kidney.dta)

list patient time infect age female in 1/10

* Declaring data to be survival data (stset) and descriptives

stset time, fail(infect)

sts graph

sts graph, by(female)

sts graph, by(female) hazard

* COX REGRESSION WITHOUT FRAILTY

stcox age female

* COX REGRESSION WITH SHARED FRAILTY

stcox age female, shared(patient)

predict log_frailtycox, effects

tab log_frailtycox

gen frailtycox = exp(log_frailtycox)

tab frailtycox

Allison, 1984, 51-57; Allison, 2004.

Advanced Topics

Repeated events and shared frailty

Cox regression -- Breslow method for ties

No. of subjects = 76 Number of obs = 76No. of failures = 58Time at risk = 7424

LR chi2(2) = 6.67Log likelihood = -185.10993 Prob > chi2 = 0.0355

------------------------------------------------------------------------------_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------age | 1.002245 .0091153 0.25 0.805 .9845377 1.020271

female | .4499194 .1340786 -2.68 0.007 .2508832 .8068592------------------------------------------------------------------------------

Allison, 1984, 51-57; Allison, 2004.

0.0

1.0

2.0

3.0

4

0 200 400 600analysis time

female = 0 female = 1

Smoothed hazard estimates

0.0

00

.25

0.5

00.7

51.0

0

0 200 400 600analysis time

female = 0 female = 1

Kaplan-Meier survival estimates

Advanced Topics

Repeated events and shared frailty

Cox regression --

Breslow method for ties Number of obs = 76Gamma shared frailty Number of groups = 38

Group variable: patient

No. of subjects = 76 Obs per group: min = 2No. of failures = 58 avg = 2Time at risk = 7424 max = 2

Wald chi2(2) = 11.66Log likelihood = -181.97453 Prob > chi2 = 0.0029

------------------------------------------------------------------------------_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------age | 1.006202 .0120965 0.51 0.607 .9827701 1.030192

female | .2068678 .095708 -3.41 0.001 .0835376 .5122756-------------+----------------------------------------------------------------

theta | .4754497 .2673108------------------------------------------------------------------------------Likelihood-ratio test of theta=0: chibar2(01) = 6.27 Prob>=chibar2 = 0.006

Allison, 1984, 51-57; Allison, 2004.

Advanced Topics

Suggested Reading

Blossfeld, H.-P., K. Golsch and G. Rohwer. 2007. Event History Analysis with Stata.

London: LEA.

Cleves, M. A., W. W. Gould and R G. Gutierrez. 2010. An Introduction to Survival

Analysis using Stata. Third Edition. College Station: Stata Press.

Singer, J. D. and J. B. Willett. 2003. Applied Longitudinal Data Analysis. Oxford:

Oxford University Press.

Allison, 1984, 51-57; Allison, 2004.