event history analysis: introductions sociology 229 event history analysis class 1 copyright © 2008...
TRANSCRIPT
Event History Analysis: Introductions
Sociology 229 Event History Analysis
Class 1
Copyright © 2008 by Evan Schofer
Do not copy or distribute without permission
Announcements
• Agenda• Introductions• Review Syllabus• Short Intro lecture• Break• Optional statistics review session (~2 hours)
Regression and EHA: Examples
• Medical Research on Drug Efficacy
• Question #1: Do patients with larger doses of a drug have lower cholesterol?
• Approach: OLS Regression• If assumptions are met, OLS is appropriate• Independent Variable = dosage (“level” of drug)• Dependent Variable = cholesterol (“level”)
Regression Example: CholesterolRelationship between level of
X and Y is modeled as a linear function:
Y = a + bX + e
300
250
200
150
100
0 10 20 30 40 50 60 70 Drug Dosage (mg)
Ch
oles
tero
l Lev
el
Example 2: Drug & Mortality
• Suppose a different question:
• Does increased drug dosage reduce the incidence of mortality among patients?
• The dependent variable has a different character
• 1. Whereas cholesterol is measured as a “level” (continuously), mortality is “discrete”
• Either the patient lives or they don’t (not a “level”)
• 2. Also, TIMING is an issue• Not just if a patient survives, but how long• A drug that extends life is good, even if patients die
Logit/Probit Strategies• Research strategies to address this problem:
• 1. Use a non-linear regression model for discrete outcomes: Logit, Probit, etc.
• Dependent variable is a dummy for patient mortality• Look for relationship between dosage and mortality
• Benefit: Easy. An analog of regression
• Limitation: Doesn’t take timing into account• All patients that die have the same influence on the
model (whether they live 5 days or 20 years due to the drug dosage).
Logit/Probit Strategy: Visual
Relationship between level of X and the discrete
variable Y is modeled as a non-linear function
Yes
No
0 10 20 30 40 50 60 70 Drug Dosage (mg)
Mor
tali
ty
Drug & Mortality: OLS Regression
• Option #2: Use OLS regression to model the time elapsed (duration) until mortality– Rather than ask “did they live or die”
(logit/probit), you ask “how long did they live”?• Compute a variable that reflects the time until mortality
(in relevant time units – e.g., months since drug therapy is started)
• Model time as the dependent variable• Observe: Do patients with high drug doses die later
than ones with low doses?
OLS Duration Strategy: Visual
Q: Where do you put individuals
who were alive at the end of the
study?
80
60
40
20
0
0 10 20 30 40 50 60 70 Drug Dosage (mg)
Mon
ths
Un
til M
orta
lity
Drug & Mortality: OLS Regression
• Problem #1: What about patients who don’t experience mortality during study?
• This is called “censored data”• If study is 80 months, you know that Y>80…
– But, you don’t have an exact value
• What do you do?– Treat them as experiencing mortality at the very end of the
study? Or approximate time of mortality?– Exclude them? NO! That selects on the dependent variable!
• Possible solution: Use models for censored data– Ex: tobit model; “censored normal regression
» Stata: tobit, cnreg.
Drug & Mortality: OLS Regression
• Problem #2: Temporal data often violates normality assumption of OLS regression
• Often violations are quite bad• “Censored” data is a surmountable problem, but
normality violation is usually not• So – we shouldn’t typically use OLS
Drug and Mortality: EHA Strategy• Event History Analysis (EHA) provides purchase on
this exact type of problem• And others, as well
• In essence, EHA models a dependent variable that reflects both:– 1. Whether or not a patient experiences mortality (like
logit),
and… – 2. When it occurs (like a OLS regression of duration)
• Note: This information is typically encoded in 2 or more variables
EHA: Overview and Terminology
• EHA is referred to as “dynamic” modeling• i.e., addresses the timing of outcomes: rates
• Dependent variable is best conceptualized as a rate of some occurrence
• Not a “level” or “amount” as in OLS regression• Think: “How fast?” “How often?”
• The “occurrence” may be something that can occur only once for each case: e.g., mortality
• Or, it may be repeatable: e.g., marriages, strategic alliances.
EHA: Overview
• EHA involves both descriptive and parametric analysis of data
• Just like regression• Scatterplots, partialplots = descriptive• OLS model/hypothesis tests = parametric
• Descriptive analyses/plots• Allow description of the overall rate of some outcome• For all cases, or for various subgroups
• Parametric Models • Allow hypothesis testing about variables that affect
rate (and can include control variables).
EHA: Types of Questions
• Some types of questions EHA can address:
• 1. Mortality: Does drug dosage reduce rates?• Does “rate” decrease with larger doses?• Also: control for race, gender, treatment options, etc
• 2. Life stage transitions: timing of marriage• Is rate affected by gender, class, religion?
• 3. Organizational mortality• Is rate affected by size, historical era, competition?
• 4. Inter-state war• Is rate affected by economic, political factors?