quantifying disease outcomes readings jewell chapters 2 & 3 rothman and greenland chapter 3

Quantifying Disease Outcomes

Readings

• Jewell Chapters 2 & 3

• Rothman and Greenland Chapter 3

Types of disease outcomesRandom variables representing disease outcomes in

epidemiological studies may be• Continuous – IQ in the methylmercury study, Lung

Function in the “Home Allergen Study”• A count – number of wheeze episodes in the past

month in the Home Allergen Study• Categorical – whether the patient died in hospital, was

transferred to a nursing home or sent home• Binary – whether or not a study subject develops

cancer, whether or not the child has asthmaBinary outcomes are probably the most commonly

encountered in epidemiology and much of our course will focus on this case.

Quantifying the rates of disease More complicated than one might think! Good

study design requires careful consideration. E.g., • What proportion of women will be diagnosed

with breast cancer in the next year? • What proportion of women will be diagnosed

with breast cancer in their lifetime? • What proportion of women have breast cancer

now? Age affects all these definitions – we will come

back to this issue presently.

Prevalence vs Incidence (Jewell p10)

• Point Prevalence (prevalence) = proportion of a defined population who have the disease at a specified point in time

• Interval Prevalence = proportion who have the disease at any point in an interval of time

• Incidence Proportion = proportion of those at risk for a disease at the beginning of an interval who have it by the end of the interval. Also called Cumulative Incidence Proportion

Prevalence vs incidence (cont’d)

Adapted from Jewell Fig 2.1. Lines represent duration of disease

In a hypothetical starting population of 100 individuals, 6 have disease between times t0 and t1.

Point prevalence at t is 4/100 if case 4 is still considered at risk, 4/99 otherwise.

Incidence proportion in [t0,t1] is 4/98 (cases 1 and 4 were not at risk because they already had the disease)

Interpreting disease prevalence Caution needed - a rare chronic disease might have same

prevalence as common disease that kills quickly. E.g.

Prevalence can be very useful for non-fatal chronic conditions. E.g. prevalence of obesity (bmi>30%) is 22% in Australia, 34% in USA, 3% in Japan, 5% in China.

Ten year Chronic Heart Disease Rates from the Framingham Heart Study (Jewell Table 2.1)

Incidence Prevalence

Cholesterol CHD No CHD CHD No CHD

High 85 (75%) 462 (47%) 38 (54%) 371 (52%)

Low 28 (25%) 516 (53%) 33 (46%) 347 (48%)

Disease Rates Incidence proportions can be difficult to

interpret over long intervals of time:– Risk may vary within the interval– Doesn’t reflect variation in time to disease onset– Doesn’t account for loss to follow-up

A disease incidence rate in the time period from t0 to t1 can be defined as

0 1

0 1

# events in (t ,t )

population time at risk in (t ,t )h

Illustration

Point prevalence at

t=0 is 0/5

t=5 is 1/2

Incidence proportion from 0 to 5 is 3/5

Incidence rate in [0,5]: 3/(5+1+4+3+1)=3/14

Adapted from Jewell Fig 2.2. X denotes disease onset, O death.

Incidence rates and survival analysis A disease incidence rate has a close connection with

hazard functions from survival analysis. Consider an instantaneous incidence proportion obtained

by letting the time interval go to zero.

Let I(t) be the incidence proportion defined over the time period [0,T]. Then in absence of censoring,

This is the standard definition of a hazard function, with I(t) corresponding to a standard cdf.

0

1 # new cases in (t,t+ )( ) lim

#at risk for disease at time th t

0

1 N(I(t+ )-I(t)) I'(t)( ) lim

(1 ( )) (1 ( ))h t

N I t I t

Incidence rates (cont’d)

Standard survival analysis tools can be used to compute hazard and incidence functions

Mortality hazard for California Males in 1980 (Adapted from Jewell Fig 2.3)

• Incidence function estimated by 1-S(t), with S(t) estimated using Kaplan-Meier curve

• Nelson-Aalen estimate of hazard function

• Lifetable methods (very common)• Exponential models

Constant hazards – closed population

• Assume time to event follows an exponential distribution with rate h.

• Let Ti be the age the event occurs for subject i.

• The log-likelihood can be written as:

• And simple algebra shows that• These calculations assume a closed population where a

group of n individuals begin observation at time 0, no new individuals can enter, and individual leave the population only by experiencing the event.

1

log( )n

ii

l h hT

ˆ / ih n T

Constant hazards – open population

Open population: allows for new entry (not necessarily at age 0 - left truncation). Subjects can leave for reasons unrelated to occurrence of the event of interest (censoring). Assume exponential with rate h and let – Ti be the age that subject i leaves the population

– di be a censoring indicator (1 if experience event, 0 otherwise)

– Ei be the age when the subject entered the population.

1

i

The Likelihood is: exp

ˆand mle of h is h= / =d/R where

= is the person years contribued by subject i

d is the total #events and R is total person time at risk

ii

i

n Td

Ei

i

i i i

L h hdt

d R

R T E

Piecewise constant hazards

Let h(t) be hazard at time t and as before, let– Ti be the age that subject i leaves the population

– Ei be the age when the subject entered the population.

But assume hazards in different agegroups. Let hk be the hazard at agegroup k. Let δik be an indicator of whether the person experienced the event in agegroup k and rik be time at risk in agegroup k.

1

11 1

n n

k ik iki=1 i=1

The Likelihood is: ( ) exp ( )

exp

and mle of h is

or #events in agegroup k /person time at risk in

i

i

ik

n T

i Ei

n K K

k ikki k

L h T h t dt

h r

r

agegroup k

Hypothetical Example

Age at entry Age at onset Age at death (δi1 , δi21) (ri1 , ri21)

71 75 79 (0,1) (0,4)

65 - 72 (0,0) (5,2)

60 72 72 (0,1) (10,2)

61 79 80

69 72 75

62 67 68

64 - 77

Data for estimating constant hazards for [60,70) and [70,80)

Estimated hazard: agegroup [60,70)

agegroup [70,80):

Note: in practice might do more precise actuarial adjustments (e.g. half year contributions to time at risk)

Analysis via Poisson Regression • Create a line of data for each individual in each interval where they were at risk• Include agegroup as a binary covariate• Include log(PYR) as an offset• Poisson regression models in log-scale, so need to convert results to get estimated rates in each interval

# Poisson regression for Hypothetical Exampley= c(1,0,0, 0,1,0,1,0,1,1,0,0)age=c(1,0,1, 0,1,0,1,0,1,0,0,1)pyr=c(4,5,2,10,2,9,9,1,2,5,6,7)summary(glm(y~age,offset=log(pyr),family="poisson"))

Estimate Std. Error z value Pr(>|z|) (Intercept) -3.583 1.000 -3.584 0.000339age 1.712 1.118 1.531 0.125768

Real Example – arsenic in drinking water

SW Taiwan populationAgegrp PYR events22.5 2595529 727.5 1846189 1932.5 1402764 1737.5 1215899 4142.5 1191615 7547.5 1111810 11252.5 957985 16057.5 774836 20062.5 634758 25867.5 492203 23072.5 342767 19077.5 199630 10882.5 96293 45

High arsenic villageAgegrp PYR events 22.5 1861 027.5 987 032.5 928 037.5 759 042.5 758 047.5 815 052.5 798 157.5 544 462.5 401 367.5 236 172.5 126 177.5 70 082.5 59 0

Data extracted from public population and mortality records and cancer registry, then reported in terms of person years at risk (PYR) in 5-year agegroups, as well as numbers of cancers in each agegroup. Table shows agegroup midpoints.

Standardized Rates

Consider a comparison of female lung cancer incidence rates between US and Taiwan. This is hard because

– Age distributions vary between the two countries

– Incidence rates vary substantially with age

We’ll discuss three different ways to address this issue.

– Direct Standardization – recalculate # cases so as to calibrate to an appropriate external population

– Indirect Standardization – compute ratio of observed to expected cases, with expecteds computed with appropriate adjustment for the age, gender and ethnicity mix of the population of interest.

– Regression-based adjustments

Direct Standardization

Suppose hk , k=1,…K are the age-specific incidence rates for a population of interest. The following expression represents the average incidence rate that would be seen in a standard population that had age-specific person-years-at-risk of r1….rK

– Can replace PYR by population size in each agegroup

– Sometimes called external standardization.

– Often reported per 100,000 population

1 1

K K

S k k kk k

I r h r

Example- stomach cancer Denmark males 1988-92

Di=# new stomach cancer cases (1988, 1992) in ith agegroup,

y_i is hundreds of thousands of person-years-at-risk in (# males in the different age groups times 5)

w_i is the number of persons in the different age groups per 100,000 standard world population.

Age standardized incidence rate/100,000 world population for stomach cancers among Danish males which is 9.03

Age di yi wi wi *di/yi

0 0 749800 12000 0.005 0 695500 10000 0.0010 0 808900 9000 0.0015 1 931100 9000 0.01 20 2 1017500 8000 0.02 25 6 1032700 8000 0.05 30 4 955800 6000 0.02 35 16 946500 6000 0.10 40 34 1025500 6000 0.20 45 76 926900 6000 0.49 50 97 718900 5000 0.68 55 150 626800 4000 0.96 60 187 590800 4000 1.27 65 302 553100 3000 1.64 70 315 449900 2000 1.40 75 309 337200 1000 0.92 80 247 196200 500 0.63 85 152 115700 500 0.66

Back to arsenic exampleAverage incidence rate in whole pop:

Average rate in high arsenic village:

Village rate standardized to whole population:

Role of standardized rates of disease

• For a stand-alone epidemiological study, standardization is not so much of an issue. The biggest decision has to do with how to model the data.

• For vital statistics record keeping, it is very important. The International Agency for Research on Cancer (IARC) has a great website that defines many of the relevant terms and why they are important.

http://www-dep.iarc.fr/glossary.htm

Indirect standardization Example – Cape Cod, MA Whole State Cape Codagegroup pop cases pop expected 5-24 819538 20 12717 0 25-34 552659 768 8881 12 35-44 465950 3619 8601 67 45-54 306719 6014 5430 106 55-64 272295 7357 5809 157 65-74 262749 9723 6189 229 75-84 173447 6919 3604 144 85+ 68434 2013 1386 41

Total of 864 cases, expected 756. So SIR = 100*864/756=114.

Standardized Incidence Rate (SIR) is the ratio of observed new cases to the number expected if the population of interest experience disease at the same rate as a comparison population. SMR does the same calc, but for death.

An SIR for the arsenic exampleAgegroup

midpointTaiwan

pyrTaiwan

cancers Village pyrVillage

cancers Expected

22.5 2595529 7 1861 0 0.0050

27.5 1846189 19 987 0 0.0102

32.5 1402764 17 928 0 0.0112

37.5 1215899 41 759 0 0.0256

42.5 1191615 75 758 0 0.0477

47.5 1111810 112 815 0 0.0821

52.5 957985 160 798 1 0.1333

57.5 774836 200 544 4 0.1404

62.5 634758 258 401 3 0.1630

67.5 492203 230 236 1 0.1103

72.5 342767 190 126 1 0.0698

77.5 199630 108 70 0 0.0379

82.5 96293 45 59 0 0.0276

quantifying disease outcomes readings jewell chapters 2 & 3 rothman and greenland chapter 3

Documents

disease slide

disease incidence rate

rates of disease

duration of disease

common disease

incidence rates

rare chronic disease

time interval prevalence