lecture 21: poisson regression log-linear regression

29
Lecture 21: poisson regression log-linear regression BMTRY 701 Biostatistical Methods II

Upload: wenda

Post on 22-Jan-2016

114 views

Category:

Documents


6 download

DESCRIPTION

Lecture 21: poisson regression log-linear regression. BMTRY 701 Biostatistical Methods II. Poisson distribution. Used for count data generally, rare events in space or time upper limit is theoretically infinite Examples: earthquakes, hurricanes cancer incidence (spatial) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 21: poisson regression log-linear regression

Lecture 21:poisson regressionlog-linear regression

BMTRY 701

Biostatistical Methods II

Page 2: Lecture 21: poisson regression log-linear regression

Poisson distribution

Used for count data generally, rare events in space or time upper limit is theoretically infinite Examples:

• earthquakes, hurricanes• cancer incidence (spatial)• absences in school year• AIDS deaths in a region

Assessing disease in different groups:• Probability, Risk, Rate, Incidence, Prevalence

Page 3: Lecture 21: poisson regression log-linear regression

The Poisson distribution

Probability mass function

Approximates a binomial for rare event Notice it has only ONE parameter: λ Mean = variance = λ

! )(

k

ekXP

k

Page 4: Lecture 21: poisson regression log-linear regression

Simple poisson distribution example

The infection rate at a Neonatal Intensive Care Unit (NICU) is typically expressed as a number of infections per patient days. This is obviously counting a number of events across both time and patients.

assume that the probability of getting an infection over a short time period is proportional to the length of the time period. In other words, a patient who stays one hour in the NICU has twice the risk of a single infection as a patient who stays 30 minutes.

assume that for a small enough interval, the probability of getting two infections is negligible.

assume that the probability of infection does not change over time or over infants.

assume independence.• The probability of seeing an infection in one child does not increase or

decrease the probability of seeing an infection in another child. • If an infant gets an infection during one time interval, it doesn't change

the probability that he or she will get another infection during a later time interval.

Page 5: Lecture 21: poisson regression log-linear regression

Poisson regression

Based on the idea that the log of probability of disease is a linear function of risk factors

The rate ratio (“relative risk”) is modeled

Interpretation of slope:

10

1

101001

)log(

)0*()1*()log()log(

r

r

rr

Groupr 101)log(

Page 6: Lecture 21: poisson regression log-linear regression

Implementation

ri is the rate

Often we observe• a number of events• a geographic region, time, or number of person-years

Need to account for these differences• rates based on smaller “exposure” are less precise• adjustment is made

Page 7: Lecture 21: poisson regression log-linear regression

Implementation

Unless there is uniform time, space, etc., the following is generally implemented:

)(totalGroup )(cases

Group )(total)(cases

Group total

count

Group ) i

(r

iii

iii

ii

i

i

loglog

loglog

log

log

10

10

10

10

“OFFSET”

Page 8: Lecture 21: poisson regression log-linear regression

Offset term

Notice: NO COEFFICIENT on offset Adjusts for population size or space Example: breast cancer incidence per county in

south carolina• cases are the number of women (& men) diagnosed

within in a county in SC in one year.• the offset would be the population size in the county

in the year (probably estimated)

Page 9: Lecture 21: poisson regression log-linear regression

Caveat

Standard poisson regression relies on poisson assumption about the variance

If events tend to occur in clusters, than there is “overdispersion”

This leads to a more general form of model: log-linear model (later)

Page 10: Lecture 21: poisson regression log-linear regression

Example: Catheter-Related Bloodstream Infections in the ICU (Critical Care Medicine, 2004).

Objective: To determine whether a multi-facted systems intervention would eliminate catheter-related bloodstream infections (CR-BSIs)

Design: prospective cohort in surgical ICU at JHU including all patients with central venous catheter in ICU.

Two ICUs Interventions:

• educating staff• creating catheter insertion cart• asking providers daily if catheters could be removed• implementing checklist to ensure adherence to guidelines• empowering nurses to stop catheter insertion if violation of

guidelines was observed.

Page 11: Lecture 21: poisson regression log-linear regression

Example: Catheter-Related Bloodstream Infections in the ICU (Critical Care Medicine, 2004).

Analysis• Poisson regression• Outcome is rate of CR-BSIs• Data structure

number of infections per quarter in ICU number of catheter days (counting every patient who has

catheter at 12am each day). Patients each counted only once

indicator of control vs. intervention ICU

• Intervention not implemented until 1st quarter 1999.

Page 12: Lecture 21: poisson regression log-linear regression

Dataset. list

+-------------------------------------------------------------+ | quarter ncase cathdays rate dataset quartern | |-------------------------------------------------------------| 1. | Qtr1-98 6 1057 5.68 1 1 | 2. | Qtr2-98 4 1018 3.93 1 2 | 3. | Qtr3-98 10 899 11.12 1 3 | 4. | Qtr4-98 8 952 8.4 1 4 | 5. | Qtr1-99 3 952 3.15 1 5 | |-------------------------------------------------------------| 6. | Qtr2-99 10 939 10.65 1 6 | 7. | Qtr3-99 5 1045 4.78 1 7 | 8. | Qtr4-99 9 927 9.71 1 8 | 9. | Qtr1-00 7 1060 6.6 1 9 | 10. | Qtr2-00 7 1094 6.4 1 10 | |-------------------------------------------------------------| 11. | Qtr3-00 5 850 5.88 1 11 | 12. | Qtr4-00 10 822 12.17 1 12 | 13. | Qtr1-01 11 868 12.67 1 13 | 14. | Qtr2-01 4 830 4.82 1 14 | 15. | Qtr3-01 4 603 6.63 1 15 | |-------------------------------------------------------------| 16. | Qtr4-01 5 551 9.07 1 16 |

Page 13: Lecture 21: poisson regression log-linear regression

Observed Data

5 10 15 20

05

10

15

20

Quarter

Ra

te o

f In

fect

ion

pe

r 1

00

0 c

ath

ete

r d

ays

Intervention ICUControl ICU

Page 14: Lecture 21: poisson regression log-linear regression

R code

data <- read.csv("csicu7.csv")

plot(data$quartern, data$rate, xlab="Quarter", ylab="Rate of Infection per 1000 catheter days", pch=16)

points(data$quartern[data$dataset==1], data$rate[data$dataset==1], pch=16, col=2)

lines(data$quartern[data$dataset==0], data$rate[data$dataset==0], col=1)

lines(data$quartern[data$dataset==1], data$rate[data$dataset==1], col=2)

legend(12,22, c("Intervention ICU","Control ICU"), col=c(1,2), pch=c(16,16))

abline(v=5, lty=3)

Page 15: Lecture 21: poisson regression log-linear regression

Estimating the Poisson regression

Want to model change in rates However, the first 4 quarters there was no

intervention. Based on the observed data and on the data

structure, what model is appropriate?

Page 16: Lecture 21: poisson regression log-linear regression

Poisson regression model

What is the model for• IV=0 and quarter<5?• IV=0 and quarter≥5?

• IV=1 and quarter<5?• IV=1 and quarter≥5?

)log(**

)log(

65

4321

iiiii

iiii

cathdayssplineIVquarterIV

splinequarterIVr

Page 17: Lecture 21: poisson regression log-linear regression

R code

ncase <- data$ncasecathdays <- data$cathdayscontrol <- data$datasetintervention <- 1- controlquartern <- data$quartern

# create knot for spline modelk1 <- ifelse(quartern>5,quartern-5,0)

# FIT MODEL WITH INTERACTIONS WITH TIME FOR BOTH GROUPSreg <- glm(ncase~intervention*quartern+ intervention*k1,

family=poisson, offset=log(cathdays))summary(reg)

Page 18: Lecture 21: poisson regression log-linear regression

ResultsCall:glm(formula = ncase ~ intervention * quartern + intervention * k1, family = poisson, offset = log(cathdays))

Deviance Residuals: Min 1Q Median 3Q Max -3.6005 -0.8439 -0.2368 0.6349 2.4233

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.20386 0.37944 -13.715 <2e-16 ***intervention 0.73339 0.45986 1.595 0.111 quartern 0.07517 0.09148 0.822 0.411 k1 -0.08774 0.10365 -0.847 0.397 intervention:quartern -0.02874 0.11302 -0.254 0.799 intervention:k1 -0.08355 0.13080 -0.639 0.523 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 108.489 on 39 degrees of freedomResidual deviance: 61.317 on 34 degrees of freedomAIC: 213.76

Page 19: Lecture 21: poisson regression log-linear regression

Fitted model, rate scale

5 10 15 20

05

10

15

20

Quarter

Ra

te o

f In

fect

ion

pe

r 1

00

0 c

ath

ete

r d

ays

Intervention ICUControl ICU

Page 20: Lecture 21: poisson regression log-linear regression

R code

fit.early.0 <- b[1] + b[3]*seq(1,5,1)fit.late.0 <- (b[1]-b[4]*5) + (b[3]+b[4])*seq(5,20,1)fit.early.1 <- (b[1]+b[2]) + (b[3]+b[5])*seq(1,5,1)fit.late.1 <- (b[1]+b[2]-b[4]*5-b[6]*5) +

(b[3]+b[4]+b[5]+b[6])*seq(5,20,1)

fit.early.0rate.early.0 <- exp(fit.early.0)*1000rate.early.0

rate.early.1 <- exp(fit.early.1)*1000rate.late.0 <- exp(fit.late.0)*1000rate.late.1 <- exp(fit.late.1)*1000

# add lines to plot for fitted control ICUlines(seq(1,5,1), rate.early.0, col=2)lines(seq(5,20,1), rate.late.0, col=2)# add lines to plot for fitted intervention ICUlines(seq(1,5,1), rate.early.1, col=1)lines(seq(5,20,1), rate.late.1, col=1)

Page 21: Lecture 21: poisson regression log-linear regression

Fitted model, linear predictor scale

5 10 15 20

-6.0

-5.5

-5.0

-4.5

Quarter

Lin

ea

r P

red

icto

r

Intervention ICUControl ICU

Page 22: Lecture 21: poisson regression log-linear regression

Real question

Is the change in infection rates different in the two ICUs?

That is, are the slopes after Q5 different? How to test that:

• slope in control ICU: β3 + β4

• slope in intervention ICU: β3 + β4 + β5 + β6

What is the hypothesis test?

Page 23: Lecture 21: poisson regression log-linear regression

Linear Combination of Coefficients

> estimable(reg, c(0,0,0,0,1,1)) Estimate Std. Error X^2 value DF Pr(>|X^2|)(0 0 0 0 1 1) -0.1122858 0.03091206 13.19452 1 0.0002807688

Page 24: Lecture 21: poisson regression log-linear regression

Example: Breast Cancer Incidence in SC

Cunningham et al. Hypothesize that there are differences in

subtypes of breast cancer by race• ER + vs. ER-• Grades 1, 2, 3• Stage 1, 2, 3, 4

Incidence of breast cancer varies by age Data:

• Tumor registry data for SC (and Ohio)• Census data for SC

Page 25: Lecture 21: poisson regression log-linear regression
Page 26: Lecture 21: poisson regression log-linear regression

Poisson modeling

Rate of incidence per cancer type Modeled as a function of ER, grade and race

> summary(reg1)

Call:glm(formula = nc ~ age + age2 + age3 + bl + er + gr + age * bl + age2 * bl + age3 * bl + age * er + age2 * er + age3 * er + age * gr + age2 * gr + age3 * gr + bl * er + bl * gr + er * gr, family = poisson, offset = log(9 * popn))

Page 27: Lecture 21: poisson regression log-linear regression

Results

Page 28: Lecture 21: poisson regression log-linear regression

Confidence Intervals0

2040

Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er+, grade 1

0.0

2.0

Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er-, grade 1

020

50

Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er unk, grade 1

030

60Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er+, grade 2

04

8

Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er-, grade 2

040

80

Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er unk, grade 20

2040

Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er+, grade 3

020

40

Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er-, grade 3

030

60

Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er unk, grade 3

05

15

Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er+, grade unk

02

4

Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er-, grade unk

040

80

Age Category

Inci

denc

e (p

er 1

00K

)

10-1

415

-19

20-2

425

-29

30-3

435

-39

40-4

445

-49

50-5

455

-59

60-6

465

-69

70-7

475

-79

80-8

485

+

er unk, grade unk

Page 29: Lecture 21: poisson regression log-linear regression

Incidence Ratio for AA vs. EA