case-crossover study

Post on 19-Feb-2017

846 Views

Category:

Health & Medicine

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Analysis of Time-series DataCase-crossover Study

Jinseob Kim

July 17, 2015

Jinseob Kim Analysis of Time-series Data July 17, 2015 1 / 30

Contents

1 ConceptsIndividual dataDesign

2 Conditional logistic regressionReview Basic linear regressionLogistic regressionConditional logistic regression

3 PracticeIssuesIn R

Jinseob Kim Analysis of Time-series Data July 17, 2015 2 / 30

Objective

1 Individual risk VS population risk

2 Case-crossover design의 개념

3 주의사항

4 적용: season package in R

Jinseob Kim Analysis of Time-series Data July 17, 2015 3 / 30

Concepts

Contents

1 ConceptsIndividual dataDesign

2 Conditional logistic regressionReview Basic linear regressionLogistic regressionConditional logistic regression

3 PracticeIssuesIn R

Jinseob Kim Analysis of Time-series Data July 17, 2015 4 / 30

Concepts Individual data

Two approaches to see the relationship between weatherand health outcome

Population based study

Y: # events (daily death counts or # hospital admissions)

X: temperature

Estimates pop’n risk (% change in daily death counts correspondingto the change in temperature)

Individual based study

Y : 1 if an event occurs, 0 otherwise

X : temperature

Estimates individual risk (% change in individual probability of eventor odds ratio corresponding to the change in temperature)

Jinseob Kim Analysis of Time-series Data July 17, 2015 5 / 30

Concepts Individual data

Data structure change

(Year,week,case)

(2006,1,20) : 1 case

(Year,week,event)

(2006,1,1), (2006,1,1), · · · , (2006,1,1) : 20개 case

(2005,53,0), · · · , (2005,53,0), (2006,2,0), · · · , (2006,2,0) : controls..

Jinseob Kim Analysis of Time-series Data July 17, 2015 6 / 30

Concepts Design

Case + Crossover

Case: 환자만 이용.

Crossover: 환자의 다른 시점이 대조군.

Jinseob Kim Analysis of Time-series Data July 17, 2015 7 / 30

Concepts Design

If average (air pollution) of controls < average (air pollution) of casedays..

We conclude that the event is associated with higher values of airpollution

Jinseob Kim Analysis of Time-series Data July 17, 2015 8 / 30

Concepts Design

Various control day

Time trend로 인한 bias 보정

Jinseob Kim Analysis of Time-series Data July 17, 2015 9 / 30

Conditional logistic regression

Contents

1 ConceptsIndividual dataDesign

2 Conditional logistic regressionReview Basic linear regressionLogistic regressionConditional logistic regression

3 PracticeIssuesIn R

Jinseob Kim Analysis of Time-series Data July 17, 2015 10 / 30

Conditional logistic regression Review Basic linear regression

Remind

β estimation in linear regression

1 Ordinary Least Square(OLS): semi-parametric

2 Maximum Likelihood Estimator(MLE): parametric

Jinseob Kim Analysis of Time-series Data July 17, 2015 11 / 30

Conditional logistic regression Review Basic linear regression

Least Square(최소제곱법)

제곱합을 최소로: y 정규성에 대한 가정 필요없다.

Figure: OLS Fitting

Jinseob Kim Analysis of Time-series Data July 17, 2015 12 / 30

Conditional logistic regression Review Basic linear regression

Likelihood??

가능도(likelihood) VS 확률(probability)

Discrete: 가능도 = 확률 - 주사위 던져 1나올 확률은 16

Continuous: 가능도 != 확률 - 0∼1 에서 숫자 하나 뽑았을 때 0.7일확률은 0...

Figure: Likelihood

Jinseob Kim Analysis of Time-series Data July 17, 2015 13 / 30

Conditional logistic regression Review Basic linear regression

Maximum likelihood estimator(MLE)

최대가능도추정량: ε1, · · · , εn이 서로 독립이라하자.

1 각각의 가능도 함수를 구한다.

2 가능도를 전부 곱하면 전체 사건의 가능도 (독립이니까)

3 가능도를 최대로 하는 β를 구한다.

Jinseob Kim Analysis of Time-series Data July 17, 2015 14 / 30

Conditional logistic regression Review Basic linear regression

MLE: 최대가능도추정량

데이터가 일어날 가능성을 최대로: y또는 ε 분포가정필요.

Jinseob Kim Analysis of Time-series Data July 17, 2015 15 / 30

Conditional logistic regression Review Basic linear regression

Logistic function: MLE

Figure: Fitting Logistic Function

Jinseob Kim Analysis of Time-series Data July 17, 2015 16 / 30

Conditional logistic regression Review Basic linear regression

LRT? Ward? score?

Likelihood Ratio Test VS Ward test VS score test

1 통계적 유의성 판단하는 방법들.

2 가능도비교 VS 베타값비교 VS 기울기비교/

Jinseob Kim Analysis of Time-series Data July 17, 2015 17 / 30

Conditional logistic regression Review Basic linear regression

비교

Figure: Comparison

Jinseob Kim Analysis of Time-series Data July 17, 2015 18 / 30

Conditional logistic regression Logistic regression

Model

Log(pi

1− pi) = β0 + β1 · xi1

pi = P(Yi = 1) =exp(β0 + β1 · xi1)

1 + exp(β0 + β1 · xi1)

P(Yi = 0) =1

1 + exp(β0 + β1 · xi1)

P(Yi = yi ) = (exp(β0 + β1 · xi1)

1 + exp(β0 + β1 · xi1))yi (

1

1 + exp(β0 + β1 · xi1))1−yi

Jinseob Kim Analysis of Time-series Data July 17, 2015 19 / 30

Conditional logistic regression Logistic regression

Likelihood

Likelihood=

n∏i=1

P(Yi = yi ) =n∏

i=1

(exp(β0 + β1 · xi1)

1 + exp(β0 + β1 · xi1))yi (

1

1 + exp(β0 + β1 · xi1))1−yi

개인별로 가능도(데이터의 상황이 나올 확률)이 나온다.

그것들을 다 곱하면 Likelihood

이것을 최소로 하는 β를 구하는 것.

Case나 Control이나 따로따로 Likelihood를 구한다.

Jinseob Kim Analysis of Time-series Data July 17, 2015 20 / 30

Conditional logistic regression Conditional logistic regression

Conditional likelihood

Matched case-control set

Case와 그의 control들(1:1 or 1:N)이 한 쌍!!

쌍별로 likelihood가 나온다.

쌍별로 우리의 데이터를 볼 가능성을 계산.

모든 쌍에 대해 다 곱하면 전체 Likelihood

Jinseob Kim Analysis of Time-series Data July 17, 2015 21 / 30

Conditional logistic regression Conditional logistic regression

Definition

ith strata(1 ≤ i ≤ N): 1 case(이름:갑), ni control이라 하자.

Conditional likelihood of ith strata=

Li = P(갑이 case고 나머지가 control|case 1명&control ni 명)

Total likelihood=

N∏i=1

Li

Jinseob Kim Analysis of Time-series Data July 17, 2015 22 / 30

Practice

Contents

1 ConceptsIndividual dataDesign

2 Conditional logistic regressionReview Basic linear regressionLogistic regressionConditional logistic regression

3 PracticeIssuesIn R

Jinseob Kim Analysis of Time-series Data July 17, 2015 23 / 30

Practice Issues

Control 확실하냐?

앞 뒤 7일, 14일 등.. control이 확실??

Exposure → Disease가 짧아야..

Exposure 가 축적되지 않아야..

급성질환, 폭로의 일시적 효과 (ex:폭염과 사망)

Jinseob Kim Analysis of Time-series Data July 17, 2015 24 / 30

Practice In R

season package

> library(season)

> data(CVDdaily) # cardiovascular disease data

> CVDdaily=subset(CVDdaily,date<=as.Date('1987-12-31')) # subset for example

> head(CVDdaily)

date cvd dow tmpd o3mean o3tmean Mon Tue Wed Thu Fri Sat

3 1987-01-01 55 Thursday 54.50 -16.0073 -15.89619 0 0 0 1 0 0

5 1987-01-02 73 Friday 58.50 -11.6595 -11.19102 0 0 0 0 1 0

9 1987-01-03 64 Saturday 55.25 -10.3241 -10.51787 0 0 0 0 0 1

12 1987-01-04 57 Sunday 54.75 -18.6471 -18.27014 0 0 0 0 0 0

15 1987-01-05 56 Monday 54.50 -17.5291 -17.13201 1 0 0 0 0 0

18 1987-01-06 65 Tuesday 49.75 -22.7846 -22.74711 0 1 0 0 0 0

month winter spring summer autumn

3 1 1 0 0 0

5 1 1 0 0 0

9 1 1 0 0 0

12 1 1 0 0 0

15 1 1 0 0 0

18 1 1 0 0 0

Jinseob Kim Analysis of Time-series Data July 17, 2015 25 / 30

Practice In R

casecross()

> # Effect of ozone on CVD death

> model1 = casecross(cvd ~ o3mean+tmpd+Mon+Tue+Wed+Thu+Fri+Sat, data=CVDdaily)

> # match on day of the week

> model2 = casecross(cvd ~ o3mean+tmpd,matchdow=TRUE, data=CVDdaily)

> # match on temperature to within a degree

> model3 = casecross(cvd ~ o3mean+Mon+Tue+Wed+Thu+Fri+Sat, data=CVDdaily, matchconf='tmpd', confrange=1)

Jinseob Kim Analysis of Time-series Data July 17, 2015 26 / 30

Practice In R

casecross(formula = cvd ~ o3mean + tmpd + Mon + Tue + Wed + Thu +

Fri + Sat, data = CVDdaily, exclusion = 2, stratalength = 28,

matchdow = FALSE, usefinalwindow = FALSE, matchconf = "",

confrange = 0, stratamonth = FALSE)

Time-stratified case-crossover with a stratum length of 28 days

Total number of cases 17502

Number of case days with available control days 364

Average number of control days per case day 23.2

Parameter Estimates:

coef exp(coef) se(coef) z Pr(>|z|)

o3mean -0.002882613 0.9971215 0.001128975 -2.55330077 0.01067073

tmpd 0.001461400 1.0014625 0.001981047 0.73769030 0.46070267

Mon 0.042733425 1.0436596 0.028942815 1.47647783 0.13981566

Tue 0.057910712 1.0596204 0.028772745 2.01269332 0.04414690

Wed -0.010008025 0.9900419 0.029171937 -0.34307029 0.73154558

Thu -0.016790296 0.9833499 0.029455877 -0.57001513 0.56866744

Fri 0.027247952 1.0276226 0.029173235 0.93400517 0.35030123

Sat 0.001855841 1.0018576 0.028900116 0.06421568 0.94879849

Jinseob Kim Analysis of Time-series Data July 17, 2015 27 / 30

Practice In R

casecross(formula = cvd ~ o3mean + tmpd, data = CVDdaily, matchdow = TRUE,

exclusion = 2, stratalength = 28, usefinalwindow = FALSE,

matchconf = "", confrange = 0, stratamonth = FALSE)

Time-stratified case-crossover with a stratum length of 28 days

Matched on day of the week

Total number of cases 17502

Number of case days with available control days 364

Average number of control days per case day 3

Parameter Estimates:

coef exp(coef) se(coef) z Pr(>|z|)

o3mean -0.0030752572 0.9969295 0.001188540 -2.5874238 0.009669658

tmpd -0.0004095116 0.9995906 0.002131744 -0.1921017 0.847662557

Jinseob Kim Analysis of Time-series Data July 17, 2015 28 / 30

Practice In R

casecross(formula = cvd ~ o3mean + Mon + Tue + Wed + Thu + Fri +

Sat, data = CVDdaily, matchconf = "tmpd", confrange = 1,

exclusion = 2, stratalength = 28, matchdow = FALSE, usefinalwindow = FALSE,

stratamonth = FALSE)

Time-stratified case-crossover with a stratum length of 28 days

Matched on tmpd plus/minus 1

Total number of cases 15180

Number of case days with available control days 318

Average number of control days per case day 4.9

Parameter Estimates:

coef exp(coef) se(coef) z Pr(>|z|)

o3mean -0.003238583 0.9967667 0.00131839 -2.4564691 1.403099e-02

Mon 0.182058170 1.1996840 0.03577818 5.0885255 3.608582e-07

Tue 0.144181049 1.1550932 0.03563272 4.0463108 5.203115e-05

Wed 0.099443480 1.1045560 0.03554924 2.7973451 5.152447e-03

Thu 0.088518237 1.0925542 0.03459482 2.5587140 1.050601e-02

Fri 0.108107305 1.1141673 0.03437323 3.1451022 1.660288e-03

Sat 0.023660066 1.0239422 0.03525152 0.6711786 5.021068e-01

Jinseob Kim Analysis of Time-series Data July 17, 2015 29 / 30

Practice In R

END

Email : secondmath85@gmail.com

Jinseob Kim Analysis of Time-series Data July 17, 2015 30 / 30

top related