developing risk prediction models using nested case ... prediction...harrell (1996), also pencina...

40
Developing Risk Prediction Models Using Nested Case-Control Data 28 May 2015 Agus Salim () VicBiostat Seminar, 28 May 2015 1 / 40

Upload: others

Post on 21-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Developing Risk Prediction Models Using NestedCase-Control Data

28 May 2015

Agus Salim () VicBiostat Seminar, 28 May 2015 1 / 40

Page 2: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Background

Recently, there has been an explosion of prediction models developedto predict risk of various diseases

A simple ”risk prediction models” search in PubMed reveals thatthere are 300 publications in the last 10 years alone.

The number is rapidly growing, in a large extent due to discoveries ofnew biomarkers from ’omics fields.

Among others, risk prediction models are useful for identifying andprioritizing high-risk individuals for interventions.

Agus Salim () VicBiostat Seminar, 28 May 2015 2 / 40

Page 3: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

ATP III risk calculator

The most widely-used risk prediction model is arguably the AdultTreatment Panel III (ATP-III) risk calculator for estimating 10-yearrisk of coronary heart disease

It has been used as the basis of statin (LDL-lowering drug) for thosewith predicted risk > 20%.

http://cvdrisk.nhlbi.nih.gov/calculator.asp

Agus Salim () VicBiostat Seminar, 28 May 2015 3 / 40

Page 4: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

ATP III risk calculator

The ATP III calculator is based on a risk prediction model developedusing data from the Framingham cohort.

Cohort study is used as alternatives such as case-control cannotunbiasedly estimate absolute risk.

The underlying statistical model is the proportional hazard (PH)model.(https://www.framinghamheartstudy.org/risk-functions/coronary-heart-disease/hard-10-year-risk.php).

Agus Salim () VicBiostat Seminar, 28 May 2015 4 / 40

Page 5: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

A quick review of PH model

The hazard rate, λ(t) = the probability of experiencing event at t,given that subject survives a moment before.

λ(t) = P(T ∈ [t, t + δt)|T ≥ t − δt)

=f (t)

S(t)

Agus Salim () VicBiostat Seminar, 28 May 2015 5 / 40

Page 6: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

A quick review of PH model

Hazard rate depends on individuals’ characteristics, i.e., individualwith more risk factors should have larger hazard rates.

λi (t) = λ0(t)exp[xTi β + zTi γ]

λ0(t) is called baseline hazard function

β and γ are log hazard ratios and reflects the effect of exposure onhazard.

Agus Salim () VicBiostat Seminar, 28 May 2015 6 / 40

Page 7: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

A quick review of PH model

Suppose we observe K unique event times at t1, t2, . . . tK forindividuals 1, 2, . . . k respectively.

The probability of observing an event occurred at ti for individual i ,given that event times are unique (there can only be one event at tj)is,

exp[xTi β + zTi γ]∑j ;j∈<i

exp[xTj β + zTj γ]

Where <i is the riskset at time ti (the subset of subjects that werestill at risk moment before ti ).

Agus Salim () VicBiostat Seminar, 28 May 2015 7 / 40

Page 8: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

A quick review of PH model

Assuming independence between the different events, the parametersare estimated by maximizing the partial likelihood,

L(β, γ) =K∏i=1

exp[xTi β + zTi γ]∑j ;j∈<i

exp[xTj β + zTj γ]

Note:

The log hazard ratios can be estimated without the need to specifythe baseline hazard function.

The likelihood is simply a product over all different event times.

A particular individual can potentially be in multiple risk-set.

Agus Salim () VicBiostat Seminar, 28 May 2015 8 / 40

Page 9: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

A quick review of PH model

But, if we are interested in the absolute risk, we will need to estimatethe baseline hazard function

Given β and γ, we use the Breslow estimator which assumes thecumulative baseline hazard function is a step function with ’jumps’ atthe unique event times,

Λ0(t) =∑i ;ti≤t

1∑j∈<i

exp[xTj β + zTj γ]

The absolute risk is estimated as

Fi (t) = 1− Si (t)

= 1− exp[−Λi (t)]

= 1− exp[−Λ0(t)]exp[xTi β+zTi γ]

Agus Salim () VicBiostat Seminar, 28 May 2015 9 / 40

Page 10: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Measuring Model Performance

Discrimination: how well the model separate those who will developthe event from the rest.

Calibration: how well the model estimates the true absolute risk.Calibration quality can be poor when models developed in onepopulation is applied to another population with different event rates.H-L GoF test is often used to examine the calibration quality.

Agus Salim () VicBiostat Seminar, 28 May 2015 10 / 40

Page 11: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Discrimination Quality: AUC and C-statistic

When there is no censoring (we know event times for everybody),Area Under the Curve (AUC) statistic measure the degree ofconcordant between the predicted risk and outcome (event time). Let(Ti ,Tj) be event times for individuals i and j and Fi (t), Fj(t) thepredicted risk,

AUC = P[Ti<Tj , Fi (t)<Fj(t) OR Ti>Tj , Fi (t)>Fj(t)]

If cij , i<j is an indicator whether individuals (i , j) are concordant andthere are n individuals in the cohort, then AUC is estimated as

ˆAUC =2

n(n − 1)

∑i

∑j>i

cij

Agus Salim () VicBiostat Seminar, 28 May 2015 11 / 40

Page 12: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Discrimination Quality: AUC and C-statistic

With censoring, we cannot compare some of the pairs. These pairsare ’unuseable’

If both i and j are censored, then we do not know the ordering of eventtimes.Also, if we observe event for i at Ti but j already censored at this point.

Harrell (1996), also Pencina and D’Agostino (2004) proposed to useonly the ’useable’ pairs and estimate what they call c-statistic,

C =1

Q

∑cij

where the summation is across all usable pairs and Q the number ofsuch pairs.

When comparing two models (’OLD’ and ’NEW’), the ’NEW’ isbetter if it has statistically higher c-statistic.

Agus Salim () VicBiostat Seminar, 28 May 2015 12 / 40

Page 13: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Discrimination Quality: AUC and C-statistic

To improve the c-statistic, the new model needs to be able turn thediscordant pairs into concordant ones.

It has been observed that even a fairly large effect size of the newbiomarkers will only result in a meagre increase in c-statistic (Pepe etal., 2004 ; Ware 2006).

In clinical application, predicted risk are often categorized andrecommendation will be based on categorized risk

E.g, ATP III guideline categorized 10-year risk into <10%, 10− 20%and >20%. Statin initiation is recommended only for those in thehighest category and optional for those in the second.

From this perspective, even a small change in the predicted risk willstill be deemed useful if it re-classifies the individual into a moreappropriate risk category.

Agus Salim () VicBiostat Seminar, 28 May 2015 13 / 40

Page 14: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Discrimination Quality: Net Reclassification Index (NRI)

For those who developed event, appropriate re-classification occur ifan individual is moved up in the risk category, by the new model

For those who did not develop event, appropriate re-classificationoccur if an individual is moved down in the risk category, by the newmodel

Let

peup = prop of those with events whose risk classification is moved up

pedown = prop of those with events whose risk classification is moved down

pneup = prop of those without events whose risk classification is moved up

pnedown = prop of those without events whose risk classification is moved down

Agus Salim () VicBiostat Seminar, 28 May 2015 14 / 40

Page 15: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Discrimination Quality: Net Reclassification Index (NRI)

The estimated net reclassification for those with events is

ˆNRIe

= peup − pedown

The estimated net reclassification for those without events is

ˆNRIne

= pnedown − pneup

Pencina et al (2008) gives expression for the asymptotic variance of theseestimates and hypothesis testing can thus be carried out.

Agus Salim () VicBiostat Seminar, 28 May 2015 15 / 40

Page 16: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Alternatives to Cohort Study

Cohort study is often expensive to perform as measurements need tobe collected on all cohort members.

With limited research budget, we need a viable alternative.

Two study designs that utilize only a sub-cohort: case-cohort (CCH)and nested case-control (NCC) studies.

Difference: CCH selects the subset at baseline, NCC selects the”controls” post-baseline as event occurs (Langholz and Thomas, 1990for details).

CCH has been shown to unbiasedly estimate absolute risk [Ganna etal (2012) and Cook et al (2012)] but NCC has not...

Agus Salim () VicBiostat Seminar, 28 May 2015 16 / 40

Page 17: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Nested Case-Control Study

In fact, Ganna et al showed that matched NCC biasedly estimateabsolute risk (Figure 1D from Ganna et al).

Agus Salim () VicBiostat Seminar, 28 May 2015 17 / 40

Page 18: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Nested Case-Control Study

To understand why they observe that bias, we look at the NCCsampling and how the parameters are estimated.

The sampling of controls in NCC is based on incidence densitysampling: for each incident case, we select a subset of the riskset.

Note that probability of selection for each control depends on variousfactors (length of follow-up, matching factors etc).

Agus Salim () VicBiostat Seminar, 28 May 2015 18 / 40

Page 19: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Nested Case-Control Study

Parameters are estimated by maximizing partial likelihood just like inthe cohort study case, but the denominator is different,

L(β, γ) =K∏i=1

exp[xTi β + zTi γ]∑j ;j∈Ri

exp[xTj β + zTj γ]

where Ri is the set that consist of the case (individual i) and theselected controls.

Problem 1: Epidemiologist like to match case and controls onconfounders (for efficiency, logistic reasons). Suppose that the caseand controls are matched so that the matched controls have the samez value as the case, zj = zi ,∀j ∈ Ri , then we lose the ability toestimate log HR associated with z ,

Lmatched(β) =K∏i=1

exp[xTi β]∑j ;j∈Ri

exp[xTj β]

Agus Salim () VicBiostat Seminar, 28 May 2015 19 / 40

Page 20: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Nested Case-Control Study

Problem 2: even in the unmatched case, the Breslow estimator forcumulative baseline hazard will be biased, because we have higherproportion of cases in the NCC data relative to the proportion in thecohort.

Langholz and Borgan (1997) attempted to remedy this by using aweighted version of Breslow estimator, where the controls selected attime ti are weighted by the inverse of their probability of beingselected

ΛLB0 (t) =

∑i ;ti≤t

1∑j∈Ri

n(ti )m+1exp[xTj β + zTj γ]

This is what was used by Ganna et al in the case of unmatched NCCstudy.

Agus Salim () VicBiostat Seminar, 28 May 2015 20 / 40

Page 21: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Nested Case-Control Study

But we have not solved the first problem.

Samuelsen (1997) proposed an alternative approach to estimatehazard ratios from NCC data. His approach ’breaks’ the case-controlpairing and all unique individuals are placed in one pool...

In the partial likelihood, each individual is weighted by the inverse oftheir probability of being included into the study (IPW). For cases, wecan assume this probability is 1. The parameters are then estimatedusing a ’weighted’ partial likelihood

Lw (β, γ) =K∏i=1

wi × exp[xTi β + zTi γ]∑j ;j∈Ri

wj × exp[xTj β + zTj γ]

Ri is the set of all individuals in the pool who has not experiencedevent moment before time ti and wj is the weight for individual j .

Agus Salim () VicBiostat Seminar, 28 May 2015 21 / 40

Page 22: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Nested Case-Control Study

For controls, the probability of inclusion can computed using K-Mtype formula

pj = 1−∏

j ;ti<tj

[1− m

n(ti )− 1

]Note: the product is taken over all event times when individual j hasnot experienced event. In the case of matched NCC, the product isonly taken over all event times when individual j has not experiencedevent and the case has the same matching characteristics.

The estimates from this approach have been shown to be unbiased,provided that the probability of inclusion depends only on variablesfully-observed in the cohort.

Agus Salim () VicBiostat Seminar, 28 May 2015 22 / 40

Page 23: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Nested Case-Control Study

The cumulative baseline hazard is estimated as

Λw0 (t) =

∑i ;ti≤t

1∑j∈Ri

wj × exp[xTj β + zTj γ]

Agus Salim () VicBiostat Seminar, 28 May 2015 23 / 40

Page 24: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Weighted Net Reclassification Index (wNRI)

We can also use the IPW weights to calculate the weighted NRI. Forthose with events the estimate is

ˆwNRIe

= pw ,eup − pw ,edown

For those without events the estimate is

ˆwNRIne

= pw ,nedown − pw ,neup

where pw are the weighted proportions, with individual weight given by theIPW weight. We also derive the variance expression of the wNRI neededfor performing hypothesis testing.

Agus Salim () VicBiostat Seminar, 28 May 2015 24 / 40

Page 25: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Simulation Studies

We generated 500 independent simulated cohorts (of 50,000 subjectseach) that closely mimic the characteristics of subjects in theSingapore Chinese Health Study (SCHS) cohort. All simulatedvariables have the same mean and covariance structure as thecorresponding variables in the SCHS cohort.

The event times are generated assuming proportional hazard modelwith constant baseline hazard.

Random censoring time is generated as exponential r.v with rate =0.05

Results in ∼96% of cohort members being censored and an averagefollow-up time ∼ 20 years.

For simplicity, we assume everybody enters the cohort at beginning ofthe study (no staggered entry).

Agus Salim () VicBiostat Seminar, 28 May 2015 25 / 40

Page 26: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Simulation Studies

Within each simulated cohort, two types of NCC study is conducted:(1) without matching, (2) with gender-matching

Log hazard ratios, cumulative baseline hazard and absolute risk areestimated using: (1) full cohort data, (2) NCC data via Samuelsen(weighted) method, (3) NCC data via L-B method.

Average estimates and empirical standard errors are calculated over500 realisations

Agus Salim () VicBiostat Seminar, 28 May 2015 26 / 40

Page 27: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Simulation Results: Log hazard ratios

Agus Salim () VicBiostat Seminar, 28 May 2015 27 / 40

Page 28: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Simulation Results: Log hazard ratios

Without matching, both L-B and weighted approaches unbiasedlyestimate log hazard ratios

With gender-matching, L-B approach unable to estimate log HR forgender

There is noticeable benefit of matching in reducing the SE ofestimates (better efficiency)

Agus Salim () VicBiostat Seminar, 28 May 2015 28 / 40

Page 29: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Simulation Results: Absolute Risk Estimates

Agus Salim () VicBiostat Seminar, 28 May 2015 29 / 40

Page 30: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Simulation Results: Absolute Risk Estimates

Agus Salim () VicBiostat Seminar, 28 May 2015 30 / 40

Page 31: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Application: Does CRP, creatinine and HbA1c improvesCHD risk prediction?

In western populations high-sensitivity C-reactive protein (hsCRP),and to a lesser degree serum creatinine and haemoglobin A1c(HbA1c) predict risk of coronary heart disease (CHD).

Data on Asian populations where CHD are expected to increase aresparse and it is not clear if these biomarkers will actually improve theCHD risk classification.

Study design: NCC with up to 3 controls per case, conducted withinSCHS cohort. Controls are matched to case on age (+/- 1 years),gender, dialect group, blood storage time (+/- 6 months).

Outcome: CHD (ICD 9:410-414). To be eligible for selections intoNCC study, participants need to not have coronary heart disease orstroke, at the time of their blood collection.

Study period: 1993-2010, average follow-up time: 12.4 years.

Agus Salim () VicBiostat Seminar, 28 May 2015 31 / 40

Page 32: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Application: Does CRP, creatinine and HbA1c improvesCHD risk prediction?

We will compare the model with only ATP III risk factors: age, totalcholesterol, HDL-C, SBP, antihypertensive treatment and currentsmoking status with a new model with CRP, creatinine and HbA1cadded.

Parameters estimation used the weighted approach and separateanalyses are conducted for male and female.

I will discuss only results for male (298 cases, 667 controls).

Agus Salim () VicBiostat Seminar, 28 May 2015 32 / 40

Page 33: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Application: Basic Characteristics

Agus Salim () VicBiostat Seminar, 28 May 2015 33 / 40

Page 34: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Application: Model Comparisons

The weighted NRI were performed using ATP III risk cut-off (<10%,10− 20% and >20%).

Agus Salim () VicBiostat Seminar, 28 May 2015 34 / 40

Page 35: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Application: SCHS Male

Agus Salim () VicBiostat Seminar, 28 May 2015 35 / 40

Page 36: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Application: Absolute risk estimates

Addition of CRP and creatinine increases predicted risk for high-riskindividuals with elevated CRP and creatinine, while reducing the predictedrisk for those with low CRP and creatinine

Agus Salim () VicBiostat Seminar, 28 May 2015 36 / 40

Page 37: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Conclusions

Developing risk prediction model using NCC data is feasible and offerssignificant savings.

Including CRP and serum creatinine into ATP III model improvesCHD risk classification in Chinese and could potentially lead to bettertreatment recommendation for subjects with elevated CRP andcreatinine not previously ’picked’ by ATP III model.

Agus Salim () VicBiostat Seminar, 28 May 2015 37 / 40

Page 38: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Collaborators:

National University of Singapore: Rob van Dam, Tai Eshyong

Singapore Chinese Health Study: Koh Woon Puay (Duke-NUS),Yuan Jian-Min (U of Pittsburgh)

The Australian National University: Alan Welsh

Funding:

Singapore Chinese Health Study: National Institute of Health,USA [Grant Number NCI RO1 CA055069, R35 CA053890, R01CA080205, R01 CA098497 and R01 CA144034]

This work: Singapore National Medical Research Council [GrantNumber 1261 (to Rob van Dam), 1270 (to Agus Salim)].

Agus Salim () VicBiostat Seminar, 28 May 2015 38 / 40

Page 39: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

References

Cai T, Zheng Y.(2012). Evaluating Prognostic Accuracy of biomarkersunder Nested Case-control studies. Biostatistics, 13:89-100.

Cook NR et al. (2012). Comparison of the Framingham and Reynolds Riskscores for global cardiovascular risk prediction in the multiethnic Women’sHealth Initiative. Circulation, 125:1748-1756.

Ganna A et al. (2012). Risk prediction measures for case-cohort and nestedcase-control designs: an application to cardiovascular disease. AmericanJournal of Epidemiology, 175:715-24.

Harrell FE et al. (1996). Tutorial in biostatistics: multivariable prognosticmodels: issues in developing models, evaluating assumptions and adequacy,and measuring and reducing errors. Statistics in Medicine, 15:361 387.

Langholz B, Borgan O. (1995). Estimation of absolute risk from nestedcase-control data. Biometrics, 53: 767-774.

Langholz B, Thomas DC. (1990). Nested case-control and case-cohortmethods of sampling from a cohort: a critical comparison. American Journalof Epidemiology, 131:169-76.

Agus Salim () VicBiostat Seminar, 28 May 2015 39 / 40

Page 40: Developing Risk Prediction Models Using Nested Case ... prediction...Harrell (1996), also Pencina and D’Agostino (2004) proposed to use only the ’useable’ pairs and estimate

Pencina MJ, D’Agostino RB. (2004). Overall C as a measure ofdiscrimination in survival analysis: model specific population value andconfidence interval estimation. Statistics in Medicine, 23:21092123.

Pencina MJ et al. (2008). Evaluating the added predictive ability of a newmarker: from area under the ROC curve to reclassification and beyond.Statistics in Medicine, 27: 157-172.

Pepe MS et al. Limitations of the odds ratio in gauging the performance ofa diagnostic, prognostic, or screening marker. American Journal ofEpidemiology, 159(9):882-90.

Samuelsen, SO. (1997). A pseudolikelihood approach to analysis of nestedcase-control studies. Biometrika, 84: 379-394.

Salim et al. Absolute risk estimation using nested case-control data.submitted.

Salim et al. C - reactive protein and serum creatinine, but not HbA1c,improve prediction of coronary heart disease risk in non-diabetic Chinese:the Singapore Chinese Health Study. submitted.

Zhou QM, Zheng Y, Cai T. Assessment of biomarkers for risk predictionwith nested case-control studies. (2013). Clinical Trials, 10:677-679.

Agus Salim () VicBiostat Seminar, 28 May 2015 40 / 40