survival data analysis model development · pdf filesurvival data analysis model development...
TRANSCRIPT
Survival Data Analysis
Model Development
Sandra Gardner, PhD
Dalla Lana School of Public Health
University of Toronto March 4, 2015
CHL5209H
1
Agenda
Model development ▫ Purposeful selection of covariates method
Reference: Hosmer, Lemeshow & May 2008
▫ Model development tips
▫ SAS coding examples
March 4, 2015 CHL5209H
2
Model development
• Select model looking at overall K-M survival plot and other diagnostic plots
• Which model?
▫ non-parametric
▫ Cox model
▫ parametric models (e.g. Weibull )
March 4, 2015 CHL5209H
3
Model development
• Choose covariates for the model
▫ Time varying covariates?
▫ Variable selection methods
Forward, backward, stepwise, best score
Not currently available for Proc Lifereg
▫ Purposeful selection of covariates (Reference: Hosmer, Lemeshow and May, 2008)
March 4, 2015 CHL5209H
4
Purposeful selection of covariates (1)
• Step 1 ▫ Model each covariate separately (univariate
analysis) ▫ Fit multivariate model including all variables
where p<.25
• Step 2 ▫ Identify covariates to be removed from
multivariate models
• Step 3 ▫ Check for confounding (important changes in beta
values).
March 4, 2015 CHL5209H
5
Purposeful selection of covariates (2)
• Step 4
▫ Add variables previously excluded step 1 to check for confounding
• Step 5
▫ Examine scale of continuous covariates
Linearity, transformation of covariates
• Step 6
▫ Check for interactions
March 4, 2015 CHL5209H
6
Purposeful selection of covariates (3)
• Step 7
▫ Model evaluation
▫ Goodness of fit
March 4, 2015 CHL5209H
7
Step 1: explore data
Variables in Creation Order
# Variable Type Len Format Informat
1 treat Num 8 BEST12. BEST32.
2 resect75 Num 8 BEST12. BEST32.
3 age Num 8 BEST12. BEST32.
4 interval Num 8 BEST12. BEST32.
5 karn Num 8 BEST12. BEST32.
6 race Num 8 BEST12. BEST32.
7 local Num 8 BEST12. BEST32.
8 male Num 8 BEST12. BEST32.
9 nitro Num 8 BEST12. BEST32.
10 weeks Num 8 BEST12. BEST32.
11 event Num 8 BEST12. BEST32.
12 path Num 8 BEST12. BEST32.
13 grade Num 8 BEST12. BEST32.
14 lweeks Num 8
15 age50 Num 8
March 4, 2015 CHL5209H
8
January 21, 2015
CHL5209H
9
Estimated median=27.4 and mean=44.5
Overall Survival
Step 1: explore data (examples)
male Frequency Percent
Cumulative
Frequency
Cumulative
Percent
0 79 35.59 79 35.59
1 143 64.41 222 100.00
March 4, 2015 CHL5209H
10
path Frequency Percent
Cumulative
Frequency
Cumulative
Percent
1 149 67.12 149 67.12
2 30 13.51 179 80.63
3 35 15.77 214 96.40
4 8 3.60 222 100.00
Step 1: data manipulation
• Do you understand how the data was collected?
▫ What is the quality of the data?
• Calculating outcome and censoring variables
• Data linkage
• Double check the results of any data manipulation
March 4, 2015 CHL5209H
11
Step 1: missing data
• Check patterns of missing data • Strategies to consider ▫ Delete observations ▫ Add missing category ▫ Missing data imputation
• Strategy will depend on amount of missing data and why the data is missing
• Are data missing at random? • Missing a covariate? ▫ Add random effects to model?
March 4, 2015 CHL5209H
12
Step 1: other model development tips
• Clean and label data before analysis • Explore the distribution of covariates ▫ Assess need to recode or rescale covariates at this step
or at univariate modeling (step 1) ▫ Check for highly correlated (collinear) relationships
amongst covariates
• Consult ▫ Subject matter specialists ▫ Statistical and medical literature
• Present ▫ Use graphics to supplement tabular results
March 4, 2015 CHL5209H
13
Step 1/2 – remove variables?
March 4, 2015
CHL5209H
14
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95%
Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 2.7478 0.1227 2.5073 2.9882 501.57 <.0001
path 1 0.3974 0.0692 0.2617 0.5331 32.94 <.0001
Scale 1 0.8934 0.0447 0.8100 0.9855
Parameter DF Estimate Std Err
95% Lower
Confidence
Limit
95% Upper
Confidence
Limit ChiSquare Pr>Chi treat 1 0.1775 0.1282 -0.0737 0.4288 1.92 0.1660
age50 1 -0.4161 0.1260 -0.6631 -0.1691 10.90 0.0010
age 1 -0.0176 0.0048 -0.0270 -0.0081 13.14 0.0003
male 1 0.1109 0.1342 -0.1522 0.3740 0.68 0.4088
race 1 -0.7266 0.2276 -1.1727 -0.2805 10.19 0.0014
karn 1 0.5376 0.1237 0.2951 0.7801 18.88 <.0001
local 1 0.1919 0.1516 -0.1052 0.4890 1.60 0.2055
grade 1 0.3380 0.2423 -0.1370 0.8130 1.95 0.1631
path 1 0.3974 0.0692 0.2617 0.5331 32.94 <.0001
resect75 1 0.4085 0.1437 0.1269 0.6902 8.08 0.0045
nitro 1 -0.4702 0.1250 -0.7151 -0.2252 14.15 0.0002
interval 1 0.1768 0.0347 0.1089 0.2447 26.02 <.0001
Example univariate model
Table of univariate results
Step 1: multivariate model (1)
March 4, 2015
CHL5209H
15
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 3.0454 0.3345 2.3897 3.7010 82.87 <.0001
treat 1 0.2208 0.1045 0.0159 0.4257 4.46 0.0347
age 1 -0.0097 0.0044 -0.0183 -0.0011 4.91 0.0267
race 1 -0.3972 0.1930 -0.7753 -0.0190 4.24 0.0396
karn 1 0.3324 0.1102 0.1165 0.5483 9.11 0.0025
local 1 0.2652 0.1247 0.0207 0.5096 4.52 0.0335
grade 1 0.4349 0.1993 0.0443 0.8256 4.76 0.0291
path 1 0.2582 0.0633 0.1341 0.3823 16.62 <.0001
resect75 1 0.2397 0.1215 0.0016 0.4778 3.89 0.0484
nitro 1 -0.3022 0.1085 -0.5148 -0.0896 7.76 0.0053
interval 1 0.1054 0.0325 0.0418 0.1690 10.53 0.0012
Scale 1 0.7667 0.0384 0.6950 0.8458
Step 3/4: multivariate model (2)
March 4, 2015
CHL5209H
16
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 3.0177 0.3368 2.3576 3.6778 80.28 <.0001
treat 1 0.2164 0.1047 0.0113 0.4216 4.28 0.0387
age 1 -0.0099 0.0044 -0.0185 -0.0013 5.12 0.0236
male 1 0.0773 0.1112 -0.1407 0.2952 0.48 0.4873
race 1 -0.3998 0.1930 -0.7781 -0.0216 4.29 0.0383
karn 1 0.3184 0.1119 0.0990 0.5378 8.09 0.0044
local 1 0.2605 0.1248 0.0158 0.5051 4.35 0.0369
grade 1 0.4259 0.1997 0.0346 0.8173 4.55 0.0329
path 1 0.2597 0.0633 0.1355 0.3838 16.80 <.0001
resect75 1 0.2480 0.1220 0.0088 0.4871 4.13 0.0421
nitro 1 -0.3067 0.1086 -0.5196 -0.0938 7.97 0.0048
interval 1 0.1050 0.0325 0.0414 0.1687 10.46 0.0012
Scale 1 0.7663 0.0384 0.6946 0.8453
Step 5: quadratic?
March 4, 2015
CHL5209H
17
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 3.1532 0.6884 1.8039 4.5025 20.98 <.0001
treat 1 0.2218 0.1052 0.0156 0.4279 4.45 0.0350
age 1 -0.0150 0.0276 -0.0691 0.0390 0.30 0.5850
age*age 1 0.0001 0.0003 -0.0005 0.0006 0.04 0.8421
race 1 -0.3983 0.1931 -0.7768 -0.0198 4.25 0.0392
karn 1 0.3319 0.1102 0.1160 0.5478 9.08 0.0026
local 1 0.2656 0.1247 0.0211 0.5100 4.53 0.0332
grade 1 0.4351 0.1994 0.0442 0.8259 4.76 0.0291
path 1 0.2581 0.0633 0.1340 0.3823 16.62 <.0001
resect75 1 0.2417 0.1220 0.0025 0.4809 3.92 0.0477
nitro 1 -0.3011 0.1096 -0.5159 -0.0864 7.55 0.0060
interval 1 0.1162 0.1013 -0.0824 0.3147 1.32 0.2514
interval*interval 1 -0.0015 0.0131 -0.0272 0.0243 0.01 0.9106
Scale 1 0.7666 0.0384 0.6949 0.8457
Step 5: continuous or categorical?
March 4, 2015
CHL5209H
18
Step 6: interactions
March 4, 2015
CHL5209H
19
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 2.4132 0.4041 1.6212 3.2051 35.67 <.0001
treat 1 0.2373 0.1030 0.0354 0.4391 5.31 0.0212
age 1 0.0028 0.0063 -0.0096 0.0151 0.19 0.6628
race 1 -0.4147 0.1897 -0.7865 -0.0428 4.78 0.0289
karn 1 1.4432 0.4093 0.6409 2.2455 12.43 0.0004
local 1 0.2703 0.1227 0.0298 0.5108 4.85 0.0276
grade 1 0.3340 0.8534 -1.3385 2.0066 0.15 0.6955
path 1 0.2563 0.0625 0.1338 0.3788 16.81 <.0001
resect75 1 0.2207 0.1196 -0.0137 0.4552 3.40 0.0650
nitro 1 -0.2995 0.1067 -0.5086 -0.0905 7.89 0.0050
interval 1 0.1166 0.0322 0.0534 0.1798 13.06 0.0003
age*karn 1 -0.0230 0.0082 -0.0390 -0.0070 7.92 0.0049
age*grade 1 0.0022 0.0157 -0.0287 0.0330 0.02 0.8896
Scale 1 0.7535 0.0377 0.6831 0.8312
Step 3/6: interaction example
March 4, 2015
CHL5209H
20
Table of age50 by karn
age50 karn
Frequency
Percent
Row Pct
Col Pct 0 1 Total
0 43
19.37
37.07
40.95
73
32.88
62.93
62.39
116
52.25
1 62
27.93
58.49
59.05
44
19.82
41.51
37.61
106
47.75
Total 105
47.30
117
52.70
222
100.00
P=0.0014
Step 6
March 4, 2015
CHL5209H
21
Unadjusted/adjusted:
March 4, 2015
CHL5209H
22
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 4.2052 0.2404 3.7340 4.6765 305.87 <.0001
age 1 -0.0176 0.0048 -0.0270 -0.0081 13.14 0.0003
Scale 1 0.9284 0.0465 0.8415 1.0243
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 3.7787 0.2593 3.2705 4.2870 212.34 <.0001
age 1 -0.0137 0.0048 -0.0231 -0.0043 8.18 0.0042
karn 1 0.4619 0.1244 0.2182 0.7056 13.80 0.0002
Scale 1 0.9000 0.0451 0.8158 0.9929
Stratifying:
March 4, 2015
CHL5209H
23
karn=1
karn=0
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 3.2973 0.3389 2.6330 3.9616 94.64 <.0001
age 1 -0.0043 0.0065 -0.0170 0.0084 0.44 0.5091
Scale 1 0.8359 0.0590 0.7278 0.9600
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 4.6371 0.3289 3.9924 5.2818 198.72 <.0001
age 1 -0.0224 0.0070 -0.0361 -0.0087 10.30 0.0013
Scale 1 0.9478 0.0673 0.8246 1.0893
Interaction:
March 4, 2015
CHL5209H
24
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 3.2994 0.3622 2.5894 4.0093 82.97 <.0001
age 1 -0.0043 0.0069 -0.0179 0.0093 0.39 0.5347
karn 1 1.3256 0.4765 0.3916 2.2596 7.74 0.0054
age*karn 1 -0.0179 0.0095 -0.0366 0.0008 3.52 0.0605
Scale 1 0.8933 0.0448 0.8097 0.9854
Estimate
Label Estimate
Standard
Error z Value Pr > |z| Exponentiated
age, karn=0 -0.00430 0.006921 -0.62 0.5347 0.9957
Estimate
Label Estimate
Standard
Error z Value Pr > |z| Exponentiated
age, karn=1 -0.02222 0.006576 -3.38 0.0007 0.9780
Step 6
March 4, 2015
CHL5209H
25
Step 3/6: confounding example
March 4, 2015
CHL5209H
26
p=0.0088
Table of resect75 by karn
resect75 karn
Frequency
Percent
Row Pct
Col Pct 0 1 Total
0 36
16.22
62.07
34.29
22
9.91
37.93
18.80
58
26.13
1 69
31.08
42.07
65.71
95
42.79
57.93
81.20
164
73.87
Total 105
47.30
117
52.70
222
100.00
Unadjusted/adjusted:
March 4, 2015
CHL5209H
27
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95%
Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 3.0637 0.1233 2.8221 3.3054 617.50 <.0001
resect75 1 0.4085 0.1437 0.1269 0.6902 8.08 0.0045
Scale 1 0.9390 0.0471 0.8511 1.0359
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95%
Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 2.8780 0.1281 2.6270 3.1290 504.96 <.0001
resect75 1 0.3101 0.1409 0.0338 0.5863 4.84 0.0278
karn 1 0.4896 0.1244 0.2458 0.7334 15.50 <.0001
Scale 1 0.9068 0.0455 0.8220 1.0004
Confounding:
March 4, 2015
CHL5209H
28
58.3173.31
3101.0
4896.0)3793.05793.0(100
3101.0
3101.04085.0100
)(100100%ˆ
1
221
1
111
aa
Stratifying:
March 4, 2015
CHL5209H
29
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95%
Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 2.9096 0.1381 2.6388 3.1803 443.60 <.0001
resect75 1 0.2598 0.1706 -0.0745 0.5941 2.32 0.1277
Scale 1 0.8289 0.0585 0.7217 0.9519
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95%
Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 3.3160 0.2088 2.9068 3.7252 252.28 <.0001
resect75 1 0.3801 0.2322 -0.0750 0.8352 2.68 0.1017
Scale 1 0.9792 0.0696 0.8519 1.1256
karn=1
karn=0
No interaction:
March 4, 2015
CHL5209H
30
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 2.9096 0.1511 2.6134 3.2057 370.77 <.0001
resect75 1 0.2620 0.1866 -0.1038 0.6277 1.97 0.1603
karn 1 0.4064 0.2453 -0.0744 0.8873 2.74 0.0976
resect75*karn 1 0.1119 0.2846 -0.4459 0.6697 0.15 0.6942
Scale 1 0.9066 0.0454 0.8218 1.0002
Estimate
Label Estimate
Standard
Error z Value Pr > |z| Exponentiated
resection, karn=0 0.2620 0.1866 1.40 0.1603 1.2995
Estimate
Label Estimate
Standard
Error z Value Pr > |z| Exponentiated
resection, karn=1 0.3739 0.2149 1.74 0.0820 1.4533
Step 7: final model?
March 4, 2015
CHL5209H
31
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 2.4051 0.4000 1.6212 3.1891 36.15 <.0001
treat 1 0.2377 0.1030 0.0359 0.4395 5.33 0.0210
age 1 0.0029 0.0062 -0.0092 0.0151 0.22 0.6385
race 1 -0.4151 0.1897 -0.7870 -0.0432 4.79 0.0287
karn 1 1.4431 0.4094 0.6407 2.2456 12.42 0.0004
local 1 0.2697 0.1226 0.0293 0.5101 4.84 0.0279
grade 1 0.4494 0.1961 0.0650 0.8337 5.25 0.0219
path 1 0.2570 0.0623 0.1349 0.3791 17.02 <.0001
resect75 1 0.2203 0.1196 -0.0142 0.4547 3.39 0.0656
nitro 1 -0.2997 0.1067 -0.5088 -0.0906 7.89 0.0050
interval 1 0.1167 0.0322 0.0535 0.1799 13.11 0.0003
age*karn 1 -0.0230 0.0082 -0.0390 -0.0070 7.92 0.0049
Scale 1 0.7537 0.0377 0.6832 0.8313
Step 7: Cox-Snell residuals
March 4, 2015
CHL5209H
32
Additional SAS code
March 4, 2015
CHL5209H
33
proc sort data=sda.brain out=sbrain;
by descending treat;
run;
proc lifereg data=sbrain order=data;
class treat;
model weeks*event(0)=treat age/d=lnormal;
effectplot/noobs;
title 'LifeReg: effect plot';
run;
proc lifereg data=sbrain order=data;
class age50 karn;
model weeks*event(0)=age50 karn
age50*karn/d=lnormal;
slice age50*karn/sliceby=karn diff cl exp;
effectplot interaction(x=age50 sliceby=karn) / noobs link;
title 'LifeReg: Age groups and Karnofsky score groups -
LogNormal model';
run;
proc lifereg data=sbrain order=data;
class treat path;
model weeks*event(0)=treat path/d=lnormal;
/* joint test 3 higher path categories different from 1 */
lsmeans path / diff exp;
lsmestimate path 'path 2,3,4 vs 1' .33 .33 .33 -1;
title 'LifeReg: Treatment & Pathology - LogNormal
model';
run;
Effect plots
March 4, 2015
CHL5209H
34
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 4.1190 0.2456 3.6376 4.6004 281.26 <.0001
treat 1 1 0.1883 0.1245 -0.0558 0.4323 2.29 0.1306
treat 0 0 0.0000 . . . . .
age 1 -0.0177 0.0048 -0.0271 -0.0083 13.52 0.0002
Scale 1 0.9230 0.0463 0.8366 1.0183
age xbeta1 xbeta0
20 3.9533 4.4189
40 3.5993 4.0649
60 3.2453 3.7109
80 2.8913 3.3569
Interaction plots
March 4, 2015
CHL5209H
35
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 3.1154 0.1371 2.8468 3.3840 516.70 <.0001
age50 1 1 -0.0579 0.1782 -0.4072 0.2915 0.11 0.7454
age50 0 0 0.0000 . . . . .
karn 1 1 0.7133 0.1734 0.3734 1.0531 16.92 <.0001
karn 0 0 0.0000 . . . . .
age50*karn 1 1 1 -0.4958 0.2477 -0.9813 -0.0103 4.01 0.0453
age50*karn 1 0 0 0.0000 . . . . .
age50*karn 0 1 0 0.0000 . . . . .
age50*karn 0 0 0 0.0000 . . . . .
Scale 1 0.8962 0.0449 0.8123 0.9886
Hypothesis tests
March 4, 2015
CHL5209H
36
Analysis of Maximum Likelihood Parameter Estimates
Parameter DF Estimate
Standard
Error
95% Confidence
Limits Chi-Square Pr > ChiSq
Intercept 1 3.0473 0.0952 2.8607 3.2340 1023.77 <.0001
treat 1 1 0.1951 0.1203 -0.0407 0.4309 2.63 0.1048
treat 0 0 0.0000 . . . . .
path 4 1 1.1837 0.3307 0.5354 1.8319 12.81 0.0003
path 3 1 0.8068 0.1702 0.4732 1.1405 22.47 <.0001
path 2 1 0.3911 0.1784 0.0415 0.7408 4.81 0.0283
path 1 0 0.0000 . . . . .
Scale 1 0.8875 0.0444 0.8046 0.9790
Least Squares Means Estimate
Effect Label Estimate
Standard
Error z Value Pr > |z|
path path 2,3,4 vs 1 0.7545 0.1486 5.08 <.0001
Reference
• Applied Survival Analysis, D.W. Hosmer, S. Lemeshow, S. May, Wiley 2008
March 4, 2015 CHL5209H
37