the %lrpowercorr10 sas macro power estimation for logistic regression models with several predictors...
Post on 14-Dec-2015
230 Views
Preview:
TRANSCRIPT
The %LRpowerCorr10 SAS Macro
Power Estimation for Logistic RegressionModels with Several Predictors of Interest
in the Presence of Covariates
D. Keith Williams M.P.H. Ph.D.Zoran Bursac M.P.H. Ph.D.Department of Biostatistics
University of Arkansas for Medical Sciences
The Premise for Linear and Logistic Regression Power and Sample
Size
• Power to detect significance among specific predictors in the presence of other covariates in a model.
• For linear regression Proc Power works great!
• Logistic regression power estimation is ‘quirky’
Common Approaches to Estimate Logistic Regression Power
• Power for one predictor possibly in the presence of other covariates.
• There may exist correlation among these predictors using %powerlog macro
• A weakness…commonly we are interested in power to detect the significance of more than one predictor
LRpowerCorr101. Up to 10 predictors
2. 2 binary, 4 uniform (-3,3), and 4 normal
3. Specify a correlation among predictors
4. Specify an odds ratio value for the predictors
5. Specify the set of factors of interest and the set of
covariates
A Power Scenario
logit = -2.2 + ln (1.5) x1 + ln(1.5) x2 + ln(1.1) x3 +
ln(1.05) x4 + ln(1.02) x5 + ln(1.05) x6 +
ln(1.01) x7 + ln(1.05)x8 +ln( 1.02) x9 + ln(1.03) x10
Risk factors of interest
Covariates of interest
%LRpowerCorr Example %LRpowerCorr10(2000,1000,.2,.1,
1.5,1.5,
1.1, 1.05,1.02,1.05,
1.01,1.05,1.02,1.03,
cx1 cx2 cx3 cx4 cx5 cx6 cx7 cx8 cx9 cx10, cx4 cx5 cx6 cx7 cx8 cx9 cx10,
.05, 3, 0.1,0.5);
The 3 riskfactors ofinterest
Full model
Reduced model
Level of signficance
The number of terms ofinterest
Prob of ‘1’ for the binary cx1 and cx2
nnumber of simulations
Correlationamongpredictors
mean numberof ‘1’s
%LRpowerCorr10 Output
Sample size = 2000; Simulations = 1000; Rho = .2; P(Y=1) = .1
OR1=1.5, OR2=1.5, OR3=1.1, OR4=1.05, OR5=1.02,OR6=1.05
OR7=1.01, OR8=1.05, OR9=1.02, OR10=1.03
Full Model: cx1 cx2 cx3 cx4 cx5 cx6 cx7 cx8 cx9 cx10 Reduced Model: cx4 cx5 cx6 cx7 cx8 cx9 cx10
Power LCL UCL
88% 86% 90%
A Key Point about Linear Regression
• We rarely have a conjectured values for particular betas in a regular linear regression
• Therefore for linear regression models, one conjectures the difference in R-square between a model that includes predictors of interest and a model without these predictors.
The Hypothetical ScenarioA model with 4 terms
Predictors for PSA of interest that we choose to power:
1.SVI2.c_volume
Two Covariates to be included : cpen, gleason
Details
gleasoncopenvolCSVIy43210
_
gleasoncopeny430
The full model We want to power the test that a model with these
2 predictors is statistically better than a model excluding them.
The reduced model
The Corresponding Hypothesis
H(o):
H(a): At least one of the above is
non-zero in the full model when the difference in Rsquare = ?
021BB
Hypothetical Full Model
Root MSE 30.98987 R-Square 0.4467
Dependent Mean 23.73013 Adj R-Sq 0.4226
Coeff Var 130.59291Predictors of interest
Note
Parameter Estimates
Variable DFParameter
EstimateStandard
Error t Value Pr > |t|
Intercept 1 -40.76878 33.24420 -1.23 0.2232
c_volume 1 2.02821 0.58404 3.47 0.0008
svi 1 17.85690 10.75049 1.66 0.1001
cpen 1 1.10381 1.32538 0.83 0.4071
gleason 1 6.39294 5.02522 1.27 0.2065
Hypothetical Reduced Model
Root MSE 33.42074 R-Square 0.3424
Dependent Mean 23.73013 Adj R-Sq 0.3285
Coeff Var 140.83671
NoteR-Square difference
0.45 – 0.34=
0.11
Parameter Estimates
Variable DFParameter
EstimateStandard
Error t Value Pr > |t|
Intercept 1 -71.59827 34.91893 -2.05 0.0431
cpen 1 4.82868 1.01632 4.75 <.0001
gleason 1 12.28661 5.19873 2.36 0.0202
proc power ;multreg model=fixedalpha= .05nfullpredictors= 4ntestpredictors= 2rsqfull=0.45rsreduced=0.34ntotal= 97 80 70 60 50 40power=. ;plot x=n min=40 max=100key = oncurvesyopts=(ref=0.8 .977 crossref=yes);run;
The POWER Procedure Type III F Test in Multiple Regression
Fixed Scenario Elements
Method Exact Model Random X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.45 Difference in R-square 0.11
Computed Power
N Index Total Power
1 97 0.979 2 80 0.949 3 70 0.916 4 60 0.864 5 50 0.787 6 40 0.677
51. 45 95. 14
0. 8
0. 98
40 50 60 70 80 90 100
Tot al Sampl e Si ze
0. 65
0. 70
0. 75
0. 80
0. 85
0. 90
0. 95
1. 00
Model Fit Statistics Intercept
Intercept and Criterion Only Covariates AIC 124.318 113.996 SC 126.903 139.846
-2 Log L 122.318 93.996
The SAS System Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -5.5161 2.2471 6.0260 0.0141 age 1 0.0646 0.0583 1.2294 0.2675 sesdum2 1 -1.7862 3.0841 0.3354 0.5625 sesdum3 1 0.2955 2.2550 0.0172 0.8957 sector 1 2.9796 1.2481 5.6988 0.0170 age_ses2 1 0.1054 0.0559 3.5514 0.0595 age_ses3 1 0.0140 0.0316 0.1952 0.6586 age_sect 1 -0.0342 0.0309 1.2231 0.2688 ses2_sect 1 -0.3094 1.4409 0.0461 0.8300 ses3_sect 1 -0.7396 1.2489 0.3507 0.5537
Model Fit Statistics Intercept Intercept and
Criterion Only Covariates AIC 124.318 111.054 SC 126.903 123.979
-2 Log L 122.318 101.054
Analysis of Maximum Likelihood Estimates
Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -3.8874 0.9955 15.2496 <.0001 age 1 0.0297 0.0135 4.8535 0.0276 sesdum2 1 0.4088 0.5990 0.4657 0.4950 sesdum3 1 -0.3051 0.6041 0.2551 0.6135 sector 1 1.5746 0.5016 9.8543 0.0017
The Corresponding Hypothesis
H(o):
H(a): At least one of the above is non-zero in the full model
LRchisq = 101.054 – 93.996 = 7.0582
Pvalue = 0.22 (Implies none are helpful)
04321 BBBB
Power for Logistic Models Background
• Most existing tools are based on Hsieh, Block, and Larsen (1998) paper, and Agresti (1996) text.
• %powerlog macro and other software.
• Recent publication by Demidenko (2008)
SAS 9.2 Proc Power for Logistic
The LOGISTIC statement performs power and sample size analyses for the likelihood ratio chi-square test of a single predictor in binary logistic regression, possibly in the presence of one or more covariates. All predictor variables are assumed to be independent of each other. So, this analysis is not applicable to studies with correlated predictors — for example, most observational studies (as opposed to randomized studies).
Common Approaches to Estimate Logistic Regression Power
• Calculate the power to detect significance of one predictor possibly in the presence of other predictors.
• There may exist correlation among these predictors using %powerlog macro
• A weakness…In many instances we are interested in power to detect the significance of more than one predictor
The %PowerLog MacroLogistic Regression
• Power for a one s.d. unit increase from the mean of X1
• Any number of other covariates in the model are accounted for by putting the R-Square of a regular regression model:
kkXBXBBX
12101...
kkXBXBXBBLogit ...22110
%Powerlog Function Example
%powerlog(p1=.5, p2=.6667, power=.8,rsq=%str(0,.0565,
.1141),alpha=.05);Prob of 1 at mean ofX1 Prob of 1
at mean + SD ofX1 Three hypothetical
values of the rsquare ofX1 regressed on any numberof other covariates
%LRpowerCorr10 versus %powerlog n=70
Sample size = 70; Simulations = 1000; Rho = 0; P(Y=1) = .5
OR1=1, OR2=1, OR3=1, OR4=1, OR5=1, OR6=1
OR7=2, OR8=1, OR9=1, OR10=1
Full Model: cx7 cx8 cx9 cx10
Reduced Model: cx8 cx9 cx10
Power LCL UCL
79% 76% 81%
%LRpowerCorr10 versus %powerlog n=75
Sample size = 75; Simulations = 1000; Rho = .1; P(Y=1) = .5
OR1=1, OR2=1, OR3=1, OR4=1, OR5=1, OR6=1
OR7=2, OR8=1, OR9=1, OR10=1
Full Model: cx7 cx8 cx9 cx10
Reduced Model: cx8 cx9 cx10
Power LCL UCL
81% 78% 83%
%LRpowerCorr10 versus %powerlog n=80
Sample size = 80; Simulations = 1000; Rho = .2; P(Y=1) = .5
OR1=1, OR2=1, OR3=1, OR4=1, OR5=1, OR6=1
OR7=2, OR8=1, OR9=1, OR10=1
Full Model: cx7 cx8 cx9 cx10
Reduced Model: cx8 cx9 cx10
Power LCL UCL
80% 77% 82%
The %LRpowerCorr10 Macro
• Power Estimation– One or more predictors of interest– Different distributions of predictors– Other covariates in model– Correlation among predictors– Specify OR values associated with predictors– Average proportion of ‘1’s
%LRpowerCorr(N, Simulations, Correlation)
Define logit: Specify associations betweeneach covariate x and outcome y through
parameter estimate .
PROC LOGISTIC: fit the full multivariatemodel. Save -2LnLikelihood.
PROC LOGISTIC: fit the reduced multivariatemodel.
Save -2LnLikelihood.
Perform Likelihood Ratio test.(The difference in the reduced and full -2LnLikelihoods)
Is the resulting chi-square test statistic> chi-square critical value?(With respect to correct number of d.f.)
Loop
If so reject the null.
If not fail to reject the null.
Save the result.
Calculate the proportion of correct rejections(i.e. power to detect the specified associations)
Sample of size N from thespecified logit. Convert logits to binary.
SAMPLESIZE The sample size to be evaluated
NSIMS The number of simulation runs
P The correlation among the predictors
AVEP The average number of “1” responses in the samples with only intercept in model
OR1 - OR2 The odds ratios associated with binary CX1-CX2
OR3 – OR6 The odds ratio associated with uniform (-3,3) CX3-CX6
OR7 - CX10 The odds ratio associated with N(0,1) CX7-CX10
FULLMODEL The predictor terms in the full model among CX1 CX2 CX3 CX4 CX5 CX6 CX7 CX8 CX9 CX10
REDUCEDMODEL The predictor terms in the reduced model among CX1 CX2 CX3 CX4 CX5 CX6 CX7 CX8 CX9 CX10
ALPHA The significance level of the testing
DFTEST The degrees freedom of the testing
PCX1 Probability of ‘1’ for binary CX1
PCX2 Probability of ‘1’ for binary CX2
%LRpowerCorr10 Variables
Example from HosmerApplied Logistic Regression‘The low birth weight study’
uihtpltsmoke
ftvracelwtagelow
8765
43210
Primary Risk Factors of Interest
Confounders
We wish to find the power to detect significance for at least one of the risk
factors in the full model
uihtpltsmoke
ftvracelwtagelow
8765
43210
uihtpltsmokelow87650
Full Model
Reduced Model
The Corresponding Hypothesis
H(o):
H(a): At least one of the above is
non-zero in the full model
04321 BBBB
Hypothesized Odds Ratios
• AGE OR=1.1 (CX7) Normal• LBT OR=1.5 (CX1) Binary• RACE OR=1.5 (CX2) Binary• FTV OR=1.1 (CX3) Uniform
• SMOKE OR=1.02 (CX8) Normal• PLT OR=1.02 (CX9) Normal• HT OR=1.02 (CX10) Normal• UI OR=1.02 (CX4) Uniform
• P(Y=1)=0.1 Investigate N = 900• Rho=0.2
Macro Commands
%LRpowerCorr10 (900,1000,.2,.1,1.5,1.5,
1.1,1.02,1.02,1.02,
1.1,1.02,1.02,1.02,
cx1 cx2 cx3 cx7 cx4 cx8 cx9 cx10 ,
cx4 cx8 cx9 cx10,
.05,
4,
0.25,0.5);
Output
Sample size = 900; Simulations = 1000; Rho = .2; P(Y=1) = .1 OR1=1.5, OR2=1.5, OR3=1.1, OR4=1.02, OR5=1.02,OR6=1.02
OR7=1.1, OR8=1.02, OR9=1.02, OR10=1.02
Full Model: cx1 cx2 cx3 cx7 cx4 cx8 cx9 cx10
Reduced Model: cx4 cx8 cx9 cx10
Power LCL UCL
73% 70% 75%
Recent Development %Quickpower Macro
%quickpower2(100,.2,.1, 1.5,1.5,
1.1,1.02,1.02,1.02,
1.1,1.02,1.02,1.02,
cx1 cx2 cx3 cx7 cx4 cx8 cx9 cx10 ,
cx4 cx8 cx9 cx10,
8,
.05,
4,
0.25,0.5);
A trick to get a good guess for N
The POWER Procedure Type III F Test in Multiple Regression
Fixed Scenario Elements
Method Exact Model Random X Number of Predictors in Full Model 8 Number of Test Predictors 4 Alpha 0.05 R-square of Full Model 0.01971 R-square of Reduced Model 0.007397 Nominal Power 0.8
Computed N Total
Actual N Power Total
0.800 962
Resulting in…
Sample size = 962; Simulations = 1000; Rho = .2; P(Y=1) = .1 OR1=1.5, OR2=1.5, OR3=1.1, OR4=1.02, OR5=1.02,OR6=1.02 OR7=1.1, OR8=1.02, OR9=1.02, OR10=1.02
Full Model: cx1 cx2 cx3 cx7 cx4 cx8 cx9 cx10 Reduced Model: cx4 cx8 cx9 cx10
Power LCL UCL
76% 74% 79%
LRpowerCorr C MacroApproximate Power Curve
• %LRpowerCorr10C (50,150,500,.1,.5,• 1,1,• 1,1,1,1,• 2.0,1,1,1,• cx7 cx8 cx9 cx10,• cx8 cx9 cx10,• .05,• 1,• .25,.25);• ods graphics on;• proc logistic data=base desc plots(only)=(roc(id=obs) effect);• model reject=n1/;
• run;• ods graphics off;
The SAS Macros
• www.uams.edu/biostat/williams
• Text file versions of the %LRpowerCorr
and %quickpower SAS macros with an example
• Copy and paste into SAS to run.
top related