analysis of a binary outcome variable
DESCRIPTION
Government & Healthcare Apps, SESUG 2011TRANSCRIPT
Analysis of a Binary Outcome Variable Using the FREQ and
the LOGISTIC Procedures
Arthur Li
A common application in the health care industry:
INTRODUCTION
Outcome(Y)
Exposure(X)
(smoking) (cancer)
Exposure(X1)
(age)
Exposure(X2)
(gender)
PROC FREQPROC LOGISTIC
One starting point create a contingency table
CONTINGENCY TABLE
BREATHING TEST
ABNORMAL NORMAL
SMOKING STATUS
CURRENT 131 927
NEVER 38 741
Forthofer & Lehnen (1981) (Agresti, 1990)Subjects: Caucasians who work in certain industrial
plants in HoustonResponse (Y): breathing testexplanatory variable (X) is smoking status
Three types of study design in observational studyCross-sectional : X and Y are collected at the same
time. Prevalence Ratio = P1 / P0
Cohort: X is collected first: Relative Risk (RR) = P1 / P0
Case-control: Y is collected first. You can’t calculate RR
STUDY DESIGN
P1=
AA+B
P0=
CC+D
Outcome (Y)
1 0
Exposure (X)1 A B
0 C D
ODDS RATIO
Outcome (Y)
1 0
Exposure (X)1 A B
0 C D
AOdds1 = B
Odds0 =
CD
Odds Ratio =
Odds1
Odds0
ADBC=
ODDS RATIO
Outcome (Y)
1 0
Exposure (X)1 A B
0 C D
0 1 infinity
OR measures the strength between X and Y
OR = 1 No AssociationOR > 1 Exposed Group (X = 1) has higher odds OR < 1 Non-exposed Group (X = 0) has higher odds
ODDS RATIO
Outcome (Y)
1 0
Exposure (X)1 A B
0 C D
0 1 infinity
To test the association between X and YUse the chi-square statistics Use 95% CI for OR – including 1 or not
OR measures the strength between X and Y
PROC FREQ
BREATHING TEST (Y)
ABNORMAL (1) NORMAL (0)
SMOKING STATUS (X)
CURRENT (1) 131 (A) 927 (B)
NEVER (0) 38 (C) 741 (D)
data breathTest; input test $ 1-8 neversmk $ 10-16 count;datalines;abnormal current 131normal current 927abnormal never 38normal never 741;
PROC FREQ
proc freq data=breathTest; weight count; tables neversmk*test;run;
the data is entered directly from the cell count of the table
The FREQ ProcedureTable of neversmk by testneversmk test
Frequency‚Percent ‚Row Pct ‚Col Pct ‚abnormal‚normal ‚ Totalƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆcurrent ‚ 131 ‚ 927 ‚ 1058 ‚ 7.13 ‚ 50.46 ‚ 57.59 ‚ 12.38 ‚ 87.62 ‚ ‚ 77.51 ‚ 55.58 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆnever ‚ 38 ‚ 741 ‚ 779 ‚ 2.07 ‚ 40.34 ‚ 42.41 ‚ 4.88 ‚ 95.12 ‚ ‚ 22.49 ‚ 44.42 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆTotal 169 1668 1837 9.20 90.80 100.00
PROC FREQ - RELRISK
proc freq data=breathTest; weight count; tables neversmk*test/relrisk;run;
Compute RR for col1 RR for col2ORBREATHING TEST (Y)
ABNORMAL (1) NORMAL (0)
SMOKING STATUS (X)
CURRENT (1) 131 (A) 927 (B)
NEVER (0) 38 (C) 741 (D)
col1 col2
PROC FREQ - RELRISK
proc freq data=breathTest; weight count; tables neversmk*test/relrisk;run;
Estimates of the Relative Risk (Row1/Row2)
Type of Study Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control (Odds Ratio) 2.7557 1.8962 4.0047Cohort (Col1 Risk) 2.5383 1.7904 3.5987Cohort (Col2 Risk) 0.9211 0.8960 0.9470
Sample Size = 1837
Compute RR for col1 RR for col2OR
Odds of having an abnormal test result are about 2.8 times higher for current smokers compared to those who have never smoked (95% CI: 1.9 – 4.0).
PROC FREQ - CHISQ
proc freq data=breathTest; weight count; tables neversmk*test/relrisk chisq;run;
Statistics for Table of neversmk by test
Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 1 30.2421 <.0001Likelihood Ratio Chi-Square 1 32.3820 <.0001Continuity Adj. Chi-Square 1 29.3505 <.0001Mantel-Haenszel Chi-Square 1 30.2257 <.0001Phi Coefficient 0.1283Contingency Coefficient 0.1273Cramer's V 0.1283
LOGISTIC REGRESSION MODEL
Use logistic regression to study the association between the “Breathing Test” & “Smoking”
For logistic regression, the MLE (not OLS) is used to estimate the parameters
Why not use a linear probability model?
[0,1]p βX;αp
The probability is bounded The relationship between p and X can be
nonlinear
LOGISTIC REGRESSION MODEL
A logistic regression is used for predicting the probability occurrence of an event by fitting data to a logit function
βXα plogit
log(odds) plogit
β)exp(α1
β)exp(α p
LOGISTIC REGRESSION MODEL
BREATHING TEST
ABNORMAL NORMAL
SMOKING STATUS
CURRENT 131 927
NEVER 38 741
mal);prob(abnor p βX;α plogit
Reference cell coding
β: the increment in log odds for current smokers compared to those that never smoked
βexpOR odds
oddslogβ 0 vs 1
0
1
αlogit(p never 0
βαlogit(p current 1X
0
1
)
)
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk /param=ref; weight count; model test = neversmk;run;
The LOGISTIC Procedure
Model Information
Data Set WORK.BREATHTESTResponse Variable testNumber of Response Levels 2Weight Variable countModel binary logitOptimization Technique Fisher's scoring
Number of Observations Read 4Number of Observations Used 4Sum of Weights Read 1837Sum of Weights Used 1837
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk /param=ref; weight count; model test = neversmk;run;
Response Profile
Ordered Total Total Value test Frequency Weight
1 abnormal 2 169.0000 2 normal 2 1668.0000
Probability modeled is test='abnormal'.
By default, PROC LOGISTIC models the probability of response levels with lower ordered value
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest descending; class neversmk /param=ref; weight count; model test = neversmk;run;
To model probability of being “normal”
proc logistic data=breathTest; class neversmk /param=ref; weight count; model test (descending) = neversmk;run;
proc logistic data=breathTest; class neversmk /param=ref; weight count; model test (event="normal") = neversmk;run;
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk /param=ref; weight count; model test = neversmk;run;
Class Level Information
DesignClass Value Variables
neversmk current 1 never 0
Reference cell coding estimates the difference between the effect of each level and the last level
Easy to interpret the result
Reference Cell Coding
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk; weight count; model test = neversmk;run;
Class Level Information
DesignClass Value Variables
neversmk current 1 never -1
Effect coding estimates the difference between the effect of each level and the average effect over all levels
Effect Coding
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk /param=ref; weight count; model test = neversmk;run;
Class Level Information
DesignClass Value Variables
neversmk current 1 never 0
By default, the last ordered value of the classification variable is considered the reference level
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;
Model Fit Statistics
Intercept Intercept andCriterion Only Covariates
AIC 1130.417 1100.035SC 1129.803 1098.808-2 Log L 1128.417 1096.035
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 32.3820 1 <.0001Score 30.2421 1 <.0001Wald 28.2434 1 <.0001
Information for model selection
These are the goodness-of-fit measures that used to compare one model to another
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;
Model Fit Statistics
Intercept Intercept andCriterion Only Covariates
AIC 1130.417 1100.035SC 1129.803 1098.808-2 Log L 1128.417 1096.035
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 32.3820 1 <.0001Score 30.2421 1 <.0001Wald 28.2434 1 <.0001
Ho: All regression coefficients =0
Similar to overall F statistics in linear regression
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;
Model Fit Statistics
Intercept Intercept andCriterion Only Covariates
AIC 1130.417 1100.035SC 1129.803 1098.808-2 Log L 1128.417 1096.035
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 32.3820 1 <.0001Score 30.2421 1 <.0001Wald 28.2434 1 <.0001
Ho: All regression coefficients =0
LRT is more reliable, esp. for small N
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;
Type 3 Analysis of Effects
WaldEffect DF Chi-Square Pr > ChiSq
neversmk 1 28.2434 <.0001
Analysis of Maximum Likelihood Estimates
Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -2.9704 0.1663 318.9365 <.0001neversmk current 1 1.0136 0.1907 28.2434 <.0001
NEVERSMK variable has only 1 df, test results will be identical
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;
Type 3 Analysis of Effects
WaldEffect DF Chi-Square Pr > ChiSq
neversmk 1 28.2434 <.0001
Analysis of Maximum Likelihood Estimates
Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -2.9704 0.1663 318.9365 <.0001neversmk current 1 1.0136 0.1907 28.2434 <.0001
Current smoker has 1.01 increase in the log odds of having abnormal test compared to people who never smokedOR = exp(1.0136) = 2.756
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;
Odds Ratio Estimates
Point 95% WaldEffect Estimate Confidence Limits
neversmk current vs never 2.756 1.896 4.004
Estimates of the Relative Risk (Row1/Row2)
Type of Study Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control (Odds Ratio) 2.7557 1.8962 4.0047Cohort (Col1 Risk) 2.5383 1.7904 3.5987Cohort (Col2 Risk) 0.9211 0.8960 0.9470
Sample Size = 1837
Result from PROC FREQ:
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk; oddsratio 'smoking' neversmk;run;
ODDSRATIO <‘label’> variable </options>;
new to 9.2!
Wald Confidence Interval for Odds Ratios
Label Estimate 95% Confidence Limits
smoking neversmk current vs never 2.756 1.896 4.004
LOGISTIC REGRESSION MODEL
proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk; oddsratio 'smoking' neversmk/cl=pl;run;
Profile Likelihood Confidence Interval for Odds Ratios
Label Estimate 95% Confidence Limits
smoking neversmk current vs never 2.756 1.916 4.054
Wald CI is based on normal approximationPL CI is based the value of log-likelihoodPL CI is generally preferred for small sample size
CONFOUNDING
Smoking Test
Age
Not including Age can cause either over-/under-estimates of the relationship between Smoking & Test
CONFOUNDING
AgeNon smoker
Non smoker
smoker
smoker
< 40 ≥ 40
Log (odds)
Non smoker
smoker
Smoking Test
Age
Adjusting age, you are comparing smoker and non-smoker at the common values of age
INTERACTION
Interaction: if the relationship between “Smoking” and “Test” differs depending upon whether the Age is absent or not
Age
Non smoker
Non smoker
smoker
smoker
< 40 ≥ 40
Log (odds)
Age is referred to as an effect modifier
INTERACTION & CONFOUNDING
PROC FREQ: analyze the association of your interest when there is only one confounder or one effect modifier
If you want to control multiple confounder variables or include multiple effect modifiers in your model, you need to use the PROC LOGISTIC
THE PURPOSES AND STRATEGIES FOR MODEL BUILDING
The methods of fitting a regression model differ depending upon your research purpose
Two Purposes :Investigating the essential association between
an outcome variable with a set of explanatory variables - epidemiologic field
Predict the outcome variable by using a set of explanatory variables
THE PURPOSES AND STRATEGIES FOR MODEL BUILDING
Situations for building a prediction model: statistical decision makinggenerating (not testing) hypotheses for a future study
A prediction model needs to be validated in an independent sample to evaluate its usefulness
For building a prediction model, one only needs to consider the interaction effect
Technique for building a prediction model:forwardbackwardand stepwise, etc.
The focus of this talk is not on building a prediction model but rather estimating the relationship between a main explanatory variable and an outcome variable
THE PURPOSES AND STRATEGIES FOR MODEL BUILDING
For estimating association, interaction and confounding issues must be considered
Which should be evaluated first? Confounding effect or interaction effect?
THE PURPOSES AND STRATEGIES FOR MODEL BUILDING
Is the association between “Smoking” & “Test” different
in the 2 age groups?
There is an interaction. Report
age-specific OR
No Interaction.Is “Age” a
confounder?
Report Crude OR
Report Age-Adjusted OR
YN
YN
THE PURPOSES AND STRATEGIES FOR MODEL BUILDING
Effect Modification (interaction) can be detected via statistical testing
Confounding effect cannot be tested statistically
Outcome Main Var Covariate OR P Include?
Y X 2.3 <0.05
Y X Z 4.2 0.2 YES
Y X Z 2.4 0.01 MAYBE
PROC FREQ: INTERACTION EFFECT
data breathTestAge; input test $ 1-8 neversmk $ 10-16 over40 $ 18-20 count;datalines;normal never no 577abnormal never no 34normal current no 682abnormal current no 57normal never yes 164abnormal never yes 4normal current yes 245abnormal current yes 74;
PROC FREQ: INTERACTION EFFECT
proc freq data=breathTestAge; weight count; tables over40*neversmk*test/chisq relrisk cmh;run;
Cochran-Mantel-Haenszel statistics (test for association between the row and column variables after adjusting for the 3rd variable)
The adjusted Mantel-Haenszel and logit estimates of the odds ratio and relative risks
the Breslow-Day test for homogeneity of odds ratios
The CMH option:
PROC FREQ: INTERACTION EFFECT
proc freq data=breathTestAge; weight count; tables over40*neversmk*test/chisq relrisk cmh;run;
Breslow-Day Test forHomogeneity of the Odds RatiosƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 18.0829DF 1Pr > ChiSq <.0001
Total Sample Size = 1837
the association between smoking status and the breathing test are not the same across different age groups
PROC FREQ: INTERACTION EFFECT
proc freq data=breathTestAge; weight count; tables over40*neversmk*test/chisq relrisk cmh;run;
Statistics for Table 1 of neversmk by testControlling for over40=no
Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 1 2.4559 0.1171Likelihood Ratio Chi-Square 1 2.4893 0.1146Continuity Adj. Chi-Square 1 2.1260 0.1448Mantel-Haenszel Chi-Square 1 2.4541 0.1172Phi Coefficient 0.0427Contingency Coefficient 0.0426Cramer's V 0.0427Statistics for Table 1 of neversmk by testControlling for over40=no Estimates of the Relative Risk (Row1/Row2)Type of Study Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control (Odds Ratio) 1.4184 0.9144 2.2000Cohort (Col1 Risk) 1.3861 0.9190 2.0906Cohort (Col2 Risk) 0.9772 0.9499 1.0054Sample Size = 1350
PROC FREQ: INTERACTION EFFECT
proc freq data=breathTestAge; weight count; tables over40*neversmk*test/chisq relrisk cmh;run;
Statistics for Table 2 of neversmk by testControlling for over40=yes
Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 1 35.4510 <.0001Likelihood Ratio Chi-Square 1 45.1246 <.0001Continuity Adj. Chi-Square 1 33.9203 <.0001Mantel-Haenszel Chi-Square 1 35.3782 <.0001Phi Coefficient 0.2698Contingency Coefficient 0.2605Cramer's V 0.2698
Estimates of the Relative Risk (Row1/Row2)
Type of Study Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control (Odds Ratio) 12.3837 4.4416 34.5272Cohort (Col1 Risk) 9.7429 3.6253 26.1844Cohort (Col2 Risk) 0.7868 0.7374 0.8394
PROC FREQ: INTERACTION EFFECT
proc freq data=breathTestAge; weight count; tables over40*neversmk*test/chisq relrisk cmh;run;
Summary Statistics for neversmk by testControlling for over40
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF Value Probƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 25.2444 <.0001 2 Row Mean Scores Differ 1 25.2444 <.0001 3 General Association 1 25.2444 <.0001
Estimates of the Common Relative Risk (Row1/Row2)
Type of Study Method Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control Mantel-Haenszel 2.5683 1.7618 3.7441 (Odds Ratio) Logit 1.9840 1.3252 2.9702
Cohort Mantel-Haenszel 2.4174 1.6754 3.4879 (Col1 Risk) Logit 1.8475 1.2641 2.7001
Cohort Mantel-Haenszel 0.9289 0.9046 0.9538 (Col2 Risk) Logit 0.9437 0.9195 0.9686
These statistics and its adjusted OR are only useful if there is a homogeneity in the OR across each category of the adjusting variable
PROC LOGISTIC: INTERACTION EFFECT
no 0
yes1X
never 0
current 1X
XXβXβXβα plogit
age40
smoke
age40smookeintage40age40smokesmoke
proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40;run;
PROC LOGISTIC: INTERACTION EFFECT
proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40;run;
Analysis of Maximum Likelihood Estimates Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSqIntercept 1 -2.8315 0.1765 257.4193 <.0001neversmk current 1 0.3495 0.2240 2.4355 0.1186over40 yes 1 -0.8820 0.5359 2.7086 0.0998neversmk*over40 current yes 1 2.1668 0.5691 14.4985 0.0001
Wald Test:
PROC LOGISTIC: INTERACTION EFFECT
proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40;run;
Likelihood Ratio Test: age40smookeintage40age40smokesmoke1
age40age40smokesmoke0
XXβXβXβα plogit:H
XβXβα plogit:H
model)] logL(Full [-2 - model)] L(Reduced [-2log
model)] L(Reduced log -model) [logL(Full 2 LR
model) Reduced in term(# - model) Full in term(#df,χ~LR
model)] L(Reduced log -model) [logL(Full 2 LR2
PROC LOGISTIC: INTERACTION EFFECT
proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40;run;
Model Fit Statistics
Intercept Intercept andCriterion Only Covariates
AIC 1130.417 1055.467SC 1130.497 1055.785-2 Log L 1128.417 1047.467
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 80.9500 3 <.0001Score 95.7956 3 <.0001Wald 81.3305 3 <.0001
PROC LOGISTIC: INTERACTION EFFECT
proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40;run;
Model Fit Statistics
Intercept Intercept andCriterion Only Covariates
AIC 1130.417 1074.123SC 1130.497 1074.361-2 Log L 1128.417 1068.123
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 60.2942 2 <.0001Score 61.2515 2 <.0001Wald 56.4737 2 <.0001
PROC LOGISTIC: INTERACTION EFFECT
proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40; ods output FitStatistics = log2Ratio_full GlobalTests = df_full;
data _null_; set log2Ratio_full; if Criterion = '-2 Log L'; call symput('neg2L_full', InterceptAndCovariates);
data _null_; set df_full; if Test = 'Likelihood Ratio'; call symput('df_full', DF);
PROC LOGISTIC: INTERACTION EFFECT
proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40; ods output FitStatistics = log2Ratio_reduce GlobalTests = df_reduce;data _null_; set log2Ratio_reduce; if Criterion = '-2 Log L'; call symput('neg2L_reduce', InterceptAndCovariates);
data _null_; set df_reduce; if Test = 'Likelihood Ratio'; call symput('df_reduce', DF);run;
PROC LOGISTIC: INTERACTION EFFECT
data result; LR = &neg2L_reduce - &neg2L_full; df = &df_full - &df_reduce; p = 1-probchi(LR,df); label LR = 'Likelihood Ratio';
proc print data=result label noobs; title "Likelihood ratio test";run;
Likelihood ratio test Likelihood Ratio df p 20.6558 1 .000005497
PROC LOGISTIC: INTERACTION EFFECT
proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40; oddsratio neversmk/ at (over40 ='no') ; oddsratio neversmk/ at (over40 ='yes');run;
Wald Confidence Interval for Odds Ratios
Label Estimate 95% Confidence Limits
neversmk current vs never at over40=no 1.418 0.914 2.200neversmk current vs never at over40=yes 12.383 4.441 34.525
NURSE HEALTH STUDY
NHS - nurses aged 30 to 55 who were enrolled in 1976
Part of the study investigated the association between OC use and BC
NURSE HEALTH STUDY
data nurse_study; input bc age oc count;datalines;1 0 1 710 0 1 284181 0 0 350 0 0 122671 1 1 1430 1 1 206611 1 0 3210 1 0 44424;
BREAST CANCER
AGE 30 – 39 (0) AGE 40 – 55 (1)
CASE (1) CONTROL (0) CASE (1) CONTROL (0)
OC USE
YES (1) 71 28418 143 20651
NO (0) 35 12267 321 44424
NURSE HEALTH STUDY
proc freq data=nurse_study order=data; weight count; tables age*oc*bc/chisq relrisk cmh;run;
Breslow-Day Test forHomogeneity of the Odds RatiosƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 0.1521DF 1Pr > ChiSq 0.6966
There is no interactionCheck for confounding
NURSE HEALTH STUDY
Summary Statistics for oc by bcControlling for age
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF Value Probƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 0.4361 0.5090 2 Row Mean Scores Differ 1 0.4361 0.5090 3 General Association 1 0.4361 0.5090
Estimates of the Common Relative Risk (Row1/Row2)
Type of Study Method Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control Mantel-Haenszel 0.9419 0.7882 1.1256 (Odds Ratio) Logit 0.9415 0.7882 1.1246
Cohort Mantel-Haenszel 0.9422 0.7897 1.1243 (Col1 Risk) Logit 0.9419 0.7894 1.1238
Cohort Mantel-Haenszel 1.0003 0.9994 1.0013 (Col2 Risk) Logit 1.0003 0.9995 1.0012
NURSE HEALTH STUDY
proc freq data=nurse_study order=data; weight count; tables oc*bc/chisq relrisk;run;
Statistics for Table of oc by bcStatistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 1 17.8881 <.0001Likelihood Ratio Chi-Square 1 18.1401 <.0001Continuity Adj. Chi-Square 1 17.5337 <.0001Mantel-Haenszel Chi-Square 1 17.8879 <.0001Phi Coefficient -0.0130Contingency Coefficient 0.0130Cramer's V -0.0130
Statistics for Table of oc by bc Estimates of the Relative Risk (Row1/Row2)Type of Study Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control (Odds Ratio) 0.6944 0.5858 0.8230Cohort (Col1 Risk) 0.6957 0.5874 0.8239Cohort (Col2 Risk) 1.0019 1.0010 1.0028
NURSE HEALTH STUDY
Unadjusted OR = 0.69, Adjusted OR = 0.94 Age is a confounder
In this situation, the age-adjusted statistics and its odds ratio should be reported
After adjusting for age, there is no association between using OC and having BC (p = 0.51; age adjusted OR = 0.94, 95% CI = 0.79 – 1.13)
NURSE HEALTH STUDY
proc logistic data=nurse_study descending; weight count; model bc = oc age;run;
Analysis of Maximum Likelihood Estimates
Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -5.9083 0.1156 2612.5788 <.0001oc 1 -0.0602 0.0911 0.4360 0.5090age 1 0.9835 0.1133 75.3707 <.0001
Odds Ratio Estimates
Point 95% WaldEffect Estimate Confidence Limits
oc 0.942 0.788 1.126age 2.674 2.141 3.338
ageageOCOC XβXβα plogit
NURSE HEALTH STUDY
proc logistic data=nurse_study descending; weight count; model bc = oc;run;
Analysis of Maximum Likelihood Estimates
Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -5.0704 0.0532 9095.8096 <.0001oc 1 -0.3646 0.0867 17.6834 <.0001
Odds Ratio Estimates
Point 95% WaldEffect Estimate Confidence Limits
oc 0.694 0.586 0.823
CONCLUSION
Analyzing variables with dichotomized outcomes by using the FREQ and LOGISTIC procedures is a common task for statisticians in the health care industry
Simply knowing how to use the procedures is not sufficient
Understanding the goal of model building and following correct model-building steps are extremely important in order to obtain accurate and unbiased results