analysis of a binary outcome variable

62
Analysis of a Binary Outcome Variable Using the FREQ and the LOGISTIC Procedures Arthur Li

Upload: arthur8898

Post on 28-Nov-2014

1.280 views

Category:

Documents


0 download

DESCRIPTION

Government & Healthcare Apps, SESUG 2011

TRANSCRIPT

Page 1: Analysis Of A Binary Outcome Variable

Analysis of a Binary Outcome Variable Using the FREQ and

the LOGISTIC Procedures

Arthur Li

Page 2: Analysis Of A Binary Outcome Variable

A common application in the health care industry:

INTRODUCTION

Outcome(Y)

Exposure(X)

(smoking) (cancer)

Exposure(X1)

(age)

Exposure(X2)

(gender)

PROC FREQPROC LOGISTIC

Page 3: Analysis Of A Binary Outcome Variable

One starting point create a contingency table

CONTINGENCY TABLE

BREATHING TEST

ABNORMAL NORMAL

SMOKING STATUS

CURRENT 131 927

NEVER 38 741

Forthofer & Lehnen (1981) (Agresti, 1990)Subjects: Caucasians who work in certain industrial

plants in HoustonResponse (Y): breathing testexplanatory variable (X) is smoking status

Page 4: Analysis Of A Binary Outcome Variable

Three types of study design in observational studyCross-sectional : X and Y are collected at the same

time. Prevalence Ratio = P1 / P0

Cohort: X is collected first: Relative Risk (RR) = P1 / P0

Case-control: Y is collected first. You can’t calculate RR

STUDY DESIGN

P1=

AA+B

P0=

CC+D

Outcome (Y)

1 0

Exposure (X)1 A B

0 C D

Page 5: Analysis Of A Binary Outcome Variable

ODDS RATIO

Outcome (Y)

1 0

Exposure (X)1 A B

0 C D

AOdds1 = B

Odds0 =

CD

Odds Ratio =

Odds1

Odds0

ADBC=

Page 6: Analysis Of A Binary Outcome Variable

ODDS RATIO

Outcome (Y)

1 0

Exposure (X)1 A B

0 C D

0 1 infinity

OR measures the strength between X and Y

OR = 1 No AssociationOR > 1 Exposed Group (X = 1) has higher odds OR < 1 Non-exposed Group (X = 0) has higher odds

Page 7: Analysis Of A Binary Outcome Variable

ODDS RATIO

Outcome (Y)

1 0

Exposure (X)1 A B

0 C D

0 1 infinity

To test the association between X and YUse the chi-square statistics Use 95% CI for OR – including 1 or not

OR measures the strength between X and Y

Page 8: Analysis Of A Binary Outcome Variable

PROC FREQ

BREATHING TEST (Y)

ABNORMAL (1) NORMAL (0)

SMOKING STATUS (X)

CURRENT (1) 131 (A) 927 (B)

NEVER (0) 38 (C) 741 (D)

data breathTest; input test $ 1-8 neversmk $ 10-16 count;datalines;abnormal current 131normal current 927abnormal never 38normal never 741;

Page 9: Analysis Of A Binary Outcome Variable

PROC FREQ

proc freq data=breathTest; weight count; tables neversmk*test;run;

the data is entered directly from the cell count of the table

The FREQ ProcedureTable of neversmk by testneversmk test

Frequency‚Percent ‚Row Pct ‚Col Pct ‚abnormal‚normal ‚ Totalƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆcurrent ‚ 131 ‚ 927 ‚ 1058 ‚ 7.13 ‚ 50.46 ‚ 57.59 ‚ 12.38 ‚ 87.62 ‚ ‚ 77.51 ‚ 55.58 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆnever ‚ 38 ‚ 741 ‚ 779 ‚ 2.07 ‚ 40.34 ‚ 42.41 ‚ 4.88 ‚ 95.12 ‚ ‚ 22.49 ‚ 44.42 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆTotal 169 1668 1837 9.20 90.80 100.00

Page 10: Analysis Of A Binary Outcome Variable

PROC FREQ - RELRISK

proc freq data=breathTest; weight count; tables neversmk*test/relrisk;run;

Compute RR for col1 RR for col2ORBREATHING TEST (Y)

ABNORMAL (1) NORMAL (0)

SMOKING STATUS (X)

CURRENT (1) 131 (A) 927 (B)

NEVER (0) 38 (C) 741 (D)

col1 col2

Page 11: Analysis Of A Binary Outcome Variable

PROC FREQ - RELRISK

proc freq data=breathTest; weight count; tables neversmk*test/relrisk;run;

Estimates of the Relative Risk (Row1/Row2)

Type of Study Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control (Odds Ratio) 2.7557 1.8962 4.0047Cohort (Col1 Risk) 2.5383 1.7904 3.5987Cohort (Col2 Risk) 0.9211 0.8960 0.9470

Sample Size = 1837

Compute RR for col1 RR for col2OR

Odds of having an abnormal test result are about 2.8 times higher for current smokers compared to those who have never smoked (95% CI: 1.9 – 4.0).

Page 12: Analysis Of A Binary Outcome Variable

PROC FREQ - CHISQ

proc freq data=breathTest; weight count; tables neversmk*test/relrisk chisq;run;

Statistics for Table of neversmk by test

Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 1 30.2421 <.0001Likelihood Ratio Chi-Square 1 32.3820 <.0001Continuity Adj. Chi-Square 1 29.3505 <.0001Mantel-Haenszel Chi-Square 1 30.2257 <.0001Phi Coefficient 0.1283Contingency Coefficient 0.1273Cramer's V 0.1283

Page 13: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

Use logistic regression to study the association between the “Breathing Test” & “Smoking”

For logistic regression, the MLE (not OLS) is used to estimate the parameters

Why not use a linear probability model?

[0,1]p βX;αp

The probability is bounded The relationship between p and X can be

nonlinear

Page 14: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

A logistic regression is used for predicting the probability occurrence of an event by fitting data to a logit function

βXα plogit

log(odds) plogit

β)exp(α1

β)exp(α p

Page 15: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

BREATHING TEST

ABNORMAL NORMAL

SMOKING STATUS

CURRENT 131 927

NEVER 38 741

mal);prob(abnor p βX;α plogit

Reference cell coding

β: the increment in log odds for current smokers compared to those that never smoked

βexpOR odds

oddslogβ 0 vs 1

0

1

αlogit(p never 0

βαlogit(p current 1X

0

1

)

)

Page 16: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk /param=ref; weight count; model test = neversmk;run;

The LOGISTIC Procedure

Model Information

Data Set WORK.BREATHTESTResponse Variable testNumber of Response Levels 2Weight Variable countModel binary logitOptimization Technique Fisher's scoring

Number of Observations Read 4Number of Observations Used 4Sum of Weights Read 1837Sum of Weights Used 1837

Page 17: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk /param=ref; weight count; model test = neversmk;run;

Response Profile

Ordered Total Total Value test Frequency Weight

1 abnormal 2 169.0000 2 normal 2 1668.0000

Probability modeled is test='abnormal'.

By default, PROC LOGISTIC models the probability of response levels with lower ordered value

Page 18: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest descending; class neversmk /param=ref; weight count; model test = neversmk;run;

To model probability of being “normal”

proc logistic data=breathTest; class neversmk /param=ref; weight count; model test (descending) = neversmk;run;

proc logistic data=breathTest; class neversmk /param=ref; weight count; model test (event="normal") = neversmk;run;

Page 19: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk /param=ref; weight count; model test = neversmk;run;

Class Level Information

DesignClass Value Variables

neversmk current 1 never 0

Reference cell coding estimates the difference between the effect of each level and the last level

Easy to interpret the result

Reference Cell Coding

Page 20: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk; weight count; model test = neversmk;run;

Class Level Information

DesignClass Value Variables

neversmk current 1 never -1

Effect coding estimates the difference between the effect of each level and the average effect over all levels

Effect Coding

Page 21: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk /param=ref; weight count; model test = neversmk;run;

Class Level Information

DesignClass Value Variables

neversmk current 1 never 0

By default, the last ordered value of the classification variable is considered the reference level

Page 22: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;

Model Fit Statistics

Intercept Intercept andCriterion Only Covariates

AIC 1130.417 1100.035SC 1129.803 1098.808-2 Log L 1128.417 1096.035

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 32.3820 1 <.0001Score 30.2421 1 <.0001Wald 28.2434 1 <.0001

Information for model selection

These are the goodness-of-fit measures that used to compare one model to another

Page 23: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;

Model Fit Statistics

Intercept Intercept andCriterion Only Covariates

AIC 1130.417 1100.035SC 1129.803 1098.808-2 Log L 1128.417 1096.035

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 32.3820 1 <.0001Score 30.2421 1 <.0001Wald 28.2434 1 <.0001

Ho: All regression coefficients =0

Similar to overall F statistics in linear regression

Page 24: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;

Model Fit Statistics

Intercept Intercept andCriterion Only Covariates

AIC 1130.417 1100.035SC 1129.803 1098.808-2 Log L 1128.417 1096.035

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 32.3820 1 <.0001Score 30.2421 1 <.0001Wald 28.2434 1 <.0001

Ho: All regression coefficients =0

LRT is more reliable, esp. for small N

Page 25: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;

Type 3 Analysis of Effects

WaldEffect DF Chi-Square Pr > ChiSq

neversmk 1 28.2434 <.0001

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -2.9704 0.1663 318.9365 <.0001neversmk current 1 1.0136 0.1907 28.2434 <.0001

NEVERSMK variable has only 1 df, test results will be identical

Page 26: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;

Type 3 Analysis of Effects

WaldEffect DF Chi-Square Pr > ChiSq

neversmk 1 28.2434 <.0001

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -2.9704 0.1663 318.9365 <.0001neversmk current 1 1.0136 0.1907 28.2434 <.0001

Current smoker has 1.01 increase in the log odds of having abnormal test compared to people who never smokedOR = exp(1.0136) = 2.756

Page 27: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk;run;

Odds Ratio Estimates

Point 95% WaldEffect Estimate Confidence Limits

neversmk current vs never 2.756 1.896 4.004

Estimates of the Relative Risk (Row1/Row2)

Type of Study Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control (Odds Ratio) 2.7557 1.8962 4.0047Cohort (Col1 Risk) 2.5383 1.7904 3.5987Cohort (Col2 Risk) 0.9211 0.8960 0.9470

Sample Size = 1837

Result from PROC FREQ:

Page 28: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk; oddsratio 'smoking' neversmk;run;

ODDSRATIO <‘label’> variable </options>;

new to 9.2!

Wald Confidence Interval for Odds Ratios

Label Estimate 95% Confidence Limits

smoking neversmk current vs never 2.756 1.896 4.004

Page 29: Analysis Of A Binary Outcome Variable

LOGISTIC REGRESSION MODEL

proc logistic data=breathTest; class neversmk (ref="never") /param=ref; weight count; model test = neversmk; oddsratio 'smoking' neversmk/cl=pl;run;

Profile Likelihood Confidence Interval for Odds Ratios

Label Estimate 95% Confidence Limits

smoking neversmk current vs never 2.756 1.916 4.054

Wald CI is based on normal approximationPL CI is based the value of log-likelihoodPL CI is generally preferred for small sample size

Page 30: Analysis Of A Binary Outcome Variable

CONFOUNDING

Smoking Test

Age

Not including Age can cause either over-/under-estimates of the relationship between Smoking & Test

Page 31: Analysis Of A Binary Outcome Variable

CONFOUNDING

AgeNon smoker

Non smoker

smoker

smoker

< 40 ≥ 40

Log (odds)

Non smoker

smoker

Smoking Test

Age

Adjusting age, you are comparing smoker and non-smoker at the common values of age

Page 32: Analysis Of A Binary Outcome Variable

INTERACTION

Interaction: if the relationship between “Smoking” and “Test” differs depending upon whether the Age is absent or not

Age

Non smoker

Non smoker

smoker

smoker

< 40 ≥ 40

Log (odds)

Age is referred to as an effect modifier

Page 33: Analysis Of A Binary Outcome Variable

INTERACTION & CONFOUNDING

PROC FREQ: analyze the association of your interest when there is only one confounder or one effect modifier

If you want to control multiple confounder variables or include multiple effect modifiers in your model, you need to use the PROC LOGISTIC

Page 34: Analysis Of A Binary Outcome Variable

THE PURPOSES AND STRATEGIES FOR MODEL BUILDING

The methods of fitting a regression model differ depending upon your research purpose

Two Purposes :Investigating the essential association between

an outcome variable with a set of explanatory variables - epidemiologic field

Predict the outcome variable by using a set of explanatory variables

Page 35: Analysis Of A Binary Outcome Variable

THE PURPOSES AND STRATEGIES FOR MODEL BUILDING

Situations for building a prediction model: statistical decision makinggenerating (not testing) hypotheses for a future study

A prediction model needs to be validated in an independent sample to evaluate its usefulness

For building a prediction model, one only needs to consider the interaction effect

Technique for building a prediction model:forwardbackwardand stepwise, etc.

The focus of this talk is not on building a prediction model but rather estimating the relationship between a main explanatory variable and an outcome variable

Page 36: Analysis Of A Binary Outcome Variable

THE PURPOSES AND STRATEGIES FOR MODEL BUILDING

For estimating association, interaction and confounding issues must be considered

Which should be evaluated first? Confounding effect or interaction effect?

Page 37: Analysis Of A Binary Outcome Variable

THE PURPOSES AND STRATEGIES FOR MODEL BUILDING

Is the association between “Smoking” & “Test” different

in the 2 age groups?

There is an interaction. Report

age-specific OR

No Interaction.Is “Age” a

confounder?

Report Crude OR

Report Age-Adjusted OR

YN

YN

Page 38: Analysis Of A Binary Outcome Variable

THE PURPOSES AND STRATEGIES FOR MODEL BUILDING

Effect Modification (interaction) can be detected via statistical testing

Confounding effect cannot be tested statistically

Outcome Main Var Covariate OR P Include?

Y X 2.3 <0.05

Y X Z 4.2 0.2 YES

Y X Z 2.4 0.01 MAYBE

Page 39: Analysis Of A Binary Outcome Variable

PROC FREQ: INTERACTION EFFECT

data breathTestAge; input test $ 1-8 neversmk $ 10-16 over40 $ 18-20 count;datalines;normal never no 577abnormal never no 34normal current no 682abnormal current no 57normal never yes 164abnormal never yes 4normal current yes 245abnormal current yes 74;

Page 40: Analysis Of A Binary Outcome Variable

PROC FREQ: INTERACTION EFFECT

proc freq data=breathTestAge; weight count; tables over40*neversmk*test/chisq relrisk cmh;run;

Cochran-Mantel-Haenszel statistics (test for association between the row and column variables after adjusting for the 3rd variable)

The adjusted Mantel-Haenszel and logit estimates of the odds ratio and relative risks

the Breslow-Day test for homogeneity of odds ratios

The CMH option:

Page 41: Analysis Of A Binary Outcome Variable

PROC FREQ: INTERACTION EFFECT

proc freq data=breathTestAge; weight count; tables over40*neversmk*test/chisq relrisk cmh;run;

Breslow-Day Test forHomogeneity of the Odds RatiosƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 18.0829DF 1Pr > ChiSq <.0001

Total Sample Size = 1837

the association between smoking status and the breathing test are not the same across different age groups

Page 42: Analysis Of A Binary Outcome Variable

PROC FREQ: INTERACTION EFFECT

proc freq data=breathTestAge; weight count; tables over40*neversmk*test/chisq relrisk cmh;run;

Statistics for Table 1 of neversmk by testControlling for over40=no

Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 1 2.4559 0.1171Likelihood Ratio Chi-Square 1 2.4893 0.1146Continuity Adj. Chi-Square 1 2.1260 0.1448Mantel-Haenszel Chi-Square 1 2.4541 0.1172Phi Coefficient 0.0427Contingency Coefficient 0.0426Cramer's V 0.0427Statistics for Table 1 of neversmk by testControlling for over40=no Estimates of the Relative Risk (Row1/Row2)Type of Study Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control (Odds Ratio) 1.4184 0.9144 2.2000Cohort (Col1 Risk) 1.3861 0.9190 2.0906Cohort (Col2 Risk) 0.9772 0.9499 1.0054Sample Size = 1350

Page 43: Analysis Of A Binary Outcome Variable

PROC FREQ: INTERACTION EFFECT

proc freq data=breathTestAge; weight count; tables over40*neversmk*test/chisq relrisk cmh;run;

Statistics for Table 2 of neversmk by testControlling for over40=yes

Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 1 35.4510 <.0001Likelihood Ratio Chi-Square 1 45.1246 <.0001Continuity Adj. Chi-Square 1 33.9203 <.0001Mantel-Haenszel Chi-Square 1 35.3782 <.0001Phi Coefficient 0.2698Contingency Coefficient 0.2605Cramer's V 0.2698

Estimates of the Relative Risk (Row1/Row2)

Type of Study Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control (Odds Ratio) 12.3837 4.4416 34.5272Cohort (Col1 Risk) 9.7429 3.6253 26.1844Cohort (Col2 Risk) 0.7868 0.7374 0.8394

Page 44: Analysis Of A Binary Outcome Variable

PROC FREQ: INTERACTION EFFECT

proc freq data=breathTestAge; weight count; tables over40*neversmk*test/chisq relrisk cmh;run;

Summary Statistics for neversmk by testControlling for over40

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis DF Value Probƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 25.2444 <.0001 2 Row Mean Scores Differ 1 25.2444 <.0001 3 General Association 1 25.2444 <.0001

Estimates of the Common Relative Risk (Row1/Row2)

Type of Study Method Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control Mantel-Haenszel 2.5683 1.7618 3.7441 (Odds Ratio) Logit 1.9840 1.3252 2.9702

Cohort Mantel-Haenszel 2.4174 1.6754 3.4879 (Col1 Risk) Logit 1.8475 1.2641 2.7001

Cohort Mantel-Haenszel 0.9289 0.9046 0.9538 (Col2 Risk) Logit 0.9437 0.9195 0.9686

These statistics and its adjusted OR are only useful if there is a homogeneity in the OR across each category of the adjusting variable

Page 45: Analysis Of A Binary Outcome Variable

PROC LOGISTIC: INTERACTION EFFECT

no 0

yes1X

never 0

current 1X

XXβXβXβα plogit

age40

smoke

age40smookeintage40age40smokesmoke

proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40;run;

Page 46: Analysis Of A Binary Outcome Variable

PROC LOGISTIC: INTERACTION EFFECT

proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40;run;

Analysis of Maximum Likelihood Estimates Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSqIntercept 1 -2.8315 0.1765 257.4193 <.0001neversmk current 1 0.3495 0.2240 2.4355 0.1186over40 yes 1 -0.8820 0.5359 2.7086 0.0998neversmk*over40 current yes 1 2.1668 0.5691 14.4985 0.0001

Wald Test:

Page 47: Analysis Of A Binary Outcome Variable

PROC LOGISTIC: INTERACTION EFFECT

proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40;run;

Likelihood Ratio Test: age40smookeintage40age40smokesmoke1

age40age40smokesmoke0

XXβXβXβα plogit:H

XβXβα plogit:H

model)] logL(Full [-2 - model)] L(Reduced [-2log

model)] L(Reduced log -model) [logL(Full 2 LR

model) Reduced in term(# - model) Full in term(#df,χ~LR

model)] L(Reduced log -model) [logL(Full 2 LR2

Page 48: Analysis Of A Binary Outcome Variable

PROC LOGISTIC: INTERACTION EFFECT

proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40;run;

Model Fit Statistics

Intercept Intercept andCriterion Only Covariates

AIC 1130.417 1055.467SC 1130.497 1055.785-2 Log L 1128.417 1047.467

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 80.9500 3 <.0001Score 95.7956 3 <.0001Wald 81.3305 3 <.0001

Page 49: Analysis Of A Binary Outcome Variable

PROC LOGISTIC: INTERACTION EFFECT

proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40;run;

Model Fit Statistics

Intercept Intercept andCriterion Only Covariates

AIC 1130.417 1074.123SC 1130.497 1074.361-2 Log L 1128.417 1068.123

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 60.2942 2 <.0001Score 61.2515 2 <.0001Wald 56.4737 2 <.0001

Page 50: Analysis Of A Binary Outcome Variable

PROC LOGISTIC: INTERACTION EFFECT

proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40; ods output FitStatistics = log2Ratio_full GlobalTests = df_full;

data _null_; set log2Ratio_full; if Criterion = '-2 Log L'; call symput('neg2L_full', InterceptAndCovariates);

data _null_; set df_full; if Test = 'Likelihood Ratio'; call symput('df_full', DF);

Page 51: Analysis Of A Binary Outcome Variable

PROC LOGISTIC: INTERACTION EFFECT

proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40; ods output FitStatistics = log2Ratio_reduce GlobalTests = df_reduce;data _null_; set log2Ratio_reduce; if Criterion = '-2 Log L'; call symput('neg2L_reduce', InterceptAndCovariates);

data _null_; set df_reduce; if Test = 'Likelihood Ratio'; call symput('df_reduce', DF);run;

Page 52: Analysis Of A Binary Outcome Variable

PROC LOGISTIC: INTERACTION EFFECT

data result; LR = &neg2L_reduce - &neg2L_full; df = &df_full - &df_reduce; p = 1-probchi(LR,df); label LR = 'Likelihood Ratio';

proc print data=result label noobs; title "Likelihood ratio test";run;

Likelihood ratio test Likelihood Ratio df p 20.6558 1 .000005497

Page 53: Analysis Of A Binary Outcome Variable

PROC LOGISTIC: INTERACTION EFFECT

proc logistic data=breathTestAge; class neversmk (ref="never") over40 (ref="no")/param=ref; weight count; model test = neversmk over40 neversmk*over40; oddsratio neversmk/ at (over40 ='no') ; oddsratio neversmk/ at (over40 ='yes');run;

Wald Confidence Interval for Odds Ratios

Label Estimate 95% Confidence Limits

neversmk current vs never at over40=no 1.418 0.914 2.200neversmk current vs never at over40=yes 12.383 4.441 34.525

Page 54: Analysis Of A Binary Outcome Variable

NURSE HEALTH STUDY

NHS - nurses aged 30 to 55 who were enrolled in 1976

Part of the study investigated the association between OC use and BC

Page 55: Analysis Of A Binary Outcome Variable

NURSE HEALTH STUDY

data nurse_study; input bc age oc count;datalines;1 0 1 710 0 1 284181 0 0 350 0 0 122671 1 1 1430 1 1 206611 1 0 3210 1 0 44424;

BREAST CANCER

AGE 30 – 39 (0) AGE 40 – 55 (1)

CASE (1) CONTROL (0) CASE (1) CONTROL (0)

OC USE

YES (1) 71 28418 143 20651

NO (0) 35 12267 321 44424

Page 56: Analysis Of A Binary Outcome Variable

NURSE HEALTH STUDY

proc freq data=nurse_study order=data; weight count; tables age*oc*bc/chisq relrisk cmh;run;

Breslow-Day Test forHomogeneity of the Odds RatiosƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 0.1521DF 1Pr > ChiSq 0.6966

There is no interactionCheck for confounding

Page 57: Analysis Of A Binary Outcome Variable

NURSE HEALTH STUDY

Summary Statistics for oc by bcControlling for age

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis DF Value Probƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 0.4361 0.5090 2 Row Mean Scores Differ 1 0.4361 0.5090 3 General Association 1 0.4361 0.5090

Estimates of the Common Relative Risk (Row1/Row2)

Type of Study Method Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control Mantel-Haenszel 0.9419 0.7882 1.1256 (Odds Ratio) Logit 0.9415 0.7882 1.1246

Cohort Mantel-Haenszel 0.9422 0.7897 1.1243 (Col1 Risk) Logit 0.9419 0.7894 1.1238

Cohort Mantel-Haenszel 1.0003 0.9994 1.0013 (Col2 Risk) Logit 1.0003 0.9995 1.0012

Page 58: Analysis Of A Binary Outcome Variable

NURSE HEALTH STUDY

proc freq data=nurse_study order=data; weight count; tables oc*bc/chisq relrisk;run;

Statistics for Table of oc by bcStatistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 1 17.8881 <.0001Likelihood Ratio Chi-Square 1 18.1401 <.0001Continuity Adj. Chi-Square 1 17.5337 <.0001Mantel-Haenszel Chi-Square 1 17.8879 <.0001Phi Coefficient -0.0130Contingency Coefficient 0.0130Cramer's V -0.0130

Statistics for Table of oc by bc Estimates of the Relative Risk (Row1/Row2)Type of Study Value 95% Confidence LimitsƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒCase-Control (Odds Ratio) 0.6944 0.5858 0.8230Cohort (Col1 Risk) 0.6957 0.5874 0.8239Cohort (Col2 Risk) 1.0019 1.0010 1.0028

Page 59: Analysis Of A Binary Outcome Variable

NURSE HEALTH STUDY

Unadjusted OR = 0.69, Adjusted OR = 0.94 Age is a confounder

In this situation, the age-adjusted statistics and its odds ratio should be reported

After adjusting for age, there is no association between using OC and having BC (p = 0.51; age adjusted OR = 0.94, 95% CI = 0.79 – 1.13)

Page 60: Analysis Of A Binary Outcome Variable

NURSE HEALTH STUDY

proc logistic data=nurse_study descending; weight count; model bc = oc age;run;

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -5.9083 0.1156 2612.5788 <.0001oc 1 -0.0602 0.0911 0.4360 0.5090age 1 0.9835 0.1133 75.3707 <.0001

Odds Ratio Estimates

Point 95% WaldEffect Estimate Confidence Limits

oc 0.942 0.788 1.126age 2.674 2.141 3.338

ageageOCOC XβXβα plogit

Page 61: Analysis Of A Binary Outcome Variable

NURSE HEALTH STUDY

proc logistic data=nurse_study descending; weight count; model bc = oc;run;

Analysis of Maximum Likelihood Estimates

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -5.0704 0.0532 9095.8096 <.0001oc 1 -0.3646 0.0867 17.6834 <.0001

Odds Ratio Estimates

Point 95% WaldEffect Estimate Confidence Limits

oc 0.694 0.586 0.823

Page 62: Analysis Of A Binary Outcome Variable

CONCLUSION

Analyzing variables with dichotomized outcomes by using the FREQ and LOGISTIC procedures is a common task for statisticians in the health care industry

Simply knowing how to use the procedures is not sufficient

Understanding the goal of model building and following correct model-building steps are extremely important in order to obtain accurate and unbiased results