hsrp 734: advanced statistical methods june 5, 2008
DESCRIPTION
HSRP 734: Advanced Statistical Methods June 5, 2008. Introduction. Categorical data analysis multinomial 2x2 and RxC analysis 2x2xK, RxCxK analysis Stratified analysis (CMH) considers the problem of controlling for other variable. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/1.jpg)
HSRP 734: HSRP 734: Advanced Statistical Advanced Statistical
MethodsMethodsJune 5, 2008June 5, 2008
![Page 2: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/2.jpg)
Introduction
• Categorical data analysis
– multinomial
– 2x2 and RxC analysis
– 2x2xK, RxCxK analysis
• Stratified analysis (CMH) considers the problem of controlling for other variable
![Page 3: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/3.jpg)
Introduction
• Need to extend to scientific questions of higher dimension.
• When the number of potential covariates increases, traditional methods of contingency table analysis become limited
• One alternative approach to stratified analyses is the development of regression models that incorporate covariates and interactions among variables.
![Page 4: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/4.jpg)
Introduction
• Logistic regression is a form of regression analysis in which the outcome variable is binary or dichotomous
• General theory: analysis of variance (ANOVA) and logistic regression all are special cases of General Linear Model (GLM)
![Page 5: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/5.jpg)
OBJECTIVES
• To describe what simple and multiple logistic regression is and how to perform
• To describe maximum likelihood techniques to fit logistic regression models
• To describe Likelihood ratio and Wald tests
![Page 6: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/6.jpg)
OBJECTIVES
• To describe how to interpret odds ratios for logistic regression with categorical and continuous predictors
• To describe how to estimate and interpret predicted probabilities from logistic models
• To describe how to do the above 5 using SAS Enterprise
![Page 7: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/7.jpg)
What is Logistic Regression?
• In a nutshell:
A statistical method used to model dichotomous or binary outcomes (but not limited to) using predictor variables.
Used when the research method is focused on whether or not an event occurred, rather than when it occurred (time course information is not used).
![Page 8: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/8.jpg)
What is Logistic Regression?
• What is the “Logistic” component?
Instead of modeling the outcome, Y, directly, the method models the log odds(Y) using the logistic function.
![Page 9: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/9.jpg)
What is Logistic Regression?
• What is the “Regression” component?
Methods used to quantify association between an outcome and predictor variables. Could be used to build predictive models as a function of predictors.
![Page 10: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/10.jpg)
What is Logistic Regression?
0 20 40 60 80Age (yrs.)
0
20
40
60
80
100
Leng
th o
f Sta
y (d
ays)
![Page 11: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/11.jpg)
20 30 40 50 60 70Age
0.0
0.2
0.4
0.6
0.8
1.0
CH
DWhat is Logistic Regression?
1
00 d
ay M
orta
lity
(Die
d=1,
Aliv
e=0)
Age (yrs.)
![Page 12: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/12.jpg)
Fig 1. Logistic regression curves for the three drug combinations. The dashed reference line represents the probability of DLT of .33. The estimated MTD can be obtained as the value on the horizontal axis that coincides with a vertical line drawn through the point where the dashed line intersects the logistic curve. Taken from “Parallel Phase I Studies of Daunorubicin Given With Cytarabine and Etoposide With or Without the Multidrug Resistance Modulator PSC-833 in Previously Untreated Patients 60 Years of Age or Older With Acute Myeloid Leukemia: Results of Cancer and Leukemia Group B Study 9420” Journal of Clinical Oncology, Vol 17, Issue 9 (September), 1999: 283. http://www.jco.org/cgi/content/full/17/9/2831
![Page 13: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/13.jpg)
What can we use Logistic Regression for?
• To estimate adjusted prevalence rates, adjusted for potential confounders (sociodemographic or clinical characteristics)
• To estimate the effect of a treatment on a dichotomous outcome, adjusted for other covariates
• Explore how well characteristics predict a categorical outcome
![Page 14: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/14.jpg)
History of Logistic Regression
• Logistic function was invented in the 19th century to describe the growth of populations and the course of autocatalytic chemical reactions.
• Quetelet and Verhulst
• Population growth was described easiest by exponential growth but led to impossible values
![Page 15: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/15.jpg)
History of Logistic Regression
• Logistic function was the solution to a differential equation that was examined from trying to dampen exponential population growth models.
![Page 16: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/16.jpg)
History of Logistic Regression
• Published in 3 different papers around the 1840’s. The first paper showed how the logistic models agreed very well with the actual course of the populations of France, Belgium, Essex, and Russia for periods up to the early 1830’s.
![Page 17: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/17.jpg)
( ) log(1 )
pLOGIT p zp
exp( )1 exp( )
zpz
exp( ) ln
(1 ) 1 expzpLOGIT p z p
p z
The Logistic Curve
z (log odds)
p (p
roba
bilit
y)
![Page 18: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/18.jpg)
Logistic Regression
• Simple logistic regression = logistic regression with 1 predictor variable
• Multiple logistic regression = logistic regression with multiple predictor variables
• Multiple logistic regression = Multivariable logistic regression = Multivariate logistic regression
![Page 19: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/19.jpg)
The Logistic Regression Model
0 1 1 2 2 K K
0 1 1 2 2 K K
Logistic Regression:
P Yln
1-P Y
Linear Regression:
Y
X X X
X X X
![Page 20: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/20.jpg)
The Logistic Regression Model
0 1 1 2 2 K K
P Yln
1-P YX X X
predictor variables
YP1
YPln is the log(odds) of the outcome.
dichotomous outcome
![Page 21: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/21.jpg)
The Logistic Regression Model
0 1 1 2 2 K K
P Yln
1-P YX X X
intercept
YP1
YPln is the log(odds) of the outcome.
model coefficients
![Page 22: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/22.jpg)
Logistic Regression uses Odds Ratios
• Does not model the outcome directly, which leads to effect estimates quantified by means (i.e., differences in means)
• Estimates of effect are instead quantified by “Odds Ratios”
![Page 23: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/23.jpg)
Relationship between Odds & Probability
Probability eventOdds event =
1-Probability event
Odds eventProbability event
1+Odds event
![Page 24: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/24.jpg)
The Odds Ratio
Definition of Odds Ratio: Ratio of two odds estimates.
So, if Pr(response | trt) = 0.40 and Pr(response | placebo) = 0.20
Then:
0.40Odds response| trt group 0.6671 0.40
25.020.01
20.0group placebo |responseOdds
0.667 OR Trt vs. Placebo 2.670.25
![Page 25: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/25.jpg)
Interpretation of the Odds Ratio
•Example cont’d:
Outcome = response, 67.2OR plb trt vs.
Then, the odds of a response in the treatment group were estimated to be 2.67 times the odds of having a response in the placebo group.
Alternatively, the odds of having a response were 167% higher in the treatment group than in the placebo group.
![Page 26: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/26.jpg)
Odds Ratio vs. Relative Risk
• An Odds Ratio of 2.67 for trt. vs. placebo does NOT mean that the outcome is 2.67 times as LIKELY to occur.
• It DOES mean that the ODDS of the outcome occurring are 2.67 times as high for trt. vs. placebo.
![Page 27: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/27.jpg)
Odds Ratio vs. Relative Risk
• The Odds Ratio is NOT mathematically equivalent to the Relative Risk (Risk Ratio)
• However, for “rare” events, the Odds ratio can approximate the Relative risk (RR)
1-P response | trtOR=RR
1-P response | plb
![Page 28: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/28.jpg)
Maximum Likelihood
![Page 29: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/29.jpg)
Idea of Maximum Likelihood
• Flipped a fair coin 10 times: T, H, H, T, T, H, H, T, H, H
• What is the Pr(Heads) given the data? 1/100? 1/5? 1/2? 6/10?
• Did you do the home experiment?
![Page 30: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/30.jpg)
T, H, H, T, T, H, H, T, H, H
• What is the Pr(Heads) given the data? • Most reasonable data-based estimate would be
6/10.
• In fact,
is the ML estimator of p.
flips of # totalheads of #ˆ
NXp
![Page 31: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/31.jpg)
Maximum Likelihood
• The method of maximum likelihood estimation chooses values for parameter estimates (regression coefficients) which make the observed data “maximally likely.”
• Standard errors are obtained as a by-product of the maximization process
![Page 32: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/32.jpg)
The Logistic Regression Model
0 1 1 2 2 K K
P Yln
1-P YX X X
intercept
YP1
YPln is the log(odds) of the outcome.
model coefficients
![Page 33: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/33.jpg)
Maximum Likelihood
• We want to choose β’s that maximizes the probability of observing the data we have:
N
iiNN yyyyyyyL
12121 PrPrPrPr,,,Pr
Assumption: independent y’s
![Page 34: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/34.jpg)
Maximum Likelihood
• Define p = Pr(y=1). Then for dichotomous outcome => Pr(y=0) = 1-Pr(y=1) = 1-p. Then:
yy ppy 11Pr
ppp
ppp
11)0Pr( 0,yFor
1)1Pr( 1,yFor
10
01
![Page 35: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/35.jpg)
• So, given that Pr(y) = py(1-p)1-y :
N
i
yi
yi
N
ii
ii ppyL1
1
1
1Pr
N
ii
y
i
i
N
ii
y
i
yi
pp
p
pp
p
i
i
i
1
1
11
11
1
![Page 36: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/36.jpg)
• Taking the logarithm of both sides:
i
ii
i
ii p
ppyL 1ln
1lnln
Can you see why?
0 1 1 2 2 K K
P Yln
1-P YX X X
Remember that:
ix
![Page 37: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/37.jpg)
• Substituting in using logistic regression model:
• Now we choose values of β that make this equation as large as possible.
• Maximizing the lnL => maximizes L
• Maximizing involves derivatives & iteration
i
iii
i xxyL exp1lnln
![Page 38: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/38.jpg)
Maximum Likelihood
• The method of maximum likelihood estimation chooses values for parameter estimates which make the observed data “maximally likely.”
• ML estimators have great properties:– Unbiased (estimate true β’s)
– Asymptotically efficient (narrow CI’s)
– Asymptotically Normally distributed (can calculate CI’s and Test Statistics using familiar Z formulas)
![Page 39: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/39.jpg)
Estimating a Logistic Regression Model
Steps:
• Observe data on outcome, Y, and charactersitiscs X1, X2, …, XK
• Estimate model coefficients using ML
• Perform inference: calculate confidence intervals, odds ratios, etc.
![Page 40: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/40.jpg)
The Logistic Regression Model
0 1 1 2 2 K K
P Yln
1-P YX X X
predictor variables
YP1
YPln is the log(odds) of the outcome.
dichotomous outcome
![Page 41: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/41.jpg)
The Logistic Regression Model
0 1 1 2 2 K K
P Yln
1-P YX X X
intercept
YP1
YPln is the log(odds) of the outcome.
model coefficients
![Page 42: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/42.jpg)
Form for Predicted Probabilities
0 1 1 2 2 K K
0 1 1 2 2 K K
0 1 1 2 2 K K
P Yln
1-P Y
expP Y
1 exp
X X X
X X XX X X
In this latter form, the logistic regression model directly relates the probability of Y to the predictor variables.
![Page 43: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/43.jpg)
The Logistic Regression Model
0 1 1 2 2 K K
0 1 1 2 2 K K
Logistic Regression:
P Yln
1-P Y
Linear Regression:
Y
X X X
X X X
![Page 44: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/44.jpg)
Why not use linear regression for dichotomous outcomes?
• If we model Y directly and Y is dichotomous, this necessarily violates the linear regression assumptions (homoscedasticity)
• One of the more intuitive reasons not to is that will end up with predicted Y’s other than 0 or 1 (possibly more extreme than 0 or 1).
![Page 45: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/45.jpg)
Assumptions in logistic regression
• Assumptions in logistic regression
– Yi are from Bernoulli or binomial (ni, i) distribution
– Yi are independent
– Log odds P(Yi = 1) or logit P(Yi = 1) is a linear function of covariates
![Page 46: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/46.jpg)
• Relationships among probability, odds and log odds
Measure Min Max NamePr(Y=1) 0 1 prob
0 ∞ odds
-∞ ∞ log odds
)1Pr(1
)1Pr(logY
Y
)1Pr(1)1Pr(
YY
![Page 47: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/47.jpg)
Commonality between linear and logistic regression
• Operating on the logit scale allows a linear model that is similar to linear regression to be applied
• Both linear and logistic regression are apart of the family of Generalized Linear Models (GLM)
![Page 48: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/48.jpg)
Logistic Regresion is a General Linear Model (GLM)
• Family of regression models that use the same general framework
• Outcome variable determines choice of model
Outcome GLM Model Continuous Linear regression Dichotomous Logistic regression Counts Poisson regression
![Page 49: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/49.jpg)
Logistic Regression Models are estimated by Maximum Likelihood
• Using this estimation gives model coefficient estimates that are asymptotically consistent, efficient, and normally distributed.
• Thus, a 95% Confidence Interval for is given by:
K
2
,
KK z SE
L U
![Page 50: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/50.jpg)
Logistic Regression Models are estimated by Maximum Likelihood
• The Odds Ratio for the kth model coefficient is:
• We can also get a 95% CI for the OR from:
KOR = exp
L U ,
where L , U is a 95% CI for K
e e
![Page 51: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/51.jpg)
The Logistic Regression ModelExample:
In Assisted Reproduction Technology (ART) clinics, one of the main outcomes is clinical pregnancy.
There is much empirical evidence that the candidate mother’s age is a significant factor that affects the chances of pregnancy success.
A recent study examined the effect of the mother’s age, along with clinical characteristics, on the odds of pregnancy success on the first ART attempt.
![Page 52: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/52.jpg)
The Logistic Regression Model
Age13.067.2exp1
Age13.067.2exppregnancyPr
Age13.067.2pregnancyPr1
pregnancyPrln
![Page 53: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/53.jpg)
The Logistic Regression Model
Age13.067.2pregnancyPr1
pregnancyPrln
Q1. What is the effect of Age on Pregnancy?
88.013.0expOR Age
This implies that for every 1 yr. increase in age, the odds of pregnancy decrease by 12%.
A. The
![Page 54: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/54.jpg)
The Logistic Regression Model
Q2. What is the predicted probability of a 25 yr. old having pregnancy success with first ART attempt?
![Page 55: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/55.jpg)
The Logistic Regression Model
Age13.067.2exp1
Age13.067.2exppregnancyPr
Age13.067.2pregnancyPr1
pregnancyPrln
![Page 56: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/56.jpg)
The Logistic Regression Model
Q2. What is the predicted probability of a 25 yr. old having pregnancy success with first ART attempt?
A. From this model, a 25 yr. old has about a 36% chance of pregnancy success.
0.359
2513.067.2exp12513.067.2exppregnancyPr
![Page 57: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/57.jpg)
Hypothesis testing
• Usually interested in testing
• Two types of tests we’ll discuss:
1. Likelihood Ratio test
2. Wald test
0:0 KH
![Page 58: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/58.jpg)
Likelihood Ratio test
• Idea is to compare the (log) Likelihood of two models to test
• Two models:1. Full model = with predictor included2. Reduced model = without predictor
• Then,
0:0 KH
0.05)for 84.3 Critical 1;df (here
model fullin parameters extra of # df with ~
ˆln2ˆln2ˆˆ
ln2
21
2
FullReducedFull
Reduced
LLL
L
![Page 59: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/59.jpg)
Wald test
• Idea is to use large sample Z statistic from a single model to test:
• Critical Z value for =0.05 is 1.96 (two-sided)
0:0 KH
1 ,0~ whereˆ
Here,ˆ
NZSE
ZK
K
![Page 60: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/60.jpg)
Hypothesis testing
• As the sample size gets larger and larger, the Wald test will approximate the Likelihood ratio test.
• The LR test is preferred but Wald test is common
• Why? Not to scald the Wald but…
![Page 61: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/61.jpg)
Predictive ability of Logistic regression
• Generalized R-squared statistics controversial
• ROC curve plots Sensitivity vs. 1-Specificity based on fitted model
• c statistic = Area under ROC curve commonly used to summarize predictive ability of model
![Page 62: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/62.jpg)
SAS Enterprise:
chd.sas7bdat
![Page 63: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/63.jpg)
Logistic Regression
• Motivating example
• Consider the problem of exploring how the risk for coronary heart disease (CHD) changes as a function of age.
• How would you test whether age is associated with CHD?
• What does a scatter plot of CHD and age look like?
![Page 64: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/64.jpg)
20 30 40 50 60Age Group
0.0
0.2
0.4
0.6
0.8
Pro
port
ion
of C
HD
![Page 65: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/65.jpg)
Logistic regression
Placebo"" vs.Drug" New" e th takingpersonsfor CHD of ratio odds 1 e
Taking the exp of β1 gives the odds ratio:
Placebo takingCHD of odds
Drug takingCHD of oddsexp 1
![Page 66: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/66.jpg)
Logistic Regression
• We can add multiple predictor variables in modeling the log odds of getting CHD:
Agei = Person i’s age in years
)45()Pr(
)Pr(log 210
i
i
i AgeDrugCHDno
CHD
iPerson for Placebo if0,
Drug if1,Drug i
![Page 67: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/67.jpg)
Interpretation for Drug
• Interpretation of coefficient 1 when there are more than one variable
– log odds ratio when the other variables are held constant
– e.g., log odds ratio between having CHD with and without drug, adjusting for age
![Page 68: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/68.jpg)
Interpretation for Age
• Interpretation of coefficient 1 when there are more than one variable
• Interpretation of coefficient 1 for continuous covariate
– e.g., log odds ratio for 1 year of change in age (unit difference in covariate), adjusting for drug
![Page 69: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/69.jpg)
• Maximum Likelihood Estimates:
• Conclusion:Strong evidence that the odds of CHD is associated with the drug.
?efor CI %95 ?for CI %95
?ˆ
?? ?ˆ
1
1
1
1
1
ˆ
1
ˆ
ˆ1
SEz
SEe
![Page 70: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/70.jpg)
Likelihood Ratio test
• Idea is to compare the (log) Likelihood of two models to test
• Two models:1. Full model = with predictor included2. Reduced model = without predictor
• Then,
0:0 KH
0.05)for 84.3 Critical 1;df (here
model fullin parameters extra of # df with ~
ˆln2ˆln2ˆˆ
ln2
21
2
FullReducedFull
Reduced
LLL
L
![Page 71: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/71.jpg)
Suggested exercises
• Read Kleinbaum Chapter 1,2,3 Detailed Outline
• Chapter 1 in Kleinbaum & KleinPractice Exercises (can check answers)
• No need to hand in
![Page 72: HSRP 734: Advanced Statistical Methods June 5, 2008](https://reader036.vdocuments.site/reader036/viewer/2022062411/56814ad0550346895db7eac4/html5/thumbnails/72.jpg)
Looking ahead
• HW 3: Due June 12th
• Next 2 classes: Model Building, Diagnostics & Extensions
• Review & Exam 1