logistic regression. analysis of proportion data we know how many times an event occurred, and how...
TRANSCRIPT
![Page 1: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/1.jpg)
Logistic regression
0
0.2
0.4
0.6
0.8
1
1.2
0 50 100 150
![Page 2: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/2.jpg)
Analysis of proportion data
• We know how many times an event occurred, and how many times did not occur.
• We want to know if these proportions are affected by a treatment or a factor
• Examples:Proportion dying
Proportion responding to a treatment
Proportion in a sex
Proportion flowering
![Page 3: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/3.jpg)
The old fashion way:
• People used to model these data using percentage mortality as the response variable
• The problems with this are:• Errors are not normally distributed• The variance is not constant• The response is bounded (1-0)
• We lose information of the size of the sample
![Page 4: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/4.jpg)
However…
• Some data as percentage of plant cover are better analyzed using the conventional models (normal errors and constant variance) following arcsine transformation (the response variable measured in radians)
•
proportion1sin
![Page 5: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/5.jpg)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 0.2 0.4 0.6 0.8 1
proportiontiontransforma 1sinarcsin_
![Page 6: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/6.jpg)
If the response variable takes the form of percentage change is some
measurement • It is usually better:
• Analysis of covariance, using final weight as the response variable and initial weight as covariate, or
• By specifying the response variable as a relative growth rate, measured as log(final/initial)
Both of which can be analyzed with normal errors without further transformation
![Page 7: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/7.jpg)
Rational for logistic regression
• The traditional transformation of proportion data was arcsine. This transformation took care of the error distribution. There is nothing wrong with this transformation, but a simpler approach is often preferable, and is likely to produce a model easier to interpret
![Page 8: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/8.jpg)
The logistic curve
• The logistic curve is commonly used to describe data on proportions.
• It asymptotes at 0 and 1, so that negative proportions and responses of more than 100 % cannot be predicted.
![Page 9: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/9.jpg)
Binomial errors• If p = proportion of individuals observed to respond in a given
way• The proportion of individuals that respond in alternative ways
is: 1-p and we shall call this proportion q• n is the size of the sample (or number of attempts • An important point is that the variance of the binomial
distribution is not constant. In fact the variance of a binomial distribution with mean np is:
npqs 2
So that the variance changes with the mean like this:
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.2 0.4 0.6 0.8 1
S2
![Page 10: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/10.jpg)
The logistic model
X
X
e
ep
10
10
1
0_, pthenx
The logistic model for p as a function of x is given by:
This model is bounded since:
1_, pthenx
![Page 11: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/11.jpg)
The trick of linearizing the logistic model is a simple transformation
X
X
e
ep
10
10
1
Xp
p101
ln
See better description for the logit transformation in the class website
![Page 12: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/12.jpg)
• Small short-lived perennial herb • Narrowly endemic and endangered• Flowers are small and bisexual• Self-compatible, but requires pollinators to set seed
Hypericum cumulicola:
Menges et al. (1999)
Dolan et al. (1999)
Boyle and Menges (2001)
![Page 13: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/13.jpg)
• 15 populations (various patch sizes)
• >80 individuals per population each year
• Data on height and number of reproductive structures
• Survival between August 1994 and August 1995
Demographic data
![Page 14: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/14.jpg)
Histogram of height (cm) Hypericum cumulicola (1994)
![Page 15: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/15.jpg)
Call:glm(formula = survival ~ rep_structures * height, family = binomial)
Deviance Residuals: Min 1Q Median 3Q Max -2.0576 -0.9510 0.5748 0.7394 1.5518
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.043e+00 1.888e-01 10.819 < 2e-16 ***rep_structures -9.112e-03 2.518e-03 -3.619 0.000296 ***height -2.717e-02 7.588e-03 -3.581 0.000343 ***rep_structures:height 1.219e-04 4.096e-05 2.977 0.002912 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1018.68 on 878 degrees of freedomResidual deviance: 925.22 on 875 degrees of freedomAIC: 933.22
Number of Fisher Scoring iterations: 4
![Page 16: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/16.jpg)
Calculating a given proportion
• You can back-transform from logits (z) to proportions (p) by
)exp(1
1
1
z
p
![Page 17: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/17.jpg)
Survival vs height
![Page 18: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/18.jpg)
Survival vs rep_structures
![Page 19: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/19.jpg)
Height - rep structures interaction0 fruits 100 fruits
200 fruits 1000 fruits
Height (cm)
surv
ival
![Page 20: Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649efa5503460f94c0c0d5/html5/thumbnails/20.jpg)