basic statistical concepts donald e. mercante, ph.d. biostatistics school of public health l s u - h...
Post on 22-Dec-2015
218 views
TRANSCRIPT
![Page 1: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/1.jpg)
Basic Statistical Concepts
Donald E. Mercante, Ph.D.
BiostatisticsSchool of Public Health
L S U - H S C
![Page 2: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/2.jpg)
![Page 3: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/3.jpg)
Two Broad Areas of Statistics
Descriptive Statistics- Numerical descriptors
- Graphical devices- Tabular displays
Inferential Statistics- Hypothesis testing- Confidence intervals- Model building/selection
![Page 4: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/4.jpg)
Descriptive Statistics
When computed for a population of values, numerical descriptors are called
Parameters
When computed for a sample of values, numerical descriptors are called
Statistics
![Page 5: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/5.jpg)
Descriptive Statistics
Two important aspects of any population
Magnitude of the responses
Spread among population members
![Page 6: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/6.jpg)
Descriptive Statistics
Measures of Central Tendency (magnitude)
Mean - most widely used
- uses all the data- best statistical properties- susceptible to outliers
Median - does not use all the data
- resistant to outliers
![Page 7: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/7.jpg)
Descriptive Statistics
Measures of Spread (variability)
range - simple to compute
- does not use all the data
variance - uses all the data
- best statistical properties- measures average
distance of values from a reference point
![Page 8: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/8.jpg)
Properties of Statistics
• Unbiasedness - On target• Minimum variance - Most reliable
• If an estimator possesses both properties then it is a MINVUE = MINimum Variance Unbiased Estimator
• Sample Mean and Variance are UMVUE =Uniformly MINimum Variance Unbiased Estimator
![Page 9: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/9.jpg)
Inferential Statistics
- Hypothesis Testing
- Interval Estimation
![Page 10: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/10.jpg)
Hypothesis Testing
Specifying hypotheses:
H0: “null” or no effect hypothesis
H1: research or alternative hypothesis
Note: Only H0 (null) is tested.
![Page 11: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/11.jpg)
Errors in Hypothesis Testing
Reality Decision H0 True H0 False
Fail to Reject H0
Reject H0
![Page 12: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/12.jpg)
Hypothesis Testing
In parametric tests, actual
parameter values are specified
for H0 and H1.
H0: µ < 120
H1: µ > 120
![Page 13: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/13.jpg)
Hypothesis Testing
Another example of explicitly
specifying H0 and H1.
H0: = 0
H1: 0
![Page 14: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/14.jpg)
Hypothesis Testing
General framework:
• Specify null & alternative
hypotheses
• Specify test statistic
• State rejection rule (RR)
• Compute test statistic and
compare to RR
• State conclusion
![Page 15: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/15.jpg)
Common Statistical TestsTest Name Purpose
One-sample (z) t-test Test value of a mean
Two-sample (z) t-test Compare two means
Paired t-test Compare difference in means (compare re-lated means)
ANOVA Test for differences in 2 or more means
![Page 16: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/16.jpg)
Common Statistical Tests (cont.)Test Purpose
Test on binomial proportion(s)
Test whether binomial proportions =0, or each other.
Test on correlation coefficient(s)
Test whether correlation coefficient =0, or each other.
Regression Test whether slope = 0
RxC contingency table analysis
Test whether two categorical variables are related
![Page 17: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/17.jpg)
Advanced Topics
Test Purpose
Multivariate Testse.g., MANOVA
Test value of severalparameters simultaneously
Repeated Measures /Crossovers
Test means when subjectsrepeatedly measured
Survival Analysis Estimate and comparesurvival probabilities forone or more groups
Nonparametric Tests Many analogous to standardparametric tests
![Page 18: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/18.jpg)
P-Values
p = Probability of obtaining a
result at least this extreme given
the null is true.
P-values are probabilities
0 < p < 1
Computed from distribution of the
test statistic
![Page 19: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/19.jpg)
Rate a proportion, specifically a fraction, where
The numerator, c, is included in the denominator:
-Useful for comparing groups of unequal size
Example:
Epidemiological Concepts
dcc
births live # totalold days 28deaths#
rate mortatilty neonatal
![Page 20: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/20.jpg)
Measures of Morbidity:
Incidence Rate: # new cases occurring during a given time interval divided by population at risk at the beginning of that period.
Prevalence Rate: total # cases at a given time divided by population at risk at that time.
Epidemiological Concepts
![Page 21: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/21.jpg)
Most people think in terms of probability (p) of an event as a natural way to quantify the chance an event will occur => 0<=p<=1
0 = event will certainly not occur
1 = event certain to occur
But there are other ways of quantifying the chances that an event will occur….
Epidemiological Concepts
![Page 22: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/22.jpg)
Odds and Odds Ratio:
For example, O = 4 means we expect 4 times as many occurrences as non-occurrences of an event.
In gambling, we say, the odds are 5 to 2. This corresponds to the single number 5/2 = Odds.
Epidemiological Concepts
occurnot event will the times# expectedoccur event willan times# expected
eventan of Odds O
![Page 23: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/23.jpg)
The relationship between probability & odds
Epidemiological Concepts
event no of probevent of prob
p-1p
O
O
Op
1
![Page 24: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/24.jpg)
Epidemiological ConceptsProbability Odds
.1 .11
.2 .25
.3 .43
.4 .67
.5 1.00
.6 1.50
.7 2.33
.8 4.00
.9 9.00
Odds<1 correspond
To probabilities<0.5
0<Odds<
![Page 25: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/25.jpg)
Blacks Nonblacks Total
Death 28 22 50
Life 45 52 97
Total 73 74 147
Death sentence by race of defendant in 147 trials
Example 1: Odds Ratio
![Page 26: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/26.jpg)
Odds of death sentence = 50/97 = 0.52
For Blacks: O = 28/45 = 0.62
For Nonblacks: O = 22/52 = 0.42
Ratio of Black Odds to Nonblack Odds = 1.47
This is called the Odds Ratio
Example 2: Odds Ratio
47.1990
145645*2252*28
5222
4528
OR
![Page 27: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/27.jpg)
Odds ratios are directly related to the parameters of the logit (logistic regression) model.
Logistic Regression is a statistical method that models binary (e.g., Yes/No; T/F; Success/Failure) data as a function of one or more explanatory variables.
We would like a model that predicts the probability of a success, ie, P(Y=1) using a linear function.
Logistic Regression
![Page 28: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/28.jpg)
Problem: Probabilities are bounded by 0 and 1.
But linear functions are inherently unbounded.
Solution: Transform P(Y=1) = p to an odds. If we take the log of the odds the lower bound is also removed.
Setting this result equal to a linear function of the explanatory variables gives us the logit model.
Logistic Regression
![Page 29: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/29.jpg)
Logit or Logistic Regression Model
Where pi is the probability that yi = 1.
The expression on the left is called the logit or log odds.
Logistic Regression
ikkiii
i XXXp
p
22111log
![Page 30: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/30.jpg)
Probability of success:
Odds Ratio for Each Explanatory Variable:
Logistic Regression
ikkii XXXi e
YPp 22111
11
ieOR iXfor
![Page 31: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/31.jpg)
Suppose a new screening test for herpes virus has been developed and the following summary for 1000 individuals has been compiled:
Has Herpes
Does Not
Have Herpes
Screened Positive 45 10
Screened Negative 5 940
Screening Tests
![Page 32: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/32.jpg)
How do we evaluate the usefulness of such a test?
Diagnostics:
sensitivity
specificity
False positive rate
False negative rate
predictive value positive
predictive value negative
Screening Tests
![Page 33: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/33.jpg)
Screening Tests
Generic Screening Test Table
With Disease
Without Disease
Total
Screened Positive
a b a+b
Screened Negative
c d c+d
Total a+c b+d N
![Page 34: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/34.jpg)
Screening Tests
caa
ySensitivit
dbd
ySpecificit
dbb
ratepositiveFalse
cac
ratenegativeFalse
ba
avaluepredictiveorYield
Nca
prevalence
dc
dvaluepredictiveorYield
![Page 35: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/35.jpg)
Screening Tests
%9050
45ySensitivit %95.98
950
940ySpecificit
%05.1950
10rate positive False %10
50
5 ratenegativeFalse
%82.8155
45 valuepredictiveorYield
%51000
50prevalence
%47.99950
940 valuepredictiveorYield
![Page 36: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/36.jpg)
Interval Estimation
Statistics such as the sample mean, median, variance, etc., are called
point estimates-vary from sample to
sample-do not incorporate
precision
![Page 37: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/37.jpg)
Interval Estimation
Take as an example the sample mean:
X ——————> (popn mean)
Or the sample variance:
S2 ——————> 2
(popn variance)
Estimates
![Page 38: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/38.jpg)
Interval Estimation
Recall Example 1, a one-sample t-test on the population mean. The test statistic was
This can be rewritten to yield:
nsx
t 0
![Page 39: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/39.jpg)
Interval Estimation
1
210
21t
nsx
tP
Which can be rearranged to give a(1-)100% Confidence Interval for :
nstx
n 1 ,21
Form: Estimate ± Multiple of Std Error of the Est.
![Page 40: Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d795503460f94a5d0f9/html5/thumbnails/40.jpg)
Interval Estimation
Example 1: Standing SBP
Mean = 140.8, s.d. = 9.5, N = 12
95% CI for :140.8 ± 2.201 (9.5/sqrt(12))
140.8 ± 6.036(134.8, 146.8)