statistical modeling and analysis of scientific inquiry: the basics of hypothesis testing
TRANSCRIPT
![Page 1: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/1.jpg)
Statistical Modeling and Analysis of Scientific Inquiry:
The Basics of Hypothesis Testing
![Page 2: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/2.jpg)
Statistics: The Science of Data• Data comprises quantitative measurements of
individuals• Individuals are representative sample from a
population• Population is modeled by a probability density
function representing the likelihood of measurement values
• Statistics is a collection of tools and techniques for organizing, analyzing, illustrating, and interpreting data
![Page 3: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/3.jpg)
Basic Data Analysis Tools• Data: • Mean and median: what’s the middle
– Sample mean, ,is the average– Median is the middle data point (of the sorted list)
• Standard deviation, IQR, median absolute deviation: how much variability
• Histograms and box plots: what does the distribution look like?
x
)(
1
1
13
2
MxmedianMAD
QQIQR
xxn
s i
nxxx ,,, 21
![Page 4: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/4.jpg)
Histograms and Box Plots
Each bar is the number of data points between the ordinate values of the
barShould look like a piecewise constant approximation (like Riemann sums in
calc)
The box is bounded by the first and third quartiles, with the mid line
being the median.The whiskers go out to q1-1.5*IQR
and q3+1.5*IQROutliers are plotted beyond the
whiskers
![Page 5: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/5.jpg)
Science and Statistics: An Abstract View
• Theory: we have a population of individuals or “experimental units” (EUs) – In bio applications, these are typically organisms– In medical applications, these are typically
patients• Inquiry: we propose hypotheses about the
properties of these EUs.– How an organism respond to stress– How a patient responds to treatment– Does one treatment work better than another
![Page 6: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/6.jpg)
Principles of Statistical Modeling• Modeling Concept 1: We can characterize the EUs
with a vector of attributes that can be observed• Modeling Concept 2: EUs selected randomly from
the population produce attributes according to a probability distribution
• Modeling Concept 3: The population’s probability distribution is known except for a parameter vector that must be estimated from observations
• Modeling Concept 4: “Truth” is defined by this unknown parameter vector.
![Page 7: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/7.jpg)
Elements of a Hypothesis Test• Sample of data• Two competing hypotheses: the null and its
alternative• A statistic, which is a function of the data with
a known sampling distribution• A rejection criterion against which we assess
the statistic’s value to decide whether or not we can reject the null.
![Page 8: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/8.jpg)
The Math of Statistics, 1
• The parametrically modeled probability distribution
• The parameter represents truth about the population
• Question: what can we say about after we’ve seen some x’s?
population thezingcharacteriparameter (unknown)
observable EU theof valuepossible
functiondensity y probabilit );(
x
xf
![Page 9: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/9.jpg)
The Math of Statistics, 2• The probability density models EUs by
weighting the possible measurement values
• Area under curve tells us probabilities-5 -4 -3 -2 -1 0 1 2 3 4 5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Measurement Value
Pro
babi
lity
Den
sity
A Few Normal Density Functions
mu=0,sigma=1
mu=1,sigma=1
mu=1,sigma=2
mu=1,sigma=0.2
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Measurement Value
Pro
babi
lity
Den
sity
A Few Gamma Density Functions
theta = 1
theta = 1/2theta = 2
![Page 10: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/10.jpg)
The Math of Statistics, 3• The sample is a collection
• Ideally the histogram of these would look like the probability density (if we knew )
nxxxx ,,,, 321
-3 -2 -1 0 1 2 3 40
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
![Page 11: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/11.jpg)
Population vs. Sample• Population is fixed
– Very large– Impractical to investigate
all members
• Population has one distribution
• Population has parameters– Fixed, but usually not
known
• Samples are random– Large enough to be
representative– Small enough to be studied
• Each sample has a histogram
• Sample has statistics– Known, but repeated
samples will have different values
Meta: we can think of a population of possible statistic values!!!!!
![Page 12: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/12.jpg)
The biggest idea in statistics
• In most circumstances, a larger sample produces an average that more accurately represents a population mean.
• If has average • If the population has mean m and std dev s• Then the population of averages has mean m
and std dev • And the sample average tends to be normally
distributed as n grows
nxxxx ,,,, 321 nx
n/
![Page 13: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/13.jpg)
Hypothesis Testing For the Mean• Population is characterized by a central value m
and a spread s of values around that.– Should be symmetric– Tails should taper relatively quickly– The actual values of m and s are not known
• The question is the following: Is the unknown m equal to a specified value m0? – H0: m =m0
– HA: m ≠m0
![Page 14: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/14.jpg)
Mistakes That Can Be Made, 1• Choosing HA when H0 is true
– Type I error– The greek letter a is used denote the likelihood– In applications, this is usually a false positive or false
detection.– Common approach is to select a value of a we’re
willing to tolerate• a =0.05 is the most common choice
– Concept: Over many many repetitions when H0 is true, a percent of the time, we’d declare H0 to be false
![Page 15: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/15.jpg)
Mistakes That Can Be Made, 2• Choosing H0 when H0 is false
– Type II error– The greek letter b is used denote the likelihood– In applications, this is usually a false negative or missed
detection.– Common approach is to hope b is small– 1 - b is called the power of the test
• Represents the likelihood of detecting a real effect!!!• This is the probability of selecting HA when HA is true
– Note that HA being true is complicated: as long as m ≠m0
the alternative HA is true! Even if by 10-15 !!!!
![Page 16: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/16.jpg)
Some Concepts and Lingo• Generally H0 is something you expect not to be
true.– For example, you expect a non-zero mean
• In science, models can only be demonstrated to be false.
• We reject an actually true H0 fairly infrequently (depends on the a we choose)
• When H0 is not rejected by the test, we say that we “fail to reject H0,” not that we accept H0.– The Type II error probability is difficult to assess
![Page 17: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/17.jpg)
How to Test
• Collect a sample
• Form the t-statistic
• If H0 is true, T has a known probability density– Student’s T distribution with n-1 degrees of freedom
• Choose critical value, ta, of T distribution– Such that would occur with probability a.
nxxxx ,,,, 321
s
xn
ns
xT 00
/
H0: m =m0
HA: m ≠m0
tT ||
![Page 18: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/18.jpg)
The P Value
• Instead of the critical value and the T statistic, we often use a directly with the p value statistic– Plug the T statistic into its (null) distribution and
find the associated probability.
T value and its minus
P-value is the shaded area added together
![Page 19: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/19.jpg)
Doing this in Excel• Data in a column or row• Compute the sample mean with the
average function• Compute the sample standard
deviation with the stdev function• Compute the t statistic
• Compute the p-value by plugging the t statistic into the integral with tdist(T,n-1,2)
– That last 2 is for two-tailed integral
• Alternatively, use ttest to compute.– Ttest is designed for two-sample
comparison, so you have to trick it by creating a sample with all m0’s
s
xn
ns
xT 00
/
![Page 20: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/20.jpg)
More On Student’s T
-4 -3 -2 -1 0 1 2 3 40
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
normal
t with 4 dft with 8 df
t with 16 df
Null true:Centered at 0
Slightly false null:Centered near 0
Extremely false null:Centered far from 0
-4 -2 0 2 4 6 80
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
![Page 21: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/21.jpg)
Type I and Type II
a is the black shaded:Depends Only on Null
b is the red shaded:Depends on how far the red curve is shifted
Some alternatives are easier to detect
![Page 22: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/22.jpg)
The Alternative Hypothesis
• If H0 is true, T has Student’s T distribution with n-1 degrees of freedom
• If HA is true, then
has the T distribution!
s
xn
ns
xT 00
/
H0: m =m0
HA: m ≠m0
s
xn
ns
xT
/
![Page 23: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/23.jpg)
The Alternative Hypothesis
ns
dT
nsT
ns
x
ns
xT
/
/
//
0
00
![Page 24: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/24.jpg)
The Alternative Hypothesis• We fail to reject the null when
• What this tell us:– If we have s and n fixed, an effect of size d leads to
a power of 1 – .b– If we have s and n fixed, a power of 1 – b requires
an effect size no smaller than d.– If we want a power of 1 – b and an effect size of d,
then we need n samples to achieve our goal.
LTtns
dtTtT
/
![Page 25: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/25.jpg)
Effect Size, Sample Size, and Power
• To detect an alternative of with power 1-b, we need
• With n samples, an effect size of d can be detected with power from
|| 0 d
)1(1,1,2
2
nn ttd
sn
1,)1(
1,/
nn tns
dt
![Page 26: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/26.jpg)
Multi-Group Similarity Testing• Population comprises a fixed set of groups: 1,2, …, p
– Usually thought of as “statistically identical” individuals within the groups
– Each group receives a different “treatment”– Process leads to groups that may have different means m1,... mp,
– Groups have the same variance s2
– We sample from each group, size n1,…np
• The question is the following: Is at least one treatment different?– H0: m1 =m2=… mp
– HA: At least one of the mi’s is different
![Page 27: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/27.jpg)
A Digression
• Given two numbers, how do we compare them?– Subtract to compute the difference– Divide to compute the ratio
• Statistical use of subtraction relies on T-statistics• Two numbers are equal if difference is 0
• Statistical use of division relies on F-statistics• Two numbers are equal if ratio is 1
![Page 28: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/28.jpg)
Probability Density Functions• The normal distribution(mu,sigma): bell shaped,
with – mu+/- sigma containing 68%– mu+/- 2sigma containing 95.4%– mu+/- 3sigma containing 99.7%
• Chi squared (m)– This distribution is what you get when you square m
normal(0,1)’s and add them up– – The quantity below is chi squared (n-1)
223
22
21 mZZZZ
22
12
2
)1(
xxxxSn n
![Page 29: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/29.jpg)
Probability Density Functions• The T-distribution comes from dividing a
normal(0,1) by the square root of a chi-squared
• The F-distribution comes from a ratio of chi-squareds
s
xn
ns
xT 00
/
![Page 30: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/30.jpg)
</digression>: ANOVA• Collect a sample
• Test the hypothesis:
• Assumption: common variance s2
pxxxx
xxxx
xxxx
ppnppp
n
n
treatment,,,,
2 treatment,,,,
1 treatment:,,,,
321
2232221
1131211
2
1
H0: m1 =m2=… mp
HA: At least one of the mi’s is different
![Page 31: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/31.jpg)
How To Test• All treatments have the same mean under H0
)1( is 1
1
1
means Group :1
mean Grand :1
2
220
220
22
,n-pp-FS
SS
p
pnF
xxn
S
xxn
S
xn
x
xn
x
Full
FullH
j iijH
j ijijFull
iij
jj
j iij
![Page 32: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/32.jpg)
Pseudo-ANOVA• Collect a sample
• Test the hypothesis:
• Assumption: common variance s2
pxxxx
xxxx
xxxx
ppnppp
n
n
treatment,,,,
2 treatment,,,,
1 treatment:,,,,
321
2232221
1131211
2
1
H0: m1 =m2=… mp=0HA: At least one of the mi’s is different from 0
![Page 33: Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d995503460f94a83afd/html5/thumbnails/33.jpg)
How To Test• All treatments have the same mean under H0
)( is
01
1
means Group :1
mean edHypothesiz :0
2
220
220
22
p,n-pFS
SS
p
pnF
xn
S
xxn
S
xn
x
Full
FullH
j iijH
j ijijFull
iij
jj