hypothesis testing all of the figures in this powerpoint are from: statistics without math –...
TRANSCRIPT
HYPOTHESISTESTING
All of the figures in this PowerPoint are from:Statistics without Math – Magnusson & Mourao
The concept of accepting a hypothesis through the rejection of a “null” hypothesis is largely credited to Karl Popper.
A null hypothesis is a statement about how the world would be if our conjecture is wrong.
Difference in means – Observed is 3.8-7.7 = -3.9Null is 7.7-7.0 = 0.7
Enough to reject the null hypothesis?
We can assure ourselves that the difference between our observed distribution and that of the null is not accidental by generating a large number of predictions based on the null hypothesis.
We could repeat this exercise a lot (say 100) times, then determine the percentage of predicted outcomes that have, in this case, a difference between the means as big or bigger than our observed value of -3.9. If these are fewer than 5%, we reject the null hypothesis.
William S. Gossett standardized this process by dividing the difference in means by the s.d. to derive the t statistic (Student, 1908), now known as Student’s t-test.
It is conventional to represent the distributions of data horizontally, rather than vertically.
Assuming the data are normally distributed, we can use the theoretical characteristics of the distribution as the basis for testing the differences between the means.
This is, basically, what mathematicians/statisticians do. They don’t physically sample their null populations.
Type I Error – falsely rejecting the null hypothesis and deciding that a phenomenon exists when it does not.
Type II Error – accepting the null hypothesis when it is false. Generally inversely proportional to the probability of making a Type I error.
We avoid Type I errors by setting bar high for rejecting the null hypothesis (< 5%). This is very important to the progress of science – we don’t want to build knowledge based on faulty previous work.
The ability of a statistical test to reject the null hypothesis when it is indeed false is called the “power of the test”.
Now let’s look at an example where we compare more than 2 groups. Streams with carnivorous fish, without fish and with herbivorous fish.
When we do pair-wise comparisons using t-tests, we compound Type I errors.
Sir Ronald Fisher used the comparison of variances.
If the variability (e.g. range) is similar in each category and the means differ, then the total variability will be greater than the variability within any one category.
Vi < VT
and VR = VT / Vi < 1
With no difference among means,
Vi = VT and VR = 1
Fisher used the ratio of the variances, now known as the F-statistic
Computer programs create the null distribution based on the mean variability (variance). If the variance is grossly dissimilar among groups, the null variance will be underestimated and we may be led to commit a Type I error in our Fisher’s test (aka analysis of variance or ANOVA).
It is very important to visually inspect one’s data (and test for similarity of variances – e.g. Levene’s test).
The variability between the means of the two groups is due to the factor (in this case fish). The difference between this and the total variability is the residual variability (not attributed to any particular cause).
When we do this with variance, it’s called partitioning variance.
F is the ratio of the factor mean square to the residual mean square. Mean squares are the sums of squares divided by the df, and are analogous to the variances. Less a few constants:
F = (2Factor + 2
Residual) / (2Residual)
When variance due to the factor is zero (null hypothesis is correct), F = 1.
You can run into problems when you categorize a continuous phenomenon.Plotting the results of the “narrow” sampling regime at two temperatures:
John estimates a probability of 0.78 that the null hypothesis is correct.
Mary estimates a probability of 0.035 that the null hypothesis is correct, so she rejects the null hypothesis.
VResidual + VFactor + VLevels = VTotal
Plotting the results of the “wide” sampling regime at two temperatures:
John now rejects the null hypothesis (P = 0.013), and Mary accepts the null hypothesis (P = 0.22).
VResidual + VFactor + VLevels + VWidth = VTotal
We might expect a direct relationship between the number of trees and the area of a given reserve.
Here, our plot shows how closely the data conform to our hypothetical relationship (Y = a + bX, where a = 0 and b = 1).
With insect activity and temperature, we may not know the true relationship and we draw a line that represents what the relationship may be based upon the distribution of the data.
We can fit the line to the data based on minimizing the distance of the points to the line (A), using the minimum area of the triangles formed by the horizontal and vertical lines from the points to the lines (B), or by minimizing the sum of the squared vertical distances of the points to the line (least squares regression)(C).
The same basic concepts apply to regression analysis as in ANOVA.
If the residual variation approaches the total variation, we assume that there is no effect of the measured variable, i.e. VFactor = 0 (null hypothesis is correct).
If we had studied a smaller temperature range, we decrease the variation due to the factor, while maintaining the residual variation and we lose our ability to detect an effect.
trees monkeys+
treesmonkeys
+
shrubs+
Vertical lines represent the variability not explained by the linear model.
We can use these residuals to calculate the partial regressions.
Monkeys = -0.667 X trees Monkeys = 1.667 X shrubs
Monkeys = 0.33 + (-0.667 X trees) + (1.667 X shrubs)[equation for the multiple regression]
THINKING ABOUTYOUR DATA,
BIOLOGICALLY
Computer-Generated “Phantom” Variables
In this theoretical example (in which the data were randomly generated), a graduate student is appalled that none of the factors his advisor suggested are significant.
So, he tests for all possible interactions. He finds a significant interaction (p=0.001) between snags and herbivorous fish. There are numerous biological explanations for an interaction of snags and herbivorous fish on crayfish density to allow for an in-depth discussion in the thesis.
The problem is, this example is base on random data. With 25 possible effects/interactions in an ANOVA, we would expect one to be “significant” at the 0.05 level by chance.
Data for 30 lakes:Pollution = heavy metal conc. in ppbFish = mean # per gill-net per hPhytoplankton = chlorophyll conc.Crayfish = # per trap-hour
Problem is that the data are all on different scales. Divide each by s.d. of that variable, effectively putting all in units of s.d. – e.g. an increase in one s.d. of pollution leads to a decrease of so many s.d.’s in the # of crayfish.
Called standardized estimates of parameters. Can use these standardized coefficients in “path analysis”.
Run multiple regression on standardized variables:
Crayfish = 0.0 - 0.16 x pollution – 0.39 x fish + 0.55 x phytoplankton
The effect of pollution is not significant (p=0.53), that of fish is questionable, (p=0.07), there is a strong effect of phytoplankton (p=0.01).
This is counterintuitive, because a simple regression indicates a significant (p=0.03) positive effect of pollution on crayfish (slope = 0.41). This is because Fig. 10.2 only shows “direct effects” and doesn’t represent the system biologically.
Need to look at “indirect effects” as well. Get standardized regressions for each direct effect by simple regression.
Calculate indirect effects by multiplying path coefficients along paths:
To get the overall effect of pollution, we add the direct and indirect effects:(-0.16)+(0.26)+(0.31) = 0.41, which is the simple regression value.
Predicting the mass of a tree from its diameter is not a linear function, but may conform to a power function of the form:
Biomass = a x Diameterb +e1
Ignoring the error term, if we take the logs of both sides, we transform the equation into a linear one that can be treated with ordinary least-squares methods:
log10(biomass) = log10(a) + b log10(diameter) + e2
log10(biomass) = -0.775 + 2.778 log10(diameter)
Take antilog of the equation to yield:
Biomass = 0.168 x Diameter2.778