statisticsforbiologists colstons
TRANSCRIPT
BIOLOGY
Spacebar to continue
Introduction• Biological studies deal with organisms
which show variety
• We cannot rely on a single measurement and so we must take a sample
• This sample of data must be summarised and analyzed to find out if it is reliable
Spacebar to continue
Summarising data• MEAN Sum of samples ÷ sample size
x ÷ n
• MEDIAN Middle number in a list when arranged in rank order: 2, 5, 7, 7, 8, 23, 31
• MODE The measurement which occurs most frequently ; 2, 5, 7, 7, 8, 23, 31
Spacebar to continue
Distribution Curves• A visual summary of data
• They can be produced by;1. Collect data
2. Split results into equal size classes
3. Make a tally chart
4. Plot a histogram of frequency against size class
• Data can show normal distribution or skewed distribution
Spacebar to continue
Distribution curves
• Normal distribution• Symmetrical bell
shaped curve around the mean
• Use parametric tests to analyse data
0
2
4
6
8
10
12
14
16
Spacebar to continue
Distribution curves
• Skewed data• Asymmetrical curve
around the mode• Use non-parametric
tests to analyse data
0
2
4
6
8
10
12
14
16
18
Spacebar to continue
Standard Deviation
• Standard deviation (SD) is a measure of the spread of the data
Large SDSmall SD
Standard deviation
• A high SD indicates data which shows great variation from the mean
• A low SD indicates data which shows little variation from the mean value
• By definition, 68% of all data values lie within the range MEAN 1SD
• 95% of all values lie within 2SD
Spacebar to continue
•
SD and confidence limits
0
2
4
6
8
10
12
14
68%
95%
Calculating SD
• Can only be used for normally distributed data
• Calculate as follows;– Sum the values for x2 ie (x2) – Sum the values for x, then square it ie (x)2
– Divide (x)2 by n– Take one from the other and divide by n– Take the square root of this. (see hand-out)
Spacebar to continue
Calculating SD
Spacebar to continue
S = x2 - ((x)2/n)
n
Confidence limits
• 95% of all values lie within 2SD of the mean
• Any value which lies outside this range is said to be significantly different from the others
• We say that we are working to 95% confidence limits or to a 5% significance level.
Spacebar to continue
Comparison tests
• To compare two samples of data we look at the overlap between the two distribution curves.
• This depends on;– The distance between the two mean values– The spread of each sample (standard deviation)
• The greater the overlap, the more similar the two samples are.
Spacebar to continue
Comparison tests
Spacebar to continue
MeanMean
Sample 2OverlapSample 1
Comparison tests
Spacebar to continue
Sample 2OverlapSample 1
When the SD is small, the overlap is less;
The null hypothesis
• In order to compare two sets of data we must first assume that there is no difference between them.
• This is called the null hypothesis
• We must also produce an alternative hypothesis which states that there is a difference.
Spacebar to continue
The t-test
• Used to compare the overlap of two sets of data
• Samples must show normal distribution
• Sample size (n) should be greater than 30
• This tests for differences between two sets of data
Spacebar to continue
The t-test
• To calculate t;– Check data is normally distributed by drawing a
tally chart
– Work out difference in means |x1 – x2|
– Calculate variance for each set of data (this is s2 ÷ n)
– Put these into the equation for t:
Spacebar to continue
The t-test
Spacebar to continue
t =
|x1 – x2|
s12 s2
2
n1 n2
The t-test
• Compare the value of t with the critical value at n1 + n2 – 2 degrees of freedom
• Use a probability value of 5%• If t is greater than the critical value we can
reject the null hypothesis…• … there is a significant difference between the
two sets of data • … there is only a 5% chance that any
similarity is due to chance
Mann-Whitney u-test
• Compares two sets of data
• Data can be skewed
• Sample size can be small; 5<n<30
• For details refer to stats book
Spacebar to continue
Chi squared
• Some data is categoric• This means that it belongs to one or more
categories• Examples include
– eye colour – presence or absence data– texture of seeds
• For these we use a chi squared test 2
• This tests for an association between two or more variables
Chi squared
• Draw a contingency table
• These are the observed values
Blue eyes Green eyes Row totals
Fair hair a b a+b
Ginger hair c d c+d
Column totals
a+c b+d a+b+c+d
Chi squared
• Now work out the expected values:
• Where,
E =(Row total) x (Column total)
(Grand total)
Chi squared
Blue eyes Green eyes Row totals
Fair hair(a+b)(a+c)
(a+b+c+d)
(a+b)(b+d)
(a+b+c+d)a+b
Ginger hair(c+d)(a+c)
(a+b+c+d)
(c+d)(b+d)
(a+b+c+d)c+d
Column totals
a+c b+d a+b+c+d
Chi squared
• For each box work out (O-E)2 ÷ E
• Find the sum of these to get 2
2 =(O-E)2
E
Chi squared
• Compare 2 with the critical value at 5% confidence limits
• There will be (no. rows – 1) x (no. columns – 1)
degrees of freedom
• If 2 is greater than the critical value we can say that the variables are associated with one another in some way
• We reject the null hypothesis
Spearman Rank
• Two sets of data may show a correlation
• The data can be plotted on a scatter graph:
Positive correlation No correlationNegative correlation
Spearman Rank
• We calculate the correlation by assigning a rank to the values:
Data 1 Rank
12
14
18
18
Data 2 Rank
24
29
29
38
Spearman Rank
• We calculate the correlation by assigning a rank to the values:
Data 1 Rank
12 1
14
18
18
Data 2 Rank
24
29
29
38
This is the Lowest value – So we call it rank 1
Spearman Rank
• We calculate the correlation by assigning a rank to the values:
Data 1 Rank
12 1
14 2
18
18
Data 2 Rank
24
29
29
38
This is the 2nd lowestvalue – so we call it rank 2
Spearman Rank
• We calculate the correlation by assigning a rank to the values:
Data 1 Rank
12 1
14 2
18 ?
18 ?
Data 2 Rank
24
29
29
38
These should be rank 3 & 4 – but they are the same. We find the average of 3 + 4 and give them this rank
Spearman Rank
• We calculate the correlation by assigning a rank to the values:
Data 1 Rank
12 1
14 2
18 3.5
18 3.5
Data 2 Rank
24
29
29
38(3+4)/2 = 3.5
Spearman Rank
• We calculate the correlation by assigning a rank to the values:
Data 1 Rank
12 1
14 2
18 3.5
18 3.5
Data 2 Rank
24
29
29
38
Similarly on thisside
Spearman Rank
• We calculate the correlation by assigning a rank to the values:
Data 1 Rank
12 1
14 2
18 3.5
18 3.5
Data 2 Rank
24 1
29
29
38
Spearman Rank
• We calculate the correlation by assigning a rank to the values:
Data 1 Rank
12 1
14 2
18 3.5
18 3.5
Data 2 Rank
24 1
29 2.5
29 2.5
38
The averageof 2 & 3
Spearman Rank
• We calculate the correlation by assigning a rank to the values:
Data 1 Rank
12 1
14 2
18 3.5
18 3.5
Data 2 Rank
24 1
29 2.5
29 2.5
38 4
Spearman Rank
• Find the difference D between each rank
• Square this difference
• Sum the D2 values
• Calculate the Spearman Rank Correlation Coefficient rs
rs = 1 -6D2
n(n2-1)
Spearman Rank
• Compare rs with the critical value at the 5% level
• If it is greater than the critical value (ignoring the sign) then we reject the null hypothesis
• … there is a significant correlation between the two sets of data
• If the value is positive there is a positive correlation
• If it is negative then there is a negative correlation
Quick guide
Is your data interval data or is it categoric data (it can only be placed in a number of categories)
IntervalInterval CategoricCategoric
Quick guide
Are you looking for a correlation between two sets of data – eg the rate of photosynthesis and light intensity
YesYes NoNo
Quick guide
Use the Chi squared test
BackBack EndEnd Chi squaredChi squared
Quick guide
Use the Spearman Rank test
BackBack EndEnd Chi squaredChi squared
Quick guide
Are you comparing data from two populations?
YesYes NoNo
Quick guide
Is your data normally distributed?
YesYes NoNo
0
2
4
6
8
10
12
14
16
Quick guide
Use a t-test
t-testt-test BackBack
Quick guide
Use a Mann-Whitney U test
BackBack ExitExit