statisticsforbiologists colstons

BIOLOGY

Spacebar to continue

Introduction• Biological studies deal with organisms

which show variety

• We cannot rely on a single measurement and so we must take a sample

• This sample of data must be summarised and analyzed to find out if it is reliable


Summarising data• MEAN Sum of samples ÷ sample size

x ÷ n

• MEDIAN Middle number in a list when arranged in rank order: 2, 5, 7, 7, 8, 23, 31

• MODE The measurement which occurs most frequently ; 2, 5, 7, 7, 8, 23, 31


Distribution Curves• A visual summary of data

• They can be produced by;1. Collect data

2. Split results into equal size classes

3. Make a tally chart

4. Plot a histogram of frequency against size class

• Data can show normal distribution or skewed distribution


Distribution curves

• Normal distribution• Symmetrical bell

shaped curve around the mean

• Use parametric tests to analyse data

0

2

4

6

8

10

12

14

16


Distribution curves

• Skewed data• Asymmetrical curve

around the mode• Use non-parametric

tests to analyse data

0

2

4

6

8

10

12

14

16

18


Standard Deviation

• Standard deviation (SD) is a measure of the spread of the data

Large SDSmall SD

Standard deviation

• A high SD indicates data which shows great variation from the mean

• A low SD indicates data which shows little variation from the mean value

• By definition, 68% of all data values lie within the range MEAN 1SD

• 95% of all values lie within 2SD


•

SD and confidence limits

0

2

4

6

8

10

12

14

68%

95%

Calculating SD

• Can only be used for normally distributed data

• Calculate as follows;– Sum the values for x2 ie (x2) – Sum the values for x, then square it ie (x)2

– Divide (x)2 by n– Take one from the other and divide by n– Take the square root of this. (see hand-out)


Calculating SD


S = x2 - ((x)2/n)

n

Confidence limits

• 95% of all values lie within 2SD of the mean

• Any value which lies outside this range is said to be significantly different from the others

• We say that we are working to 95% confidence limits or to a 5% significance level.


Comparison tests

• To compare two samples of data we look at the overlap between the two distribution curves.

• This depends on;– The distance between the two mean values– The spread of each sample (standard deviation)

• The greater the overlap, the more similar the two samples are.


Comparison tests


MeanMean

Sample 2OverlapSample 1

Comparison tests


Sample 2OverlapSample 1

When the SD is small, the overlap is less;

The null hypothesis

• In order to compare two sets of data we must first assume that there is no difference between them.

• This is called the null hypothesis

• We must also produce an alternative hypothesis which states that there is a difference.


The t-test

• Used to compare the overlap of two sets of data

• Samples must show normal distribution

• Sample size (n) should be greater than 30

• This tests for differences between two sets of data


The t-test

• To calculate t;– Check data is normally distributed by drawing a

tally chart

– Work out difference in means |x1 – x2|

– Calculate variance for each set of data (this is s2 ÷ n)

– Put these into the equation for t:


The t-test


t =

|x1 – x2|

s12 s2

2

n1 n2

The t-test

• Compare the value of t with the critical value at n1 + n2 – 2 degrees of freedom

• Use a probability value of 5%• If t is greater than the critical value we can

reject the null hypothesis…• … there is a significant difference between the

two sets of data • … there is only a 5% chance that any

similarity is due to chance

Mann-Whitney u-test

• Compares two sets of data

• Data can be skewed

• Sample size can be small; 5<n<30

• For details refer to stats book


Chi squared

• Some data is categoric• This means that it belongs to one or more

categories• Examples include

– eye colour – presence or absence data– texture of seeds

• For these we use a chi squared test 2

• This tests for an association between two or more variables

Chi squared

• Draw a contingency table

• These are the observed values

Blue eyes Green eyes Row totals

Fair hair a b a+b

Ginger hair c d c+d

Column totals

a+c b+d a+b+c+d

Chi squared

• Now work out the expected values:

• Where,

E =(Row total) x (Column total)

(Grand total)

Chi squared

Blue eyes Green eyes Row totals

Fair hair(a+b)(a+c)

(a+b+c+d)

(a+b)(b+d)

(a+b+c+d)a+b

Ginger hair(c+d)(a+c)

(a+b+c+d)

(c+d)(b+d)

(a+b+c+d)c+d

Column totals

a+c b+d a+b+c+d

Chi squared

• For each box work out (O-E)2 ÷ E

• Find the sum of these to get 2

2 =(O-E)2

E

Chi squared

• Compare 2 with the critical value at 5% confidence limits

• There will be (no. rows – 1) x (no. columns – 1)

degrees of freedom

• If 2 is greater than the critical value we can say that the variables are associated with one another in some way

• We reject the null hypothesis

Spearman Rank

• Two sets of data may show a correlation

• The data can be plotted on a scatter graph:

Positive correlation No correlationNegative correlation

Spearman Rank

• We calculate the correlation by assigning a rank to the values:

Data 1 Rank

12

14

18

18

Data 2 Rank

24

29

29

38

Spearman Rank


Data 1 Rank

12 1

14

18

18

Data 2 Rank

24

29

29

38

This is the Lowest value – So we call it rank 1

Spearman Rank


Data 1 Rank

12 1

14 2

18

18

Data 2 Rank

24

29

29

38

This is the 2nd lowestvalue – so we call it rank 2

Spearman Rank


Data 1 Rank

12 1

14 2

18 ?

18 ?

Data 2 Rank

24

29

29

38

These should be rank 3 & 4 – but they are the same. We find the average of 3 + 4 and give them this rank

Spearman Rank


Data 1 Rank

12 1

14 2

18 3.5

18 3.5

Data 2 Rank

24

29

29

38(3+4)/2 = 3.5

Spearman Rank


Data 1 Rank

12 1

14 2

18 3.5

18 3.5

Data 2 Rank

24

29

29

38

Similarly on thisside

Spearman Rank


Data 1 Rank

12 1

14 2

18 3.5

18 3.5

Data 2 Rank

24 1

29

29

38

Spearman Rank


Data 1 Rank

12 1

14 2

18 3.5

18 3.5

Data 2 Rank

24 1

29 2.5

29 2.5

38

The averageof 2 & 3

Spearman Rank


Data 1 Rank

12 1

14 2

18 3.5

18 3.5

Data 2 Rank

24 1

29 2.5

29 2.5

38 4

Spearman Rank

• Find the difference D between each rank

• Square this difference

• Sum the D2 values

• Calculate the Spearman Rank Correlation Coefficient rs

rs = 1 -6D2

n(n2-1)

Spearman Rank

• Compare rs with the critical value at the 5% level

• If it is greater than the critical value (ignoring the sign) then we reject the null hypothesis

• … there is a significant correlation between the two sets of data

• If the value is positive there is a positive correlation

• If it is negative then there is a negative correlation

Quick guide

Is your data interval data or is it categoric data (it can only be placed in a number of categories)

IntervalInterval CategoricCategoric

Quick guide

Are you looking for a correlation between two sets of data – eg the rate of photosynthesis and light intensity

YesYes NoNo

Quick guide

Use the Chi squared test

BackBack EndEnd Chi squaredChi squared

Quick guide

Use the Spearman Rank test

BackBack EndEnd Chi squaredChi squared

Quick guide

Are you comparing data from two populations?

YesYes NoNo

Quick guide

Is your data normally distributed?

YesYes NoNo

0

2

4

6

8

10

12

14

16

Quick guide

Use a t-test

t-testt-test BackBack

Quick guide

Use a Mann-Whitney U test

BackBack ExitExit

statisticsforbiologists colstons

Technology

sets of data data

sample of data

data values

sets of data spacebar

sd spacebar

samples of data

distributed data

set of data