1 measures of variability chapter 5 of howell (except 5.3 and 5.4) people are all slightly different...

1Measures of Variability

• Chapter 5 of Howell (except 5.3 and 5.4)

• People are all slightly different (that’s what makes it fun)• Not everyone scores the same on the same scale

• This is interesting for us - must take it into account• The variation tells us about the people we

studied

2Example of variability

• Imagine this variable:• 5 7 3 8 2 2 9 1 9 3

• The mean is 4.9• We sort of expect 4.9 to be representative of the

scores, but:

0

0.5

1

1.5

2

2.5

1 2 3 4 5 6 7 8 9

Series1

The data is at the edges - not at all close to 4.9!

3

A second sample:

• Look at this one:• 4 4 4 5 5 5 5 6 6

• The mean is also 4.9

• But the distribution:

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5 6 7 8 9

Series1

Same mean as before, but the numbers are very clustered close to the mean!

4How do we explain this difference?

• Both have the same mean• the mean obviously doesn’t tell the whole

story!

• What is the actual difference between those data sets?• The left one if more “spread out” than the one

on the right

0

0.5

1

1.5

2

2.5

1 2 3 4 5 6 7 8 9

Series1

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5 6 7 8 9

Series1

5Variability

• Measures of variability capture this “spreadness” of the data• not applicable to nominal variables

• Various ways to measure it

• How far does the data stretch?

• How far, on average, is it spread from the mean?

6

Extents of the data - the range• The range is the total width of the data

• Consider x, with a sample• 7 4 3 4 5 6 3

• These values range all the way from 3 (the smallest value) to 7 (the biggest value) - it’s range is 4

• Easy to calculate:• rangex = max(x) - min(x)

• (the largest value of x minus the smallest value of x)

• A high range value means the data is very spread

7Example: calculating the range

• Calculate the range for x, from the sample:• 26 28 32 15 25 12

• Step 1 - find the largest value of x• in this sample, it is 32

• Step 2 - find the smallest value of x• in this sample, it is 12

• Step 3 - biggest minus smallest• 32 - 12 = 20

• The range is 20

8Why the range is cool/ why it sucks

• Gives an idea of how far spread the data is• a higher range number means the data is more

spread apart

• Can compare various sample’s ranges to see which is spread the most

• But: can’t distinguish between these two samples (both have range = 10)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5 6 7 8 9 10 11

Series1

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10 11

Series1

9

A better idea of variation

• The right histogram shows more clustering, but has a few values which “throw off” the range• Range can be fooled by “extreme values” -

outliers

• There exist better measures which are “outlier proof”

10Outlier proofing - Varience

• The varience presents a better measure of data spread• not as easily influenced by outliers

• Varience is based on the average distance of the scores from the mean

• It is not on the variable’s scale• the variance is not in the same units a the

variable

• Still useful - bigger values mean more spread

11Calculating variance (brace yourself)

• Variance is calculated using a formula:

Varience is the mean of the squared deviations of the observations

12

Calculating variance (in English)

• Easy if broken down into 5 small steps!

• Step 1: Work out the mean of x, and n

• Step 2: For each data point, work out the deviation (x minus the mean of x)

• Step 3: For each data point, square the deviations you got above

• Step 4: Add all the squared deviations together

• Step 5: Divide your sum by n minus 1

13Example: working out s2

• Work out the variance for x, based on the sample:• 16, 12, 15, 14, 20

• By the numbers!

• Step 1: work out the mean and n• n is 5

• 16+12+15+14+20 = 77

• 77 / 5 = 15.4

• The mean is 15.4


• For the remaining steps, make yourself a table:

• x x-x (x-x)2 Each column is a step - fill in one at a time


• Step 2: Work out the deviation (x minus mean of x)

x x-x (x-x)2

16 0.6

12 -3.4

15 -0.4

14 -1.4

20 4.6

16Example: working the variance

• Step 3: Square the deviations (column 2 times column 2)

x x-x (x-x)2

16 0.6 0.36

12 -3.4 11.56

15 -0.4 0.16

14 -1.4 1.96

20 4.6 21.16

17Example: working the variance

• Step 4: sum the squared deviations• 0.36+11.56+0.16+1.96+21.16 = 35.2

• Step 5: divide the sum by (n-1)• n = 5

• n-1 = 4

• 35.2 / 4 = 8.8

• The variance of this data set is 8.8• Simple, but tedious!

18

Variance: The bad news

• Variance is a good measure of spread, but it is in odd units• A bigger number means more spread, but the

number itself means very little

• Because we square in the formula, we cause the numbers to loose their scale

• The variance of an IQ scale is not in IQ points

• Would be nice to have a measure of variation which is in the correct units!

19

The Standard Deviation

• The standard deviation is a measure variation• Has all the good properties of the variance

• PLUS it is in the same scale as the variable• Standard deviation of IQ scores is expressed in IQ

points

• Gives and intuitive understanding of how far apart the scores truly are spread

– “Scores were centered at 100 and spread by 15”

20Calculating the standard deviation

• Very simple formula:

• To work it out, calculate variance and then take its square root

21

Example: working out s

• Work out the variance for x, based on the sample:• 16, 12, 15, 14, 20

• Step 1: Work out the variance• s2 = 8.8 (from the previous example)

• Step 2: find the square root:8.8 = 2.966 The standard dev is 2.966

22

Variance and standard deviation

• If you have variance, it is easy to work out standard deviation• Square root the variance

• If you have the standard deviation, it is easy to work out the variance• Square it

23

Using the standard deviation with the mean

• By looking at the mean and std deviation at the same time, we can get a good idea of a variable:

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8

Series1

Mean: 5.35Std dev: 1.008

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5 6 7 8

Series1

Mean: 5.35Std dev: 2.3A B

24Understanding distributions

• The mean tells us the “middle” of the distribution

• The standard dev tells us the “spreadness” of the data

• From this we can derive a lot• A low std dev means that everyone scored

almost the same

• A high std dev tells you there was a lot of disagreement

1 measures of variability chapter 5 of howell (except 5.3 and 5.4) people are all slightly different...

Documents