1 measures of variability chapter 5 of howell (except 5.3 and 5.4) people are all slightly different...
TRANSCRIPT
1Measures of Variability
• Chapter 5 of Howell (except 5.3 and 5.4)
• People are all slightly different (that’s what makes it fun)• Not everyone scores the same on the same scale
• This is interesting for us - must take it into account• The variation tells us about the people we
studied
2Example of variability
• Imagine this variable:• 5 7 3 8 2 2 9 1 9 3
• The mean is 4.9• We sort of expect 4.9 to be representative of the
scores, but:
0
0.5
1
1.5
2
2.5
1 2 3 4 5 6 7 8 9
Series1
The data is at the edges - not at all close to 4.9!
3
A second sample:
• Look at this one:• 4 4 4 5 5 5 5 6 6
• The mean is also 4.9
• But the distribution:
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5 6 7 8 9
Series1
Same mean as before, but the numbers are very clustered close to the mean!
4How do we explain this difference?
• Both have the same mean• the mean obviously doesn’t tell the whole
story!
• What is the actual difference between those data sets?• The left one if more “spread out” than the one
on the right
0
0.5
1
1.5
2
2.5
1 2 3 4 5 6 7 8 9
Series1
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5 6 7 8 9
Series1
5Variability
• Measures of variability capture this “spreadness” of the data• not applicable to nominal variables
• Various ways to measure it
• How far does the data stretch?
• How far, on average, is it spread from the mean?
6
Extents of the data - the range• The range is the total width of the data
• Consider x, with a sample• 7 4 3 4 5 6 3
• These values range all the way from 3 (the smallest value) to 7 (the biggest value) - it’s range is 4
• Easy to calculate:• rangex = max(x) - min(x)
• (the largest value of x minus the smallest value of x)
• A high range value means the data is very spread
7Example: calculating the range
• Calculate the range for x, from the sample:• 26 28 32 15 25 12
• Step 1 - find the largest value of x• in this sample, it is 32
• Step 2 - find the smallest value of x• in this sample, it is 12
• Step 3 - biggest minus smallest• 32 - 12 = 20
• The range is 20
8Why the range is cool/ why it sucks
• Gives an idea of how far spread the data is• a higher range number means the data is more
spread apart
• Can compare various sample’s ranges to see which is spread the most
• But: can’t distinguish between these two samples (both have range = 10)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5 6 7 8 9 10 11
Series1
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10 11
Series1
9
A better idea of variation
• The right histogram shows more clustering, but has a few values which “throw off” the range• Range can be fooled by “extreme values” -
outliers
• There exist better measures which are “outlier proof”
10Outlier proofing - Varience
• The varience presents a better measure of data spread• not as easily influenced by outliers
• Varience is based on the average distance of the scores from the mean
• It is not on the variable’s scale• the variance is not in the same units a the
variable
• Still useful - bigger values mean more spread
11Calculating variance (brace yourself)
• Variance is calculated using a formula:
Varience is the mean of the squared deviations of the observations
12
Calculating variance (in English)
• Easy if broken down into 5 small steps!
• Step 1: Work out the mean of x, and n
• Step 2: For each data point, work out the deviation (x minus the mean of x)
• Step 3: For each data point, square the deviations you got above
• Step 4: Add all the squared deviations together
• Step 5: Divide your sum by n minus 1
13Example: working out s2
• Work out the variance for x, based on the sample:• 16, 12, 15, 14, 20
• By the numbers!
• Step 1: work out the mean and n• n is 5
• 16+12+15+14+20 = 77
• 77 / 5 = 15.4
• The mean is 15.4
14Example: working out s2
• For the remaining steps, make yourself a table:
• x x-x (x-x)2 Each column is a step - fill in one at a time
15Example: working out s2
• Step 2: Work out the deviation (x minus mean of x)
x x-x (x-x)2
16 0.6
12 -3.4
15 -0.4
14 -1.4
20 4.6
16Example: working the variance
• Step 3: Square the deviations (column 2 times column 2)
x x-x (x-x)2
16 0.6 0.36
12 -3.4 11.56
15 -0.4 0.16
14 -1.4 1.96
20 4.6 21.16
17Example: working the variance
• Step 4: sum the squared deviations• 0.36+11.56+0.16+1.96+21.16 = 35.2
• Step 5: divide the sum by (n-1)• n = 5
• n-1 = 4
• 35.2 / 4 = 8.8
• The variance of this data set is 8.8• Simple, but tedious!
18
Variance: The bad news
• Variance is a good measure of spread, but it is in odd units• A bigger number means more spread, but the
number itself means very little
• Because we square in the formula, we cause the numbers to loose their scale
• The variance of an IQ scale is not in IQ points
• Would be nice to have a measure of variation which is in the correct units!
19
The Standard Deviation
• The standard deviation is a measure variation• Has all the good properties of the variance
• PLUS it is in the same scale as the variable• Standard deviation of IQ scores is expressed in IQ
points
• Gives and intuitive understanding of how far apart the scores truly are spread
– “Scores were centered at 100 and spread by 15”
20Calculating the standard deviation
• Very simple formula:
• To work it out, calculate variance and then take its square root
21
Example: working out s
• Work out the variance for x, based on the sample:• 16, 12, 15, 14, 20
• Step 1: Work out the variance• s2 = 8.8 (from the previous example)
• Step 2: find the square root:8.8 = 2.966 The standard dev is 2.966
22
Variance and standard deviation
• If you have variance, it is easy to work out standard deviation• Square root the variance
• If you have the standard deviation, it is easy to work out the variance• Square it
23
Using the standard deviation with the mean
• By looking at the mean and std deviation at the same time, we can get a good idea of a variable:
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8
Series1
Mean: 5.35Std dev: 1.008
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5 6 7 8
Series1
Mean: 5.35Std dev: 2.3A B