section 1 topic 31 summarising metric data: median, iqr, and boxplots
TRANSCRIPT
Section 1 Topic 3 1
Section1 Topic 3
Summarising metric data:Median, IQR, and boxplots
Section 1 Topic 3 2
Summarising metric data: Median, IQR & Box Plots
Can we describe a distribution with just one or two numbers?
What is the median, how is it calculated and what does it tell us?
What is the interquartile range, how is it calculated and what does it tell us?
What is a five number summary? What is a box plot and why is it
useful?
Section 1 Topic 3 3
Will less than the whole picture do?
Summary Statistics Measures of centre
Median Mean
Measures of spread Range Interquartile Range Standard Deviation
Section 1 Topic 3 4
Median3 5 1 4 8
Firstly numerically order the data set
1 3 4 5 8
50% higher than or equal to median
50% lower than or equal to median
Location of Median = (n+1)/2
= (5+1)/2
= 3rd observationNotes p.97
For an odd number of data values the median will be one of the data values
1 3 4 5 8
Median = 4
For an even number of data values the median may not coincide with an actual data value
3 4 5 8
Median = 4.5
Location of Median = (4+1)/2
= (5)/2
= 2.5 observation
Section 1 Topic 3 6
Limitations: Range Depends on only two extreme values.
Data set 1 5 6 7 8 9 10 11 12 Range = 12 - 5 = 7 Data set 2 5 12 12 12 12 12 12 12
Section 1 Topic 3 7
Interquartile range
Quartiles are the points that divide a distribution into quarters
Q1 Q2 Q3
25% 50% 75%Median
IQR = Q3 - Q1
The interquartile range (IQR) is defined to be the spread of the middle 50% of data values, so that
Notes p.99
Section 1 Topic 3 8
Why is the IQR more useful that the range?
IQR describes the middle 50% of observations.
Upper 25% and lower 25% of observations are discarded.
IQR generally not affected by outliers.
Section 1 Topic 3 9
Fre
qu
ency
0
2
4
6
8
10
12
14
bottom 25% middle 50% top 25%
Q 1
Q 2
Q 3
Picturing quartiles with histogram
Notes p.97
Section 1 Topic 3 10
Five number summary
Minimum value, Q1, Median, Q3, Maximum value
Section 1 Topic 3 11
The BoxplotGraphical representation of five number summary
Notes p.98
Section 1 Topic 3 12
Constructing a Boxplot
Notes p.99
Section 1 Topic 3 13
*Exercise 4
Notes p.103
Section 1 Topic 3 14
Q1 Q3M
For a symmetric distribution, the box plot is also symmetric. The median
is in the middle of the box and the whiskers are approximately equal in
length.
Relating a boxplot to the shape of the distribution : Symmetric
Notes p.104
Section 1 Topic 3 15
Positively skewed distributions
Q1 Q3M
positive skew
The box plot of a positively skewed distribution has the median off-centre
and to the left. The left hand whisker will be short, while the right hand
whisker will be long reflecting the gradual tailing off data values to the
right.
Section 1 Topic 3 16
Q3Q1 M
negative skew
The box plot of a negatively-skewed distribution has the median off-centre
and to the right. The right hand whisker will be short, while the left hand
whisker will be long reflecting the gradual tailing off data values to the left.
Negatively skewed distributions
Section 1 Topic 3 17
Boxplot with outliers Possible outliers defined as any values
outside of the interval
(Q1-1.5 X IQR, Q3 + 1.5 X IQR)
We say possible, since the point may just be part of the tail of the distribution but we may not have enough data to be sure
Notes p.101
Section 1 Topic 3 18
Boxplot with outliers
Min Q1 M Q3 Max
38 63 70 75 76
Section 1 Topic 3 19
*Exercise 5
Notes p.107