chapter 3 dispersion
DESCRIPTION
statistic and probabilityTRANSCRIPT
![Page 1: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/1.jpg)
1
CHAPTER 3:
DATA ANALYSIS - MEASURES OF DISPERSION
3.3 Introduction 3.4 Ungrouped Data 3.4.1 Range 3.4.2 Inter-Quartile Range 3.4.3 Semi Inter-Quartile Range 3.4.4 Variance and Standard Deviation 3.5 Grouped Data 3.5.1 Range 3.5.2 Inter-Quartile Range 3.5.3 Semi Inter-Quartile Range 3.5.4 Variance and Standard Deviation 3.6 Relative Dispersion 3.7 Skewness
![Page 2: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/2.jpg)
2
CHAPTER 3 contd..: MEASURES OF DISPERSION
3.33.33.33.3 INTRODUCTIONINTRODUCTIONINTRODUCTIONINTRODUCTION
Measures of central value give us one single figure that represents the entire data.
However, the measures of averages alone cannot adequately describe a set of
observations; it is also necessary to describe the variability or dispersion of the
observation.
Two sets of data might have the same mean value but not necessary of the same
spread. For instance, the number sets 6, 7, 8, 9, 6 and 2, 7, 9, 13, 5, 6 have the same
mean, 7, but most of the numbers in the first set are around the mean value. On the other
hand, the second set is more spread away from the mean. The difference in the spread can
be determined by the measure of dispersion.
There are 5 methods of measures of dispersion:
1) Range
2) Inter-quartile Range
3) Semi Inter-quartile range
4) Variance
5) Standard deviation
Range is however not a good measure of dispersion because it is influenced by the
extreme values and the calculation does not cover all observations. Among all, variance
and standard deviation are the most useful and widely used measure of dispersion. This is
because, although they are influenced by the extreme values, the calculations cover all
the observations.
Standard deviation is taken as the square root of variance. The deviation in the term
refers to the difference between the observed data and the mean. It gives us an idea of
how close are the values of the data around the mean, generally, the larger value of
standard deviation for a data set, the larger the spread of the observations around the
mean.
3.43.43.43.4 UNGROUPED DATAUNGROUPED DATAUNGROUPED DATAUNGROUPED DATA
3.4.13.4.13.4.13.4.1 RANGERANGERANGERANGE
The range is the difference between the highest and lowest value in the
distribution.
Range = Highest value – Lowest value
![Page 3: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/3.jpg)
3
Example 1:
Calculate the prices of shares of ABS Co. Ltd over seven-day week:
Prices of shares (RM’00) :
21 20 28 16 22 25
Range = 28 – 16 = 12 ( RM 1 200)
3.4.23.4.23.4.23.4.2 INTERINTERINTERINTER----QUARTQUARTQUARTQUARTILE RANGEILE RANGEILE RANGEILE RANGE
Inter-quartile range is the difference between the quartiles Q3 – Q1 . This covers
the middle 50% of the observations.
For example, for the data in Example 1:
16 20 21 22 25 28
Q1= n+1th
= 7 = 1.25th
observation = 16+20 = 18
4 4 2
Q3=3( n+1)th
= 21 = 5.25th
observation = 25+28 = 26.5
4 4 2
Therefore inter-quartile range = 26.5-18 = 8.5 (RM8500)
3.4.3.4.3.4.3.4.3333 SEMI INTERSEMI INTERSEMI INTERSEMI INTER----QUARTILE QUARTILE QUARTILE QUARTILE RANGERANGERANGERANGE
The semi inter-quartile range (quartile deviation) is the average of the
differences of the quartiles from the median.
(Same example)
If Q3 = 26.5 and Q1 = 18
Semi Inter-Quartile Range = 26.5 – 18 = 4.25 (RM425)
2
Interquartile Range = Q3 – Q1
Semi Inter-Quartile range = Q3 – Q1
2
![Page 4: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/4.jpg)
4
3.4.3.4.3.4.3.4.4444 VVVVARIANCEARIANCEARIANCEARIANCE AND STANDARD DEVIATIONAND STANDARD DEVIATIONAND STANDARD DEVIATIONAND STANDARD DEVIATION
For ungrouped data, the variance is given as
�� � ∑�������� ���2 � ∑ �2��∑ ��2���1
The standard deviation is
� � �∑�������� ��� � �∑ �2��∑ ��2���1
or simply � � √��������
Example 2:
1. Find the variance and standard deviation for the sample data:
5, 2, 3, 4, 5, 6, 3
Solution:
∑� � 5 ! 2 ! 3 ! 4 ! 5 ! 6 ! 3 � 28
∑�� � 5� ! 2� ! 3� !⋯! 3� � 124 �∑ ���� � 28�7 � 112
Variance, �( � ∑���∑)�*���
� ��+����,�� � ��- � (
Standard deviation, � � √./�0/�12 � √2 � 3. 535
![Page 5: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/5.jpg)
5
2. Calculate the standard deviations for the following sets of sample data. Hence,
determine which one is more dispersed about the mean, than the other.
Data A: 2, 7, 10, 9, 2, 5, 16
Data B: 10,8,14, 20, 40, 32, 1, 4, 8, 36, 12, 32
Solution:
For data A:
∑� � 51
∑�� � 519
�∑��� � 7�,
� � 8∑�� ��∑����� � 1
� �519 � �51��77 � 1
� �147.36 � 5. 9:;
For data B:
∑� � 217
∑�� � 5929
�∑��� � ��,��
� � 8∑�� ��∑����� � 1
� �5929 � �217��1212 � 1
� �2004.9211 � 3=. :>
Therefore, data B is more dispersed than data A.
![Page 6: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/6.jpg)
6
3.53.53.53.5 GROUPED DATAGROUPED DATAGROUPED DATAGROUPED DATA
3.3.3.3.5.15.15.15.1 RANGERANGERANGERANGE
Range for grouped data:
3.3.3.3.5.25.25.25.2 INTERINTERINTERINTER----QUARTILE RANGEQUARTILE RANGEQUARTILE RANGEQUARTILE RANGE
The values of Q3 and Q1 are obtained whether from the cumulative
frequency curve (graphical method) or using formula.
3.3.3.3.5.35.35.35.3 SEMI INTERSEMI INTERSEMI INTERSEMI INTER----QUARTILE QUARTILE QUARTILE QUARTILE RANGERANGERANGERANGE
The semi inter-quartile range (quartile deviation) is the average of the
differences of the quartiles from the median.
3.3.3.3.5.5.5.5.4444 VARIANCE AND STANDARD DEVIATIONVARIANCE AND STANDARD DEVIATIONVARIANCE AND STANDARD DEVIATIONVARIANCE AND STANDARD DEVIATION
For grouped data, the variance is given as
�� � ∑?�������� or �� � ∑?���∑@)�*���
And the standard deviation is
� � �∑?�������� or � � �∑?���∑@)�*���
Range = Upper boundary of last class – lower boundary of first class
Interquartile Range = Q3 – Q1
Semi Inter-Quartile range = Q3 – Q1
2
![Page 7: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/7.jpg)
7
Example 3:
The marks obtained by 50 students in a certain college.
Marks No. of students
10 but under 20 3
20 but under 30 7
30 but under 40 10
40 but under 50 20
50 but under 60 7
60 but under 70 3
Find the inter-quartile range, variance and standard deviation by using both
graphical (semi inter-quartile range only) and calculation methods.
3.3.3.3.6666 RELATIVE DISPERSIONRELATIVE DISPERSIONRELATIVE DISPERSIONRELATIVE DISPERSION
The standard deviation, on its own, tells us very little about the amount of
dispersion in the data. To compare the dispersion between different set of data, we
need a measure of relative dispersion which expresses the magnitude of the
standard deviation to the mean. In this case, we use the coefficient of variation.
Coefficient of variation = standard deviation x 100
mean
C.V = 100×
x
s
▼
Example 4:
An analysis of monthly wages paid to workers in two firms, A and B, belonging
to the same industry gives the following results.
A B Average monthly wages RM 105 RM 95
Standard deviation of the distribution of wages RM 20 RM 22
Calculate the coefficient of variation and comment on your finding.
Larger Percentage → Greater Variation
Larger Variation → Less Consistency
Smaller Variation → More Consistency
![Page 8: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/8.jpg)
8
Solution:
Firm A C.V = 20 x 100 = 19.05 %
105
Firm B C.V = 22 x 100 = 23.16 %
95
The value of the coefficient of variation for firm B is higher than that of firm A.
Therefore, wages in firm B are less consistent compared to firm A.
3.73.73.73.7 SKEWNESSSKEWNESSSKEWNESSSKEWNESS
Although frequency curves can take any shape, certain are often encountered. The
most common is the bell-shaped distribution.
i) Symmetrical
mean, median, mode
This distribution is perfectly symmetrical. The graph for the symmetrical
distribution is the normal curve. The mean, median and mode all have the same
value.
ii) Asymmetrical
Skewness is a term used to describe the asymmetry properties of a frequency
curve. Skewness can either be positive or negative.
� Positive skewness
When the frequency distribution has a tail stretching out to the right, it is
said to be positively skewed.
Mode Median
Mean
![Page 9: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/9.jpg)
9
� Negative skewness
When the frequency distribution has a tail stretching out to the left, it is
said to be negatively skewed.
Mean Median Mode
iii) Measure the skewness by calculation.
To measure the amount of skewness in a set of data, a number of measures
have been suggested. The most common measure is the Pearson
coefficient of skewness which;
a)
; or
b)
EXERCISE
1. The table below gives an analysis of the debtors’s balance of Fuel
Suppliers Bhd.
a) Calculate the median and quartile deviation
b) Estimate the mean and standard deviation
c) Calculate the coefficient of variation.
d) Calculate the measure of skewness of these data.
Balance Outstanding
(RM’00)
No. of accounts
20 but under 39.9 1
40 but under 59.9 3
60 but under 79.9 6
80 but under 99.9 10
100 but under 119.9 5
120 but under 139.9 3
140 but under 159.9 2
Skewness = Mean – Mode
Standard Deviation
Skewness = 3 (Mean – Median)
Standard Deviation
![Page 10: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/10.jpg)
10
2. The sales orders (RM) achieved by 10 companies in Shah Alam.
Area A B C D E F G H I J
Sales (RM) 150 130 140 150 140 300 110 120 140 120
For these sales data calculate;
i) Arithmetic mean
ii) Mode
iii) Median
iii) Mean Deviation
iv) Standard Deviation
3. A long term investor is considering selling one of the two stocks, X, Y or
Z. He decides to hold on to the more consistent stock. Using the
information in the following table, find:
Stock Average Price Standard Deviation
X RM 76 RM 12
Y RM 45 RM 6
Z RM 50 s
i) If stock Y and Z have the same relative dispersion. What is the
standard deviation for stock Z
ii) Which stock should he sell?
4. The following frequency polygon shows the profit in (RM00) of a random
sample of 48 shops in 2006.
i) Construct a frequency table using the above polygon
ii) From the frequency table, calculate the median, mean and standard
deviation.
0
2
4
6
8
10
12
14
16
4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5
Profit (RM'000)
No. of sh
ops
4
9
14
11
7
3
![Page 11: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/11.jpg)
11
5. The time taken of orders (in minutes) of 10 randomly selected telephone
calls taking by telephone company A was.
2.82 3.71 2.45 5.42 6.24 4.24 3.70 4.75 3.25 4.02
i) Calculate the range of data.
ii) Estimate the mean and standard deviation of time taken.
iii) Calculate the coefficient of variation
iv) The mean and standard deviation of time taken of order by company B
was 3.77 and 2.12 minutes respectively. Calculate the coefficient of
variation for this data and compare your answer with that obtained in
part (iii).
6. The following data shows the results in (CGPA) of 80 students in class A.
CGPA Number of student
2.5 - 2.6 4
2.7 - 2.8 13
2.9 - 3.0 24
3.1 - 3.2 10
3.3 - 3.4 18
3.5 - 3.6 11
a) Draw a histogram to represent these data
b) By using the histogram, state the modal class
c) Calculate the mean, median and standard deviation of the results.
d) The mean and standard deviation of the results for students in class
B is 3.12 and 0.366 respectively. Using the appropriate
measurement, compare the performance consistency of the two
classes.
7. FLY company employed 50 salespersons to sell its products. The
following information is gathered from its travelling claims records.
Distance travelled per month
(in km)
No. of salespersons
300 and less than 400 3
400 and less than 500 8
500 and less than 600 18
600 and less than 700 11
700 and less than 800 10
(i) Draw an ogive for the data.
(ii) Using ogive in (i), estimate
a) Median, lower quartile and upper quartile.
b) the percentage of salespersons who travelled less than 550 km in a
month
c) the minimum monthly distance travelled by the 30% of the active
travelling salespersons
(iii) Find the mean and standard deviation for the above data.
(iv) Determine the shape of skewness using appropriate calculation.
![Page 12: Chapter 3 Dispersion](https://reader030.vdocuments.site/reader030/viewer/2022013105/563dbb7c550346aa9aad9995/html5/thumbnails/12.jpg)
12
8. A research physician wants to estimate the average age of people with
diabetes. She takes a random sample of 27 diabetics and obtains the
following ages.
54 48 61 38 23 79 70 82 56
38 7 79 83 57 41 10 75 60
68 76 61 77 65 55 21 53 83
(i) Construct a frequency table by using all the steps required.
(ii) Find the mean and median of the data.
(iii) Given that the standard deviation of the average age people with
diabetes is 22.068;
a) Determine the shape of the distribution and
b) Coefficient of variation for the above data.