dispersion (measures of variability)
TRANSCRIPT
Dispersion(Measures of Variability)
Introduction and Definition :
Measures of Central tendency are called averages of first order, but these are not sensitive to the variability among the data. Two distributions may have same Mean, Median and Mode but the variability among the data in two distributions may be quiet different. For example consider two groups ‘A’ and ‘B’ as
Group A Group B
65666768717374777777
425458626777778593100
Computing Averages we get,
Group A Group B
Mean = 71.5Median = 72.0
Mode = 77
Mean = 71.5Median = 72.0
Mode = 77
It is clear that Group A and Group B have same values of Mean. Median and Mode, but careful perusal of data in both the groups show that the values in Group B are much more widely scattered than the values in group A
Sometimes the two series may have similar formation but their measurement of Measures of Central Tendency may be different.
A B C
1011121314
3031323334
100101102103104
The series have entirely different average but the same formation. Clearly measurement of central tendency do not indicate how the individual values in the distribution differ from each other or from the central value.When extent of variation of individual values in relation to other values or in relation to the central value is large, the Measures of Central Tendency fail to represent the series fully. The Measures of Dispersion (or variability) coupled with the Measures of Central Tendency gives a fairly good idea (not the full idea) about the nature of the distribution.To have a complete idea about the nature of data Moments and Kurtosis must also be measured.
Dispersion is the spread or scatter of values from the Measure of Central Tendency.A Measure of Dispersion is designed to state the extent to which individual observations (or items) vary from their average. Here we shall account only the amount of variation but not the direction.
D.C. Brooks define dispersion as “Dispersion or spread is the degree of scatter or variation of variable about the central value”.
Measures of Dispersion are called Averages of Second order because they are based on the deviations of the different values from the mean or other measures of central tendency which are called averages of First order.Objectives of Dispersion:
1. To know the average variation of different values from the average of a series.
2. To know about the composition of a series or the dispersion of the values on either sides of the central tendency.
3. To know the range of values.4. To compare the disparity between two or more series expressed
in different units in order to find out the degree of variation.5. To know whether the Central Tendency truly represent the series
or not. If the dispersion is more the central tendency do not represent the series.
Importance of Dispersion :
1. Conclusion drawn from the central tendency carries no meaning without knowing variation of various items of the series from the average.
2. Inequalities in the distribution of wealth and income can be measured in dispersion.
3. Dispersion is used to compare and measure concentration of economic power and monopoly in the country.
4. Dispersion is used in output control and price control.
Characteristics of a good Measure of Dispersion :
1. It should be simple to understand and easy to calculate.2. It should be rightly defined.3. It should be based on the all items of the series.4. It should not be unduly affected by the extreme items of the
series.5. It should be least affected by the sample fluctuations.6. It should be amenable to the further algebraic treatment.
Merits :
1. They indicate the dispersal character of the statistical series.2. They speak the dependability or reliability of the average value
of a series.3. The enable the statistician in comparing between two or more
statistical series with regard to the character of their uniformity or consistency or equitability.
4. They enable the one in controlling the variability of a phenomenon under his purview .
5. They facilitate in making further statistical analysis of the series through devices like co-efficient of Skewness, co-efficient of Kurtosis, co-efficient of correlation, variance analysis etc.
6. They supplement Measures of Central Tendency in finding out more and more information related to the nature of a series.
Demerits :
1. They are liable to misinterpretations and wrong generalization by a statistician of a biased character.
2. They are liable to yield inappropriate results as there are different methods of calculating the dispersion.
3. Except one or two, most of the dispersion involve complicated process of computing.
4. They by themselves can not give any idea about the symmetrical or skewed character of a series.
5. Like measures of central tendency, most of the measures of dispersion do not give a convincing idea about a series to a layman.
Different Measures of Dispersion :
Measures of Dispersion :
Absolute Measures
Relative Measures
RangeCo-efficient of Mean
Deviation
Co-efficient of Range
Co-efficient of Quartile DeviationCo-efficient of Mean Variation
Quartile DeviationAverage DeviationStandard DeviationLorentz Curve
An absolute Measure of Dispersion is expressed in terms of the units of the measurement of the variable. The relative measure of dispersion generally known as co-efficient of dispersion is expressed as a pure number independent of the units of measurement of the variable. The main disadvantage of the absolute measure of dispersion is that it can not be used to compare the variability of two expressions measured with different units. Comparison of distribution with respect to their variability from the central value is done by relative measure of dispersion.
Range :It is defined as difference between extreme value in the distribution, i.e.,
Range = Largest Value in the Distribution - Smallest Value in the Distribution
In case of continuous frequency distribution range is calculated by any one of the following two methods.By subtracting the lower limit of the lowest class from the upper limit of the highest class.
ORBy subtracting the mid-value of the lower from mid value of the highest class.
Important:
1. In calculation of Range only the values of the variable are taken in to account and the frequencies are completely ignored.
2. Open ended classes have no Range since they have no highest and lowest value.
3. Some times, variability of two series is measured by Range only though it is a rough measure of variability.
Co-efficient of Range :
Co-efficient of Range Max. Value + Min.
Value
Max. Value – Min. Value
Sum of the extreme values
Absolute Range
=
=
QQIR13
Inter-quartile Range :
Percentile Range :
PPRP1090
..
Quartile Deviation :
= Half the difference between the upper and lower quartile= (Q3 – Q1)/2 = Semi-inter quartile Range.
Co-efficient of Quartile Deviation :
13
13
)2(...............................
2
2
)1........(......................
2
2)(
2
3
133
3313
1
113
113
1113
13
13
DQQMedian
QQQ
QQQQMedian
DQQMedian
QQQ
QQQ
QQQQ
QQMedian
QMedianMedianQ
For Symmetric Distribution :
For Asymmetric Distribution :
Median Q1 + Q.D.Median Q3 – Q.D.
Merits of Quartile Deviation :
1. Simple to understand and easy to compute.2. Not affected by extreme values.3. Computed even if distribution has unequal intervals.4. Computed in case of open ended intervals.
Demerits of Quartile Deviation :
1. It is not based on the all observations of the series because it does not take frequencies below the lower quartile and above the upper quartile into consideration. .
2. Not amendable to algebraic treatment. 3. Affected by sample fluctuations.4. It is a distance on the scale and is not a measure from average.
Therefore, it fails to show variations around an average.
Use :
The quartile deviation, as a measure of dispersion, is mainly employed in open ended distributions. In many situations, we encounter such distributions because of the need to keep certain information confidential.
Mean Deviations :
Mean deviation (also called Average Deviation) is defined as the arithmetic mean of the absolute deviations of all the values from their Mean or Median or Mode.
n
i
in
i
i
n
i
in
i
i
N
Meanx
N
MedianxDeviationMean
ondistributifrequencyfor
n
Meanx
n
MedianxDeviationMean
11
11
Where N = f.
Steps to Calculate Mean deviation in Individual Values (or Observations):In case of individual observations, the following steps are involved
in the calculation of Mean Deviation :1. Calculate the Mean or Median of a given series.2. Write down the deviations (dxi )of each item (xi ) either from the
Mean or the Median without considering the sign.3. Sum up the deviations disregarding the signs, This is =
.4. Divide the total of the deviations by the number of observations
and the resulting value is the Mean Deviation.
n
iidx
1
Steps to Calculate Mean deviation in – Discrete Series :In case of discrete series, the following steps are involved in the
calculation of Mean Deviation :1. Calculate the Mean or Median of a given series.2. Write down the deviations (dxi )of each item (xi ) either from the
Mean or the Median without considering the sign.3. Multiple the deviations by frequencies ( ).4. Find sum of the products so obtained. This is 5. Divide the sum of products by the total frequency and the
resulting value is the mean deviation. Expressed as a formula form :
idxf dxf
f
dxfDM ..
Steps to Calculate Mean deviation in – Continuous Series :As regards the calculation of mean deviation in a continuous series,
the procedure to be adopted same as in the case of discrete series but with a minor difference. Here classes are replaced by mid-values and frequencies are multiplied by deviation of the mid-values from the mean.
Coefficient of Mean Deviation :
It is a relative measure of dispersion and is computed by the following formula :
Coefficient of Mean Deviation = Mea
n
Mean Deviation
Coefficient of Mean Deviation = Media
n
Mean Deviation
,When mean is used as a reference point
,When median is used as a reference point
Coefficient of Mean Deviation = Average
Used
Mean Deviation
When median is used as a reference point
Short Cut Method of finding Mean Deviation :
When mean (or median) is not a whole number but a fraction, the following short-cut formula can be used :
Mean Deviation from Mean
=[ Sum of the values > Mean ] - [ Sum of the
values < Mean ] Total Number of
Values
Mean Deviation from Median
=[ Sum of the values > Median ] - [ Sum of the values <
Median ] Total Number of Values
Short Cut Method of finding Mean Deviation ( for Frequency Distribution ):
f
cffAxf
f
cffdxf baba )( Mean Deviation from
Mean
Where,
dxf Sum of the products of the absolute deviations [ Considering all deviations positive or negative as positive ] and the respective frequencies when the deviations are taken from Assumed Mean
af Sum of the frequencies above the Mean = Sum of the values of the frequency less than the Mean.
bf Sum of the frequencies below the Mean = Sum of the values of the frequency more than the Mean.
C = Difference between the real Mean and arbitrary Mean.
Similarly, We can find the Mean Deviation from Median
f
cffdxf ba Mean Deviation from Median
bf Sum of the frequencies below the Mean = Sum of the values of the frequency more than the Median.
C = Difference between the real Median and arbitrary Median.
dxf Sum of the products of the absolute deviations [ Considering all deviations positive or negative as positive ] and the respective frequencies when the deviations are taken from Assumed Median
Note : The assumed Mean or Median must be selected from the class in which real Mean or Median lies.Mean deviation may also be calculated from following method :
M.D. from Mean N
XffXfXf baba
aXf
bXf
af
bf
N
Sum of the products of Mid-point (X) and frequencies above the Mean.Sum of the products of Mid-point (X) and frequencies below the Mean.
Sum of frequencies of Mid-points above the Mean.
Sum of frequencies of Mid-points below the Mean.
Total number of observations.
M.D. from Median
N
MedianffXfXf baba
Ex :Find the Mean Deviation around Median of the following series :
Marks (x)
5 10 15 20 25
Students
6 7 8 11 8
Solution :
Marks(x)
f c.f.
510152025
678
118
613213240
10505
10
60350
5580
Total N = 40
15xd df
230 df
Median = Value for [(N+1)/2]th term = 20.5 th term = 15
Mean Deviation marksf
df75.5
40
230
Ex :Find the Mean Deviation around Mean of the following data :
Class Interval Frequency
0 – 1010 – 2020 – 3030 – 4040 – 50 50 – 6060 – 70
812108327
Solution :For calculating the Mean deviation :
Mid Value
(x)
f fx x – 29
5152535455565
812108327
40180250280135110455
- 24- 14- 4
6162636
241446
162636
19216840484852
252
xx xxf
800
xxf ii 1450fx50N
2950
1450
f
fxMean
1680050
11..
xxff
DM ii
Standard Deviation :
Standard Deviation is the most important, the most reliable and the most widely used measure of dispersion. The term ‘standard’ is assigned to this measure of variation probability because of the following reasons.(i) It is the most commonly used and is the most flexible in terms of variety of
applications of all measures of variation.(ii) The area under any symmetrical curve rather normal curve remains the
same with in a fixed number of standard deviations from the Mean on either side of it, e.g., in any normal curve area with in Mean standard deviation is always 68.27% of the total area and the area is 95.45% of the total area with in mean 2 standard deviation.
(iii) The sum of squares of the deviations about the Mean is the least as compared to the sum of the squares of the deviations about the Median or Mode, therefore, root Mean square deviation about the Mean is the least.
It is most important of all the measures of dispersion because it is used in many other statistical operations, e.g., sampling techniques, correlation and regression analysis, finding co-efficient of variation, skewness, kurtosis, etc. standard deviation is also called ‘Mean Error’ or ‘Mean Square Error’ or ‘Root-Mean Square Deviation’ Unlike the Mean Deviation, which may be calculated around any average, the standard deviation is always computed around the Mean.
It is the square-root of the Arithmetic Mean of the squared deviations of all values from their Mean.
Standard Deviation :
n
XXn
ii
1
2
For Frequency Distribution :
f
XXfn
iii
1
2
Short-Cut Method For Finding Standard Deviation :
22
f
fd
f
fd
For discrete Distribution :
Standard deviation is calculated for Continuous Series by calculating the Mid-Points.
22
..
n
d
n
dhDS
h
d
h
AXd
And, if the frequencies are given, then
22
..
f
df
f
dfhDS
Standard Deviation from Step-Deviation Method :
Combined Standard Deviation of Two or More Groups :
Let A and B be two groups with n1 and n2 the respective number of values and 1 and 2 the respective standard deviations, then their combined S.D. 12 is given by
21
22
222
21
211
12 nn
dndn
This can also be given by the following formula :
2
)(4
1
2
)()(
22
21
12
21
221
22
21
12
21
2212
21
21
21
222
211
12
xx
xx
nn
xxnn
nn
nnnn
Co-efficient of Variation :
Karl Pearson’s Co-efficient of Variation
= 100 %Mea
n
Standard Deviation
Relations Between Measures of Dispersion :
1. Q.D. = (2/3) S.D.2. M.D. = (4/5) S.D.3. A.M. Q.D. would cover 50 % of the items.4. A.M. S.D. would cover 68.27 % of the item.5. A.M. M.D. would cover 57.51 % of the items.
Lorenz Curve:
Graphically the Dispersion is studied by means of Lorenz Curve. Lorenz Curve is a cumulative percentage curve in which the percentage of items (or frequencies) are shown with the corresponding percentage of factors like income, wealth, profits, etc. The curve is also used to study the distribution of land, wages, income, etc, among the population of a country.
Ex :Calculate Semi-inter quartile Range and co-efficient of quartile deviation from the following data.
Age (in years)
15 – 25
25 – 35
35 – 45
45 – 55 55 – 65 65 – 75 75 – 85
No. of persons
3 61 132 153 140 51 3
Age Frequency c.f.
15 – 25 25 – 35 35 – 45 45 – 5555 – 65 65 – 75 75 – 85
361132153140513
364196349489540543
N = 543
Solution :
N/4 = 543/4 = 135.75,
3N/4 = 407.25.
Q1 = 40.435
Q3 = 59.16
Q3 – Q1 = 18.725
Q3 + Q1 = 99.595
Semi – inter quartile Range = (Q3 – Q1)/2
= 18.725/2 = 9.365
Co-efficient of quartile deviation = (Q3 – Q1)/(Q3 + Q1)
= 18.725/99.595
= 0.188
Ex :Calculate the Mean deviation from the Mean for the following data :
Marks 0 – 10 10 – 20
20 – 30
30 – 40 40 – 50 50 – 60 60 – 70
No. of Students
6 5 8 15 7 6 3
Mid-Value (x)
Frequency (f )
fx
5152535455565
65815763
3075200525315330195
28.418.48.41.611.621.631.6
170.492.067.224.081.2129.694.8
N = 50
Computational Table
xx xxf
1670fx 2.659 xxf
184.1350
2.659..
4.3350
1670
f
xxfDM
N
fxx
Ex :Determine the S.D. from following data :Yield (in gm) 1216 1374 1167
1232 1407 14531202 1372 12781141 1221 1329
Solution :
(x ) x – 1200 = d d2
121612321202114113741407137212211167145312781329
163202
- 5917420717221
- 3325378
129
2561024
43481
302764284929584
4411089
640096084
16641
Total 992d 1957382d
35.97
7.68335.16311
12
992
12
195738
..
2
22
n
d
n
dDS
Ex :Find out the S.D. from the following table giving the wages of 230 persons
Wages (Rs) No. of Persons
140 – 160160 – 180180 – 200200 – 220220 – 240240 – 260260 – 280280 – 300
121835425045208
Solution :
Wages (Rs)
Mid Value(x)
x - 230
d/20 = d’
f fd’ d’2 fd’2
140 – 160160 – 180180 – 200200 – 220220 – 240240 – 260260 – 280280 – 300
150170190210230250270290
- 80- 60- 40- 20
0204060
- 4- 3- 2- 1
0123
121835425045208
- 48- 54- 70- 42
0454024
169410149
192162140420
458072
f = 230
fd’ = - 105
fd’2 =733
5.34
230
105
230
73320
..
2
22
i
f
df
f
dfDS
Case (1) :Mr Ranveer wants to invest Rs. 10,000 in one of the two companies X and Y. Average return in a year from company X is Rs. 4000 with a Standard Deviation of Rs 25, while in company Y the average return in a year is Rs 5000 with a Standard Deviation of Rs. 40.Which company would you recommend to Mr Ranveer for investment ? Justify your answer. Solution :
Coefficient of variance of Company X
625.01004000
25
Coefficient of variance of Company Y
8.01005000
40
Since the co-efficient of variation of company X is less than of Y, Hence company X is more consistence and Mr Ranveer is suggested to invest in company X.
Ex :Weekly salaries of 100 employees in a firm are given below.
Salaries per week 400 – 500
500 – 600 600 – 700 700 – 800
No. of Employees 2 15 21 30
Salaries per week 800 – 900
900 – 1000
1000 – 1100
No. of Employees 20 9 3
Calculate the percentage of employees with salaries, in Rs per week, in the following Range :(Mean – 2 S.D.), (Mean + 2 S.D.).
Solution :
Salaries in Rs per week
Mid Value
(x)
f x – 750 = d
d’ = d/100
fd’ d’2
fd’2 cf
400 – 500500 – 600600 – 700700 – 800800 – 900
900 – 10001000 – 1100
450550650750850950
1,050
21521302093
- 300- 200- 100
0100200300
- 3- 2- 1
0123
- 6- 30- 21
020189
9410149
1860210
203627
21738688897100 100f 10df 1822df
740100100
10750 Mean
009,1269740..2
471269740..2
26925.134..2
5.134118210100
10
100
182100100..
2222
DSMean
DSMean
DS
n
df
n
fdDS
If the value 471 corresponds to n1th value and 1009 corresponds to n2th value. then
27.9797100
39100
3
97000,1009,1
4.1100
271100
2
0400471
22
11
nn
nn
Required Percentage is 97.27 – 1.4 = 96 % approx.