dispersion (measures of variability)

Dispersion(Measures of Variability)

Introduction and Definition :

Measures of Central tendency are called averages of first order, but these are not sensitive to the variability among the data. Two distributions may have same Mean, Median and Mode but the variability among the data in two distributions may be quiet different. For example consider two groups ‘A’ and ‘B’ as

Group A Group B

65666768717374777777

425458626777778593100

Computing Averages we get,

Group A Group B

Mean = 71.5Median = 72.0

Mode = 77

Mean = 71.5Median = 72.0

Mode = 77

It is clear that Group A and Group B have same values of Mean. Median and Mode, but careful perusal of data in both the groups show that the values in Group B are much more widely scattered than the values in group A

Sometimes the two series may have similar formation but their measurement of Measures of Central Tendency may be different.

A B C

1011121314

3031323334

100101102103104

The series have entirely different average but the same formation. Clearly measurement of central tendency do not indicate how the individual values in the distribution differ from each other or from the central value.When extent of variation of individual values in relation to other values or in relation to the central value is large, the Measures of Central Tendency fail to represent the series fully. The Measures of Dispersion (or variability) coupled with the Measures of Central Tendency gives a fairly good idea (not the full idea) about the nature of the distribution.To have a complete idea about the nature of data Moments and Kurtosis must also be measured.

Dispersion is the spread or scatter of values from the Measure of Central Tendency.A Measure of Dispersion is designed to state the extent to which individual observations (or items) vary from their average. Here we shall account only the amount of variation but not the direction.

D.C. Brooks define dispersion as “Dispersion or spread is the degree of scatter or variation of variable about the central value”.

Measures of Dispersion are called Averages of Second order because they are based on the deviations of the different values from the mean or other measures of central tendency which are called averages of First order.Objectives of Dispersion:

1. To know the average variation of different values from the average of a series.

2. To know about the composition of a series or the dispersion of the values on either sides of the central tendency.

3. To know the range of values.4. To compare the disparity between two or more series expressed

in different units in order to find out the degree of variation.5. To know whether the Central Tendency truly represent the series

or not. If the dispersion is more the central tendency do not represent the series.

Importance of Dispersion :

1. Conclusion drawn from the central tendency carries no meaning without knowing variation of various items of the series from the average.

2. Inequalities in the distribution of wealth and income can be measured in dispersion.

3. Dispersion is used to compare and measure concentration of economic power and monopoly in the country.

4. Dispersion is used in output control and price control.

Characteristics of a good Measure of Dispersion :

1. It should be simple to understand and easy to calculate.2. It should be rightly defined.3. It should be based on the all items of the series.4. It should not be unduly affected by the extreme items of the

series.5. It should be least affected by the sample fluctuations.6. It should be amenable to the further algebraic treatment.

Merits :

1. They indicate the dispersal character of the statistical series.2. They speak the dependability or reliability of the average value

of a series.3. The enable the statistician in comparing between two or more

statistical series with regard to the character of their uniformity or consistency or equitability.

4. They enable the one in controlling the variability of a phenomenon under his purview .

5. They facilitate in making further statistical analysis of the series through devices like co-efficient of Skewness, co-efficient of Kurtosis, co-efficient of correlation, variance analysis etc.

6. They supplement Measures of Central Tendency in finding out more and more information related to the nature of a series.

Demerits :

1. They are liable to misinterpretations and wrong generalization by a statistician of a biased character.

2. They are liable to yield inappropriate results as there are different methods of calculating the dispersion.

3. Except one or two, most of the dispersion involve complicated process of computing.

4. They by themselves can not give any idea about the symmetrical or skewed character of a series.

5. Like measures of central tendency, most of the measures of dispersion do not give a convincing idea about a series to a layman.

Different Measures of Dispersion :

Measures of Dispersion :

Absolute Measures

Relative Measures

RangeCo-efficient of Mean

Deviation

Co-efficient of Range

Co-efficient of Quartile DeviationCo-efficient of Mean Variation

Quartile DeviationAverage DeviationStandard DeviationLorentz Curve

An absolute Measure of Dispersion is expressed in terms of the units of the measurement of the variable. The relative measure of dispersion generally known as co-efficient of dispersion is expressed as a pure number independent of the units of measurement of the variable. The main disadvantage of the absolute measure of dispersion is that it can not be used to compare the variability of two expressions measured with different units. Comparison of distribution with respect to their variability from the central value is done by relative measure of dispersion.

Range :It is defined as difference between extreme value in the distribution, i.e.,

Range = Largest Value in the Distribution - Smallest Value in the Distribution

In case of continuous frequency distribution range is calculated by any one of the following two methods.By subtracting the lower limit of the lowest class from the upper limit of the highest class.

ORBy subtracting the mid-value of the lower from mid value of the highest class.

Important:

1. In calculation of Range only the values of the variable are taken in to account and the frequencies are completely ignored.

2. Open ended classes have no Range since they have no highest and lowest value.

3. Some times, variability of two series is measured by Range only though it is a rough measure of variability.

Co-efficient of Range :

Co-efficient of Range Max. Value + Min.

Value

Max. Value – Min. Value

Sum of the extreme values

Absolute Range

=

=

QQIR13

Inter-quartile Range :

Percentile Range :

PPRP1090

..

Quartile Deviation :

= Half the difference between the upper and lower quartile= (Q3 – Q1)/2 = Semi-inter quartile Range.

Co-efficient of Quartile Deviation :

13

13

QQ

QQ

)2(...............................

2

2

)1........(......................

2

2)(

2

3

133

3313

1

113

113

1113

13

13

DQQMedian

QQQ

QQQQMedian

DQQMedian

QQQ

QQQ

QQQQ

QQMedian

QMedianMedianQ

For Symmetric Distribution :

For Asymmetric Distribution :

Median Q1 + Q.D.Median Q3 – Q.D.

Merits of Quartile Deviation :

1. Simple to understand and easy to compute.2. Not affected by extreme values.3. Computed even if distribution has unequal intervals.4. Computed in case of open ended intervals.

Demerits of Quartile Deviation :

1. It is not based on the all observations of the series because it does not take frequencies below the lower quartile and above the upper quartile into consideration. .

2. Not amendable to algebraic treatment. 3. Affected by sample fluctuations.4. It is a distance on the scale and is not a measure from average.

Therefore, it fails to show variations around an average.

Use :

The quartile deviation, as a measure of dispersion, is mainly employed in open ended distributions. In many situations, we encounter such distributions because of the need to keep certain information confidential.

Mean Deviations :

Mean deviation (also called Average Deviation) is defined as the arithmetic mean of the absolute deviations of all the values from their Mean or Median or Mode.

n

i

in

i

i

n

i

in

i

i

N

Meanx

N

MedianxDeviationMean

ondistributifrequencyfor

n

Meanx

n

MedianxDeviationMean

11

11

Where N = f.

Steps to Calculate Mean deviation in Individual Values (or Observations):In case of individual observations, the following steps are involved

in the calculation of Mean Deviation :1. Calculate the Mean or Median of a given series.2. Write down the deviations (dxi )of each item (xi ) either from the

Mean or the Median without considering the sign.3. Sum up the deviations disregarding the signs, This is =

.4. Divide the total of the deviations by the number of observations

and the resulting value is the Mean Deviation.

n

iidx

1

Steps to Calculate Mean deviation in – Discrete Series :In case of discrete series, the following steps are involved in the

calculation of Mean Deviation :1. Calculate the Mean or Median of a given series.2. Write down the deviations (dxi )of each item (xi ) either from the

Mean or the Median without considering the sign.3. Multiple the deviations by frequencies ( ).4. Find sum of the products so obtained. This is 5. Divide the sum of products by the total frequency and the

resulting value is the mean deviation. Expressed as a formula form :

idxf dxf

f

dxfDM ..

Steps to Calculate Mean deviation in – Continuous Series :As regards the calculation of mean deviation in a continuous series,

the procedure to be adopted same as in the case of discrete series but with a minor difference. Here classes are replaced by mid-values and frequencies are multiplied by deviation of the mid-values from the mean.

Coefficient of Mean Deviation :

It is a relative measure of dispersion and is computed by the following formula :

Coefficient of Mean Deviation = Mea

n

Mean Deviation

Coefficient of Mean Deviation = Media

n

Mean Deviation

,When mean is used as a reference point

,When median is used as a reference point

Coefficient of Mean Deviation = Average

Used

Mean Deviation

When median is used as a reference point

Short Cut Method of finding Mean Deviation :

When mean (or median) is not a whole number but a fraction, the following short-cut formula can be used :

Mean Deviation from Mean

=[ Sum of the values > Mean ] - [ Sum of the

values < Mean ] Total Number of

Values

Mean Deviation from Median

=[ Sum of the values > Median ] - [ Sum of the values <

Median ] Total Number of Values

Short Cut Method of finding Mean Deviation ( for Frequency Distribution ):

f

cffAxf

f

cffdxf baba )( Mean Deviation from

Mean

Where,

dxf Sum of the products of the absolute deviations [ Considering all deviations positive or negative as positive ] and the respective frequencies when the deviations are taken from Assumed Mean

af Sum of the frequencies above the Mean = Sum of the values of the frequency less than the Mean.

bf Sum of the frequencies below the Mean = Sum of the values of the frequency more than the Mean.

C = Difference between the real Mean and arbitrary Mean.

Similarly, We can find the Mean Deviation from Median

f

cffdxf ba Mean Deviation from Median

bf Sum of the frequencies below the Mean = Sum of the values of the frequency more than the Median.

C = Difference between the real Median and arbitrary Median.

dxf Sum of the products of the absolute deviations [ Considering all deviations positive or negative as positive ] and the respective frequencies when the deviations are taken from Assumed Median

Note : The assumed Mean or Median must be selected from the class in which real Mean or Median lies.Mean deviation may also be calculated from following method :

M.D. from Mean N

XffXfXf baba

aXf

bXf

af

bf

N

Sum of the products of Mid-point (X) and frequencies above the Mean.Sum of the products of Mid-point (X) and frequencies below the Mean.

Sum of frequencies of Mid-points above the Mean.

Sum of frequencies of Mid-points below the Mean.

Total number of observations.

M.D. from Median

N

MedianffXfXf baba

Ex :Find the Mean Deviation around Median of the following series :

Marks (x)

5 10 15 20 25

Students

6 7 8 11 8

Solution :

Marks(x)

f c.f.

510152025

678

118

613213240

10505

10

60350

5580

Total N = 40

15xd df

230 df

Median = Value for [(N+1)/2]th term = 20.5 th term = 15

Mean Deviation marksf

df75.5

40

230

Ex :Find the Mean Deviation around Mean of the following data :

Class Interval Frequency

0 – 1010 – 2020 – 3030 – 4040 – 50 50 – 6060 – 70

812108327

Solution :For calculating the Mean deviation :

Mid Value

(x)

f fx x – 29

5152535455565

812108327

40180250280135110455

- 24- 14- 4

6162636

241446

162636

19216840484852

252

xx xxf

800

xxf ii 1450fx50N

2950

1450

f

fxMean

1680050

11..

xxff

DM ii

Standard Deviation :

Standard Deviation is the most important, the most reliable and the most widely used measure of dispersion. The term ‘standard’ is assigned to this measure of variation probability because of the following reasons.(i) It is the most commonly used and is the most flexible in terms of variety of

applications of all measures of variation.(ii) The area under any symmetrical curve rather normal curve remains the

same with in a fixed number of standard deviations from the Mean on either side of it, e.g., in any normal curve area with in Mean standard deviation is always 68.27% of the total area and the area is 95.45% of the total area with in mean 2 standard deviation.

(iii) The sum of squares of the deviations about the Mean is the least as compared to the sum of the squares of the deviations about the Median or Mode, therefore, root Mean square deviation about the Mean is the least.

It is most important of all the measures of dispersion because it is used in many other statistical operations, e.g., sampling techniques, correlation and regression analysis, finding co-efficient of variation, skewness, kurtosis, etc. standard deviation is also called ‘Mean Error’ or ‘Mean Square Error’ or ‘Root-Mean Square Deviation’ Unlike the Mean Deviation, which may be calculated around any average, the standard deviation is always computed around the Mean.

It is the square-root of the Arithmetic Mean of the squared deviations of all values from their Mean.

Standard Deviation :

n

XXn

ii

1

2

For Frequency Distribution :

f

XXfn

iii

1

2

Short-Cut Method For Finding Standard Deviation :

22

f

fd

f

fd

For discrete Distribution :

Standard deviation is calculated for Continuous Series by calculating the Mid-Points.

22

..

n

d

n

dhDS

h

d

h

AXd

And, if the frequencies are given, then

22

..

f

df

f

dfhDS

Standard Deviation from Step-Deviation Method :

Combined Standard Deviation of Two or More Groups :

Let A and B be two groups with n1 and n2 the respective number of values and 1 and 2 the respective standard deviations, then their combined S.D. 12 is given by

21

22

222

21

211

12 nn

dndn

This can also be given by the following formula :

2

)(4

1

2

)()(

22

21

12

21

221

22

21

12

21

2212

21

21

21

222

211

12

xx

xx

nn

xxnn

nn

nnnn

Co-efficient of Variation :

Karl Pearson’s Co-efficient of Variation

= 100 %Mea

n

Standard Deviation

Relations Between Measures of Dispersion :

1. Q.D. = (2/3) S.D.2. M.D. = (4/5) S.D.3. A.M. Q.D. would cover 50 % of the items.4. A.M. S.D. would cover 68.27 % of the item.5. A.M. M.D. would cover 57.51 % of the items.

Lorenz Curve:

Graphically the Dispersion is studied by means of Lorenz Curve. Lorenz Curve is a cumulative percentage curve in which the percentage of items (or frequencies) are shown with the corresponding percentage of factors like income, wealth, profits, etc. The curve is also used to study the distribution of land, wages, income, etc, among the population of a country.

Ex :Calculate Semi-inter quartile Range and co-efficient of quartile deviation from the following data.

Age (in years)

15 – 25

25 – 35

35 – 45

45 – 55 55 – 65 65 – 75 75 – 85

No. of persons

3 61 132 153 140 51 3

Age Frequency c.f.

15 – 25 25 – 35 35 – 45 45 – 5555 – 65 65 – 75 75 – 85

361132153140513

364196349489540543

N = 543

Solution :

N/4 = 543/4 = 135.75,

3N/4 = 407.25.

Q1 = 40.435

Q3 = 59.16

Q3 – Q1 = 18.725

Q3 + Q1 = 99.595

Semi – inter quartile Range = (Q3 – Q1)/2

= 18.725/2 = 9.365

Co-efficient of quartile deviation = (Q3 – Q1)/(Q3 + Q1)

= 18.725/99.595

= 0.188

Ex :Calculate the Mean deviation from the Mean for the following data :

Marks 0 – 10 10 – 20

20 – 30

30 – 40 40 – 50 50 – 60 60 – 70

No. of Students

6 5 8 15 7 6 3

Mid-Value (x)

Frequency (f )

fx

5152535455565

65815763

3075200525315330195

28.418.48.41.611.621.631.6

170.492.067.224.081.2129.694.8

N = 50

Computational Table

xx xxf

1670fx 2.659 xxf

184.1350

2.659..

4.3350

1670

f

xxfDM

N

fxx

Ex :Determine the S.D. from following data :Yield (in gm) 1216 1374 1167

1232 1407 14531202 1372 12781141 1221 1329

Solution :

(x ) x – 1200 = d d2

121612321202114113741407137212211167145312781329

163202

- 5917420717221

- 3325378

129

2561024

43481

302764284929584

4411089

640096084

16641

Total 992d 1957382d

35.97

7.68335.16311

12

992

12

195738

..

2

22

n

d

n

dDS

Ex :Find out the S.D. from the following table giving the wages of 230 persons

Wages (Rs) No. of Persons

140 – 160160 – 180180 – 200200 – 220220 – 240240 – 260260 – 280280 – 300

121835425045208

Solution :

Wages (Rs)

Mid Value(x)

x - 230

d/20 = d’

f fd’ d’2 fd’2

140 – 160160 – 180180 – 200200 – 220220 – 240240 – 260260 – 280280 – 300

150170190210230250270290

- 80- 60- 40- 20

0204060

- 4- 3- 2- 1

0123

121835425045208

- 48- 54- 70- 42

0454024

169410149

192162140420

458072

f = 230

fd’ = - 105

fd’2 =733

5.34

230

105

230

73320

..

2

22

i

f

df

f

dfDS

Case (1) :Mr Ranveer wants to invest Rs. 10,000 in one of the two companies X and Y. Average return in a year from company X is Rs. 4000 with a Standard Deviation of Rs 25, while in company Y the average return in a year is Rs 5000 with a Standard Deviation of Rs. 40.Which company would you recommend to Mr Ranveer for investment ? Justify your answer. Solution :

Coefficient of variance of Company X

625.01004000

25

Coefficient of variance of Company Y

8.01005000

40

Since the co-efficient of variation of company X is less than of Y, Hence company X is more consistence and Mr Ranveer is suggested to invest in company X.

Ex :Weekly salaries of 100 employees in a firm are given below.

Salaries per week 400 – 500

500 – 600 600 – 700 700 – 800

No. of Employees 2 15 21 30

Salaries per week 800 – 900

900 – 1000

1000 – 1100

No. of Employees 20 9 3

Calculate the percentage of employees with salaries, in Rs per week, in the following Range :(Mean – 2 S.D.), (Mean + 2 S.D.).

Solution :

Salaries in Rs per week

Mid Value

(x)

f x – 750 = d

d’ = d/100

fd’ d’2

fd’2 cf

400 – 500500 – 600600 – 700700 – 800800 – 900

900 – 10001000 – 1100

450550650750850950

1,050

21521302093

- 300- 200- 100

0100200300

- 3- 2- 1

0123

- 6- 30- 21

020189

9410149

1860210

203627

21738688897100 100f 10df 1822df

740100100

10750 Mean

009,1269740..2

471269740..2

26925.134..2

5.134118210100

10

100

182100100..

2222

DSMean

DSMean

DS

n

df

n

fdDS

If the value 471 corresponds to n1th value and 1009 corresponds to n2th value. then

27.9797100

39100

3

97000,1009,1

4.1100

271100

2

0400471

22

11

nn

nn

Required Percentage is 97.27 – 1.4 = 96 % approx.

dispersion (measures of variability)

Documents

measuresof dispersion

objectives of dispersion

importance of dispersion

statistical series

values of mean

relative measure of

measure of central tendency

good measure of dispersion