chapter 3 dispersion

12
1 CHAPTER 3: DATA ANALYSIS - MEASURES OF DISPERSION

Upload: elan-johnson

Post on 12-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

statistic and probability

TRANSCRIPT

Page 1: Chapter 3 Dispersion

1

CHAPTER 3:

DATA ANALYSIS - MEASURES OF DISPERSION

3.3 Introduction 3.4 Ungrouped Data 3.4.1 Range 3.4.2 Inter-Quartile Range 3.4.3 Semi Inter-Quartile Range 3.4.4 Variance and Standard Deviation 3.5 Grouped Data 3.5.1 Range 3.5.2 Inter-Quartile Range 3.5.3 Semi Inter-Quartile Range 3.5.4 Variance and Standard Deviation 3.6 Relative Dispersion 3.7 Skewness

Page 2: Chapter 3 Dispersion

2

CHAPTER 3 contd..: MEASURES OF DISPERSION

3.33.33.33.3 INTRODUCTIONINTRODUCTIONINTRODUCTIONINTRODUCTION

Measures of central value give us one single figure that represents the entire data.

However, the measures of averages alone cannot adequately describe a set of

observations; it is also necessary to describe the variability or dispersion of the

observation.

Two sets of data might have the same mean value but not necessary of the same

spread. For instance, the number sets 6, 7, 8, 9, 6 and 2, 7, 9, 13, 5, 6 have the same

mean, 7, but most of the numbers in the first set are around the mean value. On the other

hand, the second set is more spread away from the mean. The difference in the spread can

be determined by the measure of dispersion.

There are 5 methods of measures of dispersion:

1) Range

2) Inter-quartile Range

3) Semi Inter-quartile range

4) Variance

5) Standard deviation

Range is however not a good measure of dispersion because it is influenced by the

extreme values and the calculation does not cover all observations. Among all, variance

and standard deviation are the most useful and widely used measure of dispersion. This is

because, although they are influenced by the extreme values, the calculations cover all

the observations.

Standard deviation is taken as the square root of variance. The deviation in the term

refers to the difference between the observed data and the mean. It gives us an idea of

how close are the values of the data around the mean, generally, the larger value of

standard deviation for a data set, the larger the spread of the observations around the

mean.

3.43.43.43.4 UNGROUPED DATAUNGROUPED DATAUNGROUPED DATAUNGROUPED DATA

3.4.13.4.13.4.13.4.1 RANGERANGERANGERANGE

The range is the difference between the highest and lowest value in the

distribution.

Range = Highest value – Lowest value

Page 3: Chapter 3 Dispersion

3

Example 1:

Calculate the prices of shares of ABS Co. Ltd over seven-day week:

Prices of shares (RM’00) :

21 20 28 16 22 25

Range = 28 – 16 = 12 ( RM 1 200)

3.4.23.4.23.4.23.4.2 INTERINTERINTERINTER----QUARTQUARTQUARTQUARTILE RANGEILE RANGEILE RANGEILE RANGE

Inter-quartile range is the difference between the quartiles Q3 – Q1 . This covers

the middle 50% of the observations.

For example, for the data in Example 1:

16 20 21 22 25 28

Q1= n+1th

= 7 = 1.25th

observation = 16+20 = 18

4 4 2

Q3=3( n+1)th

= 21 = 5.25th

observation = 25+28 = 26.5

4 4 2

Therefore inter-quartile range = 26.5-18 = 8.5 (RM8500)

3.4.3.4.3.4.3.4.3333 SEMI INTERSEMI INTERSEMI INTERSEMI INTER----QUARTILE QUARTILE QUARTILE QUARTILE RANGERANGERANGERANGE

The semi inter-quartile range (quartile deviation) is the average of the

differences of the quartiles from the median.

(Same example)

If Q3 = 26.5 and Q1 = 18

Semi Inter-Quartile Range = 26.5 – 18 = 4.25 (RM425)

2

Interquartile Range = Q3 – Q1

Semi Inter-Quartile range = Q3 – Q1

2

Page 4: Chapter 3 Dispersion

4

3.4.3.4.3.4.3.4.4444 VVVVARIANCEARIANCEARIANCEARIANCE AND STANDARD DEVIATIONAND STANDARD DEVIATIONAND STANDARD DEVIATIONAND STANDARD DEVIATION

For ungrouped data, the variance is given as

�� � ∑�������� ���2 � ∑ �2��∑ ��2���1

The standard deviation is

� � �∑�������� ��� � �∑ �2��∑ ��2���1

or simply � � √��������

Example 2:

1. Find the variance and standard deviation for the sample data:

5, 2, 3, 4, 5, 6, 3

Solution:

∑� � 5 ! 2 ! 3 ! 4 ! 5 ! 6 ! 3 � 28

∑�� � 5� ! 2� ! 3� !⋯! 3� � 124 �∑ ���� � 28�7 � 112

Variance, �( � ∑���∑)�*���

� ��+����,�� � ��- � (

Standard deviation, � � √./�0/�12 � √2 � 3. 535

Page 5: Chapter 3 Dispersion

5

2. Calculate the standard deviations for the following sets of sample data. Hence,

determine which one is more dispersed about the mean, than the other.

Data A: 2, 7, 10, 9, 2, 5, 16

Data B: 10,8,14, 20, 40, 32, 1, 4, 8, 36, 12, 32

Solution:

For data A:

∑� � 51

∑�� � 519

�∑��� � 7�,

� � 8∑�� ��∑����� � 1

� �519 � �51��77 � 1

� �147.36 � 5. 9:;

For data B:

∑� � 217

∑�� � 5929

�∑��� � ��,��

� � 8∑�� ��∑����� � 1

� �5929 � �217��1212 � 1

� �2004.9211 � 3=. :>

Therefore, data B is more dispersed than data A.

Page 6: Chapter 3 Dispersion

6

3.53.53.53.5 GROUPED DATAGROUPED DATAGROUPED DATAGROUPED DATA

3.3.3.3.5.15.15.15.1 RANGERANGERANGERANGE

Range for grouped data:

3.3.3.3.5.25.25.25.2 INTERINTERINTERINTER----QUARTILE RANGEQUARTILE RANGEQUARTILE RANGEQUARTILE RANGE

The values of Q3 and Q1 are obtained whether from the cumulative

frequency curve (graphical method) or using formula.

3.3.3.3.5.35.35.35.3 SEMI INTERSEMI INTERSEMI INTERSEMI INTER----QUARTILE QUARTILE QUARTILE QUARTILE RANGERANGERANGERANGE

The semi inter-quartile range (quartile deviation) is the average of the

differences of the quartiles from the median.

3.3.3.3.5.5.5.5.4444 VARIANCE AND STANDARD DEVIATIONVARIANCE AND STANDARD DEVIATIONVARIANCE AND STANDARD DEVIATIONVARIANCE AND STANDARD DEVIATION

For grouped data, the variance is given as

�� � ∑?�������� or �� � ∑?���∑@)�*���

And the standard deviation is

� � �∑?�������� or � � �∑?���∑@)�*���

Range = Upper boundary of last class – lower boundary of first class

Interquartile Range = Q3 – Q1

Semi Inter-Quartile range = Q3 – Q1

2

Page 7: Chapter 3 Dispersion

7

Example 3:

The marks obtained by 50 students in a certain college.

Marks No. of students

10 but under 20 3

20 but under 30 7

30 but under 40 10

40 but under 50 20

50 but under 60 7

60 but under 70 3

Find the inter-quartile range, variance and standard deviation by using both

graphical (semi inter-quartile range only) and calculation methods.

3.3.3.3.6666 RELATIVE DISPERSIONRELATIVE DISPERSIONRELATIVE DISPERSIONRELATIVE DISPERSION

The standard deviation, on its own, tells us very little about the amount of

dispersion in the data. To compare the dispersion between different set of data, we

need a measure of relative dispersion which expresses the magnitude of the

standard deviation to the mean. In this case, we use the coefficient of variation.

Coefficient of variation = standard deviation x 100

mean

C.V = 100×

x

s

Example 4:

An analysis of monthly wages paid to workers in two firms, A and B, belonging

to the same industry gives the following results.

A B Average monthly wages RM 105 RM 95

Standard deviation of the distribution of wages RM 20 RM 22

Calculate the coefficient of variation and comment on your finding.

Larger Percentage → Greater Variation

Larger Variation → Less Consistency

Smaller Variation → More Consistency

Page 8: Chapter 3 Dispersion

8

Solution:

Firm A C.V = 20 x 100 = 19.05 %

105

Firm B C.V = 22 x 100 = 23.16 %

95

The value of the coefficient of variation for firm B is higher than that of firm A.

Therefore, wages in firm B are less consistent compared to firm A.

3.73.73.73.7 SKEWNESSSKEWNESSSKEWNESSSKEWNESS

Although frequency curves can take any shape, certain are often encountered. The

most common is the bell-shaped distribution.

i) Symmetrical

mean, median, mode

This distribution is perfectly symmetrical. The graph for the symmetrical

distribution is the normal curve. The mean, median and mode all have the same

value.

ii) Asymmetrical

Skewness is a term used to describe the asymmetry properties of a frequency

curve. Skewness can either be positive or negative.

� Positive skewness

When the frequency distribution has a tail stretching out to the right, it is

said to be positively skewed.

Mode Median

Mean

Page 9: Chapter 3 Dispersion

9

� Negative skewness

When the frequency distribution has a tail stretching out to the left, it is

said to be negatively skewed.

Mean Median Mode

iii) Measure the skewness by calculation.

To measure the amount of skewness in a set of data, a number of measures

have been suggested. The most common measure is the Pearson

coefficient of skewness which;

a)

; or

b)

EXERCISE

1. The table below gives an analysis of the debtors’s balance of Fuel

Suppliers Bhd.

a) Calculate the median and quartile deviation

b) Estimate the mean and standard deviation

c) Calculate the coefficient of variation.

d) Calculate the measure of skewness of these data.

Balance Outstanding

(RM’00)

No. of accounts

20 but under 39.9 1

40 but under 59.9 3

60 but under 79.9 6

80 but under 99.9 10

100 but under 119.9 5

120 but under 139.9 3

140 but under 159.9 2

Skewness = Mean – Mode

Standard Deviation

Skewness = 3 (Mean – Median)

Standard Deviation

Page 10: Chapter 3 Dispersion

10

2. The sales orders (RM) achieved by 10 companies in Shah Alam.

Area A B C D E F G H I J

Sales (RM) 150 130 140 150 140 300 110 120 140 120

For these sales data calculate;

i) Arithmetic mean

ii) Mode

iii) Median

iii) Mean Deviation

iv) Standard Deviation

3. A long term investor is considering selling one of the two stocks, X, Y or

Z. He decides to hold on to the more consistent stock. Using the

information in the following table, find:

Stock Average Price Standard Deviation

X RM 76 RM 12

Y RM 45 RM 6

Z RM 50 s

i) If stock Y and Z have the same relative dispersion. What is the

standard deviation for stock Z

ii) Which stock should he sell?

4. The following frequency polygon shows the profit in (RM00) of a random

sample of 48 shops in 2006.

i) Construct a frequency table using the above polygon

ii) From the frequency table, calculate the median, mean and standard

deviation.

0

2

4

6

8

10

12

14

16

4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5

Profit (RM'000)

No. of sh

ops

4

9

14

11

7

3

Page 11: Chapter 3 Dispersion

11

5. The time taken of orders (in minutes) of 10 randomly selected telephone

calls taking by telephone company A was.

2.82 3.71 2.45 5.42 6.24 4.24 3.70 4.75 3.25 4.02

i) Calculate the range of data.

ii) Estimate the mean and standard deviation of time taken.

iii) Calculate the coefficient of variation

iv) The mean and standard deviation of time taken of order by company B

was 3.77 and 2.12 minutes respectively. Calculate the coefficient of

variation for this data and compare your answer with that obtained in

part (iii).

6. The following data shows the results in (CGPA) of 80 students in class A.

CGPA Number of student

2.5 - 2.6 4

2.7 - 2.8 13

2.9 - 3.0 24

3.1 - 3.2 10

3.3 - 3.4 18

3.5 - 3.6 11

a) Draw a histogram to represent these data

b) By using the histogram, state the modal class

c) Calculate the mean, median and standard deviation of the results.

d) The mean and standard deviation of the results for students in class

B is 3.12 and 0.366 respectively. Using the appropriate

measurement, compare the performance consistency of the two

classes.

7. FLY company employed 50 salespersons to sell its products. The

following information is gathered from its travelling claims records.

Distance travelled per month

(in km)

No. of salespersons

300 and less than 400 3

400 and less than 500 8

500 and less than 600 18

600 and less than 700 11

700 and less than 800 10

(i) Draw an ogive for the data.

(ii) Using ogive in (i), estimate

a) Median, lower quartile and upper quartile.

b) the percentage of salespersons who travelled less than 550 km in a

month

c) the minimum monthly distance travelled by the 30% of the active

travelling salespersons

(iii) Find the mean and standard deviation for the above data.

(iv) Determine the shape of skewness using appropriate calculation.

Page 12: Chapter 3 Dispersion

12

8. A research physician wants to estimate the average age of people with

diabetes. She takes a random sample of 27 diabetics and obtains the

following ages.

54 48 61 38 23 79 70 82 56

38 7 79 83 57 41 10 75 60

68 76 61 77 65 55 21 53 83

(i) Construct a frequency table by using all the steps required.

(ii) Find the mean and median of the data.

(iii) Given that the standard deviation of the average age people with

diabetes is 22.068;

a) Determine the shape of the distribution and

b) Coefficient of variation for the above data.