agec 405 lecture iii

1

Analysing Data

2

Descriptive Statistics

3

Descriptive statistics

Descriptive statistics provides an objective way of describing and summarising data

4

Data description

5

Two key measures of data description

• Location – to show where the centre of the data is, giving some kind of typical or average value

• Dispersion (spread) – to show how spread out the data is around this centre, giving an idea of the range of values.

6

Measures of location

• Three basic measures of location used:Arithmetic mean the average value

Median the middle value

Mode the most frequent value

• Three data structures:Untabulated (raw data)

Tabulated (ungrouped)

Tabulated (grouped)For use with Curwin & Slater, Quantitative

Methods for Business Decisions, 6th Edition ISBN: 9781844805747

3

7

Mean - Untabulated (raw data)

The mean for untabulated data is obtained by dividing the sum of all values by the number of values in the data set. Thus,

Mean for population data:

Mean for sample data:

N

x

n

xx

8

Example 1

The following are the ages of all eight employees of a small company:

53 32 61 27 39 44 49 57

Find the mean age of these employees.

9

Solution 1

years 25.458

362

N

x

Thus, the mean age of all eight employees of this company is 45.25 years, or 45 years and 3 months.

10

Mean - tabulated (ungrouped data)

Sample mean of data:

Where x is the value of the observation and f is the frequency of the observation.

fxx

n

11

Example

The number of working days lost by employees in the last quarter (Calculate the average number of working

days lost)Number of days (x)

Number of employees (f)

0

1

2

3

4

5

410

430

290

180

110

20

1440

12

x f fx

0 410

1 430

2 290

3 180

4 110

5 20

1440

0

430

580

540

440

100

2090

20901.451 days lost

1440

fxx

n

13

Mean

• Mean can be affected by outliers

14

Outliers

Definition Values that are very small or very large

relative to the majority of the values in a data set are called outliers or extreme values.

15

Example 3

Table 2 lists the 2000 populations (in thousands) of the five Pacific states.

StatePopulation

(thousands)

WashingtonOregonAlaskaHawaiiCalifornia

58943421627

121233,872 An outlier

Table 2

16

Solution 3

Now, to see the impact of the outlier on the value of the mean, we include the population of California and find the mean population of all five Pacific states. This mean is

thousand2.90055

872,33121262734215894Mean

17

Example 3

Notice that the population of California is very large compared to the populations of the other four states. Hence, it is an outlier. Show how the inclusion of this outlier affects the value of the mean.

18

Solution 3

If we do not include the population of California (the outlier) the mean population of the remaining four states (Washington, Oregon, Alaska, and Hawaii) is

thousand5.27884

121262734215894Mean

19

Mean - tabulated (grouped data)

fx

N

fxx

n

Mean for population data:

Mean for sample data:

Where x is the midpoint and f is the frequency of a class.

20

Calculate the mean of the grouped data below

Weight (oz) Class midpoint (x)

Frequency f fx

19.2-19.4 19.3 1 19.3

19.5-19.7 19.6 2 39.2

19.8-20.0 19.9 8 159.2

20.1-20.3 20.2 4 80.8

20.4-20.6 20.5 3 61.5

20.7-20.9 20.8 2 41.6

Total 20f n 401.6fx

21

Mean

• n = 20• Ʃfx = 401.6

401.620.08

20

fxx oz

n

22

Median

Definition The median is the value of the middle term

in a data set that has been ranked in increasing order.

23

Median cont.

The calculation of the median consists of the following two steps:

1. Rank the data set in increasing order

2. Find the middle term in a data set with n values. The value of this term is the median.

24

Median cont.

Value of Median for Ungrouped Data

set data ranked ain th term2

1 theof Value Median

n

25

Example 6

The following data give the weight lost (in pounds) by a sample of five members of a health club at the end of two months of membership:

10 5 19 8 3

Find the median.

26

Solution 6

First, we rank the given data in increasing order as follows:

3 5 8 10 19

There are five observations in the data set. Consequently, n = 5 and

32

15

2

1 termmiddle theofPosition

n

27

Solution 6

Therefore, the median is the value of the third term in the ranked data.

3 5 8 10 19

The median weight loss for this sample of five members of this health club is 8 pounds.

Median

28

Example 7

Table 8 lists the total revenue for the 12 top-grossing North American concert tours of all time.

Find the median revenue for these data.

29

Table 8

Tour Artist

Total Revenue

(millions of dollars)

Steel Wheels, 1989

Magic Summer, 1990

Voodoo Lounge, 1994

The Division Bell, 1994

Hell Freezes Over, 1994

Bridges to Babylon, 1997

Popmart, 1997

Twenty-Four Seven, 2000

No Strings Attached, 2000

Elevation, 2001

Popodyssey, 2001

Black and Blue, 2001

The Rolling Stones

New Kids on the Block

The Rolling Stones

Pink Floyd

The Eagles

The Rolling Stones

U2

Tina Turner

‘N-Sync

U2

‘N-Sync

The Backstreet Boys

98.0

74.1

121.2

103.5

79.4

89.3

79.9

80.2

76.4

109.7

86.8

82.1

30

Solution 7

First we rank the given data in increasing order, as follows:

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2

There are 12 values in this data set. Hence, n = 12 and

5.62

112

2

1 termmiddle theofPosition

n

31

Solution 7

Therefore, the median is given by the mean of the sixth and the seventh values in the ranked data.

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2

Thus the median revenue for the 12 top-grossing North American concert tours of all time is $84.45 million.

million 45.84$45.842

8.861.82Median

32

Median - tabulated (ungrouped data)

Steps:

• Order the observations

• Calculate cummulative frequency

Note:

• Cummulative frequency is the number of items with a given value or less

33

Example

The number of working days lost by employees in the last quarter (Calculate the median number of working



0

1

2

3

4

5

410

430

290

180

110

20

1440

34

x f Cumulative frequency

0 410

1 430

2 290

3 180

4 110

5 20

1440

410

840=410+430

1130=840+290

1310=1130+180

1420=1310+110

1440=1420+20

The position of the median is (n+1)/2 = (1440+1)/2 =720.5ie between 720th and 721st one day

35

Advantages of using median

The advantage of using the median as a measure of central tendency is that it is not influenced by outliers. Consequently, the median is preferred over the mean as a measure of central tendency for data sets that contain outliers.

36

Median for grouped data

• Median for a grouped data is given by:

• L ≡ lower limit of the median class• n ≡ number of observation• F ≡ sum of frequency up to but excludes the median class

• fm ≡ frequency of the median class

• c ≡ width of the class

2

m

n Fmedian L c

f

37

Calculate the median of the grouped data below

Weight (oz) Frequency (f)

19.2-19.4 1

19.5-19.7 2

19.8-20.0 8

20.1-20.3 4

20.4-20.6 3

20.7-20.9 2

Total 20f n

38

Median

• L ≡ 19.8, n ≡ 20, F ≡ 3, fm ≡ 8, c ≡ 0.3

2 20 2 319.8 0.3

8

7 19.8 0.3 19.8 0.2625

8

20.06 oz

m

n Fmed L c

f

39

Mode

Definition

The mode is the value that occurs with the highest frequency in a data set.

40

Example 8

The following data give the speeds (in miles per hour) of eight cars that were stopped for speeding violations.

77 69 74 81 71 68 74 73

Find the mode.

41

Solution 8

In this data set, 74 occurs twice and each of the remaining values occurs only once. Because 74 occurs with the highest frequency, it is the mode. Therefore,

Mode = 74 miles per hour

42

Mode cont.

• A data set may have none or many modes, whereas it will have only one mean and only one median.– The data set with only one mode is called

unimodal.– The data set with two modes is called

bimodal.– The data set with more than two modes is

called multimodal.

43

Different patterns for the mode

44


45


46

Mode - tabulated (ungrouped data)

The number of working days lost by employees in the last quarter (Calculate the mode number of working



0

1

2

3

4

5

410

430

290

180

110

20

1440

47

The mode correspond to the highest frequency occurring number which is one day lost

Number of days (x)


0

1

2

3

4

5

410

430

290

180

110

20

1440

48

Advantage of using the mode

One advantage of the mode is that it can be calculated for both quantitative and qualitative kinds of data, whereas the mean and median can be calculated for only quantitative data.

49

Example 12

The status of five students who are members of the student senate at a college are senior, sophomore, senior, junior, senior. Find the mode.

50

Solution 12

Because senior occurs more frequently than the other categories, it is the mode for this data set.

We cannot calculate the mean and median for this data set.

51

Mode for tabulated grouped data

• For a group data, mode is given as:

• L ≡ lower limit of the modal class

• d1 ≡ frequency of modal class minus previous class

• d2 ≡ frequency of modal class minus following class

• c ≡ width of the class

1

1 2

dmode L c

d d

52

Calculate the mode of the grouped data below

Weight (oz) Frequency (f)

19.2-19.4 1

19.5-19.7 2

19.8-20.0 8

20.1-20.3 4

20.4-20.6 3

20.7-20.9 2

Total 20f n

53

Mode

• L ≡ 19.8, d1 ≡ 6, d2 ≡ 4, c ≡ 0.3

1

1 2

619.8 0.3

6 4

1.8 19.8 19.8 0.18

10 19.98 oz

dmode L c

d d

54

Relationships among the Mean, Median, and Mode

1. This is observed with regards to the shape of the frequency distribution (Skewness).

In Figure 1, the values of the mean, median, and mode are identical, and they lie at the center of the distribution.

55

Figure 1 Zero Skewed (Symmetrical)

56

Figure 2 Positively skewed

57

Positively Skewed

2. A histogram and a frequency curve is positively skewed if the right tail is longer (Figure 2),

the value of the mean > median > mode

Notice that the mode always occurs at the peak point. The value of the mean is the largest in this case

because it is sensitive to outliers that occur in the right tail. Outliers in the right tail pull the mean to the right.

58

Figure 3 Negatively skewed

59

Negatively Skewed

3. A histogram and a frequency distribution is negatively skewed if the left tail is longer (Figure 3)

the value of the mode > median > mean – In this case, the outliers in the left tail pull the

mean to the left.

agec 405 lecture iii

Technology