agec 405 lecture iii

59
Slide 1.1 Analysing Data

Upload: antonioanthony-liverpool

Post on 25-May-2015

179 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Agec 405 lecture iii

Slide 1.1

Analysing Data

Page 2: Agec 405 lecture iii

Slide 1.2

Descriptive Statistics

Page 3: Agec 405 lecture iii

Slide 1.3

Descriptive statistics

Descriptive statistics provides an objective way of describing and summarising data

Page 4: Agec 405 lecture iii

Slide 1.4

Data description

Page 5: Agec 405 lecture iii

Slide 1.5

Two key measures of data description

• Location – to show where the centre of the data is, giving some kind of typical or average value

• Dispersion (spread) – to show how spread out the data is around this centre, giving an idea of the range of values.

Page 6: Agec 405 lecture iii

Slide 1.6

Measures of location

• Three basic measures of location used:Arithmetic mean the average value

Median the middle value

Mode the most frequent value

• Three data structures:Untabulated (raw data)

Tabulated (ungrouped)

Tabulated (grouped)For use with Curwin & Slater, Quantitative

Methods for Business Decisions, 6th Edition ISBN: 9781844805747

3

Page 7: Agec 405 lecture iii

Slide 1.7

Mean - Untabulated (raw data)

The mean for untabulated data is obtained by dividing the sum of all values by the number of values in the data set. Thus,

Mean for population data:

Mean for sample data:

N

x

n

xx

Page 8: Agec 405 lecture iii

Slide 1.8

Example 1

The following are the ages of all eight employees of a small company:

53 32 61 27 39 44 49 57

Find the mean age of these employees.

Page 9: Agec 405 lecture iii

Slide 1.9

Solution 1

years 25.458

362

N

x

Thus, the mean age of all eight employees of this company is 45.25 years, or 45 years and 3 months.

Page 10: Agec 405 lecture iii

Slide 1.10

Mean - tabulated (ungrouped data)

Sample mean of data:

Where x is the value of the observation and f is the frequency of the observation.

fxx

n

Page 11: Agec 405 lecture iii

Slide 1.11

Example

The number of working days lost by employees in the last quarter (Calculate the average number of working

days lost)Number of days (x)

Number of employees (f)

0

1

2

3

4

5

410

430

290

180

110

20

1440

Page 12: Agec 405 lecture iii

Slide 1.12

x f fx

0 410

1 430

2 290

3 180

4 110

5 20

1440

0

430

580

540

440

100

2090

20901.451 days lost

1440

fxx

n

Page 13: Agec 405 lecture iii

Slide 1.13

Mean

• Mean can be affected by outliers

Page 14: Agec 405 lecture iii

Slide 1.14

Outliers

Definition Values that are very small or very large

relative to the majority of the values in a data set are called outliers or extreme values.

Page 15: Agec 405 lecture iii

Slide 1.15

Example 3

Table 2 lists the 2000 populations (in thousands) of the five Pacific states.

StatePopulation

(thousands)

WashingtonOregonAlaskaHawaiiCalifornia

58943421627

121233,872 An outlier

Table 2

Page 16: Agec 405 lecture iii

Slide 1.16

Solution 3

Now, to see the impact of the outlier on the value of the mean, we include the population of California and find the mean population of all five Pacific states. This mean is

thousand2.90055

872,33121262734215894Mean

Page 17: Agec 405 lecture iii

Slide 1.17

Example 3

Notice that the population of California is very large compared to the populations of the other four states. Hence, it is an outlier. Show how the inclusion of this outlier affects the value of the mean.

Page 18: Agec 405 lecture iii

Slide 1.18

Solution 3

If we do not include the population of California (the outlier) the mean population of the remaining four states (Washington, Oregon, Alaska, and Hawaii) is

thousand5.27884

121262734215894Mean

Page 19: Agec 405 lecture iii

Slide 1.19

Mean - tabulated (grouped data)

fx

N

fxx

n

Mean for population data:

Mean for sample data:

Where x is the midpoint and f is the frequency of a class.

Page 20: Agec 405 lecture iii

Slide 1.20

Calculate the mean of the grouped data below

Weight (oz) Class midpoint (x)

Frequency f fx

19.2-19.4 19.3 1 19.3

19.5-19.7 19.6 2 39.2

19.8-20.0 19.9 8 159.2

20.1-20.3 20.2 4 80.8

20.4-20.6 20.5 3 61.5

20.7-20.9 20.8 2 41.6

Total 20f n 401.6fx

Page 21: Agec 405 lecture iii

Slide 1.21

Mean

• n = 20• Ʃfx = 401.6

401.620.08

20

fxx oz

n

Page 22: Agec 405 lecture iii

Slide 1.22

Median

Definition The median is the value of the middle term

in a data set that has been ranked in increasing order.

Page 23: Agec 405 lecture iii

Slide 1.23

Median cont.

The calculation of the median consists of the following two steps:

1. Rank the data set in increasing order

2. Find the middle term in a data set with n values. The value of this term is the median.

Page 24: Agec 405 lecture iii

Slide 1.24

Median cont.

Value of Median for Ungrouped Data

set data ranked ain th term2

1 theof Value Median

n

Page 25: Agec 405 lecture iii

Slide 1.25

Example 6

The following data give the weight lost (in pounds) by a sample of five members of a health club at the end of two months of membership:

10 5 19 8 3

Find the median.

Page 26: Agec 405 lecture iii

Slide 1.26

Solution 6

First, we rank the given data in increasing order as follows:

3 5 8 10 19

There are five observations in the data set. Consequently, n = 5 and

32

15

2

1 termmiddle theofPosition

n

Page 27: Agec 405 lecture iii

Slide 1.27

Solution 6

Therefore, the median is the value of the third term in the ranked data.

3 5 8 10 19

The median weight loss for this sample of five members of this health club is 8 pounds.

Median

Page 28: Agec 405 lecture iii

Slide 1.28

Example 7

Table 8 lists the total revenue for the 12 top-grossing North American concert tours of all time.

Find the median revenue for these data.

Page 29: Agec 405 lecture iii

Slide 1.29

Table 8

Tour Artist

Total Revenue

(millions of dollars)

Steel Wheels, 1989

Magic Summer, 1990

Voodoo Lounge, 1994

The Division Bell, 1994

Hell Freezes Over, 1994

Bridges to Babylon, 1997

Popmart, 1997

Twenty-Four Seven, 2000

No Strings Attached, 2000

Elevation, 2001

Popodyssey, 2001

Black and Blue, 2001

The Rolling Stones

New Kids on the Block

The Rolling Stones

Pink Floyd

The Eagles

The Rolling Stones

U2

Tina Turner

‘N-Sync

U2

‘N-Sync

The Backstreet Boys

98.0

74.1

121.2

103.5

79.4

89.3

79.9

80.2

76.4

109.7

86.8

82.1

Page 30: Agec 405 lecture iii

Slide 1.30

Solution 7

First we rank the given data in increasing order, as follows:

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2

There are 12 values in this data set. Hence, n = 12 and

5.62

112

2

1 termmiddle theofPosition

n

Page 31: Agec 405 lecture iii

Slide 1.31

Solution 7

Therefore, the median is given by the mean of the sixth and the seventh values in the ranked data.

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2

Thus the median revenue for the 12 top-grossing North American concert tours of all time is $84.45 million.

million 45.84$45.842

8.861.82Median

Page 32: Agec 405 lecture iii

Slide 1.32

Median - tabulated (ungrouped data)

Steps:

• Order the observations

• Calculate cummulative frequency

Note:

• Cummulative frequency is the number of items with a given value or less

Page 33: Agec 405 lecture iii

Slide 1.33

Example

The number of working days lost by employees in the last quarter (Calculate the median number of working

days lost)Number of days (x)

Number of employees (f)

0

1

2

3

4

5

410

430

290

180

110

20

1440

Page 34: Agec 405 lecture iii

Slide 1.34

x f Cumulative frequency

0 410

1 430

2 290

3 180

4 110

5 20

1440

410

840=410+430

1130=840+290

1310=1130+180

1420=1310+110

1440=1420+20

The position of the median is (n+1)/2 = (1440+1)/2 =720.5ie between 720th and 721st one day

Page 35: Agec 405 lecture iii

Slide 1.35

Advantages of using median

The advantage of using the median as a measure of central tendency is that it is not influenced by outliers. Consequently, the median is preferred over the mean as a measure of central tendency for data sets that contain outliers.

Page 36: Agec 405 lecture iii

Slide 1.36

Median for grouped data

• Median for a grouped data is given by:

• L ≡ lower limit of the median class• n ≡ number of observation• F ≡ sum of frequency up to but excludes the median class

• fm ≡ frequency of the median class

• c ≡ width of the class

2

m

n Fmedian L c

f

Page 37: Agec 405 lecture iii

Slide 1.37

Calculate the median of the grouped data below

Weight (oz) Frequency (f)

19.2-19.4 1

19.5-19.7 2

19.8-20.0 8

20.1-20.3 4

20.4-20.6 3

20.7-20.9 2

Total 20f n

Page 38: Agec 405 lecture iii

Slide 1.38

Median

• L ≡ 19.8, n ≡ 20, F ≡ 3, fm ≡ 8, c ≡ 0.3

2 20 2 319.8 0.3

8

7 19.8 0.3 19.8 0.2625

8

20.06 oz

m

n Fmed L c

f

Page 39: Agec 405 lecture iii

Slide 1.39

Mode

Definition

The mode is the value that occurs with the highest frequency in a data set.

Page 40: Agec 405 lecture iii

Slide 1.40

Example 8

The following data give the speeds (in miles per hour) of eight cars that were stopped for speeding violations.

77 69 74 81 71 68 74 73

Find the mode.

Page 41: Agec 405 lecture iii

Slide 1.41

Solution 8

In this data set, 74 occurs twice and each of the remaining values occurs only once. Because 74 occurs with the highest frequency, it is the mode. Therefore,

Mode = 74 miles per hour

Page 42: Agec 405 lecture iii

Slide 1.42

Mode cont.

• A data set may have none or many modes, whereas it will have only one mean and only one median.– The data set with only one mode is called

unimodal.– The data set with two modes is called

bimodal.– The data set with more than two modes is

called multimodal.

Page 43: Agec 405 lecture iii

Slide 1.43

Different patterns for the mode

Page 44: Agec 405 lecture iii

Slide 1.44

Different patterns for the mode

Page 45: Agec 405 lecture iii

Slide 1.45

Different patterns for the mode

Page 46: Agec 405 lecture iii

Slide 1.46

Mode - tabulated (ungrouped data)

The number of working days lost by employees in the last quarter (Calculate the mode number of working

days lost)Number of days (x)

Number of employees (f)

0

1

2

3

4

5

410

430

290

180

110

20

1440

Page 47: Agec 405 lecture iii

Slide 1.47

The mode correspond to the highest frequency occurring number which is one day lost

Number of days (x)

Number of employees (f)

0

1

2

3

4

5

410

430

290

180

110

20

1440

Page 48: Agec 405 lecture iii

Slide 1.48

Advantage of using the mode

One advantage of the mode is that it can be calculated for both quantitative and qualitative kinds of data, whereas the mean and median can be calculated for only quantitative data.

Page 49: Agec 405 lecture iii

Slide 1.49

Example 12

The status of five students who are members of the student senate at a college are senior, sophomore, senior, junior, senior. Find the mode.

Page 50: Agec 405 lecture iii

Slide 1.50

Solution 12

Because senior occurs more frequently than the other categories, it is the mode for this data set.

We cannot calculate the mean and median for this data set.

Page 51: Agec 405 lecture iii

Slide 1.51

Mode for tabulated grouped data

• For a group data, mode is given as:

• L ≡ lower limit of the modal class

• d1 ≡ frequency of modal class minus previous class

• d2 ≡ frequency of modal class minus following class

• c ≡ width of the class

1

1 2

dmode L c

d d

Page 52: Agec 405 lecture iii

Slide 1.52

Calculate the mode of the grouped data below

Weight (oz) Frequency (f)

19.2-19.4 1

19.5-19.7 2

19.8-20.0 8

20.1-20.3 4

20.4-20.6 3

20.7-20.9 2

Total 20f n

Page 53: Agec 405 lecture iii

Slide 1.53

Mode

• L ≡ 19.8, d1 ≡ 6, d2 ≡ 4, c ≡ 0.3

1

1 2

619.8 0.3

6 4

1.8 19.8 19.8 0.18

10 19.98 oz

dmode L c

d d

Page 54: Agec 405 lecture iii

Slide 1.54

Relationships among the Mean, Median, and Mode

1. This is observed with regards to the shape of the frequency distribution (Skewness).

In Figure 1, the values of the mean, median, and mode are identical, and they lie at the center of the distribution.

Page 55: Agec 405 lecture iii

Slide 1.55

Figure 1 Zero Skewed (Symmetrical)

Page 56: Agec 405 lecture iii

Slide 1.56

Figure 2 Positively skewed

Page 57: Agec 405 lecture iii

Slide 1.57

Positively Skewed

2. A histogram and a frequency curve is positively skewed if the right tail is longer (Figure 2),

the value of the mean > median > mode

Notice that the mode always occurs at the peak point. The value of the mean is the largest in this case

because it is sensitive to outliers that occur in the right tail. Outliers in the right tail pull the mean to the right.

Page 58: Agec 405 lecture iii

Slide 1.58

Figure 3 Negatively skewed

Page 59: Agec 405 lecture iii

Slide 1.59

Negatively Skewed

3. A histogram and a frequency distribution is negatively skewed if the left tail is longer (Figure 3)

the value of the mode > median > mean – In this case, the outliers in the left tail pull the

mean to the left.