agec 405 lecture iii
TRANSCRIPT
Slide 1.1
Analysing Data
Slide 1.2
Descriptive Statistics
Slide 1.3
Descriptive statistics
Descriptive statistics provides an objective way of describing and summarising data
Slide 1.4
Data description
Slide 1.5
Two key measures of data description
• Location – to show where the centre of the data is, giving some kind of typical or average value
• Dispersion (spread) – to show how spread out the data is around this centre, giving an idea of the range of values.
Slide 1.6
Measures of location
• Three basic measures of location used:Arithmetic mean the average value
Median the middle value
Mode the most frequent value
• Three data structures:Untabulated (raw data)
Tabulated (ungrouped)
Tabulated (grouped)For use with Curwin & Slater, Quantitative
Methods for Business Decisions, 6th Edition ISBN: 9781844805747
3
Slide 1.7
Mean - Untabulated (raw data)
The mean for untabulated data is obtained by dividing the sum of all values by the number of values in the data set. Thus,
Mean for population data:
Mean for sample data:
N
x
n
xx
Slide 1.8
Example 1
The following are the ages of all eight employees of a small company:
53 32 61 27 39 44 49 57
Find the mean age of these employees.
Slide 1.9
Solution 1
years 25.458
362
N
x
Thus, the mean age of all eight employees of this company is 45.25 years, or 45 years and 3 months.
Slide 1.10
Mean - tabulated (ungrouped data)
Sample mean of data:
Where x is the value of the observation and f is the frequency of the observation.
fxx
n
Slide 1.11
Example
The number of working days lost by employees in the last quarter (Calculate the average number of working
days lost)Number of days (x)
Number of employees (f)
0
1
2
3
4
5
410
430
290
180
110
20
1440
Slide 1.12
x f fx
0 410
1 430
2 290
3 180
4 110
5 20
1440
0
430
580
540
440
100
2090
20901.451 days lost
1440
fxx
n
Slide 1.13
Mean
• Mean can be affected by outliers
Slide 1.14
Outliers
Definition Values that are very small or very large
relative to the majority of the values in a data set are called outliers or extreme values.
Slide 1.15
Example 3
Table 2 lists the 2000 populations (in thousands) of the five Pacific states.
StatePopulation
(thousands)
WashingtonOregonAlaskaHawaiiCalifornia
58943421627
121233,872 An outlier
Table 2
Slide 1.16
Solution 3
Now, to see the impact of the outlier on the value of the mean, we include the population of California and find the mean population of all five Pacific states. This mean is
thousand2.90055
872,33121262734215894Mean
Slide 1.17
Example 3
Notice that the population of California is very large compared to the populations of the other four states. Hence, it is an outlier. Show how the inclusion of this outlier affects the value of the mean.
Slide 1.18
Solution 3
If we do not include the population of California (the outlier) the mean population of the remaining four states (Washington, Oregon, Alaska, and Hawaii) is
thousand5.27884
121262734215894Mean
Slide 1.19
Mean - tabulated (grouped data)
fx
N
fxx
n
Mean for population data:
Mean for sample data:
Where x is the midpoint and f is the frequency of a class.
Slide 1.20
Calculate the mean of the grouped data below
Weight (oz) Class midpoint (x)
Frequency f fx
19.2-19.4 19.3 1 19.3
19.5-19.7 19.6 2 39.2
19.8-20.0 19.9 8 159.2
20.1-20.3 20.2 4 80.8
20.4-20.6 20.5 3 61.5
20.7-20.9 20.8 2 41.6
Total 20f n 401.6fx
Slide 1.21
Mean
• n = 20• Ʃfx = 401.6
401.620.08
20
fxx oz
n
Slide 1.22
Median
Definition The median is the value of the middle term
in a data set that has been ranked in increasing order.
Slide 1.23
Median cont.
The calculation of the median consists of the following two steps:
1. Rank the data set in increasing order
2. Find the middle term in a data set with n values. The value of this term is the median.
Slide 1.24
Median cont.
Value of Median for Ungrouped Data
set data ranked ain th term2
1 theof Value Median
n
Slide 1.25
Example 6
The following data give the weight lost (in pounds) by a sample of five members of a health club at the end of two months of membership:
10 5 19 8 3
Find the median.
Slide 1.26
Solution 6
First, we rank the given data in increasing order as follows:
3 5 8 10 19
There are five observations in the data set. Consequently, n = 5 and
32
15
2
1 termmiddle theofPosition
n
Slide 1.27
Solution 6
Therefore, the median is the value of the third term in the ranked data.
3 5 8 10 19
The median weight loss for this sample of five members of this health club is 8 pounds.
Median
Slide 1.28
Example 7
Table 8 lists the total revenue for the 12 top-grossing North American concert tours of all time.
Find the median revenue for these data.
Slide 1.29
Table 8
Tour Artist
Total Revenue
(millions of dollars)
Steel Wheels, 1989
Magic Summer, 1990
Voodoo Lounge, 1994
The Division Bell, 1994
Hell Freezes Over, 1994
Bridges to Babylon, 1997
Popmart, 1997
Twenty-Four Seven, 2000
No Strings Attached, 2000
Elevation, 2001
Popodyssey, 2001
Black and Blue, 2001
The Rolling Stones
New Kids on the Block
The Rolling Stones
Pink Floyd
The Eagles
The Rolling Stones
U2
Tina Turner
‘N-Sync
U2
‘N-Sync
The Backstreet Boys
98.0
74.1
121.2
103.5
79.4
89.3
79.9
80.2
76.4
109.7
86.8
82.1
Slide 1.30
Solution 7
First we rank the given data in increasing order, as follows:
74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2
There are 12 values in this data set. Hence, n = 12 and
5.62
112
2
1 termmiddle theofPosition
n
Slide 1.31
Solution 7
Therefore, the median is given by the mean of the sixth and the seventh values in the ranked data.
74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2
Thus the median revenue for the 12 top-grossing North American concert tours of all time is $84.45 million.
million 45.84$45.842
8.861.82Median
Slide 1.32
Median - tabulated (ungrouped data)
Steps:
• Order the observations
• Calculate cummulative frequency
Note:
• Cummulative frequency is the number of items with a given value or less
Slide 1.33
Example
The number of working days lost by employees in the last quarter (Calculate the median number of working
days lost)Number of days (x)
Number of employees (f)
0
1
2
3
4
5
410
430
290
180
110
20
1440
Slide 1.34
x f Cumulative frequency
0 410
1 430
2 290
3 180
4 110
5 20
1440
410
840=410+430
1130=840+290
1310=1130+180
1420=1310+110
1440=1420+20
The position of the median is (n+1)/2 = (1440+1)/2 =720.5ie between 720th and 721st one day
Slide 1.35
Advantages of using median
The advantage of using the median as a measure of central tendency is that it is not influenced by outliers. Consequently, the median is preferred over the mean as a measure of central tendency for data sets that contain outliers.
Slide 1.36
Median for grouped data
• Median for a grouped data is given by:
• L ≡ lower limit of the median class• n ≡ number of observation• F ≡ sum of frequency up to but excludes the median class
• fm ≡ frequency of the median class
• c ≡ width of the class
2
m
n Fmedian L c
f
Slide 1.37
Calculate the median of the grouped data below
Weight (oz) Frequency (f)
19.2-19.4 1
19.5-19.7 2
19.8-20.0 8
20.1-20.3 4
20.4-20.6 3
20.7-20.9 2
Total 20f n
Slide 1.38
Median
• L ≡ 19.8, n ≡ 20, F ≡ 3, fm ≡ 8, c ≡ 0.3
2 20 2 319.8 0.3
8
7 19.8 0.3 19.8 0.2625
8
20.06 oz
m
n Fmed L c
f
Slide 1.39
Mode
Definition
The mode is the value that occurs with the highest frequency in a data set.
Slide 1.40
Example 8
The following data give the speeds (in miles per hour) of eight cars that were stopped for speeding violations.
77 69 74 81 71 68 74 73
Find the mode.
Slide 1.41
Solution 8
In this data set, 74 occurs twice and each of the remaining values occurs only once. Because 74 occurs with the highest frequency, it is the mode. Therefore,
Mode = 74 miles per hour
Slide 1.42
Mode cont.
• A data set may have none or many modes, whereas it will have only one mean and only one median.– The data set with only one mode is called
unimodal.– The data set with two modes is called
bimodal.– The data set with more than two modes is
called multimodal.
Slide 1.43
Different patterns for the mode
Slide 1.44
Different patterns for the mode
Slide 1.45
Different patterns for the mode
Slide 1.46
Mode - tabulated (ungrouped data)
The number of working days lost by employees in the last quarter (Calculate the mode number of working
days lost)Number of days (x)
Number of employees (f)
0
1
2
3
4
5
410
430
290
180
110
20
1440
Slide 1.47
The mode correspond to the highest frequency occurring number which is one day lost
Number of days (x)
Number of employees (f)
0
1
2
3
4
5
410
430
290
180
110
20
1440
Slide 1.48
Advantage of using the mode
One advantage of the mode is that it can be calculated for both quantitative and qualitative kinds of data, whereas the mean and median can be calculated for only quantitative data.
Slide 1.49
Example 12
The status of five students who are members of the student senate at a college are senior, sophomore, senior, junior, senior. Find the mode.
Slide 1.50
Solution 12
Because senior occurs more frequently than the other categories, it is the mode for this data set.
We cannot calculate the mean and median for this data set.
Slide 1.51
Mode for tabulated grouped data
• For a group data, mode is given as:
• L ≡ lower limit of the modal class
• d1 ≡ frequency of modal class minus previous class
• d2 ≡ frequency of modal class minus following class
• c ≡ width of the class
1
1 2
dmode L c
d d
Slide 1.52
Calculate the mode of the grouped data below
Weight (oz) Frequency (f)
19.2-19.4 1
19.5-19.7 2
19.8-20.0 8
20.1-20.3 4
20.4-20.6 3
20.7-20.9 2
Total 20f n
Slide 1.53
Mode
• L ≡ 19.8, d1 ≡ 6, d2 ≡ 4, c ≡ 0.3
1
1 2
619.8 0.3
6 4
1.8 19.8 19.8 0.18
10 19.98 oz
dmode L c
d d
Slide 1.54
Relationships among the Mean, Median, and Mode
1. This is observed with regards to the shape of the frequency distribution (Skewness).
In Figure 1, the values of the mean, median, and mode are identical, and they lie at the center of the distribution.
Slide 1.55
Figure 1 Zero Skewed (Symmetrical)
Slide 1.56
Figure 2 Positively skewed
Slide 1.57
Positively Skewed
2. A histogram and a frequency curve is positively skewed if the right tail is longer (Figure 2),
the value of the mean > median > mode
Notice that the mode always occurs at the peak point. The value of the mean is the largest in this case
because it is sensitive to outliers that occur in the right tail. Outliers in the right tail pull the mean to the right.
Slide 1.58
Figure 3 Negatively skewed
Slide 1.59
Negatively Skewed
3. A histogram and a frequency distribution is negatively skewed if the left tail is longer (Figure 3)
the value of the mode > median > mean – In this case, the outliers in the left tail pull the
mean to the left.