chapter 3
TRANSCRIPT
121
Chapter 3 Numerically Describing Data from One Variable
3.1 Measures of Central Tendency
1. A statistic is resistant if it is not sensitive to extreme data values. The median is resistant because it is a positional measure of central tendency and increasing the largest value or decreasing the smallest value does not affect the position of the center. The mean is not resistant because it is a function of the sum of the data values. Changing the magnitude of one value changes the sum of the values, and thus affects the mean. The mode is a resistant measure of center.
2. The men and the median are approximately equal when the data are symmetric. If the mean is significantly greater than the median, the data are skewed right. If the mean is significantly less than the median, the data are skewed left.
3. Since the distribution of household incomes in the United States is skewed to the right, the mean is greater than the median. Thus, the mean household income is $55,263 and the median is $41,349.
4. HUD uses the median because the data are skewed. Explanations will vary. One possibility is that the price of homes has a distribution that is skewed to the right, so the median is more representative of the typical price of a home.
5. The mean will be larger because it will be influenced by the extreme data values that are to the right end (or high end) of the distribution.
6. 10,000 1 5000.52
+= . The median is between the 5000th and the 5001st ordered values.
7. The mode is used with qualitative data because the computations involved with the mean and median make no sense for qualitative data.
8. parameter; statistic
9. False. A data set may have multiple modes, or it may have no mode at all.
10. False. The formula 12
n + gives the position of the median, not the value of the median.
11. 20 13 4 8 10 55 115 5
x + + + += = =
12. 83 65 91 87 84 420 845 5
x + + + += = =
13. 3 6 10 12 14 45 95 5
μ + + + += = =
Chapter 3 Numerically Summarizing Data
122
14. 1 19 25 15 12 16 28 13 6 135 159 9
μ + + + + + + + += = =
15. 142 2.459
≈ . The mean price per ad slot is approximately $2.4 million.
16. Let x represent the missing value. Since there are 6 data values in the list, the median 26.5 is between the 3rd and 4th ordered values which are 21 and x, respectively. Thus, 21 26.5
221 53
32
x
xx
+=
+ ==
The missing value is 32.
17. 420 462 409 236 1527Mean $381.754 4
+ + += = =
Data in order: 236, 409, 420, 462 409 420 829Median $414.50
2 2+
= = =
No data value occurs more than once so there is no mode.
18. 35.34 42.09 39.43 38.93 43.39 49.26 248.44Mean $41.416 6
+ + + + += = ≈
Data in order: 35.34, 38.93, 39.43, 42.09, 43.39, 49.26 39.43 42.09 81.52Median $40.76
2 2+
= = =
No data value occurs more than once so there is no mode.
19. 3960 4090 3200 3100 2940 3830 4090 4040 3780 33,030Mean 3670 psi9 9
+ + + + + + + += = =
Data in order: 2940, 3100, 3200, 3780, 3830, 3960, 4040, 4090, 4090 Median = the 5th ordered data value = 3830 psi Mode = 4090 psi (because it is the only data value to occur twice)
20. 282 270 260 266 257 260 267 1862Mean 266 minutes7 7
+ + + + + += = =
Data in order: 257, 260, 260, 266, 267, 270, 282 Median = the 4th data value with the data in order = 266 minutes Mode = 260 minutes (because it is the only data value to occur twice)
21. (a) The histogram is skewed to the right, suggesting that the mean is greater than the median. That is, x M> .
(b) The histogram is symmetric, suggesting that the mean is approximately equal to the median. That is, x M= .
(c) The histogram is skewed to the left, suggesting that the mean is less than the median. That is, x M< .
Section 3.1 Measures of Central Tendency
123
22. (a) IV because the distribution is symmetric (so mean ≈ median) and centered near 30. (b) III because the distribution is skewed to the right, so mean > median. (c) II because the distribution is skewed to the left, so mean < median. (d) I because the distribution is symmetric (so mean ≈ median) and centered near 40.
23. Los Angeles ATM fees:
2.00 1.50 1.50 1.00 1.50 2.00 0.00 2.00 11.50Mean $1.448 8
+ + + + + + += = ≈
Data in order: 0.00, 1.00, 1.50, 1.50, 1.50, 2.00, 2.00, 2.00
1.50 1.50 3.00Median $1.502 2+
= = =
Mode = $1.50 and $2.00 (because both values occur three times each) New York City ATM fees:
1.50 1.00 1.00 1.25 1.25 1.50 1.00 0.00 8.50Mean $1.068 8
+ + + + + + += = ≈
Data in order: 0.00, 1.00, 1.00, 1.00, 1.25, 1.25, 1.50, 1.50
1.00 1.25 2.25Median $1.132 2+
= = ≈
Mode = $1.00 (because it occurs the more than the other values) The ATM fees in Los Angeles appear to be higher in general than those in New York City.
All three measures of center were higher for Los Angeles than for New York. Explanations will vary. Possibilities for the difference may be the number of ATMs available or the amount of ATM usage in each city.
24. Reaction Time to Blue:
0.582 0.481 0.841 0.267 0.685 0.45 3.306Mean 0.551 sec.6 6
+ + + + += = =
Data in order: 0.267, 0.45, 0.481, 0.582, 0.685, 0.841
0.481 0.582 1.063Median 0.5315 sec.2 2+
= = =
No data value occurs more than once so there is no mode. Reaction Time to Red:
0.408 0.407 0.542 0.402 0.456 0.533 2.748Mean 0.458 sec.6 6
+ + + + += = =
Data in order: 0.402, 0.407, 0.408, 0.456, 0.533, 0.542
0.408 0.456 0.864Median 0.432 sec.2 2+
= = =
No data value occurs more than once so there is no mode. There is a shorter reaction time to the red screen than to the blue screen. Explanations will
vary. This information could be useful in designing warning screens for computer software controlling critical operations (such as nuclear power plants, for one example).
Chapter 3 Numerically Summarizing Data
124
25. (a) 76 60 60 81 72 80 80 68 73 650 72.2 beats per minute9 9
μ + + + + + + + += = ≈
(b) Samples and sample means will vary. (c) Answers will vary.
26. (a) 39 21 9 32 30 45 11 12 39 238 26.4 minutes9 9
μ + + + + + + + += = ≈
(b) Samples and sample means will vary. (c) Answers will vary.
27. (a) 0 0 0 4 10 1 10 10 19 9 18 20 13 13 2 7 8 13 15718 18
8.7 goals per year
μ + + + + + + + + + + + + + + + + +==
≈
(b) Samples and sample means will vary. (c) Answers will vary.
28. (a) time91.538 92.552 86.291 82.087 83.687 83.601 86.251 606.007
7 786.572 hours
μ + + + + + += =
≈
Data in order: 82.087, 83.601, 83.687, 86.251, 86.291, 91.538, 92.552 Median = the 4th data value with the data in order = 86.251 hours
(b) distance3687 3662 3453 3278 3427 3391 3593 24,491 3499 km
7 7μ + + + + + +
= = ≈
Data in order: 3278, 3391, 3427, 3453, 3593, 3662, 3687 Median = the 4th data value with the data in order = 3453 km
(c) margin7.617 6.033 6.733 7.283 1.017 6.317 39.667 5.667 minutes
7 7μ + + + + +
= = ≈
Data in order: 1.017, 4.667, 6.033, 6.317, 6.733, 7.283, 7.617 Median = the 4th data value with the data in order = 6.317 minutes
(d) Mean winning speed: 40.28 39.56 40.02 39.93 40.94 40.56 41.657
282.94 40.420 km/h7
μ + + + + + +=
= =
Winning speed: distance 24, 491 40.414 km/hr
time 606.007= ≈∑
∑
Winning speed: distance
time
3499 40.417 km/hr86.572
μμ
= ≈
The three results agree approximately. The differences are due to rounding.
29. The distribution is relatively symmetric as is evidenced by both the histogram and the fact that the mean and median are approximately equal. Therefore, the mean is the better measure of central tendency.
Section 3.1 Measures of Central Tendency
125
30. The distribution is skewed right as is evidenced by both the histogram and the fact that the mean is significantly greater than the mean. Therefore, the median is the better measure of central tendency.
31. (a) 51.1; 51x M≈ = . (b) The mean is approximately equal to the median suggesting that the distribution is
symmetric, and this is confirmed by the histogram.
32. (a) 5.88 million shares; 5.58 million sharesx M≈ = . (b) The mean is greater than the mean suggesting that the distribution is skewed right, and
this is confirmed by the histogram.
33.
0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 Weight (grams)
Fre
quen
cy
Weight of Plain M&Ms
0.874 grams; 0.88 gramsx M≈ = . The mean is approximately equal to the median suggesting that the distribution is symmetric. This is confirmed by the histogram (though is does appear to be slightly skewed left). The mean is the better measure of central tendency.
34.
90 95 100 105 110 115 120 125Length (seconds)
Fre
quen
cy
Length of Eruptions
5
15
104.1 seconds; 104 secondsx M≈ = . The mean is approximately equal to the median suggesting that the distribution is symmetric. This is confirmed by the histogram. The mean is the better measure of central tendency.
Chapter 3 Numerically Summarizing Data
126
35.
0 5 10 15 20 25 30 35 40 45 Hours
Fre
quen
cyHours Worked per Week
22 hour; 25 hoursx M= = . The mean is smaller than the median suggesting that the
distribution is skewed left. This is confirmed by the histogram. The median is the better measure of central tendency.
36.
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Dollars
Num
ber o
f Sal
es
Car Dealer’s Profit
$1392.83; $1177.50x M≈ = . The mean is significantly greater than the median
suggesting that the distribution is skewed right. This is confirmed by the histogram. The median is the better measure of central tendency.
37. The highest frequency is 12,362, and so the mode region of birth is Central America.
38. The highest frequency is 131, and so the mode offense is Street or Highway.
39. The vote counts are: Bush = 21, Kerry = 17, Nader = 1, and Badnarik = 1. The mode candidate is Bush.
40. The frequencies are: Cancer = 1, Gunshot wound = 8, Assault = 1, Motor vehicle accident = 7, Fall = 2, and Congestive heart failure = 1. The mode diagnosis is Gunshot wound.
41. Sample size of 5: All data recorded correctly: 99.8; 100x M= = . 106 recorded at 160: 110.6; 100x M= = . Sample size of 12: All data recorded correctly: 100.4; 101x M≈ = . 106 recorded at 160: 104.9; 101x M≈ = . Sample size of 30: All data recorded correctly: 100.6; 99x M= = . 106 recorded at 160: 102.4; 99x M= = . For each sample size, the mean becomes larger while the median remains the same. As the sample size increases, the impact of the misrecorded data value on the mean decreases.
Section 3.1 Measures of Central Tendency
127
42. (a) 27.1 years; 27 years; Mode 26 yearsMμ ≈ = = (b) 249.8 lb; 245 lb; Mode 305 lbMμ = = = (c) 4.6 years; 4 years; Mode 3 yearsMμ ≈ = =
(d) The frequency for Purdue is 3. The frequencies of all other colleges are lower than 3, so the mode college attended is Purdue.
(e) Samples and sample means will vary. (f) Offensive guards: 306.4 lb; 305 lb; Mode 305 lbMμ = = =
Running backs: 217.8 lb; 220 lb; Mode 225 lbMμ = = = Yes, there appears to be differences in the weights of offensive guards and running backs. All three measures of center indicate that offensive guards are significantly heavier than running backs. This is due to the nature of the positions. Offensive guards must be able to protect the quarterback while the running back must be able to run quickly.
(g) It does not make sense to compute the mean player number. The variable “player number” is qualitative, so the quantitative calculations will be meaningless.
43. Samples and sample means will vary.
44. NBA salaries are likely significantly skewed to the right. Therefore, since the median will be lower than the mean, the players would rather use the median salary to support the claim that the average player’s salary need to be increases. The negotiator for the owners would rather use the mean salary.
45. The amount of money lost per visitor is likely skewed to the right. Therefore, the median loss would be less than the mean because the mean amount would be inflated by those few visitors who lost very large amounts of money
46. The sum of the nineteen readable scores is 19 84 1596⋅ = . The sum of all twenty scores is 20 82 1640⋅ = . Therefore, the unreadable score is 1640 1596 44− = .
47. The sum of the six number will be 6 34 204⋅ = .
48. (a) Median. Home prices are likely skewed right. (b) Mode. The variable “major” is qualitative. (c) Mean. The data are quantitative and symmetric. (d) Median. The data are quantitative and skewed. (e) Median. NFL salaries are likely skewed right. (f) Mode. The variable “requested song” is qualitative.
49. (a) Mean: 30 30 45 50 50 50 55 55 60 75 500 5010 10
+ + + + + + + + += = . The mean is $50,000.
Median: The ten data values are in order, so we average the two middle values. 50 50 100 50
2 2+
= = . The median is $50,000.
Mode: The mode is $50,000 (the most frequent salary).
Chapter 3 Numerically Summarizing Data
128
(b) Add $2500 ($2.5 thousand) to each salary to form the new data set. New data set: 32.5, 32.5, 47.5, 52.5, 52.5, 52.5, 57.5, 57.5, 62.5, 77.5
Mean: 32.5 32.5 47.5 52.5 52.5 52.5 57.5 57.5 62.5 77.5 525 52.510 10
+ + + + + + + + += =
The new mean is $52,500. Median: The ten data values are in order, so we average the two middle values.
52.5 52.5 105 52.52 2+
= = . The new median is $52,500.
Mode: The new mode is $52,500 (the most frequent new salary). All three measures of central tendency increased by $2500, which was the amount of the raises.
(c) Multiply each original data value by 1.05 to generate the new data set. New data set: 31.5, 31.5, 47.25, 52.5, 52.5, 52.5, 57.75, 57.75, 63, 78.75
Mean: 31.5 31.5 47.25 52.5 52.5 52.5 57.75 57.75 63 78.75 525 52.510 10
+ + + + + + + + += = .
The new mean is $52,500. Median: The ten data values are in order, so we average the two middle values.
52.5 52.5 105 52.52 2+
= = . The new median is $52,500.
Mode: The new mode is $52,500 (the most frequent new salary). All three measures of central tendency increased by 5%, which was the amount of the
raises. (d) Add $25 thousand to the largest data value to form the new data set.
New data set: 30, 30, 45, 50, 50, 50, 55, 55, 60, 100
Mean: 30 30 45 50 50 50 55 55 60 100 525 52.510 10
+ + + + + + + + += = . The new mean is
$52,500. Median: The ten data values are in order, so we average the two middle values.
50 50 100 502 2+
= = . The mew median is $50,000.
Mode: The new mode is $50,000 (the most frequent salary). The mean was increased by $2500, but the median and mode remained unchanged.
50. (a) 65 70 71 75 95 376 75.25 5
x + + + += = =
(b) The five data values are in order, so the median is the middle value: 71M = . (c) The distribution is skewed right, so the median is the better measure of central
tendency. (d) Adding 4 to each score gives the following new data set: 69, 74, 75, 79, 99.
69 74 75 79 99 396 79.25 5
x + + + += = =
(e) The curved test score mean is 4 greater than the unadjusted test score mean. Adding 4 to each score increased the mean by 4.
Section 3.2 Measures of Dispersion
129
51. The largest data value is 0.94 and the smallest is 0.76. The mean after deleting those two data values is 0.875 grams. (Note: The value 0.94 occurs twice, but we only remove one.) The trimmed mean is more resistant than the regular mean. Note in this case that the trimmed mean 0.875 grams is approximately equal to the median 0.88 grams.
52. 0.76 0.94 1.7Midrange 0.85 grams.2 2+
= = = The midrange is not resistant because it is
computed using the two most extreme data values. 3.2 Measures of Dispersion
1. No. In comparing two populations, the larger the standard deviation, the more dispersed the distribution, provided that the variable of interest in both populations has the same unit of measurement. Since 5 inches 5 2.54 12.7 centimeters≈ × = , the distribution with a standard deviation of 5 inches is in fact more dispersed.
2. In the calculation of the sample variance, the degrees of freedom is 1n − , and is used as the divisor in averaging the squared deviations about the mean.
3. All data values are used in computing the standard deviation, including extreme values. Since a statistic is resistant only if it is not influenced by extreme data values, the standard deviation is not resistant.
4. zero
5. A statistic is biased whenever that statistic consistently overestimates or underestimates a parameter.
6. range 7. The standard deviation is the square root of the variance.
8. mean; mean; spread 9. True 10. True
11. From Section 3.1, Exercise 11, we know 11x = .
( )
( ) ( )
2
2
2
2
2
2
2
Data, Sample Mean, Deviations, Squared Deviations, 20 11 20 11 9 9 8113 11 13 11 2 2 44 11 4 11 7 ( 7) 498 11 8 11 3 ( 3) 9
10 11 10 11 1 ( 1) 1
0 144
i i i
i i
x x x x x x
x x x x
− −− = =− = =− = − − =− = − − =− = − − =
− = − =∑ ∑
( )22 144 36
1 5 1ix x
sn−
= = =− −
∑ ; ( )2
144 36 61 5 1
ix xs
n−
= = = =− −
∑ .
Chapter 3 Numerically Summarizing Data
130
12. From Section 3.1, Exercise 12, we know 82x = .
( )
( ) ( )
2
2
2
2
2
2
2
Data, Sample Mean, Deviations, Squared Deviations, 83 82 83 82 1 1 165 82 65 82 17 ( 17) 28991 82 91 82 9 9 8187 82 87 82 5 5 2584 82 84 82 2 2 4
0 400
i i i
i i
x x x x x x
x x x x
− −− = =
− = − − =− = =− = =− = =
− = − =∑ ∑
( )22 400 100
1 5 1ix x
sn−
= = =− −
∑ ; ( )2
400 100 101 5 1
ix xs
n−
= = = =− −
∑ .
13. From Section 3.1, Exercise 13, we know 9μ = .
( )
( ) ( )
2
2
2
2
2
2
2
Data, Population Mean, Deviations, Squared Deviations, 3 9 3 9 6 ( 6) 366 9 6 9 3 ( 3) 9
10 9 10 9 1 1 112 9 12 9 3 3 914 9 14 9 5 5 25
0 80
i i i
i i
x x x
x x
μ μ μ
μ μ
− −− = − − =− = − − =− = =− = =− = =
− = − =∑ ∑
( )22 80 16
5ixN
μσ
−= = =∑ ;
( )280 16 45
ixN
μσ
−= = = =∑ .
14. From Section 3.1, Exercise 14, we know 15μ = .
( )2
2
2
2
2
2
2
2
Data, Population Mean, Deviations, Squared Deviations, 1 15 1 15 14 ( 14) 196
19 15 19 15 4 4 1625 15 25 15 10 10 10015 15 15 15 0 0 012 15 12 15 3 ( 3) 916 15 16 15 1 1 128 15 28 15 13 13 16913 15 13 15
i i ix x xμ μ μ− −− = − − =− = =− = =− = =− = − − =− = =− = =−
( ) ( )
2
2
2
2 ( 2) 46 15 6 15 9 ( 9) 81
0 576i ix xμ μ
= − − =− = − − =
− = − =∑ ∑
Section 3.2 Measures of Dispersion
131
( )2
2 576 649
ixN
μσ
−= = =∑ ;
( )264 64 89
ixN
μσ
−= = = =∑ .
15. 6 52 13 49 35 25 31 29 31 29 300 3010 10
x + + + + + + + + += = = .
( )2
2
2
2
2
2
2
2
Data, Sample Mean, Deviations, Squared Deviations, 6 30 6 30 24 ( 24) 576
52 30 52 30 22 22 48413 30 13 30 17 ( 17) 28949 30 49 30 19 19 36135 30 35 30 5 5 2525 30 25 30 5 ( 5) 2531 30 31 30 1 1 129 30
i i ix x x x x x− −− = − − =− = =− = − − =− = =− = =− = − − =− = =
( ) ( )
2
2
2
2
29 30 1 ( 1) 131 30 31 30 1 1 129 30 29 30 1 ( 1) 1
0 1764i ix x x x
− = − − =− = =− = − − =
− = − =∑ ∑
( )22 1764 196
1 10 1ix x
sn−
= = =− −
∑ ; ( )2
1764 196 149
ix xs
N−
= = = =∑ .
16. 4 10 12 12 13 21 72 126 6
μ + + + + += = = .
( )
( ) ( )
2
2
2
2
2
2
2
2
Data, Population Mean, Deviations, Squared Deviations, 4 12 4 12 8 ( 8) 64
10 12 10 12 2 ( 2) 412 12 12 12 0 0 012 12 12 12 0 0 013 12 13 12 1 1 121 12 21 12 9 9 81
0 150
i i i
i i
x x x
x x
μ μ μ
μ μ
− −− = − − =− = − − =− = =− = =− = =− = =
− = − =∑ ∑
( )22 150 25
6ixN
μσ
−= = =∑ ;
( )2150 25 5
6ixN
μσ
−= = = =∑ .
Chapter 3 Numerically Summarizing Data
132
17. Range = Largest Data Value – Smallest Data Value = 462 236− = $226. From Section 3.1, Exercise 17, we know $381.75x = .
( )
( ) ( )
2
2
Data, Sample Mean, Deviations, Squared Deviations, 420 381.75 38.25 1463.0625462 381.75 80.25 6440.0625409 381.75 27.25 742.5625236 381.75 145.75 21,243.0625
0 29,888.75
i i i
i i
x x x x x x
x x x x
− −
−
− = − =∑ ∑
2
2 2( ) 29,888.75 9,962.9 $1 4 1
ix xs
n−
= ≈ =− −
∑ ; 2( ) 29,888.75 $99.81
1 4 1ix x
sn−
= = ≈− −
∑
18. Range = Largest Data Value – Smallest Data Value = 49.26 – 35.34 = $13.92. To calculate the sample variance and the sample standard deviation, we use the computational formula:
2
2
Data value, Data value squared, 35.34 1248.915642.09 1771.568139.43 1554.724938.93 1515.544943.39 1882.692149.26 2426.5476
248.44 10,399.9932
i i
i i
x x
x x= =∑ ∑
( )
( )
( )
2
2
2
2
2
2
1
248.4410,399.9932
66 1
248.4410,399.9932
66 1
22.584 $ ;
$4.75
ii
xx
nn
s
s
−
−
−
−
−
−
=
= ≈
= ≈
∑∑
19. Range = Largest Data Value – Smallest Data Value = 4090 – 2940 = 1150 psi
To calculate the sample variance and the sample standard deviation, we use the computational formula:
2
2
Data value, Data value squared, 3960 15,681,6004090 16,728,1003200 10,240,0003100 9,610,0002940 8,643,6003830 14,668,9004090 16,728,1004040 16,321,6003780 14,288,400
33,020 122,828,600
i i
i i
x x
x x= =∑ ∑
( )
( )
( )
2
2
2
2
2
2
1
33,030122,910,300
99 1
33,030122,910,300
99 1
211,275 psi ;
459.6 psi
ii
xx
nn
s
s
−
−
−
−
−
−
=
= ≈
= ≈
∑∑
Section 3.2 Measures of Dispersion
133
20. Range = Largest Data Value – Smallest Data Value = 282 – 257 = 25 minutes. From Section 3.1, Exercise 20, we know 266 minutes.x = .
( )
( ) ( )
2
2
Data, Sample Mean, Deviations, Squared Deviations, 282 266 16 256270 266 4 16260 266 6 36266 266 0 0257 266 9 81260 266 6 36267 266 1 1
0 426
i i i
i i
x x x x x x
x x x x
− −
−
−−
− = − =∑ ∑
2 22 2( ) ( )426 42671 min ; 8.4 min
1 7 1 1 7 1i ix x x x
s sn n− −
= = = = = ≈− − − −
∑ ∑
21. Histogram (b) depicts a higher standard deviation because the data is more dispersed, with data values ranging from 30 to 75. Histogram (a)’s data values only range from 40 to 60.
22. (a) III, because it is centered between 52 and 57 and has the greatest amount of dispersion of the three histograms with mean = 53.
(b) I, because it is centered near 53 and its dispersion is consistent with 1.3s = but not with 0.12s = or 9s = .
(c) IV, because it is centered near 53 and it has the least dispersion of the three histograms with mean = 53.
(d) II, because it has a center near 60.
23. Los Angeles ATM fees: Range = Largest Data Value – Smallest Data Value = 2.00 – 0.00 = $2.00.
2
2
Data value, Data value squared, 2.00 41.50 2.251.50 2.251.00 11.50 2.252.00 40.00 02.00 4
11.5 19.75
i i
i i
x x
x x= =∑ ∑
( )
( )
2
2
2
1
11.519.75
88 1
$0.68
ii
xx
nn
s−
−
−
−
=
=
≈
∑∑
Chapter 3 Numerically Summarizing Data
134
New York City ATM fees: Range = Largest Data Value – Smallest Data Value = 1.50 – 0.00 = $1.50.
2
2
Data value, Data value squared, 1.50 2.251.00 11.00 11.25 1.56251.25 1.56251.50 2.251.00 10.00 0
8.5 10.625
i i
i i
x x
x x= =∑ ∑
( )
( )
2
2
2
1
8.510.625
88 1
$0.48
ii
xx
nn
s−
−
−
−
=
=
≈
∑∑
Based on both the range and the standard deviation, ATM fees in Los Angeles have more dispersion than ATM fees in New York. Both the range and the standard deviation for Los Angeles are larger.
24. Reaction Time to Blue: Range = Largest Data Value – Smallest Data Value = 0.841 – 0.267 = 0574 sec.
2
2
Data value, Data value squared, 0.582 0.3387240.481 0.2313610.841 0.7072810.267 0.0712890.685 0.4692250.45 0.2025
3.306 2.02038
i i
i i
x x
x x= =∑ ∑
( )
( )
2
2
2
1
3.3062.02038
66 1
0.1994 sec.
ii
xx
nn
s−
−
−
−
=
=
≈
∑∑
Reaction Time to Red: Range = Largest Data Value – Smallest Data Value = 0.542 – 0.402 = 0.140 sec.
2
2
Data value, Data value squared, 0.408 0.1664640.407 0.1656490.542 0.2937640.402 0.1616040.456 0.2079360.533 0.284089
2.748 1.279506
i i
i i
x x
x x= =∑ ∑
( )
( )
2
2
2
1
2.7481.279506
66 1
0.0647 sec.
ii
xx
nn
s−
−
−
−
=
=
≈
∑∑
Based on both the range and the standard deviation, the reaction times for blue have more variability than those for red. Both the range and the standard deviation for blue are larger.
Section 3.2 Measures of Dispersion
135
25. (a) We use the computational formula: 650ix =∑ ; 2 47, 474ix =∑ ; 9N = ;
( ) ( )( )
2
2
2
22
65047,474
9 58.8 beats/min.9
ii
xx
NN
σ− −
= = ≈
∑∑;
( ) ( )2 2
2 65047,474
99
7.7 beats/min.i
i
xx
NN
σ− −
= = ≈
∑∑
(b) Samples, sample variances, and sample standard deviations will vary. (c) Answers will vary.
26. (a) We use the computational formula: 238ix =∑ ; 2 7778ix =∑ ; 9N = ;
( ) ( )2
2
2
2 2
2387778
9 164.9 min.9
ii
xx
NN
σ− −
= = ≈
∑∑;
( ) ( )2 2
2 2387778
99
12.8 min.i
i
xx
NN
σ− −
= = ≈
∑∑
(b) Samples, sample variances, and sample standard deviations will vary. (c) Answers will vary.
27. (a) We use the computational formula: 157ix =∑ ; 2 2107ix =∑ ; 18N = ;
( ) ( )2
2
2
2 2
1572107
18 41.0 goals18
ii
xx
NN
σ− −
= = ≈
∑∑;
( ) ( )2 2
2 1572107
1818
6.4 goalsi
i
xx
NN
σ− −
= = ≈
∑∑
(b) Samples, sample variances, and sample standard deviations will vary. (c) Answers will vary.
28. (a) Range = Largest Data Value – Smallest Data Value = 92.552 – 82.087 = 10.465 hours. For the population variance and standard deviation, we use the computational formula:
606.007ix =∑ ; 2 52,561.3666ix =∑ ; 7N = ;
( ) ( )2
2
2
2 2
606.00752,561.3666
7 13.981 hours7
ii
xx
NN
σ− −
= = ≈
∑∑;
( ) ( )2 2
2 606.00752,561.3666
77
3.739 hoursi
i
xx
NN
σ− −
= = ≈
∑∑
Chapter 3 Numerically Summarizing Data
136
(b) Range = Largest Data Value – Smallest Data Value = 3687 – 3278 = 409 km. For the population variance and standard deviation, we use the computational formula:
24, 491ix =∑ ; 2 85,825,565ix =∑ ; 7N = ;
( ) ( )2
2
2
2 2
24, 49185,825,565
7 19,793.3 km7
ii
xx
NN
σ− −
= = ≈
∑∑;
( ) ( )2 2
2 24, 49185,825,565
77
140.7 kmi
i
xx
NN
σ− −
= = ≈
∑∑
(c) Range = Largest Data Value – Smallest Data Value = 7.617 – 1.017 = 6.600 min. For the population variance and standard deviation, we use the computational formula:
39.667ix =∑ ; 2 255.510823ix =∑ ; 7N = ;
( ) ( )2
2
2
2 2
39.667255.510823
7 4.390 min7
ii
xx
NN
σ− −
= = ≈
∑∑;
( ) ( )2 2
2 39.667255.510823
77
2.095 mini
i
xx
NN
σ− −
= = ≈
∑∑
(d) Range = Largest Data Value – Smallest Data Value = 41.65 – 39.56 = 2.09 km/h. For the population variance and standard deviation, we use the computational formula:
282.94ix =∑ ; 2 11, 439.397ix =∑ ; 7N = ;
( ) ( )( )
2
2
2
22
282.9411,439.397
7 0.423 km/h7
ii
xx
NN
σ− −
= = ≈
∑∑;
( ) ( )2 2
2 282.9411,439.397
77
0.651 km/hi
i
xx
NN
σ− −
= = ≈
∑∑
29. (a) Ethan: 9 24 8 9 5 8 9 10 8 10 100 10 fish10 10
ixN
μ + + + + + + + + += = = =∑ ;
Range = Largest Data Value – Smallest Data Value = 24 – 5 = 19 fish
Drew: 15 2 3 18 20 1 17 2 19 3 100 10 fish10 10
ixN
μ + + + + + + + + += = = =∑ ;
Range = Largest Data Value – Smallest Data Value = 20 – 1 = 19 fish Both fishermen have the same mean and range, so these values do not indicate any
differences between their catches per day.
Section 3.2 Measures of Dispersion
137
(b) Ethan: 100ix =∑ ; 2 1236ix =∑ ; 10N =
( ) ( )2 2
2 1001236
1010
4.9 fishi
i
xx
NN
σ− −
= = ≈
∑∑
Drew: 100ix =∑ ; 2 1626ix =∑ ; 10N =
( ) ( )2 2
2 1001626
1010
7.9 fishi
i
xx
NN
σ− −
= = ≈
∑∑
Yes, now there appears to be a difference in the two fishermen’s records. Ethan had a more consistent fishing record, which is indicated by the smaller standard deviation.
(c) Answers will vary. One possibility follows: The range is limited as a measure of dispersion because it does not take all of the data values into account. It is obtained by using only the two most extreme data values. Since the standard deviation utilizes all of the data values, it provides a better overall representation of dispersion.
30. (a) Range = Largest Data Value – Smallest Data Value = 349 – 180 = 169 lb
8591ix =∑ ; 2 2,332,051ix =∑ ; 33N = ; 8591 260.3 lb33
ixN
μ = = ≈∑
( ) ( )2 2
2 85912,332,051
3333
53.8 lbi
i
xx
NN
σ− −
= = ≈
∑∑
(b) Range = Largest Data Value – Smallest Data Value = 306 – 177 = 129 lb
5889ix =∑ ; 2 1, 481,833ix =∑ ; 24N = ; 5889 245.4 lb24
ixN
μ = = ≈∑
( ) ( )2 2
2 58891,481,833
2424
39.2 lbi
i
xx
NN
σ− −
= = ≈
∑∑
(c) The weights of the offense have the greater dispersion. The offense has both the larger range and the larger standard deviation.
31. Range = Largest Data Value – Smallest Data Value = 73 – 28 = 45. For the sample variance and sample standard deviation, we use the computational formula:
2045ix =∑ ; 2 109,151ix =∑ ; 40n = ;
( ) ( )2
2
2
2
1
2045109,151
40 118.040 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑;
( )22045109,151
4040 1
10.9s−
−= ≈
Chapter 3 Numerically Summarizing Data
138
32. Range = Largest Data Value – Smallest Data Value = 10.96 – 3.01 = 7.95 million shares. For the sample variance and sample standard deviation, we use the computational formula:
205.92ix =∑ ; 2 1355.6208ix =∑ ; 35n = ;
( ) ( )2
2
2
2 2
1
205.921355.6208
35 4.238 million shares35 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑;
( )2205.921355.6208
3535 1
2.059 million sharess−
−= ≈
33. (a) We use the computational formula: 43.71ix =∑ ; 2 38.2887ix =∑ ; 50n = ;
( ) ( )2 2
2 43.7138.2887
501 50 1
0.04 gi
i
xx
nn
s− −
− −= = ≈
∑∑
(b) The histogram is approximately symmetric, so the Empirical Rule is applicable. (c) Since 0.79 is exactly 2 standard deviations below the mean [0.79 = 0.87 – 2(0.04)] and
0.95 is exactly 2 standard deviations above the mean [0.95 = 0.87 + 2(0.04)], the Empirical Rule predicts that approximately 95% of the M&Ms will weigh between 0.79 and 0.95 grams.
(d) All except 1 of the M&Ms weigh between 0.79 and 0.95 grams. Thus, the actual percentage is 49/50 = 98%.
(e) Since 0.91 is exactly 1 standard deviation above the mean [0.91 = 0.87 + 0.04], the Empirical Rule predicts that 13.5% + 2.35% + 0.15% = 16% of the M&Ms will weigh more than 0.91 grams.
(f) Seven of the M&Ms weigh more than 0.91 grams (not including the ones that weigh exactly 0.91 grams). Thus, the actual percentage is 7/50 = 14%.
34. (a) We use the computational formula: 4582ix =∑ ; 2 478,832ix =∑ ; 44n = ;
( ) ( )2 2
2 4582478,832
441 44 1
6 sec.i
i
xx
nn
s− −
− −= = ≈
∑∑
(b) The histogram is approximately symmetric, so the Empirical Rule is applicable. (c) Since 92 is exactly 2 standard deviations below the mean [92 = 104 – 2(6)] and 116 is
exactly 2 standard deviations above the mean [116 = 92 + 2(6)], the Empirical Rule predicts that approximately 95% of the eruptions should last between 92 and 116 sec.
(d) All except 3 of the observed eruptions lasted between 92 and 116 seconds. Thus, the actual percentage is 41/ 44 93%≈ .
(e) Since 98 is exactly 1 standard deviation below the mean [98 = 104 – 6], the Empirical Rule predicts that 13.5% + 2.35% + 0.15% = 16% of the eruptions will last less than 98 sec.
(f) Five of the observed eruptions lasted less than 98 seconds. Thus, the actual percentage is 5 / 44 11%≈ .
Section 3.2 Measures of Dispersion
139
35. Car 1: 23352; 755,712; 15i ix x n= = =∑ ∑
Measures of Center: 3352 223.5 miles15
ixx
n= = ≈∑ ; Mode: none;
223 milesM = (the 8th value in the ordered data) Measures of Dispersion:
Range = Largest Data Value – Smallest Data Value = 271 – 178 = 93 miles; ( ) ( )2
2
2
2 2
1
3352755,712
15 475.1 miles15 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑;
( )23352755,712
1515 1
21.8 miless−
−= ≈
Car 2: 23558; 877,654; 15i ix x n= = =∑ ∑
Measures of Center: 3558 237.2 miles15
ixx
n= = =∑ ; Mode: none;
230 milesM = (the 8th value in the ordered data) Measures of Dispersion:
Range = Largest Data Value – Smallest Data Value = 326 – 160 = 166 miles; ( ) ( )2
2
2
2 2
1
3558877,654
15 2406.9 miles15 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑;
( )23558877,654
1515 1
49.1 miless−
−= ≈
The distribution for Car 1 is symmetric since the mean and median are approximately equal. The distribution for Car 2 is skewed right slightly since the mean is larger than the median. Both distributions have similar measures of center, but Car 2 has more dispersion which can be seen by its larger range, variance, and standard deviation. This means that the distance Car 1 can be driven on 10 gallons of gas is more consistent. Thus, Car 1 is probably the better car to buy.
36. Fund A: 261; 356.12; 20i ix x n= = =∑ ∑
Measures of Center: 61 3.05 miles20
ixx
n= = ≈∑ ; Mode: none; 3.0 3.1 3.05
2M +
= = ;
Chapter 3 Numerically Summarizing Data
140
Measures of Dispersion: Range = Largest Data Value – Smallest Data Value = ( )8.6 2.3 10.9− − = ;
( ) ( )2
2
2
2
1
61356.12
20 8.9520 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑;
( )261356.12
2020 1
2.99s−
−= ≈
Fund B: 23558; 877,654; 15i ix x n= = =∑ ∑
Measures of Center: 68.1 3.4120
ixx
n= = ≈∑ ; Mode = 4.3; 3.5 3.8 3.65
2M +
= =
Measures of Dispersion: Range = Largest Data Value – Smallest Data Value = ( )12.9 6.7 19.6− − = ;
( ) ( )2
2
2
2
1
68.1825.27
20 31.2319 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑;
( )268.1825.27
2019 1
5.59s−
−= ≈
The distribution for Mutual Fund A is symmetric since the mean and median are equal. Likewise, the distribution for Mutual Fund B is approximately symmetric (but skewed left slightly since the mean is smaller than the median). Mutual Fund B has a larger measure of center and greater dispersion which can be seen by its larger range, variance, and standard deviation. This means that the rate of return on Mutual Fund A is generally lower, but more consistent. The rate of return o Mutual Fund B is generally higher, but more dispersed.
37. (a) Financial Stocks: 2502.9; 9591.0556; 32i ix x n= = =∑ ∑
502.9 15.71632
ixx
n= = ≈∑ ; 15.92 16.26 16.09
2M +
= =
Energy Stocks: 2719.4; 21, 213.3104; 32i ix x n= = =∑ ∑
719.4 22.48132
ixx
n= = ≈∑ ; 19.50 19.67 19.585
2M +
= =
Energy Stocks have higher mean and median rates of return.
(b) Financial Stocks:
( ) ( )2
2
2
1
502.99591.0556
32 7.37832 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑
Energy Stocks:
( ) ( )2
2
2
1
719.421,213.3104
32 12.75132 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑
Energy Stocks are riskier since they have a larger standard deviation.
Section 3.2 Measures of Dispersion
141
38. (a) American League: 2166.26; 715.1876; 40i ix x n= = =∑ ∑
166.26 4.15740
ixx
n= = ≈∑ ; 4.18 4.21 4.195
2M +
= =
National League: 2149.93; 576.4971; 40i ix x n= = =∑ ∑
149.93 3.74840
ixx
n= = ≈∑ , 3.84 3.87 3.855
2M +
= =
The American League has both the higher mean and median earned-run average.
(b) American League:
( ) ( )2
2
2
1
166.26715.1876
40 0.78740 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑
National League:
( ) ( )2
2
2
1
149.93576.4971
40 0.61040 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑
The American League has more dispersion.
39. (a) Since 70 is exactly 2 standard deviations below the mean [70 = 100 – 2(15)] and 130 is exactly 2 standard deviations above the mean [130 = 100 + 2(15)], the Empirical Rule predicts that approximately 95% of people has an IQ score between 70 and 130.
(b) Since about 95% of people has an IQ score between 70 and 30, then approximately 5% of people has an IQ score either less than 70 or greater than 130.
(c) Approximately 5% / 2 2.5%= of people has an IQ score greater than 130.
40. (a) Since 404 is exactly 1 standard deviation below the mean [404 = 518 – 114] and 632 is exactly 1 standard deviation above the mean [632 = 518 + 114], the Empirical Rule predicts that approximately 68% of SAT scores is between 404 and 632.
(b) Since about 68% of SAT scores is between 404 and 632, then approximately 32% of people of SAT scores is either less than 404 or greater than 632.
(c) Since 746 is exactly 2 standard deviations above the mean [746 = 518 + 2(114)], the Empirical Rule predicts that approximately 2.5% of SAT scores is greater than 746.
41. (a) Approximately 95% of the data will be within 2 standard deviations of the mean. Now, 325 – 2(30) = 265 and 325 + 2(30) = 385. Thus, about 95% of pairs of kidneys will be between 265 and 385 grams.
(b) Since 235 is exactly 3 standard deviations below the mean [235 = 325 – 3(30)] and 415 is exactly 3 standard deviations above the mean [415 = 325 + 3(30)], the Empirical Rule predicts that about 99.7% of pairs of kidneys weighs between 235 and 415 grams.
(c) Since about 99.7% of pairs of kidneys weighs between 235 and 415 grams, then about 0.3% of pairs of kidneys weighs either less than 235 or more than 415 grams.
(d) Since 295 is exactly 1 standard deviations below the mean [295 = 325 – 30] and 385 is exactly 2 standard deviations above the mean [385 = 325 + 2(30)], the Empirical Rule predicts that approximately 34% + 34% + 13.5% = 81.5% of pairs of kidneys weighs between 295 and 385 grams.
Chapter 3 Numerically Summarizing Data
142
42. (a) Approximately 68% of the data will be within 1 standard deviation of the mean. Now, 4 – 0.007 = 3.993 and 4 + 0.007 = 4.007. Thus, about 68% of bolts manufactured will be between 3.933 and 4.007 inches long.
(b) Since 3.986 is exactly 2 standard deviations below the mean [3.986 = 4 – 2(0.007)] and 4.014 is exactly 2 standard deviations above the mean [4.014 = 4 + 2(0.007)], the Empirical Rule predicts that about 95% of bolts manufactured will be between 3.986 and 4.014 inches long.
(c) Since about 95% of bolts is between 3.986 and 4.014 inches, then about 5% of bolts manufactured will either be shorter than 3.986 or longer than 4.014 inches. That is, about 5% of the bolts will be discarded.
(d) Since 4.007 is exactly 1 standard deviations above the mean [4.007 = 4 + 0.007] and 4.021 is exactly 3 standard deviations above the mean [4.021 = 4 + 3(0.007)], the Empirical Rule predicts that approximately 13.5% + 2.35% = 15.85% of bolts manufactured will be between 4.007 and 4.021 inches long.
43. (a) By Chebyshev’s inequality, at least 2 2
1 11 100% 1 100% 88.9%3k
⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ ≈⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
of
gasoline prices has prices within 3 standard deviations of the mean.
(b) By Chebyshev’s inequality, at least 2 2
1 11 100% 1 100% 84%2.5k
⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
of
gasoline prices has prices within k = 2.5 standard deviations of the mean. Now, 1.37 2.5(0.05) 1.245− = and 1.37 2.5(0.05) 1.495+ = . Thus, the gasoline prices that are within 2.5 standard deviations of the mean are from $1.245 to $1.495.
(c) Since 1.27 is exactly k = 2 standard deviations below the mean [1.27 = 1.37 – 2(0.05)] and 1.47 is exactly k = 2 standard deviations above the mean [1.47 = 1.37 + 2(0.05)],
Chebyshev’s theorem predicts that at least %75%100211%10011 22 =⋅⎟⎠⎞
⎜⎝⎛ −=⋅⎟
⎠⎞
⎜⎝⎛ −
k of
gas stations has prices between $1.27 and $1.47 per gallon.
44. (a) By Chebyshev’s inequality, at least 2 2
1 11 100% 1 100% 75%2k
⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
of
commuters in Boston has a commute time within 2 standard deviations of the mean.
(b) By Chebyshev’s inequality, at least 2 2
1 11 100% 1 100% 55.6%1.5k
⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ ≈⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
of
commuters in Boston has a commute time within 1.5 standard deviations of the mean. Now, 27.3 1.5(8.1) 15.15− = and 27.3 1.5(8.1) 39.45+ = . Thus, the commute times within 1.5 standard deviations of the mean are from 15.15 to 39.45 minutes.
(c) Since 3 is exactly k = 3 standard deviations below the mean [3 = 27.3 – 3(8.1)] and 51.6 is exactly k = 3 standard deviations above the mean [51.6 = 27.3 + 3(8.1)],
Chebyshev’s theorem predicts that at least 2 2
1 11 100% 1 100% 88.9%3k
⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ ≈⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
of
gas stations has prices between $1.27 and $1.47 per gallon.
Section 3.2 Measures of Dispersion
143
45. When calculating the variability in team batting averages, we are finding the variability of means. When calculating the variability of all players, we are finding the variability of individuals. Since there is more variability among individuals than among means, the teams will have less variability.
46. (a) Range = Largest Data Value – Smallest Data Value = 75 – 30 = $45 thousand. For the population variance and standard deviation, we use the computational formula:
500ix =∑ ; 2 26,600ix =∑ ; 10N = ;
( ) ( )2 2
2
2 2
50026,600
10 160 thousand $10
ii
xx
NN
σ− −
= = =
∑∑;
( ) ( )2 2
2 50026,600
10 $12.6 thousand10
ii
xx
NN
σ− −
= = ≈
∑∑
(b) Add $2500 ($2.5 thousand) to each salary to form the new data set. New data set: 32.5, 32.5, 47.5, 52.5, 52.5, 52.5, 57.5, 57.5, 62.5, 77.5
Range = Largest Data Value – Smallest Data Value = 77.5 – 32.5 = $45 thousand. 525ix =∑ ; 2 29,162.5ix =∑ ; 10N = ;
( ) ( )2 2
2
2 2
52529,162.5
10 160 thousand $10
ii
xx
NN
σ− −
= = =
∑∑;
( ) ( )2 2
2 52529,162.5
10 $12.6 thousand10
ii
xx
NN
σ− −
= = ≈
∑∑
All three measures of variability remain the same. (c) Multiply each original data value by 1.05 to generate the new data set.
New data set: 31.5, 31.5, 47.25, 52.5, 52.5, 52.5, 57.75, 57.75, 63, 78.75 Range = Largest Data Value – Smallest Data Value = 78.75 – 31.5 = $47.25 thousand.
525ix =∑ ; 2 29,326.5ix =∑ ; 10N = ;
( ) ( )2 2
2
2 2
52529,326.5
10 176.4 thousand $10
ii
xx
NN
σ− −
= = =
∑∑;
( ) ( )2 2
2 52529,326.5
10 $13.3 thousand10
ii
xx
NN
σ− −
= = ≈
∑∑
All three measures of variability are larger than original, showing greater dispersion of salaries. (Note that R and σ are each 5% larger than original, and 2σ is 1.1025 times larger than original which is 2(1.05) .)
Chapter 3 Numerically Summarizing Data
144
(d) Add $25 thousand to the largest data value to form the new data set. New data set: 30, 30, 45, 50, 50, 50, 55, 55, 60, 100
Range = Largest Data Value – Smallest Data Value = 100 – 30 = $70 thousand. 525ix =∑ ; 2 30,975ix =∑ ; 10N = ;
( ) ( )2 2
2
2 2
52530,975
10 341.25 thousand $10
ii
xx
NN
σ− −
= = =
∑∑;
( ) ( )2 2
2 52530,975
10 $18.5 thousand10
ii
xx
NN
σ− −
= = ≈
∑∑
All three measures of variability are significantly larger than original.
47. Sample size of 5: All data recorded correctly: 5.3s ≈ . 106 recorded incorrectly as 160: 27.9s ≈ . Sample size of 12: All data recorded correctly: 14.7s ≈ . 106 recorded incorrectly as 160: 22.7s ≈ . Sample size of 30: All data recorded correctly: 15.9s ≈ . 106 recorded incorrectly as 160: 19.2s ≈ . As the sample size increases, the impact of the misrecorded data value on the standard deviation decreases.
48. We use the computational formula: 312ix =∑ ; 2 24,336ix =∑ ; 4n = ;
( ) ( )2 2
2 31224,336
4 01 4 1
ii
xx
nsn
− −= = =
− −
∑∑
If all values in a data set are identical, then there is zero variance.
49. (a) The coefficient of variation for blood pressure before exercise is =⋅ %100121
1.14 11.65%,
while the coefficient of variation for blood pressure after exercise is =⋅ %1009.1351.18
13.32%. There is more variability in systolic blood pressure after exercise. (b) The coefficient of variation for free calcium concentration in the group of people with
normal blood pressure is =⋅ %1009.1071.16 14.92%, while the coefficient of variation for
free calcium concentration in the group of people with high blood pressure is
=⋅ %1002.1687.31 18.85%. There is more variability in free calcium concentration in the
high blood pressure group.
Section 3.2 Measures of Dispersion
145
50. From Section 3.1, Exercise 17, we know $381.75x = .
( )
Data, Sample Mean, Deviations, Squared Deviations, 420 381.75 38.25 38.25462 381.75 80.25 80.25409 381.75 27.25 27.25236 381.75 145.75 145.75
0 291.50
i i i
i i
x x x x x x
x x x x
− −
−− = − =∑ ∑
| | $291.50MAD $72.875
4ix xn−
= = =∑ , which is somewhat less than the sample standard
deviation of $99.81s ≈ .
51. (a) Skewness = 3(50 40) 310−
= . The distribution is skewed to the right.
(b) Skewness = 3(100 100) 015−
= . The distribution is perfectly symmetric.
(c) Skewness = 3(400 500) 2.5120−
= − . The distribution is skewed to the left.
(d) Skewness = 3(0.8742 0.88) 0.440.0397
−≈ − . The distribution is slightly skewed to the left.
(e) Skewness = 3(104.136 104) 0.076.249
−≈ . The distribution is symmetric.
52. (a) Reading from the graph, the average annual return for a portfolio that is 10% foreign is 14.9%. The level of risk is 14.7%.
(b) To best minimize risk, 30% should be invested in foreign stocks. According to the graph, a 30% investment in foreign stocks has the smallest standard deviation (level of risk) at about 14.3%.
(c) Answers will vary. One possibility follows: The risk decreases because a portfolio including foreign stocks is more diversified.
(d) According to Chebyshev’s theorem, at least 75% of returns are within k = 2 standard deviations of the mean. Thus, at least 75% of returns are between x ks− = 15.8 2(14.3) 12.8%− = − and 15.8 2(14.3) 44.4%x ks+ = + = . By Chebyshev’s theorem, at least 88.9% of returns are within k = 3 standard deviations of the mean, Thus, at least 88.9% of returns are between 15.8 3(14.3) 27.1%x ks− = − = − and
15.8 3(14.3) 58.7%x ks+ = + = . An investor should not be surprised if she has a negative rate of return. Chebyshev’s theorem indicates that a negative return is fairly common.
Chapter 3 Numerically Summarizing Data
146
Consumer Reports®: Basement Waterproofing Coatings
(a) 546.2 91.03 g6
iA
xx
n= = ≈∑ ; 90.9 91.2 182.1 91.05 g
2 2AM += = = ;
There are 2 modes: 90.8 g and 91.2 g (each value occurs twice). (b) 546.2ix =∑ ; 2 49,722.66ix =∑ ; 6n = ;
( ) ( )2 2
2 546.249,722.66
6 0.23 g1 6 1
ii
A
xx
nsn
− −= = ≈
− −
∑∑
(c) 522.3 87.05 g6
iB
xx
n= = =∑ ; 87.0 87.1 174.1 87.05 g
2 2BM += = =
There are 2 modes: 87.0 g and 87.2 g (each value occurs twice).
(d) 522.2ix =∑ ; 2 45, 448.9ix =∑ ; 6n = ;
( ) ( )2 2
2 522.345,466.33
6 0.15 g1 6 1
ii
B
xx
nsn
− −= = ≈
− −
∑∑
(e) A B 86 887 0 0 1 2 28889
9 8 8 903 2 2 91
Yes, there appears to be a difference in these two product’s ability to mitigate water seepage. All 6 of the measurements for product B are less than the measurements for product A. Although it is not clear whether there is any practical difference in these two products ability to mitigate water seepage, product B appears to do a better job. 3.3 Measures of Central Tendency and Dispersion from Grouped Data 1. When we approximate the mean and standard deviation from grouped data, we assume that
all of the data points within each group can be approximated by the midpoint of that group.
2. ixx
n= ∑ is a weighted average in which the value of each weight is one.
Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
147
3. Class Midpoint, ix Frequency, if i ix f x ix x− ( )2i ix x f−
10 – 19
10 202
15+= 8 120 32.8333 17.8333− 2544.2127
20 – 29
20 302
25+= 16 400 32.8333 7.8333− 981.7694
30 – 39 35 21 735 32.8333 2.1667 98.5864 40 – 49 45 11 495 32.8333 12.1667 1628.3145 50 – 59 55 4 220 32.8333 22.1667 1965.4504 60if =∑ 1970i ix f =∑ ( )2 7218.3334ix x f− =∑
1970 32.8333 $32.8360
i i
i
x fx
f= = ≈ ≈∑∑
; ( )( )
27185.3334 $11.06
60 11i
i
x x fs
f−
= = ≈−−
∑∑
4. Class Midpoint, ix Frequency, if i ix f μ ix μ− ( )2i ix fμ−
1 – 5
1 62
3.5+= 11 38.5 14.5714 11.0714− 1348.3349
6 – 10
6 112
8.5+= 0 0 14.5714 6.0714− 0
11 – 15 13.5 5 67.5 14.5714 1.0714− 5.7395 16 – 20 18.5 6 111 14.5714 3.9286 92.6034 21 – 25 23.5 1 23.5 14.5714 8.9286 79.7199 26 – 30 28.5 2 57 14.5714 13.9286 388.0118 31 – 35 33.5 1 33.5 14.5714 18.9286 358.2919 36 – 40 38.5 2 77 14.5714 23.9286 1145.1558 28if =∑ 408i ix f =∑ ( )2 3417.8572ix fμ− =∑
408 14.5714 14.6 points28
i i
i
x ff
μ = = ≈ ≈∑∑
; ( )2
3417.8572 11.0 points28
i
i
x ffμ
σ−
= = ≈∑∑
Chapter 3 Numerically Summarizing Data
148
5. Class Midpoint, ix Frequency, if i ix f μ ix μ− ( )2i ix fμ−
0 – 9
0 102
5+= 31 155 17.3 12.3− 4689.99
10 – 19
10 202
15+= 39 585 17.3 2.3− 206.31
20 – 29 25 17 425 17.3 7.7 1007.93 30 – 39 35 6 210 17.3 17.7 1879.74 40 – 49 45 4 180 17.3 27.7 3069.16 50 – 59 55 2 110 17.3 37.7 2842.58 60 – 69 65 1 65 17.3 47.7 2275.29 100if =∑ 1730i ix f =∑ ( )2 15,971ix fμ− =∑
1730 17.3 days100
i i
i
x ff
μ = = =∑∑
; ( )2
15,971 12.6 days100
i
i
x ffμ
σ−
= = ≈∑∑
6. Class Midpoint, ix Frequency, if i ix f x ix x− ( )2i ix x f−
0 – 9
0 102
5+= 24 120 21.6 16.6− 6613.44
10 – 19
10 202
15+= 14 210 21.6 6.6− 609.84
20 – 29 25 39 975 21.6 3.4 450.84 30 – 39 35 18 630 21.6 13.4 3232.08 40 – 49 45 5 225 21.6 23.4 2737.8 100if =∑ 2160i ix f =∑ ( )2 13,644ix x f− =∑
2160 21.6 hr/wk100
i i
i
x fx
f= = =∑∑
; ( )( )
213,644 11.7 hr/wk100 11
i
i
x x fs
f−
= = ≈−−
∑∑
Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
149
7. Class Midpoint, ix Frequency, if
(in millions) i ix f μ ix μ− ( )2i ix fμ−
25 – 34
25 352
30+= 28.9 867 44.4695 14.4695− 6050.6898
35 – 44
35 452
40+= 35.7 1428 44.4695 4.4695− 713.1586
45 – 54 50 35.1 1755 44.4695 5.5305 1073.5837 55 – 64 60 24.7 1482 44.4695 15.5305 5957.5518 124.4if =∑ 5532i ix f =∑ ( )2 13,794.9839ix fμ− =∑
5532 44.4695 44.5 yrs124.4
i i
i
x ff
μ = = ≈ ≈∑∑
; ( )2
13,794.9839 10.5 yrs124.4
i
i
x ffμ
σ−
= = ≈∑∑
8. Class Midpt, ix Freq, if i ix f μ ix μ− ( )2
i ix fμ−
0 – 0.9 0 1
20.5+
= 539 269.5 2.7627 2.2627− 2759.5783
1.0 – 1.9
1 22
1.5+= 1 1.5 2.7627 1.2627− 1.5944
2.0 – 2.9 2.5 1336 3340 2.7627 0.2627− 92.1991 3.0 – 3.9 3.5 1363 4770.5 2.7627 0.7373 740.9422 4.0 – 4.9 4.5 289 1300.5 2.7627 1.7373 872.2631 5.0 – 5.9 5.5 21 115.5 2.7627 2.7373 157.3490 6.0 – 6.9 6.5 2 13 2.7627 3.7373 27.9348 3551if =∑ 9810.5i ix f =∑ ( )2 4651.8609ix fμ− =∑
9810.5 2.7627 2.83551
i i
i
x ff
μ = = ≈ ≈∑∑
; ( )2
4651.8609 1.13551
i
i
x ffμ
σ−
= = ≈∑∑
Chapter 3 Numerically Summarizing Data
150
9. (a) Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−
50 – 59
50 602
55+= 1 55 80.9350 25.9350− 672.6242
60 – 69
60 702
65+= 308 20,020 80.9350 15.9350− 78,208.6613
70 – 79 75 1519 113,925 80.9350 5.9350 53,505.5977 80 – 89 85 1626 138,210 80.9350 4.0650 26,868.3900 90 – 99 95 503 47,785 80.9350 14.0650 99,505.5851 100 – 109 105 11 1155 80.9350 24.0650 6370.3665 3968if =∑ 321,150i ix f =∑ ( )2 265,131.2248ix fμ− =∑
321,150 80.9350 80.9 F3968
i i
i
x ff
μ = = ≈ ≈ °∑∑
; ( )2
265,131.2248 8.2 F3968
i
i
x ffμ
σ−
= = ≈ °∑∑
(b) 18001600140012001000 800 600 400 200 0 50 60 70 80 90 100 110
Temperature
Freq
uenc
y
High Temperatures in August in Chicago
(c) By the Empirical Rule, 95% of the observations will be within 2 standard deviations of the mean. Now, 2 80.9 2(8.2) 64.5μ σ− = − = and 2 80.9 2(8.2) 97.3μ σ+ = + = , so 95% of the of days in August will have temperatures between 64.5 F° and 97.3 F° .
Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
151
10. (a) Class Midpoint, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−
20 – 24
20 252
22.5+= 4 90 37.8333 15.3333− 940.4404
25 – 29
25 302
27.5+= 15 412.5 37.8333 10.3333− 1601.6563
30 – 34 32.5 27 877.5 37.8333 5.3333− 767.9904 35 – 39 37.5 40 1500 37.8333 0.3333− 4.4436 40 – 44 42.5 28 1190 37.8333 4.6667 609.7865 45 – 49 47.5 15 712.5 37.8333 9.6667 1401.6763 50 – 54 52.5 4 210 37.8333 14.6667 860.4484 55 – 59 57.5 2 115 37.8333 19.6667 773.5582 135if =∑ 5107.5i ix f =∑ ( )2 6960.0001ix fμ− =∑
5107.5 37.8333 37.8 in135
i i
i
x ff
μ = = ≈ ≈∑∑
; ( )2
6960.0001 7.2 in135
i
i
x ffμ
σ−
= = ≈∑∑
(b) Annual Rainfall for St. Louis, MO4540353025201510 5 0 20 25 30 35 40 45 50 60 65
Rainfall (inches)
Freq
uenc
y
(c) By the Empirical Rule, 95% of the observations will be within 2 standard deviations of the mean. Now, 2 37.8 2(7.2) 23.4μ σ− = − = and 2 37.8 2(7.2) 52.2μ σ+ = + = , so 95% of annual rainfalls in St. Louis will be between 23.4 and 52.2 inches.
Chapter 3 Numerically Summarizing Data
152
11. (a) Class Midpoint, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−
15 – 19
15 202
17.5+= 93 1627.5 32.2721 14.7721− 20,293.99
20 – 24
20 252
22.5+= 511 11,497.5 32.2721 9.7721− 48,787.40
25 – 29 27.5 1628 44,770 32.2721 4.7721− 37,074.34 30 – 34 32.5 2832 92,040 32.2721 0.2279 147.09 35 – 39 37.5 1843 69,112.5 32.2721 5.2279 50,370.92 40 – 44 42.5 377 16,022.5 32.2721 10.2279 39,437.95 7284if =∑ 235,070i ix f =∑ ( )2 196,111.69ix fμ− =∑
235,070 32.2721 32.3 yr7284
i i
i
x ff
μ = = ≈ ≈∑∑
; ( )2
196,111.69 5.2 yr7284
i
i
x ffμ
σ−
= = ≈∑∑
(b) 30002500200015001000 500 0 15 20 25 30 35 40 45
Mother’s Age
Freq
uenc
y
Number of Multiple Births in 2002
(c) By the Empirical Rule, 95% of the observations will be within 2 standard deviations of the mean. Now, 2 32.3 2(5.2) 21.9μ σ− = − = and 2 32.3 2(5.2) 42.7μ σ+ = + = , so 95% of mothers of multiple births will be between 21.9 and 42.7 years of age.
Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
153
12. (a) Class Midpoint, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−
400 – 449
400 4502
425+= 281 119,425 603.1482 178.1482− 8,918,035.5
450 – 499
450 5002
475+= 577 274,075 603.1482 128.1482− 9,475,471.6
500 – 549 525 840 441,000 603.1482 78.1482− 5,129,998.6
550 – 599 575 1120 644,000 603.1482 28.1482− 887,399.7
600 – 649 625 1166 728,750 603.1482 21.8518 556,766.4
650 – 699 675 900 607,500 603.1482 71.8518 4,646,413.0
700 – 749 725 518 375,550 603.1482 121.8518 7,691,192.1
750 – 800 775.5 394 305,547 603.1482 172.3518 11,703,826.
3 5796if =∑ 3,495,650i ix f =∑ ( )2 49,009,103.2ix fμ− =∑
3, 495,847 603.1482 603.15796
i i
i
x ff
μ = = ≈ ≈∑∑
; ( )2
49,009,103.2 92.05796
i
i
x ffμ
σ−
= = ≈∑∑
(b) SAT Verbal Scores, 200312001000 800 600 400 200 0
400 450 500 550 600 650 700 750 800 Score
Freq
uenc
y
(c) By the Empirical Rule, 95% of the observations will be within 2 standard deviations of the mean. Now, 2 603.1 2(92.0) 419.1μ σ− = − = and 2 603.1 2(92) 787.1μ σ+ = + = , so 95% of ISACS college-bound seniors will have SAT Verbal scores between 419 and 787.
Chapter 3 Numerically Summarizing Data
154
13. Class Midpt, ix Freq, if i ix f x ix x− ( )2i ix x f−
20 – 29
20 302
25+= 1 25 51.75 26.75− 715.5625
30 – 39
30 402
35+= 6 210 51.75 16.75− 1683.375
40 – 49 45 10 450 51.75 6.75− 455.625 50 – 59 55 14 770 51.75 3.25 147.875 60 – 69 65 6 390 51.75 13.25 1053.375 70 – 79 75 3 225 51.75 23.25 1621.6875 40if =∑ 2070i ix f =∑ ( )2 5677.5ix x f− =∑
2070 51.75 51.840
i i
i
x fx
f= = = ≈∑∑
(compared to 51.1 using the raw data.);
( )( )
25677.5 12.140 11
i
i
x x fs
f−
= = ≈−−
∑∑
(compared to 10.9 using the raw data.)
14. Class Midpoint, ix Frequency, if i ix f x ix x− ( )2i ix x f−
3 – 4.99
3 52
4+= 12 48 6 2− 48
5 – 6.99
5 72
6+= 14 84 6 0 0
7 – 8.99 8 6 48 6 2 24 9 – 10.99 10 3 30 6 4 48 35if =∑ 210i ix f =∑ ( )2 120ix x f− =∑
210 6 million shares35
i i
i
x fx
f= = =∑∑
(compared to 5.88 million shares using the raw data.);
( )( )
2120 1.879 million shares
35 11i
i
x x fs
f−
= = ≈−−
∑∑
(compared to 2.059 million shares using
the raw data.)
Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
155
15. 5(3) 3(4) 4(4) 3(2) 49GPA 3.275 3 4 3 15
i iw
i i
w xx
w x+ + +
= = = = ≈+ + +
∑∑
16. 5(100) 10(93) 60(86) 25(85) 8715Course Average 87.15%5 10 60 25 100
i iw
i i
w xx
w x+ + +
= = = = =+ + +
∑∑
17. Cost per pound = 4($3.50) 3($2.75) 2($2.25) $2.974 3 2
i iw
i i
w xx
w x+ +
= = ≈+ +
∑∑
/lb
18. Cost per pound = 2.5($1.30) 4($4.50) 2($3.75) $3.382.5 4 2
i iw
i i
w xx
w x+ +
= = ≈+ +
∑∑
/lb
19. (a) Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−
0 – 9 5 20,225 101,125 35.6058 30.6058− 18,945,060.7
10 – 19 15 21,375 320,625 35.6058 20.6058− 9,075,803.5
20 – 29 25 20,437 510,925 35.6058 10.6058− 2,298,814.9
30 – 39 35 21,176 741,160 35.6058 0.6058− 7,771.5
40 – 49 45 22,138 996,210 35.6058 9.3942 1,953,700.5
50 – 59 55 16,974 933,570 35.6058 19.3942 6,384,515.4
60 – 69 65 10,289 668,785 35.6058 29.3942 8,889,891.4
70 – 79 75 6,923 519,225 35.6058 39.3942 10,743,824.4
80 – 89 85 3,053 259,505 35.6058 49.3942 7,448,669.7
90 – 99 95 436 41,420 35.6058 59.3942 1,538,064.6
143,026if =∑ 5,092,550i ix f =∑ ( )2 67,286,116.6ix fμ− =∑
5,092,550 35.6058 35.6 yr143,026
i i
i
x ff
μ = = ≈ ≈∑∑
; ( )2
67,286,116.6 21.7 yr143,026
i
i
x ffμ
σ−
= = ≈∑∑
Chapter 3 Numerically Summarizing Data
156
(b) Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−
0 – 9 5 19,319 96,595 38.0872 33.0872− 21,149,722.6
10 – 19 15 20,295 304,425 38.0872 23.0872− 10,817,616.6
20 – 29 25 19,459 486,475 38.0872 13.0872− 3,332,836.4
30 – 39 35 20,936 732,760 38.0872 3.0872− 199,536.9
40 – 49 45 22,586 1,016,370 38.0872 6.9128 1,079,312.8
50 – 59 55 17,864 982,520 38.0872 16.9128 5,109,868.6
60 – 69 65 11,563 751,595 38.0872 26.9128 8,375,067.1
70 – 79 75 9,121 684,075 38.0872 36.9128 12,427,862.3
80 – 89 85 5,367 456,195 38.0872 46.9128 11,811,751.5
90 – 99 95 1,215 115,425 38.0872 56.9128 3,935,466.2
147,725if =∑ 5,626,435i ix f =∑ ( )2 78,239,041.0ix fμ− =∑
5,626,435 38.0872 38.1 yr147,725
i i
i
x ff
μ = = ≈ ≈∑∑
; ( )2
78,239,041 23.0 yr147,725
i
i
x ffμ
σ−
= = ≈∑∑
(c) & (d) Females have both a higher mean age and more dispersion in age.
20. (a) Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−
10 – 14 12.5 1.1 13.75 25.8462 13.4996− 200.4631
15 – 19 17.5 53.0 927.5 25.8462 8.4996− 3828.8896
20 – 24 22.5 115.1 2589.75 25.8462 3.4996− 1409.6527
25 – 29 27.5 112.9 3104.75 25.8462 1.5004 254.1605
30 – 34 32.5 61.9 2011.75 25.8462 6.5004 2615.5969
35 – 39 37.5 19.8 742.5 25.8462 11.5004 2618.7322
40 – 44 42.5 3.9 165.75 25.8462 16.5004 1061.8265
45 – 49 47.5 0.2 9.5 25.8462 21.5004 92.4534
367.9if =∑ 9565.25i ix f =∑ ( )2 12,081.7749ix fμ− =∑
Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
157
9565.25 25.9996 26.0 yr367.9
i i
i
x ff
μ = = ≈ ≈∑∑
; ( )2
12,081.7749 5.7 yr367.9
i
i
x ffμ
σ−
= = ≈∑∑
(b) Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−
10 – 14 12.5 0.7 8.75 27.6180 15.118− 159.9877
15 – 19 17.5 43.0 752.5 27.6180 10.118− 4402.0787
20 – 24 22.5 103.6 2331 27.6180 5.118− 2713.6905
25 – 29 27.5 113.6 3124 27.6180 0.118− 1.5818
30 – 34 32.5 91.5 2973.75 27.6180 4.882 2180.8040
35 – 39 37.5 41.4 1552.5 27.6180 9.882 4042.8725
40 – 44 42.5 8.3 352.75 27.6180 14.882 1838.2336
45 – 49 47.5 0.5 23.75 27.6180 19.882 197.6470
402.6if =∑ 11,119i ix f =∑ ( )2 12,081.7749ix fμ− =∑
11,1119 27.6180 27.6 yr402.6
i i
i
x ff
μ = = ≈ ≈∑∑
; ( )2
15,536.8952 6.2 yr402.6
i
i
x ffμ
σ−
= = ≈∑∑
(c) & (d) The year 2002 has both the higher mean age of mothers and more dispersion in the age of mothers.
21. Class Frequency, f Cumulative Frequency, CF 0 – 9 31 31 10 – 19 39 70 20 – 29 17 87 30 – 39 6 93 40 – 49 4 97 50 – 59 2 99 60 – 69 1 100
The total frequency is 100, so the position of the median is 100 502 2n= = , which is in the
second class, 10 – 19. Then ( )50 312 10 20 10 14.9 days39
n CFM L i
f
− −= + ⋅ = + − ≈ .
Chapter 3 Numerically Summarizing Data
158
22. Class Frequency, f Cumulative Frequency, CF 0 – 9 24 24 10 – 19 14 38 20 – 29 39 77 30 – 39 18 95 40 – 49 5 100
The total frequency is 100, so the position of the median is 100 502 2n= = , which is in the
third class, 20 – 29. Then ( )50 382 20 30 20 23.1 hr/wk39
n CFM L i
f
− −= + ⋅ = + − ≈ .
23. Class Frequency, f (millions) Cumulative Frequency, CF (millions) 25 – 34 28.9 28.9 35 – 44 35.7 64.6 45 – 54 35.1 99.7 55 – 64 24.7 124.4
The total frequency is 124.4 (million), so the position of the median is 124.4 62.22 2n= = ,
which is in the second class, 35 – 44. Then
( )62.2 28.92 35 45 35 44.3 years35.7
n CFM L i
f
− −= + ⋅ = + − ≈ .
24. Class Frequency, f Cumulative Frequency, CF 0 – 0.9 539 539 1.0 – 1.9 1 540 2.0 – 2.9 1336 1876 3.0 – 3.9 1363 3239 4.0 – 4.9 289 3528 5.0 – 5.9 21 3549 6.0 – 6.9 2 3551
The total frequency is 3551, so the position of the median is 3551 1775.52 2n= = , which is
in the third class, 2.0 – 2.9. Then ( )1775.5 5402 2.0 3.0 2.0 2.91336
n CFM L i
f
− −= + ⋅ = + − ≈ .
Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data
159
25. From the table in Problem 5, the modal class (highest frequency class) is 10 – 19 days.
26. From the table in Problem 6, the modal class (highest frequency class) is 20 – 29 hr/wk.
27. From the table in Problem 7, the modal class (highest frequency class) is 25 – 44 years.
28. From the table in Problem 8, the modal class (highest frequency class) is 3.0 – 3.9.
29. (a) Answers will vary. One possibility follows: Many colleges do not permit students under age 16 to enroll in courses, so a reasonable midpoint to use would be 17.
(b) Answers will vary. One possibility follows: Since it is not likely that many students would be over 70 years old, a reasonable midpoint would be 60.
(c) Answers will vary depending on choices for midpoints in parts (a) and (b). Using the choices midpoints from above:
Class Midpoint, ix Freq, if i ix f
Less than 18 17 139 2363
18 – 19 18 20
219+
= 4089 77,691
20 – 21
20 222
21+= 3357 70,497
22 – 24
22 252
23.5+= 1661 39,033.5
25 – 29
25 302
27.5+= 470 12,925
30 – 34
30 352
32.5+= 145 4712.5
35 – 39
35 402
37.5+= 95 3562.5
40 – 49
40 502
45+= 117 5265
50 and above 60 21 1260 10,094if =∑ 217,309.5i ix f =∑
217,309.5 21.5 years10,094
i i
i
x ff
μ = = ≈∑∑
. This estimate is a little higher than the actual
mean age of 20.9 years.
Chapter 3 Numerically Summarizing Data
160
3.4 Measures of Position 1. Answers will vary. The kth percentile of a set of data is the value which divides the bottom
k% of the data from the top (100–k)% of the data. For example, if a data value lies at the 60th percentile, then approximately 60% of the data is below it and approximately 40% is above this value.
2. This can happen because the percentile is rounded to the nearest integer. For example, if
there were 150 scores in the class then the percentile for the top score would be given by 149 100 99.3150
⋅ = which rounds to the 99th percentile, while the next score would
correspond to a percentile of 148 100 98.7150
⋅ = which also rounds to the 99th percentile.
3. A four-star mutual fund is in the top 40% but not in the top 20% of its investment class.
That is, it is above the bottom 60% but below the top 20% of the ranked funds. 4. Not necessarily. When an outlier is discovered it should be investigated to find its cause.
Once the cause is determined, then it can be determined whether it should be removed from the data set.
5. To qualify for Mensa, one needs to have an IQ that is in the top 2% of people. 6. Comparing z-scores gives us a unitless comparison of standard deviations from the mean.
They also take the relative size and variability of the data into account. This allows us to have a standard basis for comparison and also enables us to more easily detect possible outliers.
7. z-score for the 34-week gestation baby: 2400 2600 0.30670
xz μσ− −
= = ≈ −
z-score for the 40-week gestation baby: 3300 3500 0.42475
xz μσ− −
= = ≈ −
The weight of a 34-week gestation baby is 0.30 standard deviations below the mean, while the weight of a 40-week gestation baby is 0.42 standard deviations below the mean. Thus, the 40-week gestation baby weighs less relative to the gestation period.
8. z-score for the 34-week gestation baby: 3000 2600 0.60670
xz μσ− −
= = ≈
z-score for the 40-week gestation baby: 3900 3500 0.84475
xz μσ− −
= = ≈
The weight of a 34-week gestation baby is 0.60 standard deviations above the mean, while the weight of a 40-week gestation baby is 0.84 standard deviations above the mean. Thus, the 34-week gestation baby weighs less relative to the gestation period.
Section 3.4 Measures of Position
161
9. z-score for the 75-inch man: 75 69.6 22.7
xz μσ− −
= = =
z-score for the 70-inch woman: 70 64.1 2.272.7
xz μσ− −
= = ≈
The height of the 75-inch man is 2 standard deviations above the mean, while the height of a 70-inch woman is 2.27 standard deviations above the mean. Thus, the 70-inch woman is relatively taller than the 75-inch man.
10. z-score for the 68-inch man: 68 69.6 0.592.7
xz μσ− −
= = ≈ −
z-score for the 62-inch woman: 62 64.1 0.812.7
xz μσ− −
= = ≈ −
The height of the 68-inch man is 0.59 standard deviations below the mean, while the height of a 62-inch woman is 0.81 standard deviations below the mean. Thus, the 68-inch man is relatively taller than the 62-inch woman.
11. z-score for Jake Peavy: 2.27 4.198 2.500.772
xz μσ− −
= = ≈ −
z-score for Johann Santana: 2.61 4.338 2.200.785
xz μσ− −
= = ≈ −
Jake Peavy’s 2004 ERA was 2.50 standard deviations below the mean, while Johann Santana’s 2004 ERA was 2.20 standard deviations below the mean. Thus, Peavy had the better year relative to his peers.
12. z-score for Ted Williams: 0.406 0.28062 3.820.03281
xz μσ− −
= = ≈
z-score for Ichiro Suzuki: 0.372 0.26992 4.740.02154
xz μσ− −
= = ≈
Ted Williams’ 1941 batting average was 3.82 standard deviations above the mean, while Ichiro Suzuki’s 2004 batting average was 4.74 standard deviations above the mean. Thus, Suzuki had the better year relative to his peers.
13. The data provided in Table 17 are already listed in ascending order.
(a) ( ) ( )401 51 1 20.8100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 20.8 is not an integer, we average
the 20th and 21st data values: 40325.5 333.2 329.35
2P +
= = . This means that
approximately 40% of the states have violent crime rates less than 329.35 crimes per 100,000 population, and approximately 60% of the states have violent crime rates more than this.
Chapter 3 Numerically Summarizing Data
162
(b) ( ) ( )951 51 1 49.4100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 49.4 is not an integer, we average
the 49th and 50th data values: 95730.2 793.5 761.85
2P +
= = . This means that
approximately 95% of the states have violent crime rates less than 761.85 crimes per 100,000 population, and approximately 5% of the states have violent crime rates more than this.
(c) ( ) ( )101 51 1 5.2100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 5.2 is not an integer, we average the
5th and 6th data values: 10173.4 221.0 197.2
2P +
= = . This means that approximately
10% of the states have violent crime rates less than 197.2 crimes per 100,000 population, and approximately 90% of the states have violent crime rates more than this.
(d) Of the 51 states, 48 have a violent crime rate less than Florida’s violent crime rate.
Percentile rank of Florida 48 100 9451
⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠
. Florida’s violent crime rate is at the 94th
percentile. This means that approximately 94% of the states have violent crime rates that are less than that of Florida, and approximately 6% of the states have violent crime rates that are larger than that of Florida.
(e) Of the 51 states, 40 have a violent crime rate less than California’s violent crime rate.
Percentile rank of California 40 100 7851
⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠
. California’s violent crime rate is at the
78th percentile. This means that approximately 78% of the states have violent crime rates that are less than that of California, and approximately 22% of the states have violent crime rates that are larger than that of California.
14. The data provided in Table 17 are already listed in ascending order.
(a) ( ) ( )301 51 1 15.6100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 15.6 is not an integer, we average
the 15th and 16th data values: 30275.8 285.6 280.7
2P +
= = . This means that
approximately 30% of the states have violent crime rates less than 280.7 crimes per 100,000 population, and approximately 70% of the states have violent crime rates more than this.
(b) ( ) ( )851 51 1 44.2100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 44.2 is not an integer, we average
the 44th and 45th data values: 85646.3 658.0 652.15
2P +
= = . This means that
approximately 85% of the states have violent crime rates less than 652.15 crimes per 100,000 population, and approximately 15% of the states have violent crime rates more than this.
Section 3.4 Measures of Position
163
(c) ( ) ( )51 51 1 2.6100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 2.6 is not an integer, we average the
2nd and 3rd data values: 5108.9 110.2 109.55
2P += = . This means that approximately
5% of the states have violent crime rates less than 109.55 crimes per 100,000 population, and approximately 95% of the states have violent crime rates more than this.
(d) Of the 51 states, 45 have a violent crime rate less than New Mexico’s violent crime
rate. Percentile rank of New Mexico 45 100 8851
⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠
. New Mexico’s violent crime
rate is at the 88th percentile. This means that approximately 88% of the states have violent crime rates that are less than that of New Mexico, and approximately 12% of the states have violent crime rates that are larger than that of New Mexico.
(e) Of the 51 states, 15 have a violent crime rate less than Rhode Island’s violent crime
rate. Percentile rank of Rhode Island 15 100 2951
⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠
. Rhode Island’s violent crime
rate is at the 29th percentile. This means that approximately 29% of the states have violent crime rates that are less than that of Rhode Island, and approximately 71% of the states have violent crime rates that are larger than that of Rhode Island.
15. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields 3.9935x = inches and 1.7790s ≈ inches. Using these values as approximations for
the μ and σ , the z-score for x = 0.97 inches 0.97 3.9935 1.701.7790
xz μσ− −
= ≈ ≈ − . The
rainfall in 1971 (0.97 inches) is 1.70 standard deviations below the mean. (b) The data provided are already listed in ascending order. There are n = 20 data points.
The index for the first quartile is ( )25100
20 1 5.25i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
. Since i = 5.2 is not an
integer, we average the 5th and 6th data values: 12.47 2.78
22.625Q +
= = inches. The
index for the second quartile is ( )50100
20 1 10.5i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
. Since i = 10.5 is not an
integer, we average the 10th and 11th data values: 23.97 4.0
23.985Q +
= = inches. The
index for the third quartile is ( )75100
20 1 15.75i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
. Since i = 15.75 is not an
integer, we average the 15th and 16th data values: 35.22 5.50
25.36Q +
= = inches.
(c) IQR 735.2625.236.5 =−=−= 13 QQ inches (d) Lower fence ( ) ( )1.5 IQR 2.625 1.5 2.735 1.4781Q= − = − = − inches.
Upper fence ( ) ( )3 1.5 IQR 5.36 1.5 2.735 9.463Q= + = + = inches. According to this criterion, there are no outliers.
Chapter 3 Numerically Summarizing Data
164
16. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields 10.08x = g/dL and 1.8858s ≈ g/dL. Using these values as approximations for the μ
and σ , the z-score for x = 7.8 g/dL is 7.8 10.08 1.211.8858
xz μσ− −
= ≈ ≈ − . Blackie’s
hemoglobin level (7.8 g/dL) is 1.21 standard deviations below the mean. (b) The data provided are already listed in ascending order. There are n = 20 data points.
The index for the first quartile is ( )25100
20 1 5.25i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
. Since i = 5.25 is not an
integer, we average the 5th and 6th data values: 18.9 9.4
29.15Q +
= = g/dL. The index
for the second quartile is ( )50100
20 1 10.5i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
. Since i = 10.5 is not an integer, we
average the 10th and 11th data values: 29.9 10.0
29.95Q +
= = g/dL. The index for the
third quartile is ( )75100
20 1 15.75i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
. Since i = 15.75 is not an integer, we
average the 15th and 16th data values: 311.0 11.2
211.1Q +
= = g/dL.
(c) IQR 11.1 9.15 1.953 1Q Q= − = − = g/dL (d) Lower fence ( ) ( )1.5 IQR 9.15 1.5 1.95 6.2251Q= − = − = g/dL.
Upper fence ( ) ( )3 1.5 IQR 11.1 1.5 1.95 14.025Q= + = + = g/dL. The hemoglobin level 5.7 g/dL is an outlier because it is less than the lower fence.
17. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields
15.9227x ≈ mg/L and 7.3837s ≈ mg/L. Using these values as approximations for the
μ and σ , the z-score for 20.46x = mg/L is 20.46 15.9227 0.617.3837
xz μσ− −
= ≈ ≈ . The
organic concentration of 20.46 mg/L is 0.61 standard deviations above the mean. (b) There are n = 33 data points, and we must put them in ascending order: 5.2, 5.29, 5.3, 6.51, 7.4, 8.09, 8.81, 9.72, 10.3, 11.4, 11.9, 14, 14.86, 14.86,
14.9, 15.35, 15.42, 15.72, 15.91, 16.51, 16.87, 17.5, 17.9, 18.3, 19.8, 20.46, 20.46, 22.49, 22.74, 27.1, 29.8, 30.91, 33.67
The index for the first quartile is ( )25100
33 1 8.5i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
. Since i = 8.5 is not an
integer, we average the 8th and 9th data values: 19.72 10.3
210.01Q +
= = mg/L. The
index for the second quartile is ( )50100
33 1 17i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
. Since i = 17 is an integer, the
17th data value is the second quartile: 2 15.42Q = mg/L. The index for the third
Section 3.4 Measures of Position
165
quartile is ( )75100
33 1 25.5i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
. Since i = 25.5 is not an integer, we average the
25th and 26th data values: 319.8 20.46
220.13Q +
= = mg/L.
(c) IQR 20.13 10.1 10.123 1Q Q= − = − = mg/L (d) Lower fence ( ) ( )1.5 IQR 10.01 1.5 10.12 5.171Q= − = − = − mg/L.
Upper fence ( ) ( )3 1.5 IQR 20.13 1.5 10.12 35.31Q= + = + = mg/L. According to this criterion, there are no outliers.
18. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields
10.0266x ≈ mg/L and 4.9789s ≈ mg/L. Using these values as approximations for the
μ and σ , the z-score for 20.46x = mg/L is 17.99 10.0266 1.604.9789
xz μσ− −
= ≈ ≈ . The
organic concentration of 17.99 mg/L is 1.60 standard deviations above the mean. (b) There are n = 47 data points, and we must put them in ascending order: 3.02, 3.79, 3.91, 3.99, 4.6, 4.71, 4.8, 4.85, 4.9, 5.5, 7, 7.11, 7.31, 7.45, 7.66,
7.85, 7.9, 7.92, 8.05, 8.37, 8.5, 8.5, 8.79, 9.1, 9.11, 9.29, 9.6, 9.81, 10.3, 10.72, 10.47, 10.89, 11.33, 11.56, 11.72, 11.72, 11.8, 11.97, 12.57, 12.89, 16.92, 17.9, 17.99, 21, 21.4, 21.82, 22.62
The index for the first quartile is ( )25100
47 1 12i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
. Since i = 12 is an integer, the
12th data value is the first quartile: 1 7.11Q = mg/L. The index for the second quartile
is ( )50100
47 1 24i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
. Since i = 24 is an integer, the 24th data value is the second
quartile: 2 9.1Q = mg/L. The index for the third quartile is ( )75100
47 1 36i ⎛ ⎞= + =⎜ ⎟⎝ ⎠
.
Since i = 36 is an integer, the 36th data is the third quartile: 3 11.72Q = mg/L. (c) IQR 11.72 7.11 4.613 1Q Q= − = − = mg/L (d) Lower fence ( ) ( )1.5 IQR 7.11 1.5 4.61 0.1951Q= − = − = mg/L.
Upper fence ( ) ( )3 1.5 IQR 20.13 1.5 10.12 18.635Q= + = + = mg/L. The organic carbon concentrations 21, 21.4, 21.82, and 22.62 mg/L are outliers because they are higher than the upper fence.
19. The first and third quartiles are 4331Q = minutes and 489.53Q = minutes.
Upper fence ( ) ( )3 1.5 IQR 489.5 1.5 489.5 433 574.25Q= + = + − = minutes. The cutoff point is 574 minutes. If more minutes are used, the customer is contacted.
20. The first and third quartiles are $841Q = and $1383Q = .
Upper fence ( ) ( )3 1.5 IQR 138 1.5 138 84 $219Q= + = + − = . If daily charges exceed $219, the customer will be contacted.
Chapter 3 Numerically Summarizing Data
166
21. (a) The first and third quartiles are $671Q = and $4793Q = . Lower fence ( ) ( )1 1.5 IQR 67 1.5 479 67 $551Q= − = − − = −
Upper fence ( ) ( )3 1.5 IQR 479 1.5 479 67 $1097Q= + = + − = . Therefore, $12,777 is an outlier because it is greater than the upper fence.
(b)
(c) Answers will vary. One possibility is that a student may have provided his or her
annual income instead of his or her weekly income. 22. (a) The first and third quartiles are $211Q = and $543Q = .
Lower fence ( ) ( )1 1.5 IQR 21 1.5 54 21 $28.50Q= − = − − = −
Upper fence ( ) ( )3 1.5 IQR 54 1.5 54 21 $103.50Q= + = + − = . Therefore, $115 and $1000 are outliers because they are greater than the upper fence.
(b)
(c) Answers will vary. One possibility follows: It is possible that $115 is correct but
simply an unusual situation. For the data value $1000, perhaps a student provided his or her annual expenditures for entertainment instead of his or her weekly expenditures.
Section 3.5 The Five-Number Summary and Boxplots
167
23. Pulse z-score 76 0.49 60 – 1.59 60 – 1.59 81 1.14 72 – 0.03 80 1.01 80 1.01 68 – 0.55 73 0.10 μ 72.2 0.0 = mean of the z-scores σ 7.671 1.00 = standard deviation of the z-scores 24. Travel Time z-score 39 0.98 21 – 0.42 9 – 1.36 32 0.43 30 0.28 45 1.44 11 – 1.20 12 – 1.12 39 0.98 μ 26.4 0.0 = mean of the z-scores σ 12.842 1.000 = standard deviation of the z-scores 3.5 The Five-Number Summary and Boxplots 1. The median and interquartile range are better measures of central tendency and dispersion
if the data are skewed or if the data contain outliers. 2. right 3. (a) The median is to the left of the center of the box and the right line is substantially
longer than the left line, so the distribution is skewed right. (b) Reading the boxplot, the five-number summary is approximately: 0, 1, 3, 6, 16. 4. (a) The median is near the center of the box and the horizontal lines are approximately the
same in length, so the distribution is symmetric. (b) Reading the boxplot, the five-number summary is approximately: 1− , 2, 5, 8, 11.
Chapter 3 Numerically Summarizing Data
168
5. The data in ascending order are as follows: 42, 43, 46, 46, 47, 48, 49, 49, 50, 50, 51, 51, 51, 51, 52, 52, 54, 54, 54, 54, 54, 55, 55, 55, 55, 56, 56, 56, 57, 57, 57, 57, 58, 60, 61, 61, 61, 62, 64, 64, 65, 68, 69
The smallest number (youngest president) in the data set is 42. The largest number in the data set is 69. The first quartile is 1 51Q = (the 11th data point). The median is 55M = (the 22nd data point). The third quartile is 3 58Q = (the 33rd data point). The five-number summary is 42, 51, 55, 58, 69. The upper and lower fences are: ( ) ( )1Lower fence 1.5 IQR 51 1.5 58 51 40.5Q= − = − − = ;
( ) ( )3Upper fence 1.5 IQR 58 1.5 58 51 68.5Q= + = + − = . Thus, 69 is an outlier.
The median is near the center of the box and the horizontal lines are approximately the
same in length, so the distribution is symmetric. 6. The data in ascending order are as follows:
1, 2, 8, 8, 11, 11, 12, 15, 16, 16, 17, 23, 23, 23, 23, 28, 28, 31, 33, 33, 35, 40 The smallest number in the data set is 1. The largest number in the data set is 40. The first
quartile is 111 11
211Q +
= = (the mean of the 5th and 6th data points). The median is
17 232
20M += = (the mean of the 11th and 12th data points). The third quartile is
328 28
228Q +
= = (the mean of the 16th and 17th data points). The five-number summary is
1, 11, 20, 28, 40. The upper and lower fences are: ( ) ( )1Lower fence 1.5 IQR 11 1.5 28 11 14.5Q= − = − − = − ;
( ) ( )3Upper fence 1.5 IQR 28 1.5 28 11 53.5Q= + = + − = . Thus, there are no outliers.
The median is near the center of the box and the horizontal lines are approximately the
same in length, so the distribution is symmetric.
Section 3.5 The Five-Number Summary and Boxplots
169
7. The data is ascending order are as follows: 1, 3, 3, 3, 3, 4, 4, 4, 5, 7, 7, 7, 9, 10, 10, 10, 12, 13, 14, 15, 16, 17, 17, 17, 17, 18, 19, 19, 21, 22, 23, 25, 27, 27, 29, 32, 35, 36, 45
The smallest number in the data set is 1. The largest number in the data set is 45. The first quartile is 1 7Q = (the 10th data point). The median is 15M = (the 20th data point). The third quartile is 3 22Q = (the 30th data point). The five-number summary is 1, 7, 15, 22, 45. The upper and lower fences are:
( ) ( )1Lower fence 1.5 IQR 7 1.5 22 7 15.5Q= − = − − = − ;
( ) ( )3Upper fence 1.5 IQR 22 1.5 22 7 44.5Q= + = + − = . Thus, 45 is an outlier.
The median is to the left of the center of the box and the right line is substantially longer
than the left line, so the distribution is skewed right. 8. The data is ascending order are as follows:
18, 19, 19, 19, 20, 21, 22, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 26, 27, 27, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 38, 39, 46
The smallest number in the data set is 18. The largest is 46. The first quartile is 1 26Q = (the 16th data point). The median is 30M = (the 32nd data point). The third quartile is
3 32Q = (the 48th data point). The five-number summary is 18, 26, 30, 32, 46. The upper and lower fences are: ( ) ( )1Lower fence 1.5 IQR 18 1.5 32 26 9Q= − = − − = − ;
( ) ( )3Upper fence 1.5 IQR 32 1.5 32 26 41Q= + = + − = . Thus, 46 is an outlier.
The median is to the right of the center of the box, so the distribution is skewed left.
Chapter 3 Numerically Summarizing Data
170
9. The data is ascending order are as follows: 0.598, 0.600, 0.600, 0.601, 0.602, 0.603, 0.605, 0.605, 0.605, 0.606, 0.607, 0.607, 0.608, 0.608, 0.608, 0.608, 0.608, 0.609, 0.610, 0.610, 0.610, 0.610, 0.611, 0.611, 0.612
The smallest number in the data set is 0.598. The largest is 0.612. The first quartile is
10.603 0.605
20.604Q +
= = (the mean of the 6th and 7th data points). The median is
0.608M = (the 13th data point). The third quartile is 30.610 0.610
20.610Q +
= = (the mean
of the 19th and 20th data points). The five-number summary is 0.598, 0.604, 0.608, 0.610, 0.612. The upper and lower fences are:
( ) ( )1Lower fence 1.5 IQR 0.604 1.5 0.610 0.604 0.595Q= − = − − = ;
( ) ( )3Upper fence 1.5 IQR 0.610 1.5 0.610 0.604 0.619Q= + = + − = . Thus, there are no outliers.
The median is to the right of the center of the box, so the distribution is skewed left.
Answers will vary concerning the source of variability in weight.
10. The data is ascending order are as follows: 421, 480, 581, 583, 598, 611, 616, 618, 643, 645, 646, 649, 653, 654, 660, 664, 666, 667, 669, 672, 675, 678, 679, 682, 683, 684, 688, 688, 692, 692, 698, 698, 704, 706, 707, 707, 711, 711, 713, 715, 726, 737, 740, 741, 787, 791, 802, 816, 821, 830, 971
The smallest number in the data set is 421. The largest number in the data set is 971. The first quartile is 1 653Q = (the 13th data point). The median is 684M = (the 26th data point). The third quartile is 3 713Q = (the 39th data point). The five-number summary is 421, 653, 684, 713, 971. The upper and lower fences are:
( ) ( )1Lower fence 1.5 IQR 653 1.5 713 653 563Q= − = − − = ;
( ) ( )3Upper fence 1.5 IQR 713 1.5 713 653 803Q= + = + − = . Thus, the data points 421, 480, 816, 821, 830, and 971 are outliers.
***
The median is near the center of the box. Though the left line is longer than the right line,
when we consider the positions of the outliers, the distribution is relatively symmetric. Answers will vary. Wyoming is very rural resulting in the need to drive further distances. New York is more urban with many mass transit systems resulting in many individual gasoline expenditures.
Section 3.5 The Five-Number Summary and Boxplots
171
11. (a) The data is ascending order are as follows: 28, 32, 33, 35, 36, 38, 39, 44, 44, 45, 45, 46, 46, 48, 48, 48, 49, 50, 51, 51, 51, 52, 52, 53, 53, 54, 55, 56, 56, 58, 59, 60, 60, 62, 63, 66, 69, 70, 70, 73
The smallest number in the data set is 28. The largest number in the data set is 73. The
first quartile is 145 45
245Q +
= = (the mean of the 10th and 11th data points). The
median is 51 512
51M += = (the mean of the 20th and 21st data points). The third
quartile is 358 59
258.5Q +
= = (the 30th and 31st data points). The five-number
summary is 28, 45, 51, 58.5, 73.
(b) ( ) ( )1Lower fence 1.5 IQR 45 1.5 58.5 45 24.75Q= − = − − = ;
( ) ( )3Upper fence 1.5 IQR 58.5 1.5 58.5 45 78.75Q= + = + − = . There are no outliers.
(c) The median is near the center of the box and the horizontal lines are approximately
equal in length, so the distribution is symmetric. This is confirmed by the histogram. (d) Since the distribution is symmetric and contains no outliers, the mean and standard
deviation should be reported as the measures of central tendency and dispersion. 12. (a) The data is ascending order are as follows:
3.01, 3.04, 3.25, 3.38, 3.38, 3.56, 3.78, 4.35, 4.43, 4.50, 4.74, 4.88, 5.00, 5.02, 5.32, 5.34, 5.53, 5.58, 5.64, 5.75, 6.06, 6.07, 6.23, 6.52, 6.57, 6.92, 7.16, 7.25, 7.57, 7.97, 8.40, 8.74, 9.70, 10.32, 10.96
The smallest number in the data set is 3.01. The largest number in the data set is 10.96. The first quartile is 1 4.43Q = (the 9th data point). The median is 5.58M = (the 18th data point). The third quartile is 3 7.16Q = (the 27th data point). The five-number summary is 3.01, 4.43, 5.58, 7.16, 10.96.
(b) ( ) ( )1Lower fence 1.5 IQR 4.43 1.5 7.16 4.43 0.335Q= − = − − = ;
( ) ( )3Upper fence 1.5 IQR 7.16 1.5 7.16 4.43 11.255Q= + = + − = . There are no outliers.
Chapter 3 Numerically Summarizing Data
172
(c) The median is to the left of the center of the box and the right line is substantially longer than the left line, so the distribution is skewed right. This is confirmed by the histogram.
(d) Since the distribution is skewed, the median and interquartile range should be reported as the measures of central tendency and dispersion.
13. (a) The data is ascending order are as follows:
0, 0, 0, 0, 0, 0, 0, 0.41, 0.62, 0.64, 0.67, 0.89, 0.94, 1.05, 1.06, 1.15, 1.22, 1.35, 1.68, 1.7, 1.7, 2.04, 2.07, 2.16, 2.38, 2.45, 2.59, 2.83
The smallest number in the data set is 0. The largest number in the data set is 2.83.
The first quartile is 10 0.41
20.205Q +
= = (the mean of the 7th and 8th data points). The
median is 1.05 1.062
1.055M += = (the mean of the 14th and 15th data points). The third
quartile is 31.7 2.04
21.87Q +
= = (the 21st and 22nd data points). The five-number
summary is 0, 0.205, 1.055, 1.87, 2.83.
(b) ( ) ( )1Lower fence 1.5 IQR 0.205 1.5 1.87 0.205 2.2925Q= − = − − = − ;
( ) ( )3Upper fence 1.5 IQR 1.87 1.5 1.87 0.205 4.3675Q= + = + − = . Thus, there are no outliers.
(c) The right line is substantially longer than the left line, so the distribution is skewed
right. This is confirmed by the histogram. (d) Since the distribution is skewed, the median and interquartile range should be reported
as the measures of central tendency and dispersion. 14. (a) The data is ascending order are as follows:
78, 107, 108, 161, 177, 225, 234, 237, 255, 262, 268, 274, 279, 285, 286, 291, 292, 311, 314, 343, 345, 351, 352, 352, 357, 375, 377, 402, 424, 444, 459, 470, 484, 496, 503, 539, 540, 553, 563, 579, 593, 599, 621, 638, 662, 717, 740, 770, 770, 822, 1633
The smallest number in the data set is 78. The largest is 1633. The first quartile is 1 279Q = (the 13th data point). The median is 375M = (the 26th data point). The third
quartile is 3 563Q = (the 39th data point). The five-number summary is 78, 279, 375, 563, 1633.
Section 3.5 The Five-Number Summary and Boxplots
173
(b) ( ) ( )1Lower fence 1.5 IQR 285.5 1.5 563 285.5 130.75Q= − = − − = − ;
( ) ( )3Upper fence 1.5 IQR 563 1.5 563 285.5 979.25Q= + = + − = . Thus, the data point 1633 is an outlier.
(c) The median is to the left of the center of the box, so the distribution is skewed right.
This is confirmed by the histogram. (d) Since the distribution is skewed, the median and interquartile range should be reported
as the measures of central tendency and dispersion. 15. The data in ascending order are:
Keebler: 20, 20, 21, 21, 21, 22, 23, 24, 24, 24, 25, 25, 26, 28, 28, 28, 28, 29, 31, 32, 33 Store Brand: 16, 17, 18, 21, 21, 21, 23, 23, 24, 24, 24, 25, 26, 26, 27, 27, 28, 29, 30, 31, 33 Since both sets of data contain n = 21 data points, the quartiles are in the same positions for
both sets. Namely, the first quartile is the mean of the 5th and 6th data points, the median is the 11th data point, and the third quartile is the mean of the 16th and 17th data points.
The five-number summaries are: Keebler: 20, 21.5, 25, 28, 33
Store Brand: 16, 21, 24, 27.5, 33 The fences for Keebler Chips Deluxe Chocolate Chip Cookies are:
( )Lower fence 21.5 1.5 28 21.5 11.75= − − = ; ( )Upper fence 28 1.5 28 21.5 37.75= + − = The fences for the store brand chocolate chip cookies are:
( )Lower fence 21 1.5 27.5 21 11.25= − − = ; ( )Upper fence 27.5 1.5 27.5 21 37.25= + − = So, neither data set has any outliers.
Keebler appears to have both a higher number of chocolate chips per cookie and the more
consistent number of chips per cookie.
Chapter 3 Numerically Summarizing Data
174
16. The data in ascending order are: Oklahoma: 18, 30, 40, 44, 47, 55, 61, 62, 64, 64, 73, 78, 79, 83, 145
Kansas: 42, 59, 62, 64, 68, 71, 73, 88, 91, 92, 95, 101, 113, 116, 122 Nebraska: 26, 28, 30, 55, 60, 61, 62, 63, 65, 69, 74, 81, 88, 102, 110 Since all three sets of data contain n = 15 data points, the quartiles are in the same positions
for all three sets. Namely, the first quartile is the 4th data point, the median is the 8th data point, and the third quartile is the 12th data point.
The five-number summaries are: Oklahoma: 18, 44, 62, 78, 145
Kansas: 42, 64, 88, 101, 122 Nebraska: 26, 55, 63, 81, 110
Oklahoma: ( )Lower fence 44 1.5 78 44 7= − − = − ; ( )Upper fence 78 1.5 78 44 129= + − = , so 145 is an outlier.
Kansas: ( )Lower fence 64 1.5 101 64 8.5= − − = ; ( )Upper fence 101 1.5 101 64 156.5= + − = , so there are no outliers.
Nebraska: ( )Lower fence 55 1.5 81 55 16= − − = ; ( )Upper fence 81 1.5 81 55 120= + − = , so there are no outliers.
Kansas appears to have a higher number of tornados per year. 17. The data in ascending order are: McGwire: 340, 341, 350, 350, 360, 360, 360, 369, 370, 370, 370, 370, 377, 380, 380, 380,
380, 380, 385, 385, 388, 390, 390, 390, 390, 398, 400, 400, 409, 410, 410, 410, 410, 410, 420, 420, 420, 420, 420, 423, 425, 430, 430, 430, 430, 430, 430, 430, 440, 440, 440, 450, 450, 450, 450, 452, 458, 460, 460, 461, 470, 470, 470, 478, 480, 500, 510, 510, 527, 550
The smallest number in the data set is 340. The largest number is 550. The first quartile is 1 380Q = (the mean of the 17th and 18th data points). The median is 420M = (the mean of
the 35th and 36th data points). The third quartile is 3 450Q = (the mean of the 53rd and 54th data points). The five-number summary for Mark McGwire is 340, 380, 420, 450, 550. Lower fence 380 1.5(450 380) 275= − − = ; Upper fence 450 1.5(450 380) 555= + − = . Thus, there are no outliers.
Section 3.5 The Five-Number Summary and Boxplots
175
Sosa: 340, 344, 350, 350, 350, 360, 364, 364, 365, 366, 368, 370, 370, 370, 370, 370, 371, 380, 380, 380, 380, 380, 380, 388, 390, 390, 400, 400, 400, 400, 400, 405, 410, 410, 410, 410, 410, 414, 415, 420, 420, 420, 420, 420, 420, 420, 420, 430, 430, 430, 430, 430, 430, 433, 433, 434, 434, 440, 440, 440, 450, 460, 480, 480, 482, 500,
The smallest number in the data set is 340. The largest number is 500. The first quartile is 1 370.5Q = (the mean of the 16th and 17th data points). The median is 410M = (the mean
of the 33rd and 34th data points). The third quartile is 3 430Q = (the mean of the 50th and 51st data points). The five-number summary for Sammy Sosa is 340, 370.5, 410, 430, 500.
( )Lower fence 370.5 1.5 430 370.5 281.25= − − = ;
( )Upper fence 430 1.5 430 370.5 519.25= + − = . Thus, there are no outliers. (Note: The TI-84 gives 1 371Q = because the calculator uses a different, but acceptable, procedure for determining the quartiles. In most cases, the different procedures produce the same results, but in this case, they differ slightly.)
Bonds: 320, 320, 347, 350, 360, 360, 360, 361, 365, 370, 370, 375, 375, 375, 375, 380, 380, 380, 380, 380, 385, 390, 390, 391, 394, 396, 400, 400, 400, 400, 404, 405, 410, 410, 410, 410, 410, 410, 410, 410, 410, 410, 411, 415, 415, 416, 417, 417, 420, 420, 420, 420, 420, 420, 420, 420, 429, 430, 430, 430, 430, 430, 435, 435, 436, 440, 440, 440, 440, 442, 450, 454, 488
The smallest number in the data set is 320. The largest number is 488. The first quartile is 1 380Q = (the mean of the 18th and 19th data points). The median is 410M = (the 37th
data point). The third quartile is 3 420Q = (the mean of the 55th and 56th data points). The five-number summary for Barry Bonds is 320, 380, 410, 420, 488.
( )Lower fence 380 1.5 420 380 320= − − = ; ( )Upper fence 420 1.5 420 380 480= + − = . Thus, 488 is an outlier.
Mark McGwire appears to have longer distances. Barry Bonds appears to have the most
consistent distances.
Chapter 3 Numerically Summarizing Data
176
Chapter 3 Review Exercises
1. (a) 7925.110
792.51 m/sx
nx = = =∑ ; 792.4 792.4
2792.4 m/sM +
= =
Data in order: 789.6, 791.4, 791.7, 792.3, 792.4, 792.4, 793.1, 793.8, 794.0, 794.4 (b) Range = Largest Data Value – Smallest Data Value = 974.4 789.6 4.8 m/s− = .
( )2
2
2
2
Data, Sample Mean, Deviations, Squared Deviations, 793.8 792.51 793.8 792.51 1.29 1.29 1.6641793.1 792.51 793.1 792.51 0.59 0.59 0.3481792.4 792.51 792.4 792.51 0.11 ( 0.11) 0.0121794.0 792.51 79
i i ix x x x x x− −− = =− = =− = − − =
2
2
2
2
2
34.0792.51 1.49 1.49 2.2201791.4 792.51 791.4 792.51 1.11 ( 1.11) 1.2321792.4 792.51 792.4 792.51 0.11 ( 0.11) 0.0121791.7 792.51 791.7 792.51 0.81 ( 0.81) 0.6561792.3 792.51 792.3 792.51 0.21 ( 0.21) 0.0
= =− = − − =− = − − =− = − − =− = − − =
( ) ( )
2
2
2
441789.6 792.51 789.6 792.51 2.91 ( 2.91) 8.4681794.4 792.51 794.4 792.51 1.89 1.89 3.5721
7925.1 0 18.2290i ix x x x x
− = − − =− = =
= − = − =∑ ∑ ∑
( )2
2 218.229041 10 1
2.03 (m/s)ix xn
s−
− −= = ≈∑ ;
( )218.22904
1 10 11.42 m/six x
ns
−
− −= = ≈∑ .
2. (a) 126810
126.8 beats/minx
nx = = =∑ ; 128 129
2128.5 beats/min.M +
= =
Data in order: 86, 96, 115, 120, 128, 129, 136, 143, 146, 169 (b) Range = Largest Data Value – Smallest Data Value = 169 86 83 beats/min.− =
( )2Data, Sample Mean, Deviations, Squared Deviations, 136 126.8 9.2 84.64169 126.8 42.2 1780.84120 126.8 6.8 46.24128 126.8 1.2 1.44129 126.8 2.2 4.84143 126.8 16.2 262.44115 126.8 11.8 139.24146 126.8 19.2 368.64
i i ix x x x x x− −
−
−
( ) ( )2
96 126.8 30.8 948.6486 126.8 40.8 1664.64
1268 0 5301.60i ix x x x x
−−
= − = − =∑ ∑ ∑
Chapter 3 Review Exercises
177
( )2
2 25301.601 10 1
589.1 (beats/min.)ix xn
s−
− −= = ≈∑ ;
( )25301.60
1 10 124.3 beats/min.ix x
ns
−
− −= = ≈∑
3. (a) 91, 6109
10,178.8889 $10,178.89x
nx = = ≈ ≈∑ ; $9,980M =
Data in order: 5500, 7200, 7889, 8998, 9980, 10995, 12999, 13999, 14050 (b) Range = Largest Data Value – Smallest Data Value 14,050 5,500 $8,550= − = .
( )2Data, Sample Mean, Deviations, Squared Deviations, 14,050 10,178.8889 3871.1111 14,985,501.113,999 10,178.8889 3820.1111 14,593,248.812,999 10,178.8889 2820.1111 7,953,026.610,995 10,178.8889 816.111
i i ix x x x x x− −
1 666,037.39,980 10,178.8889 198.8889 39,556.88,998 10,178.8889 1180.8889 1,394,498.67,889 10,178.8889 2289.8889 5,243,591.27,200 10,178.8889 2978.8889 8,873,779.15,550 10,178.8889 4678.8889 21,892,001.3
91,61x
−−−−−
=∑ ( ) ( )20 0 75,641,240.9i ix x x x− = − =∑ ∑
( )2
75, 641, 240.91 9 1
$3,074.92ix xn
s−
− −= = ≈∑ .
(c) 118, 6109
13,178.8889 $13,178.89x
nx = = ≈ ≈∑
Data in order: 5500, 7200, 7889, 8998, 9980, 10995, 12999, 13999, 41050 $9,980M = ; Range 41,050 5,500 $35,550= − = .
( )2Data, Sample Mean, Deviations, Squared Deviations, 41,050 13,178.8889 27,871.1111 776,798,833.913,999 13,178.8889 820.1111 672,582.212,999 13,178.8889 179.8889 32,360.010,995 13,178.8889 2183.8889 4
i i ix x x x x x− −
−− ,769,370.7
9,980 13,178.8889 3198.8889 10,232,890.18,998 13,178.8889 4180.8889 17,479,831.97,889 13,178.8889 5289.8889 27,982,924.57,200 13,178.8889 5978.8889 35,747,112.45,550 13,178.8889 7678.8889 58,965,334.7
−−−−−
( ) ( )2118,610 0 932,681,240.9i ix x x x x= − = − =∑ ∑ ∑
Chapter 3 Numerically Summarizing Data
178
( )2
932, 681, 240.91 9 1
$10,797.46ix xn
s−
− −= = ≈∑ .
The mean, range, and standard deviation are all changed considerably by the incorrectly entered data value. The median does not change. The median is resistant.
4. (a) 2, 071, 02415
138,068.2667 $138,068x
nx = = ≈ ≈∑
Data in order: 99000, 115000, 124757, 128429, 135512, 136529, 136833, 136924, 138820, 140794, 149143, 149380, 153146, 157216, 169541
$136,924M =
(b) Range = Largest Data Value – Smallest Data Value 169,541 99,000 $70,541= − = .
( )2Data, Sample Mean, Deviations, Squared Deviations, 138,820 138,068.2667 751.7333 565,103169,541 138,068.2667 31,472.7333 990,532,941135,512 138,068.2667 2556.2667 6,534,499149,143 138,068.2667 11,07
i i ix x x x x x− −
−4.7333 122,649,717
140,794 138,068.2667 2,725.7333 7,429,622153,146 138,068.2667 15,078.7333 227,338,04199,000 138,068.2667 39,068.2667 1,526,329,462
136,924 138,068.2667 1,144.2667 1,309,346136,833 138,068.2667 1,
−−− 235.2667 1,525,884
115,000 138,068.2667 23,068.2667 532,144,928124,757 138,068.2667 13,311.2667 177,189,821128,429 138,068.2667 9,639.2667 92,915,463157,216 138,068.2667 19,147.7333 366,635,690149,380 138,068.266
−−−
( ) ( )2
7 11,311.7333 127,955,310136,529 138,068.2667 1,539.2667 2,369,342
91,610 0 4,183,425,169i ix x x x x
−
= − = − =∑ ∑ ∑
( )2
4,183, 425,1691 15 1
$17,286.30ix xn
s−
− −= = ≈∑ .
Chapter 3 Review Exercises
179
5. (a) 93316
58.3 yearsx
Nμ = = ≈∑
Data in order: 44, 46, 51, 55, 56, 56, 56, 58, 59, 62, 62, 62, 64, 65, 68, 69
58 59 58.5 years2
M += =
The data is bimodal: 56 years and 62 years. Both have frequencies of 3.
(b) Range = 69 – 44 = 25 years To calculate the population standard deviation, we use the computational formula:
( ) ( )
2 22 933
55,16916
166.9 years
ii
xx
NN
σ− −
= =
≈
∑∑
(c) Answers will vary depending on samples selected.
2
2
Data value, Data value squared, 44 193656 313651 260146 211659 348156 313658 336455 302565 422564 409668 462469 476156 313662 384462 384462 3844
933 55,169
i i
i i
x x
x x= =∑ ∑
6. (a) To find the mean, we find 2846ix =∑ and 16n = , so 2846 177.9 home runs16
μ = ≈ .
To find the median, we put the data in order and find the mean of the 8th and 9th data
values: 183 185 184 home runs2
M += = . The mode is the most frequent data value,
which is 135 home runs. (b) Range = 235 – 135 = 100 home runs. To find the standard deviation, we determine
2 521,902ix =∑ . So,
( ) ( )2
2
22846521,902
16 31.3 home runs16
ii
xx
NN
σ− −
= = ≈
∑∑.
(c) Answers will vary. (d) The reporter is not lying because the mode is an “average”. He is being deceptive,
however, because the word “average” is usually meant as the mean.
7. (a) To find the mean, we determine 78ix =∑ and 36n = , so 78 2.2 children36
x = ≈ .
To find the median, we put the data in order and find the mean of the 18th and 19th data
values: 2 3 2.5 children2
M += = .
Chapter 3 Numerically Summarizing Data
180
(b) Range = 4 – 0 = 4 children. To find the standard deviation, we determine 2 224ix =∑ .
( ) ( )2
2
2
1
78224
36 1.3 children36 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑.
8. (a) To find the mean, we determine 134ix =∑ and 30n = , so 134 4.5 cars30
x = ≈ .
To find the median, we put the data in order and find the mean of the 15th and 16th data
values: 4 5 4.5 cars2
M += = .
(b) Range = 9 – 1 = 8 cars. To find the standard deviation, we determine 2 754ix =∑ .
( ) ( )2
2
2
1
134754
30 2.3 cars30 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑.
9. (a) By the Empirical Rule, approximately 99.7% of the data will be within 3 standard
deviations of the mean. Now, 600 – 3(53) = 441 and 600 + 3(53) = 759. Thus, about 99.7% of light bulbs have lifetimes between 441 and 759 hours.
(b) Since 494 is exactly 2 standard deviations below the mean [494 = 600 – 2(53)] and 706 is exactly 2 standard deviations above the mean [706 = 600 + 2(53)], the Empirical Rule predicts that approximately 95% of the light bulbs will have lifetimes between 494 and 706 hours.
(c) Since 547 is exactly 1 standard deviations below the mean [547 = 600 – 1(53)] and 706 is exactly 2 standard deviations above the mean [706 = 600 + 2(53)], the Empirical Rule predicts that approximately 34 + 47.5 = 81.5% of the light bulbs will have lifetimes between 547 and 706 hours.
(d) Since 441 hours is 3 standard deviations below the mean [441 = 600 – 3(53)], the Empirical Rule predicts that 0.15% of light bulbs will last less than 441 hours. Thus, the company should expect to replace about 0.15% of the light bulbs.
(e) By Chebyshev’s theorem, at least 2 2
1 11 100% 1 100% 84%2.5k
⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
of all the
light bulbs are within k = 2.5 standard deviations of the mean. (f) Since 494 is exactly k = 2 standard deviations below the mean [494 = 600 – 2(53)] and
706 is exactly 2 standard deviations above the mean [706 = 600 + 2(53)], Chebyshev’s
inequality indicates that at least 2 2
1 11 100% 1 100% 75%2k
⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
of the light
bulbs will have lifetimes between 494 and 706 hours. 10. (a) By the Empirical Rule, approximately 99.7% of the data will be within 3 standard
deviations of the mean. Now, 4302 – 3(340) = 3282 and 4302 + 3(340) = 5322. Thus, about 99.7% of toner cartridges will print between 3282 and 5322 page.
Chapter 3 Review Exercises
181
(b) Since 3622 is exactly 2 standard deviations below the mean [3622 = 4302 – 2(340)] and 4982 is exactly 2 standard deviations above the mean [4982 = 4302 + 2(340)], the Empirical Rule predicts that approximately 95% of the toner cartridges will print between 3622 and 4982 hours.
(c) Since 3622 is exactly 2 standard deviations below the mean [3622 = 4302 – 2(340)] the Empirical Rule predicts that 0.15 + 2.35 = 2.5% of the toner cartridges light bulbs will last less than 3622 pages. Thus, the company should expect to replace about 2.5 of the toner cartridges.
(d) By Chebyshev’s theorem, at least 2 2
1 11 100% 1 100% 55.6%1.5k
⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ ≈⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
of all the
toner cartridges are within k = 1.5 standard deviations of the mean. (e) Since 3282 is exactly k = 3 standard deviations below the mean [3282 = 4302 – 3(340)]
and 5322 is exactly 3 standard deviations above the mean [5322 = 4302 + 3(340)],
Chebyshev’s inequality indicates that at least 2 2
1 11 100% 1 100% 88.9%3k
⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ ≈⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
of the toner cartidges will print between 3282 and 5322 pages.
11. Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−
20 – 24 22.5 6035 135,787.5 42.2826 19.7826− 2,361,804.87
25 – 29 27.5 4352 119,680 42.2826 14.7826− 951,021.94
30 – 34 32.5 4083 132,697.5 42.2826 9.7826− 390,740.09
35 – 39 37.5 3933 147,487.5 42.2826 4.7826− 89,960.54
40 – 44 42.5 4194 178,245 42.2826 0.2174 198.22
45 – 49 47.5 3716 176,510 42.2826 5.2174 101,154.21
50 – 54 52.5 3005 157,762.5 42.2826 10.2174 313,707.76
55 – 59 57.5 2355 135,412.5 42.2826 15.2174 545,345.61
60 – 64 62.5 1664 104,000 42.2826 20.2174 680,148.79
65 – 69 67.5 1173 79,177.5 42.2826 25.2174 745,930.95
70 – 74 72.5 1025 74,312.5 42.2826 30.2174 935,918.54
75 – 79 77.5 895 69,362.5 42.2826 35.2174 1,110,037.41
80 – 84 82.5 744 61,380 42.2826 40.2174 1,203,374.81
37,174if =∑ 1,571,815i ix f =∑ ( )2 9,429,343.76ix fμ− =∑
(a) 1,571,815 42.2826 42.28 years37,174
i i
i
x ff
μ = = ≈ ≈∑∑
(b) ( )2
9,429,343.76 15.93 years37,174
i
i
x ffμ
σ−
= = ≈∑∑
Chapter 3 Numerically Summarizing Data
182
12. Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−
20 – 24 22.5 1903 42,817.5 43.7136 21.2136− 856,382.02
25 – 29 27.5 1415 38,912.5 43.7136 16.2136− 371,976.37
30 – 34 32.5 1364 44,330 43.7136 11.2136− 171,515.94
35 – 39 37.5 1430 53,625 43.7136 6.2136− 55,210.62
40 – 44 42.5 1409 59,882.5 43.7136 1.2136− 2,075.21
45 – 49 47.5 1242 58,995 43.7136 3.7864 17,806.34
50 – 54 52.5 1008 52,920 43.7136 8.7864 77,818.43
55 – 59 57.5 784 45,080 43.7136 13.7864 149,040.82
60 – 64 62.5 599 37,437.5 43.7136 18.7864 211,404.37
65 – 69 67.5 415 28,012.5 43.7136 23.7864 234,804.02
70 – 74 72.5 482 34,945 43.7136 28.7864 399,412.59
75 – 79 77.5 456 35,340 43.7136 33.7864 520,533.50
80 – 84 82.5 372 30,690 43.7136 38.7864 559,631.15
12,879if =∑ 562,987.5i ix f =∑ ( )2 3,627,581.38ix fμ− =∑
(a) 562,987.5 43.7136 43.71 years12,879
i i
i
x ff
μ = = ≈ ≈∑∑
(b) ( )2
3,627,581.38 16.78 years12,879
i
i
x ffμ
σ−
= = ≈∑∑
(c) The mean age of a female involved in a traffic fatality is greater than the mean age of a male involved in a traffic fatality. Also, the ages of females involved in traffic a traffic fatality are more dispersed. Answers will vary. One possibility is that an insurance company might use this information in order to help establish the rates it would charge for insuring drivers.
13. 5(4) 4(3) 3(4) 3(2) 50GPA 3.335 4 3 3 15
i iw
i i
w xx
w x+ + +
= = = = ≈+ + +
∑∑
14. Cost per pound = 12
12
2($2.70) 1($1.30) ($1.80) $2.17 / lb2 1
i iw
i i
w xx
w x+ +
= = ≈+ +
∑∑
15. (a) Yankees: 184,193,950ix =∑ and 29n = , so Yankees184,193,950 $6,351,516
29μ = ≈ .
Mets: 96,660,970ix =∑ and 28n = , so Mets96,660,970 $3,452,177
28μ = ≈ .
Chapter 3 Review Exercises
183
(b) Yankees: Yankees $3,100,000M = (the 15th data value)
Mets: Mets800,000 1,000,000 $900,000
2M +
= = (the mean of the 14th and 15th values)
(c) In both cases, the mean is substantially larger than the median, so both distributions are skewed right.
(d) Yankees: 2 152.3001 10ix ≈ ×∑ , so
( ) ( )2
2
215
Yankees
184,193,9502.3001 10
29 $6,242,767.019
ii
xx
NN
σ− × −
= = ≈
∑∑
Mets: 2 149.37457 10ix ≈ ×∑ , so
( ) ( )2
2
214
Mets
96,660,9709.37457 10
28 $4,643,606.128
ii
xx
NN
σ− × −
= = ≈
∑∑
(e) Yankees: $301,400; $837,500; $3,100,000; $11,623,571.50; $22,000,000 Mets: $300,000; $318,750; $900,000; $4,666,666.50; $17,166,667
(f) Fences for the Yankees: Lower fence 837,500 1.5(11,623,571.50 837,500) $15,341,607.25= − − = − Upper fence 11,623,571.50 1.5(11,623,571.50 837,500) $27,802,678.75= + − = The Yankees have no outliers.
Fences for the Mets: Lower fence 318,750 1.5(4,666,666.50 318,750) $6,203,124.75= − − = − Upper fence 4,666,666.50 1.5(4,666,666.50 318,750) $11,188,541.25= + − = The data values $16,071,429 (Vaughn) and $17,166,667 (Piazza) are outliers.
Annotations will vary. One possibility is that the Mets’ salaries are clearly lower and
less dispersed than the Yankees’ salaries. (g) In both boxplots, the median is to the left of the center of the box and the right line is
substantially longer than the left line, so both distributions are skewed right. (h) For both distributions, the median is the better measure of central tendency since the
distributions are skewed.
Chapter 3 Numerically Summarizing Data
184
16. (a) Material A: 64.04ix =∑ and 10n = , so A64.04 6.404 million cycles
10x = = .
Material B: 113.32ix =∑ and 10n = , so B113.32 11.332 million cycles
10x = = .
(b) Material A: A5.69 5.88
25.785 million cyclesM +
= = (the mean of 5th and 6th values)
Material B: B8.20 9.65
28.925 million cyclesM +
= = (the mean of 5th and 6th values)
(c) In both cases, the mean is substantially larger than the median, so both distributions are skewed right.
(d) Material A: 2 472.177ix ≈∑ , so
( ) ( )2
2
2
A 1
64.04472.177
10 2.626 million cycles10 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑
Material B: 2 1597.4002ix ≈∑ , so
( ) ( )2
2
2
B 1
113.321597.4002
10 5.900 million cycles10 1
ii
xx
nn
s−
−
−= = ≈
−
∑∑
(e) Material A: 3.17; 4.52; 5.785; 8.01; 11.92 million cycles Material B: 5.78; 6.84; 8.925; 14.71; 24.37 million cycles
(f) Fences for Material A: Lower fence 4.52 1.5(8.01 4.52) 0.715 million cycles= − − = − Upper fence 8.01 1.5(8.01 4.52) 13.245 million cycles= + − = Material A has no outliers.
Fences for Material B: Lower fence 6.84 1.5(14.71 6.84) 4.965 million cycles= − − = − Upper fence 14.71 1.5(14.71 6.84) 26.515 million cycles= + − = Material B has no outliers
Bearing Failures
Chapter 3 Review Exercises
185
(g) In both boxplots, the median is to the left of the center of the box and the right line is substantially longer than the left line, so both distributions are skewed right.
(h) For both distributions, the median is the better measure of central tendency since the distributions are skewed.
17. The data provided are already listed in ascending order.
(a) ( ) ( )401 88 1 35.6100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 35.6 is not an integer, we average
the 35th and 36th data values: 40366,155 371,479 $368,817
2P +
= = . This means that
approximately 40% of drivers in the 2004 Nextel Cup Series earned less than $368,817, and approximately 60% of drivers in the 2004 Nextel Cup Series earned more than $368,817.
(b) ( ) ( )951 88 1 84.55100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 84.55 is not an integer, we average
the 84th and 85th data values: 955,692,620 6,221,710 $5,957,165
2P +
= = . This means
that approximately 95% of drivers in the 2004 Nextel Cup Series earned less than $5,957,165, and approximately 5% of drivers in the 2004 Nextel Cup Series earned more than $5,957,165.
(c) ( ) ( )101 88 1 8.9100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 8.9 is not an integer, we average the
8th and 9th data values: 1065,175 70,550 $67862.50
2P +
= = . This means that
approximately 10% of drivers in the 2004 Nextel Cup Series earned less than $67,862.50, and approximately 90% of drivers in the 2004 Nextel Cup Series earned more than $67,862.50.
(d) Of the 88 drivers in the 2004 Nextel Cup Series, 73 earned less than $4,117,750.
Percentile rank of $4,117,750 73 100 8388
⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠
. Thus, $4,117,750 was at the 83rd
percentile. This means that approximately 83% of drivers in the 2004 Nextel Cup Series earned less than $4,117,750, and approximately 17% of drivers in the 2004 Nextel Cup Series earned more than $4,117,750.
(e) Of the 88 drivers in the 2004 Nextel Cup Series, 13 earned less than $116,359.
Percentile rank of $116,359 13 100 1588
⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠
. Thus, $116,359 was at the 15th
percentile. This means that approximately 15% of drivers in the 2004 Nextel Cup Series earned less than $116,359, and approximately 85% of drivers in the 2004 Nextel Cup Series earned more than $116,359.
Chapter 3 Numerically Summarizing Data
186
18. The data provided are already listed in ascending order.
(a) ( ) ( )301 88 1 26.7100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 26.7 is not an integer, we average
the 26th and 27th data values: 30366,155 371,479 $268,422.50
2P +
= = . This means that
approximately 30% of drivers in the 2004 Nextel Cup Series earned less than $268,422.50, and approximately 70% of drivers in the 2004 Nextel Cup Series earned more than $268,422.50.
(b) ( ) ( )901 88 1 80.1100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 80.1 is not an integer, we average
the 80th and 81st data values: 904,759,020 5,152,670 $4,955,845
2P +
= = . This means
that approximately 90% of drivers in the 2004 Nextel Cup Series earned less than $4,955,845, and approximately 5% of drivers in the 2004 Nextel Cup Series earned more than $4,955,845.
(c) ( ) ( )51 88 1 4.45100 100
ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. Since i = 4.45 is not an integer, we average
the 4th and 5th data values: 557,450 57,590 $57,520
2P += = . This means that
approximately 5% of drivers in the 2004 Nextel Cup Series earned less than $57,520, and approximately 95% of drivers in the 2004 Nextel Cup Series earned more than $57,520.
(d) Of the 88 drivers in the 2004 Nextel Cup Series, 49 earned less than $1,333,520.
Percentile rank of $1,333,520 49 100 5688
⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠
. Thus, $1,333,520 was at the 56th
percentile. This means that approximately 56% of drivers in the 2004 Nextel Cup Series earned less than $1,333,520, and approximately 44% of drivers in the 2004 Nextel Cup Series earned more than $1,333,520.
(e) Of the 88 drivers in the 2004 Nextel Cup Series, 16 earned less than $139,614.
Percentile rank of $139,614 16 100 1888
⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠
. Thus, $139,614was at the 18th
percentile. This means that approximately 18% of drivers in the 2004 Nextel Cup Series earned less than $139,614, and approximately 82% of drivers in the 2004 Nextel Cup Series earned more than $139,614.
19. z-score for the female: 160 156.5 0.0751.2
xz μσ− −
= = ≈
z-score for the male: 185 183.4 0.0440
xz μσ− −
= = =
The weight of the 160-pound female is 0.07 standard deviations above the mean, while the weight of the 185-pound male is 0.04 standard deviations above the mean. Thus, the 160-pound female is relatively heavier.
Case Study Who Was “A Mourner”?
187
20. (a) Reading the boxplot, the median crime rate is approximately 4050 per 100,000 population.
(b) Reading the boxplot, the 25th percentile crime rate is approximately 3100 per 100,000 population.
(c) Reading the boxplot, there is one outlier. It is approximately 8000. (d) Reading the boxplot, the lowest crime rate is approximately 2200 per 100,000
population. Case Study: Who Was “A Mourner”?
1. The table below gives the length of each word, line by line in the passage. A listing is also provided of the proper names, numbers, abbreviation, and titles that have been omitted from the data set. 3, 7, 8, 3, 7, 3, 3, 6, 2, 3, 3, 2, 3 4, 3, 8, 2, 3, 7, 4, 2, 11 (omitted Richardson and 22d) 6, 3, 4, 9, 3, 7, 4, 2, 4, 2, 6, 4, 3 7, 5, 2, 8, 2, 4 (omitted Frogg Lane, Liberty-Tree, and Monday) 4, 3, 3, 7, 2, 7, 3, 4, 2, 10, 2, 6 5, 4, 8, 2, 3, 7, 2, 4, 6, 4, 3, 5, 6, 2 3, 5, 5, 5, 5, 6, 5, 4, 8, 8 2, 3, 8, 7, 2, 3, 6, 3, 6, 2, 3, 9 (omitted appear’d) 3, 6, 4, 3, 3, 7, 3, 5, 2, 9, 3 8, 8, 2, 6, 4, 3, 4, 5, 2, 3, 3, 4, 2, 7 5, 6, 8, 4, 3, 7, 6, 6, 5, 2, 3 6, 12, 5, 6, 2 (omitted Wolfe’s Summit of human Glory) 5, 2, 3, 1, 7, 6, 3, 5, 4, 4, 1, 6, 3
2. Mean = 4.54; Median = 4; Mode = 3; standard deviation 2.21≈ ; sample variance 4.90≈ ;
Range = 11; Minimum = 1; Maximum = 12; Sum = 649; Count = 143 Answers will vary. None of the provided authors match both the measures of central
tendency and the measures of dispersion well. In other words, there is no clear cut choice for the author based on the information provided. Based on measures of central tendency, James Otis or Samuel Adams would appear to be the more likely candidates for A MOURNER. Based on measures of dispersion, Tom Sturdy seems the more likely choice. Still, the unknown author’s mean word length differs considerably from that of Sturdy, and the unknown author’s standard deviation differs considerably from those of Otis and Adams.
Chapter 3 Numerically Summarizing Data
188
3. Comparing the two Adams summaries, both the measures of center and the measures of variability differ considerably for the two documents. For example, the means differ by 0.09 and the standard deviations differ by 0.19, not to mention the differences in word counts and the maximum length. This calls into question the viability of word-length analysis as a tool for resolving disputed documents. Word-length may be a part of the analysis needed to determine unknown authors, but other variables should also be taken into consideration.
4. Other information that would be useful to identify A MOURNER would be the style of the
rhetoric, vocabulary choices, use of particular phrases, and the overall flow of the writing. In other words, identifying an unknown author requires qualitative analysis in addition to quantitative analysis.