chapter 3

68
121 Chapter 3 Numerically Describing Data from One Variable 3.1 Measures of Central Tendency 1. A statistic is resistant if it is not sensitive to extreme data values. The median is resistant because it is a positional measure of central tendency and increasing the largest value or decreasing the smallest value does not affect the position of the center. The mean is not resistant because it is a function of the sum of the data values. Changing the magnitude of one value changes the sum of the values, and thus affects the mean. The mode is a resistant measure of center. 2. The men and the median are approximately equal when the data are symmetric. If the mean is significantly greater than the median, the data are skewed right. If the mean is significantly less than the median, the data are skewed left. 3. Since the distribution of household incomes in the United States is skewed to the right, the mean is greater than the median. Thus, the mean household income is $55,263 and the median is $41,349. 4. HUD uses the median because the data are skewed. Explanations will vary. One possibility is that the price of homes has a distribution that is skewed to the right, so the median is more representative of the typical price of a home. 5. The mean will be larger because it will be influenced by the extreme data values that are to the right end (or high end) of the distribution. 6. 10, 000 1 5000.5 2 + = . The median is between the 5000 th and the 5001 st ordered values. 7. The mode is used with qualitative data because the computations involved with the mean and median make no sense for qualitative data. 8. parameter; statistic 9. False. A data set may have multiple modes, or it may have no mode at all. 10. False. The formula 1 2 n + gives the position of the median, not the value of the median. 11. 20 13 4 8 10 55 11 5 5 x + + + + = = = 12. 83 65 91 87 84 420 84 5 5 x + + + + = = = 13. 3 6 10 12 14 45 9 5 5 μ + + + + = = =

Upload: b00035001

Post on 28-Oct-2014

322 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 3

121

Chapter 3 Numerically Describing Data from One Variable

3.1 Measures of Central Tendency

1. A statistic is resistant if it is not sensitive to extreme data values. The median is resistant because it is a positional measure of central tendency and increasing the largest value or decreasing the smallest value does not affect the position of the center. The mean is not resistant because it is a function of the sum of the data values. Changing the magnitude of one value changes the sum of the values, and thus affects the mean. The mode is a resistant measure of center.

2. The men and the median are approximately equal when the data are symmetric. If the mean is significantly greater than the median, the data are skewed right. If the mean is significantly less than the median, the data are skewed left.

3. Since the distribution of household incomes in the United States is skewed to the right, the mean is greater than the median. Thus, the mean household income is $55,263 and the median is $41,349.

4. HUD uses the median because the data are skewed. Explanations will vary. One possibility is that the price of homes has a distribution that is skewed to the right, so the median is more representative of the typical price of a home.

5. The mean will be larger because it will be influenced by the extreme data values that are to the right end (or high end) of the distribution.

6. 10,000 1 5000.52

+= . The median is between the 5000th and the 5001st ordered values.

7. The mode is used with qualitative data because the computations involved with the mean and median make no sense for qualitative data.

8. parameter; statistic

9. False. A data set may have multiple modes, or it may have no mode at all.

10. False. The formula 12

n + gives the position of the median, not the value of the median.

11. 20 13 4 8 10 55 115 5

x + + + += = =

12. 83 65 91 87 84 420 845 5

x + + + += = =

13. 3 6 10 12 14 45 95 5

μ + + + += = =

Page 2: Chapter 3

Chapter 3 Numerically Summarizing Data

122

14. 1 19 25 15 12 16 28 13 6 135 159 9

μ + + + + + + + += = =

15. 142 2.459

≈ . The mean price per ad slot is approximately $2.4 million.

16. Let x represent the missing value. Since there are 6 data values in the list, the median 26.5 is between the 3rd and 4th ordered values which are 21 and x, respectively. Thus, 21 26.5

221 53

32

x

xx

+=

+ ==

The missing value is 32.

17. 420 462 409 236 1527Mean $381.754 4

+ + += = =

Data in order: 236, 409, 420, 462 409 420 829Median $414.50

2 2+

= = =

No data value occurs more than once so there is no mode.

18. 35.34 42.09 39.43 38.93 43.39 49.26 248.44Mean $41.416 6

+ + + + += = ≈

Data in order: 35.34, 38.93, 39.43, 42.09, 43.39, 49.26 39.43 42.09 81.52Median $40.76

2 2+

= = =

No data value occurs more than once so there is no mode.

19. 3960 4090 3200 3100 2940 3830 4090 4040 3780 33,030Mean 3670 psi9 9

+ + + + + + + += = =

Data in order: 2940, 3100, 3200, 3780, 3830, 3960, 4040, 4090, 4090 Median = the 5th ordered data value = 3830 psi Mode = 4090 psi (because it is the only data value to occur twice)

20. 282 270 260 266 257 260 267 1862Mean 266 minutes7 7

+ + + + + += = =

Data in order: 257, 260, 260, 266, 267, 270, 282 Median = the 4th data value with the data in order = 266 minutes Mode = 260 minutes (because it is the only data value to occur twice)

21. (a) The histogram is skewed to the right, suggesting that the mean is greater than the median. That is, x M> .

(b) The histogram is symmetric, suggesting that the mean is approximately equal to the median. That is, x M= .

(c) The histogram is skewed to the left, suggesting that the mean is less than the median. That is, x M< .

Page 3: Chapter 3

Section 3.1 Measures of Central Tendency

123

22. (a) IV because the distribution is symmetric (so mean ≈ median) and centered near 30. (b) III because the distribution is skewed to the right, so mean > median. (c) II because the distribution is skewed to the left, so mean < median. (d) I because the distribution is symmetric (so mean ≈ median) and centered near 40.

23. Los Angeles ATM fees:

2.00 1.50 1.50 1.00 1.50 2.00 0.00 2.00 11.50Mean $1.448 8

+ + + + + + += = ≈

Data in order: 0.00, 1.00, 1.50, 1.50, 1.50, 2.00, 2.00, 2.00

1.50 1.50 3.00Median $1.502 2+

= = =

Mode = $1.50 and $2.00 (because both values occur three times each) New York City ATM fees:

1.50 1.00 1.00 1.25 1.25 1.50 1.00 0.00 8.50Mean $1.068 8

+ + + + + + += = ≈

Data in order: 0.00, 1.00, 1.00, 1.00, 1.25, 1.25, 1.50, 1.50

1.00 1.25 2.25Median $1.132 2+

= = ≈

Mode = $1.00 (because it occurs the more than the other values) The ATM fees in Los Angeles appear to be higher in general than those in New York City.

All three measures of center were higher for Los Angeles than for New York. Explanations will vary. Possibilities for the difference may be the number of ATMs available or the amount of ATM usage in each city.

24. Reaction Time to Blue:

0.582 0.481 0.841 0.267 0.685 0.45 3.306Mean 0.551 sec.6 6

+ + + + += = =

Data in order: 0.267, 0.45, 0.481, 0.582, 0.685, 0.841

0.481 0.582 1.063Median 0.5315 sec.2 2+

= = =

No data value occurs more than once so there is no mode. Reaction Time to Red:

0.408 0.407 0.542 0.402 0.456 0.533 2.748Mean 0.458 sec.6 6

+ + + + += = =

Data in order: 0.402, 0.407, 0.408, 0.456, 0.533, 0.542

0.408 0.456 0.864Median 0.432 sec.2 2+

= = =

No data value occurs more than once so there is no mode. There is a shorter reaction time to the red screen than to the blue screen. Explanations will

vary. This information could be useful in designing warning screens for computer software controlling critical operations (such as nuclear power plants, for one example).

Page 4: Chapter 3

Chapter 3 Numerically Summarizing Data

124

25. (a) 76 60 60 81 72 80 80 68 73 650 72.2 beats per minute9 9

μ + + + + + + + += = ≈

(b) Samples and sample means will vary. (c) Answers will vary.

26. (a) 39 21 9 32 30 45 11 12 39 238 26.4 minutes9 9

μ + + + + + + + += = ≈

(b) Samples and sample means will vary. (c) Answers will vary.

27. (a) 0 0 0 4 10 1 10 10 19 9 18 20 13 13 2 7 8 13 15718 18

8.7 goals per year

μ + + + + + + + + + + + + + + + + +==

(b) Samples and sample means will vary. (c) Answers will vary.

28. (a) time91.538 92.552 86.291 82.087 83.687 83.601 86.251 606.007

7 786.572 hours

μ + + + + + += =

Data in order: 82.087, 83.601, 83.687, 86.251, 86.291, 91.538, 92.552 Median = the 4th data value with the data in order = 86.251 hours

(b) distance3687 3662 3453 3278 3427 3391 3593 24,491 3499 km

7 7μ + + + + + +

= = ≈

Data in order: 3278, 3391, 3427, 3453, 3593, 3662, 3687 Median = the 4th data value with the data in order = 3453 km

(c) margin7.617 6.033 6.733 7.283 1.017 6.317 39.667 5.667 minutes

7 7μ + + + + +

= = ≈

Data in order: 1.017, 4.667, 6.033, 6.317, 6.733, 7.283, 7.617 Median = the 4th data value with the data in order = 6.317 minutes

(d) Mean winning speed: 40.28 39.56 40.02 39.93 40.94 40.56 41.657

282.94 40.420 km/h7

μ + + + + + +=

= =

Winning speed: distance 24, 491 40.414 km/hr

time 606.007= ≈∑

Winning speed: distance

time

3499 40.417 km/hr86.572

μμ

= ≈

The three results agree approximately. The differences are due to rounding.

29. The distribution is relatively symmetric as is evidenced by both the histogram and the fact that the mean and median are approximately equal. Therefore, the mean is the better measure of central tendency.

Page 5: Chapter 3

Section 3.1 Measures of Central Tendency

125

30. The distribution is skewed right as is evidenced by both the histogram and the fact that the mean is significantly greater than the mean. Therefore, the median is the better measure of central tendency.

31. (a) 51.1; 51x M≈ = . (b) The mean is approximately equal to the median suggesting that the distribution is

symmetric, and this is confirmed by the histogram.

32. (a) 5.88 million shares; 5.58 million sharesx M≈ = . (b) The mean is greater than the mean suggesting that the distribution is skewed right, and

this is confirmed by the histogram.

33.

0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 Weight (grams)

Fre

quen

cy

Weight of Plain M&Ms

0.874 grams; 0.88 gramsx M≈ = . The mean is approximately equal to the median suggesting that the distribution is symmetric. This is confirmed by the histogram (though is does appear to be slightly skewed left). The mean is the better measure of central tendency.

34.

90 95 100 105 110 115 120 125Length (seconds)

Fre

quen

cy

Length of Eruptions

5

15

104.1 seconds; 104 secondsx M≈ = . The mean is approximately equal to the median suggesting that the distribution is symmetric. This is confirmed by the histogram. The mean is the better measure of central tendency.

Page 6: Chapter 3

Chapter 3 Numerically Summarizing Data

126

35.

0 5 10 15 20 25 30 35 40 45 Hours

Fre

quen

cyHours Worked per Week

22 hour; 25 hoursx M= = . The mean is smaller than the median suggesting that the

distribution is skewed left. This is confirmed by the histogram. The median is the better measure of central tendency.

36.

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Dollars

Num

ber o

f Sal

es

Car Dealer’s Profit

$1392.83; $1177.50x M≈ = . The mean is significantly greater than the median

suggesting that the distribution is skewed right. This is confirmed by the histogram. The median is the better measure of central tendency.

37. The highest frequency is 12,362, and so the mode region of birth is Central America.

38. The highest frequency is 131, and so the mode offense is Street or Highway.

39. The vote counts are: Bush = 21, Kerry = 17, Nader = 1, and Badnarik = 1. The mode candidate is Bush.

40. The frequencies are: Cancer = 1, Gunshot wound = 8, Assault = 1, Motor vehicle accident = 7, Fall = 2, and Congestive heart failure = 1. The mode diagnosis is Gunshot wound.

41. Sample size of 5: All data recorded correctly: 99.8; 100x M= = . 106 recorded at 160: 110.6; 100x M= = . Sample size of 12: All data recorded correctly: 100.4; 101x M≈ = . 106 recorded at 160: 104.9; 101x M≈ = . Sample size of 30: All data recorded correctly: 100.6; 99x M= = . 106 recorded at 160: 102.4; 99x M= = . For each sample size, the mean becomes larger while the median remains the same. As the sample size increases, the impact of the misrecorded data value on the mean decreases.

Page 7: Chapter 3

Section 3.1 Measures of Central Tendency

127

42. (a) 27.1 years; 27 years; Mode 26 yearsMμ ≈ = = (b) 249.8 lb; 245 lb; Mode 305 lbMμ = = = (c) 4.6 years; 4 years; Mode 3 yearsMμ ≈ = =

(d) The frequency for Purdue is 3. The frequencies of all other colleges are lower than 3, so the mode college attended is Purdue.

(e) Samples and sample means will vary. (f) Offensive guards: 306.4 lb; 305 lb; Mode 305 lbMμ = = =

Running backs: 217.8 lb; 220 lb; Mode 225 lbMμ = = = Yes, there appears to be differences in the weights of offensive guards and running backs. All three measures of center indicate that offensive guards are significantly heavier than running backs. This is due to the nature of the positions. Offensive guards must be able to protect the quarterback while the running back must be able to run quickly.

(g) It does not make sense to compute the mean player number. The variable “player number” is qualitative, so the quantitative calculations will be meaningless.

43. Samples and sample means will vary.

44. NBA salaries are likely significantly skewed to the right. Therefore, since the median will be lower than the mean, the players would rather use the median salary to support the claim that the average player’s salary need to be increases. The negotiator for the owners would rather use the mean salary.

45. The amount of money lost per visitor is likely skewed to the right. Therefore, the median loss would be less than the mean because the mean amount would be inflated by those few visitors who lost very large amounts of money

46. The sum of the nineteen readable scores is 19 84 1596⋅ = . The sum of all twenty scores is 20 82 1640⋅ = . Therefore, the unreadable score is 1640 1596 44− = .

47. The sum of the six number will be 6 34 204⋅ = .

48. (a) Median. Home prices are likely skewed right. (b) Mode. The variable “major” is qualitative. (c) Mean. The data are quantitative and symmetric. (d) Median. The data are quantitative and skewed. (e) Median. NFL salaries are likely skewed right. (f) Mode. The variable “requested song” is qualitative.

49. (a) Mean: 30 30 45 50 50 50 55 55 60 75 500 5010 10

+ + + + + + + + += = . The mean is $50,000.

Median: The ten data values are in order, so we average the two middle values. 50 50 100 50

2 2+

= = . The median is $50,000.

Mode: The mode is $50,000 (the most frequent salary).

Page 8: Chapter 3

Chapter 3 Numerically Summarizing Data

128

(b) Add $2500 ($2.5 thousand) to each salary to form the new data set. New data set: 32.5, 32.5, 47.5, 52.5, 52.5, 52.5, 57.5, 57.5, 62.5, 77.5

Mean: 32.5 32.5 47.5 52.5 52.5 52.5 57.5 57.5 62.5 77.5 525 52.510 10

+ + + + + + + + += =

The new mean is $52,500. Median: The ten data values are in order, so we average the two middle values.

52.5 52.5 105 52.52 2+

= = . The new median is $52,500.

Mode: The new mode is $52,500 (the most frequent new salary). All three measures of central tendency increased by $2500, which was the amount of the raises.

(c) Multiply each original data value by 1.05 to generate the new data set. New data set: 31.5, 31.5, 47.25, 52.5, 52.5, 52.5, 57.75, 57.75, 63, 78.75

Mean: 31.5 31.5 47.25 52.5 52.5 52.5 57.75 57.75 63 78.75 525 52.510 10

+ + + + + + + + += = .

The new mean is $52,500. Median: The ten data values are in order, so we average the two middle values.

52.5 52.5 105 52.52 2+

= = . The new median is $52,500.

Mode: The new mode is $52,500 (the most frequent new salary). All three measures of central tendency increased by 5%, which was the amount of the

raises. (d) Add $25 thousand to the largest data value to form the new data set.

New data set: 30, 30, 45, 50, 50, 50, 55, 55, 60, 100

Mean: 30 30 45 50 50 50 55 55 60 100 525 52.510 10

+ + + + + + + + += = . The new mean is

$52,500. Median: The ten data values are in order, so we average the two middle values.

50 50 100 502 2+

= = . The mew median is $50,000.

Mode: The new mode is $50,000 (the most frequent salary). The mean was increased by $2500, but the median and mode remained unchanged.

50. (a) 65 70 71 75 95 376 75.25 5

x + + + += = =

(b) The five data values are in order, so the median is the middle value: 71M = . (c) The distribution is skewed right, so the median is the better measure of central

tendency. (d) Adding 4 to each score gives the following new data set: 69, 74, 75, 79, 99.

69 74 75 79 99 396 79.25 5

x + + + += = =

(e) The curved test score mean is 4 greater than the unadjusted test score mean. Adding 4 to each score increased the mean by 4.

Page 9: Chapter 3

Section 3.2 Measures of Dispersion

129

51. The largest data value is 0.94 and the smallest is 0.76. The mean after deleting those two data values is 0.875 grams. (Note: The value 0.94 occurs twice, but we only remove one.) The trimmed mean is more resistant than the regular mean. Note in this case that the trimmed mean 0.875 grams is approximately equal to the median 0.88 grams.

52. 0.76 0.94 1.7Midrange 0.85 grams.2 2+

= = = The midrange is not resistant because it is

computed using the two most extreme data values. 3.2 Measures of Dispersion

1. No. In comparing two populations, the larger the standard deviation, the more dispersed the distribution, provided that the variable of interest in both populations has the same unit of measurement. Since 5 inches 5 2.54 12.7 centimeters≈ × = , the distribution with a standard deviation of 5 inches is in fact more dispersed.

2. In the calculation of the sample variance, the degrees of freedom is 1n − , and is used as the divisor in averaging the squared deviations about the mean.

3. All data values are used in computing the standard deviation, including extreme values. Since a statistic is resistant only if it is not influenced by extreme data values, the standard deviation is not resistant.

4. zero

5. A statistic is biased whenever that statistic consistently overestimates or underestimates a parameter.

6. range 7. The standard deviation is the square root of the variance.

8. mean; mean; spread 9. True 10. True

11. From Section 3.1, Exercise 11, we know 11x = .

( )

( ) ( )

2

2

2

2

2

2

2

Data, Sample Mean, Deviations, Squared Deviations, 20 11 20 11 9 9 8113 11 13 11 2 2 44 11 4 11 7 ( 7) 498 11 8 11 3 ( 3) 9

10 11 10 11 1 ( 1) 1

0 144

i i i

i i

x x x x x x

x x x x

− −− = =− = =− = − − =− = − − =− = − − =

− = − =∑ ∑

( )22 144 36

1 5 1ix x

sn−

= = =− −

∑ ; ( )2

144 36 61 5 1

ix xs

n−

= = = =− −

∑ .

Page 10: Chapter 3

Chapter 3 Numerically Summarizing Data

130

12. From Section 3.1, Exercise 12, we know 82x = .

( )

( ) ( )

2

2

2

2

2

2

2

Data, Sample Mean, Deviations, Squared Deviations, 83 82 83 82 1 1 165 82 65 82 17 ( 17) 28991 82 91 82 9 9 8187 82 87 82 5 5 2584 82 84 82 2 2 4

0 400

i i i

i i

x x x x x x

x x x x

− −− = =

− = − − =− = =− = =− = =

− = − =∑ ∑

( )22 400 100

1 5 1ix x

sn−

= = =− −

∑ ; ( )2

400 100 101 5 1

ix xs

n−

= = = =− −

∑ .

13. From Section 3.1, Exercise 13, we know 9μ = .

( )

( ) ( )

2

2

2

2

2

2

2

Data, Population Mean, Deviations, Squared Deviations, 3 9 3 9 6 ( 6) 366 9 6 9 3 ( 3) 9

10 9 10 9 1 1 112 9 12 9 3 3 914 9 14 9 5 5 25

0 80

i i i

i i

x x x

x x

μ μ μ

μ μ

− −− = − − =− = − − =− = =− = =− = =

− = − =∑ ∑

( )22 80 16

5ixN

μσ

−= = =∑ ;

( )280 16 45

ixN

μσ

−= = = =∑ .

14. From Section 3.1, Exercise 14, we know 15μ = .

( )2

2

2

2

2

2

2

2

Data, Population Mean, Deviations, Squared Deviations, 1 15 1 15 14 ( 14) 196

19 15 19 15 4 4 1625 15 25 15 10 10 10015 15 15 15 0 0 012 15 12 15 3 ( 3) 916 15 16 15 1 1 128 15 28 15 13 13 16913 15 13 15

i i ix x xμ μ μ− −− = − − =− = =− = =− = =− = − − =− = =− = =−

( ) ( )

2

2

2

2 ( 2) 46 15 6 15 9 ( 9) 81

0 576i ix xμ μ

= − − =− = − − =

− = − =∑ ∑

Page 11: Chapter 3

Section 3.2 Measures of Dispersion

131

( )2

2 576 649

ixN

μσ

−= = =∑ ;

( )264 64 89

ixN

μσ

−= = = =∑ .

15. 6 52 13 49 35 25 31 29 31 29 300 3010 10

x + + + + + + + + += = = .

( )2

2

2

2

2

2

2

2

Data, Sample Mean, Deviations, Squared Deviations, 6 30 6 30 24 ( 24) 576

52 30 52 30 22 22 48413 30 13 30 17 ( 17) 28949 30 49 30 19 19 36135 30 35 30 5 5 2525 30 25 30 5 ( 5) 2531 30 31 30 1 1 129 30

i i ix x x x x x− −− = − − =− = =− = − − =− = =− = =− = − − =− = =

( ) ( )

2

2

2

2

29 30 1 ( 1) 131 30 31 30 1 1 129 30 29 30 1 ( 1) 1

0 1764i ix x x x

− = − − =− = =− = − − =

− = − =∑ ∑

( )22 1764 196

1 10 1ix x

sn−

= = =− −

∑ ; ( )2

1764 196 149

ix xs

N−

= = = =∑ .

16. 4 10 12 12 13 21 72 126 6

μ + + + + += = = .

( )

( ) ( )

2

2

2

2

2

2

2

2

Data, Population Mean, Deviations, Squared Deviations, 4 12 4 12 8 ( 8) 64

10 12 10 12 2 ( 2) 412 12 12 12 0 0 012 12 12 12 0 0 013 12 13 12 1 1 121 12 21 12 9 9 81

0 150

i i i

i i

x x x

x x

μ μ μ

μ μ

− −− = − − =− = − − =− = =− = =− = =− = =

− = − =∑ ∑

( )22 150 25

6ixN

μσ

−= = =∑ ;

( )2150 25 5

6ixN

μσ

−= = = =∑ .

Page 12: Chapter 3

Chapter 3 Numerically Summarizing Data

132

17. Range = Largest Data Value – Smallest Data Value = 462 236− = $226. From Section 3.1, Exercise 17, we know $381.75x = .

( )

( ) ( )

2

2

Data, Sample Mean, Deviations, Squared Deviations, 420 381.75 38.25 1463.0625462 381.75 80.25 6440.0625409 381.75 27.25 742.5625236 381.75 145.75 21,243.0625

0 29,888.75

i i i

i i

x x x x x x

x x x x

− −

− = − =∑ ∑

2

2 2( ) 29,888.75 9,962.9 $1 4 1

ix xs

n−

= ≈ =− −

∑ ; 2( ) 29,888.75 $99.81

1 4 1ix x

sn−

= = ≈− −

18. Range = Largest Data Value – Smallest Data Value = 49.26 – 35.34 = $13.92. To calculate the sample variance and the sample standard deviation, we use the computational formula:

2

2

Data value, Data value squared, 35.34 1248.915642.09 1771.568139.43 1554.724938.93 1515.544943.39 1882.692149.26 2426.5476

248.44 10,399.9932

i i

i i

x x

x x= =∑ ∑

( )

( )

( )

2

2

2

2

2

2

1

248.4410,399.9932

66 1

248.4410,399.9932

66 1

22.584 $ ;

$4.75

ii

xx

nn

s

s

=

= ≈

= ≈

∑∑

19. Range = Largest Data Value – Smallest Data Value = 4090 – 2940 = 1150 psi

To calculate the sample variance and the sample standard deviation, we use the computational formula:

2

2

Data value, Data value squared, 3960 15,681,6004090 16,728,1003200 10,240,0003100 9,610,0002940 8,643,6003830 14,668,9004090 16,728,1004040 16,321,6003780 14,288,400

33,020 122,828,600

i i

i i

x x

x x= =∑ ∑

( )

( )

( )

2

2

2

2

2

2

1

33,030122,910,300

99 1

33,030122,910,300

99 1

211,275 psi ;

459.6 psi

ii

xx

nn

s

s

=

= ≈

= ≈

∑∑

Page 13: Chapter 3

Section 3.2 Measures of Dispersion

133

20. Range = Largest Data Value – Smallest Data Value = 282 – 257 = 25 minutes. From Section 3.1, Exercise 20, we know 266 minutes.x = .

( )

( ) ( )

2

2

Data, Sample Mean, Deviations, Squared Deviations, 282 266 16 256270 266 4 16260 266 6 36266 266 0 0257 266 9 81260 266 6 36267 266 1 1

0 426

i i i

i i

x x x x x x

x x x x

− −

−−

− = − =∑ ∑

2 22 2( ) ( )426 42671 min ; 8.4 min

1 7 1 1 7 1i ix x x x

s sn n− −

= = = = = ≈− − − −

∑ ∑

21. Histogram (b) depicts a higher standard deviation because the data is more dispersed, with data values ranging from 30 to 75. Histogram (a)’s data values only range from 40 to 60.

22. (a) III, because it is centered between 52 and 57 and has the greatest amount of dispersion of the three histograms with mean = 53.

(b) I, because it is centered near 53 and its dispersion is consistent with 1.3s = but not with 0.12s = or 9s = .

(c) IV, because it is centered near 53 and it has the least dispersion of the three histograms with mean = 53.

(d) II, because it has a center near 60.

23. Los Angeles ATM fees: Range = Largest Data Value – Smallest Data Value = 2.00 – 0.00 = $2.00.

2

2

Data value, Data value squared, 2.00 41.50 2.251.50 2.251.00 11.50 2.252.00 40.00 02.00 4

11.5 19.75

i i

i i

x x

x x= =∑ ∑

( )

( )

2

2

2

1

11.519.75

88 1

$0.68

ii

xx

nn

s−

=

=

∑∑

Page 14: Chapter 3

Chapter 3 Numerically Summarizing Data

134

New York City ATM fees: Range = Largest Data Value – Smallest Data Value = 1.50 – 0.00 = $1.50.

2

2

Data value, Data value squared, 1.50 2.251.00 11.00 11.25 1.56251.25 1.56251.50 2.251.00 10.00 0

8.5 10.625

i i

i i

x x

x x= =∑ ∑

( )

( )

2

2

2

1

8.510.625

88 1

$0.48

ii

xx

nn

s−

=

=

∑∑

Based on both the range and the standard deviation, ATM fees in Los Angeles have more dispersion than ATM fees in New York. Both the range and the standard deviation for Los Angeles are larger.

24. Reaction Time to Blue: Range = Largest Data Value – Smallest Data Value = 0.841 – 0.267 = 0574 sec.

2

2

Data value, Data value squared, 0.582 0.3387240.481 0.2313610.841 0.7072810.267 0.0712890.685 0.4692250.45 0.2025

3.306 2.02038

i i

i i

x x

x x= =∑ ∑

( )

( )

2

2

2

1

3.3062.02038

66 1

0.1994 sec.

ii

xx

nn

s−

=

=

∑∑

Reaction Time to Red: Range = Largest Data Value – Smallest Data Value = 0.542 – 0.402 = 0.140 sec.

2

2

Data value, Data value squared, 0.408 0.1664640.407 0.1656490.542 0.2937640.402 0.1616040.456 0.2079360.533 0.284089

2.748 1.279506

i i

i i

x x

x x= =∑ ∑

( )

( )

2

2

2

1

2.7481.279506

66 1

0.0647 sec.

ii

xx

nn

s−

=

=

∑∑

Based on both the range and the standard deviation, the reaction times for blue have more variability than those for red. Both the range and the standard deviation for blue are larger.

Page 15: Chapter 3

Section 3.2 Measures of Dispersion

135

25. (a) We use the computational formula: 650ix =∑ ; 2 47, 474ix =∑ ; 9N = ;

( ) ( )( )

2

2

2

22

65047,474

9 58.8 beats/min.9

ii

xx

NN

σ− −

= = ≈

∑∑;

( ) ( )2 2

2 65047,474

99

7.7 beats/min.i

i

xx

NN

σ− −

= = ≈

∑∑

(b) Samples, sample variances, and sample standard deviations will vary. (c) Answers will vary.

26. (a) We use the computational formula: 238ix =∑ ; 2 7778ix =∑ ; 9N = ;

( ) ( )2

2

2

2 2

2387778

9 164.9 min.9

ii

xx

NN

σ− −

= = ≈

∑∑;

( ) ( )2 2

2 2387778

99

12.8 min.i

i

xx

NN

σ− −

= = ≈

∑∑

(b) Samples, sample variances, and sample standard deviations will vary. (c) Answers will vary.

27. (a) We use the computational formula: 157ix =∑ ; 2 2107ix =∑ ; 18N = ;

( ) ( )2

2

2

2 2

1572107

18 41.0 goals18

ii

xx

NN

σ− −

= = ≈

∑∑;

( ) ( )2 2

2 1572107

1818

6.4 goalsi

i

xx

NN

σ− −

= = ≈

∑∑

(b) Samples, sample variances, and sample standard deviations will vary. (c) Answers will vary.

28. (a) Range = Largest Data Value – Smallest Data Value = 92.552 – 82.087 = 10.465 hours. For the population variance and standard deviation, we use the computational formula:

606.007ix =∑ ; 2 52,561.3666ix =∑ ; 7N = ;

( ) ( )2

2

2

2 2

606.00752,561.3666

7 13.981 hours7

ii

xx

NN

σ− −

= = ≈

∑∑;

( ) ( )2 2

2 606.00752,561.3666

77

3.739 hoursi

i

xx

NN

σ− −

= = ≈

∑∑

Page 16: Chapter 3

Chapter 3 Numerically Summarizing Data

136

(b) Range = Largest Data Value – Smallest Data Value = 3687 – 3278 = 409 km. For the population variance and standard deviation, we use the computational formula:

24, 491ix =∑ ; 2 85,825,565ix =∑ ; 7N = ;

( ) ( )2

2

2

2 2

24, 49185,825,565

7 19,793.3 km7

ii

xx

NN

σ− −

= = ≈

∑∑;

( ) ( )2 2

2 24, 49185,825,565

77

140.7 kmi

i

xx

NN

σ− −

= = ≈

∑∑

(c) Range = Largest Data Value – Smallest Data Value = 7.617 – 1.017 = 6.600 min. For the population variance and standard deviation, we use the computational formula:

39.667ix =∑ ; 2 255.510823ix =∑ ; 7N = ;

( ) ( )2

2

2

2 2

39.667255.510823

7 4.390 min7

ii

xx

NN

σ− −

= = ≈

∑∑;

( ) ( )2 2

2 39.667255.510823

77

2.095 mini

i

xx

NN

σ− −

= = ≈

∑∑

(d) Range = Largest Data Value – Smallest Data Value = 41.65 – 39.56 = 2.09 km/h. For the population variance and standard deviation, we use the computational formula:

282.94ix =∑ ; 2 11, 439.397ix =∑ ; 7N = ;

( ) ( )( )

2

2

2

22

282.9411,439.397

7 0.423 km/h7

ii

xx

NN

σ− −

= = ≈

∑∑;

( ) ( )2 2

2 282.9411,439.397

77

0.651 km/hi

i

xx

NN

σ− −

= = ≈

∑∑

29. (a) Ethan: 9 24 8 9 5 8 9 10 8 10 100 10 fish10 10

ixN

μ + + + + + + + + += = = =∑ ;

Range = Largest Data Value – Smallest Data Value = 24 – 5 = 19 fish

Drew: 15 2 3 18 20 1 17 2 19 3 100 10 fish10 10

ixN

μ + + + + + + + + += = = =∑ ;

Range = Largest Data Value – Smallest Data Value = 20 – 1 = 19 fish Both fishermen have the same mean and range, so these values do not indicate any

differences between their catches per day.

Page 17: Chapter 3

Section 3.2 Measures of Dispersion

137

(b) Ethan: 100ix =∑ ; 2 1236ix =∑ ; 10N =

( ) ( )2 2

2 1001236

1010

4.9 fishi

i

xx

NN

σ− −

= = ≈

∑∑

Drew: 100ix =∑ ; 2 1626ix =∑ ; 10N =

( ) ( )2 2

2 1001626

1010

7.9 fishi

i

xx

NN

σ− −

= = ≈

∑∑

Yes, now there appears to be a difference in the two fishermen’s records. Ethan had a more consistent fishing record, which is indicated by the smaller standard deviation.

(c) Answers will vary. One possibility follows: The range is limited as a measure of dispersion because it does not take all of the data values into account. It is obtained by using only the two most extreme data values. Since the standard deviation utilizes all of the data values, it provides a better overall representation of dispersion.

30. (a) Range = Largest Data Value – Smallest Data Value = 349 – 180 = 169 lb

8591ix =∑ ; 2 2,332,051ix =∑ ; 33N = ; 8591 260.3 lb33

ixN

μ = = ≈∑

( ) ( )2 2

2 85912,332,051

3333

53.8 lbi

i

xx

NN

σ− −

= = ≈

∑∑

(b) Range = Largest Data Value – Smallest Data Value = 306 – 177 = 129 lb

5889ix =∑ ; 2 1, 481,833ix =∑ ; 24N = ; 5889 245.4 lb24

ixN

μ = = ≈∑

( ) ( )2 2

2 58891,481,833

2424

39.2 lbi

i

xx

NN

σ− −

= = ≈

∑∑

(c) The weights of the offense have the greater dispersion. The offense has both the larger range and the larger standard deviation.

31. Range = Largest Data Value – Smallest Data Value = 73 – 28 = 45. For the sample variance and sample standard deviation, we use the computational formula:

2045ix =∑ ; 2 109,151ix =∑ ; 40n = ;

( ) ( )2

2

2

2

1

2045109,151

40 118.040 1

ii

xx

nn

s−

−= = ≈

∑∑;

( )22045109,151

4040 1

10.9s−

−= ≈

Page 18: Chapter 3

Chapter 3 Numerically Summarizing Data

138

32. Range = Largest Data Value – Smallest Data Value = 10.96 – 3.01 = 7.95 million shares. For the sample variance and sample standard deviation, we use the computational formula:

205.92ix =∑ ; 2 1355.6208ix =∑ ; 35n = ;

( ) ( )2

2

2

2 2

1

205.921355.6208

35 4.238 million shares35 1

ii

xx

nn

s−

−= = ≈

∑∑;

( )2205.921355.6208

3535 1

2.059 million sharess−

−= ≈

33. (a) We use the computational formula: 43.71ix =∑ ; 2 38.2887ix =∑ ; 50n = ;

( ) ( )2 2

2 43.7138.2887

501 50 1

0.04 gi

i

xx

nn

s− −

− −= = ≈

∑∑

(b) The histogram is approximately symmetric, so the Empirical Rule is applicable. (c) Since 0.79 is exactly 2 standard deviations below the mean [0.79 = 0.87 – 2(0.04)] and

0.95 is exactly 2 standard deviations above the mean [0.95 = 0.87 + 2(0.04)], the Empirical Rule predicts that approximately 95% of the M&Ms will weigh between 0.79 and 0.95 grams.

(d) All except 1 of the M&Ms weigh between 0.79 and 0.95 grams. Thus, the actual percentage is 49/50 = 98%.

(e) Since 0.91 is exactly 1 standard deviation above the mean [0.91 = 0.87 + 0.04], the Empirical Rule predicts that 13.5% + 2.35% + 0.15% = 16% of the M&Ms will weigh more than 0.91 grams.

(f) Seven of the M&Ms weigh more than 0.91 grams (not including the ones that weigh exactly 0.91 grams). Thus, the actual percentage is 7/50 = 14%.

34. (a) We use the computational formula: 4582ix =∑ ; 2 478,832ix =∑ ; 44n = ;

( ) ( )2 2

2 4582478,832

441 44 1

6 sec.i

i

xx

nn

s− −

− −= = ≈

∑∑

(b) The histogram is approximately symmetric, so the Empirical Rule is applicable. (c) Since 92 is exactly 2 standard deviations below the mean [92 = 104 – 2(6)] and 116 is

exactly 2 standard deviations above the mean [116 = 92 + 2(6)], the Empirical Rule predicts that approximately 95% of the eruptions should last between 92 and 116 sec.

(d) All except 3 of the observed eruptions lasted between 92 and 116 seconds. Thus, the actual percentage is 41/ 44 93%≈ .

(e) Since 98 is exactly 1 standard deviation below the mean [98 = 104 – 6], the Empirical Rule predicts that 13.5% + 2.35% + 0.15% = 16% of the eruptions will last less than 98 sec.

(f) Five of the observed eruptions lasted less than 98 seconds. Thus, the actual percentage is 5 / 44 11%≈ .

Page 19: Chapter 3

Section 3.2 Measures of Dispersion

139

35. Car 1: 23352; 755,712; 15i ix x n= = =∑ ∑

Measures of Center: 3352 223.5 miles15

ixx

n= = ≈∑ ; Mode: none;

223 milesM = (the 8th value in the ordered data) Measures of Dispersion:

Range = Largest Data Value – Smallest Data Value = 271 – 178 = 93 miles; ( ) ( )2

2

2

2 2

1

3352755,712

15 475.1 miles15 1

ii

xx

nn

s−

−= = ≈

∑∑;

( )23352755,712

1515 1

21.8 miless−

−= ≈

Car 2: 23558; 877,654; 15i ix x n= = =∑ ∑

Measures of Center: 3558 237.2 miles15

ixx

n= = =∑ ; Mode: none;

230 milesM = (the 8th value in the ordered data) Measures of Dispersion:

Range = Largest Data Value – Smallest Data Value = 326 – 160 = 166 miles; ( ) ( )2

2

2

2 2

1

3558877,654

15 2406.9 miles15 1

ii

xx

nn

s−

−= = ≈

∑∑;

( )23558877,654

1515 1

49.1 miless−

−= ≈

The distribution for Car 1 is symmetric since the mean and median are approximately equal. The distribution for Car 2 is skewed right slightly since the mean is larger than the median. Both distributions have similar measures of center, but Car 2 has more dispersion which can be seen by its larger range, variance, and standard deviation. This means that the distance Car 1 can be driven on 10 gallons of gas is more consistent. Thus, Car 1 is probably the better car to buy.

36. Fund A: 261; 356.12; 20i ix x n= = =∑ ∑

Measures of Center: 61 3.05 miles20

ixx

n= = ≈∑ ; Mode: none; 3.0 3.1 3.05

2M +

= = ;

Page 20: Chapter 3

Chapter 3 Numerically Summarizing Data

140

Measures of Dispersion: Range = Largest Data Value – Smallest Data Value = ( )8.6 2.3 10.9− − = ;

( ) ( )2

2

2

2

1

61356.12

20 8.9520 1

ii

xx

nn

s−

−= = ≈

∑∑;

( )261356.12

2020 1

2.99s−

−= ≈

Fund B: 23558; 877,654; 15i ix x n= = =∑ ∑

Measures of Center: 68.1 3.4120

ixx

n= = ≈∑ ; Mode = 4.3; 3.5 3.8 3.65

2M +

= =

Measures of Dispersion: Range = Largest Data Value – Smallest Data Value = ( )12.9 6.7 19.6− − = ;

( ) ( )2

2

2

2

1

68.1825.27

20 31.2319 1

ii

xx

nn

s−

−= = ≈

∑∑;

( )268.1825.27

2019 1

5.59s−

−= ≈

The distribution for Mutual Fund A is symmetric since the mean and median are equal. Likewise, the distribution for Mutual Fund B is approximately symmetric (but skewed left slightly since the mean is smaller than the median). Mutual Fund B has a larger measure of center and greater dispersion which can be seen by its larger range, variance, and standard deviation. This means that the rate of return on Mutual Fund A is generally lower, but more consistent. The rate of return o Mutual Fund B is generally higher, but more dispersed.

37. (a) Financial Stocks: 2502.9; 9591.0556; 32i ix x n= = =∑ ∑

502.9 15.71632

ixx

n= = ≈∑ ; 15.92 16.26 16.09

2M +

= =

Energy Stocks: 2719.4; 21, 213.3104; 32i ix x n= = =∑ ∑

719.4 22.48132

ixx

n= = ≈∑ ; 19.50 19.67 19.585

2M +

= =

Energy Stocks have higher mean and median rates of return.

(b) Financial Stocks:

( ) ( )2

2

2

1

502.99591.0556

32 7.37832 1

ii

xx

nn

s−

−= = ≈

∑∑

Energy Stocks:

( ) ( )2

2

2

1

719.421,213.3104

32 12.75132 1

ii

xx

nn

s−

−= = ≈

∑∑

Energy Stocks are riskier since they have a larger standard deviation.

Page 21: Chapter 3

Section 3.2 Measures of Dispersion

141

38. (a) American League: 2166.26; 715.1876; 40i ix x n= = =∑ ∑

166.26 4.15740

ixx

n= = ≈∑ ; 4.18 4.21 4.195

2M +

= =

National League: 2149.93; 576.4971; 40i ix x n= = =∑ ∑

149.93 3.74840

ixx

n= = ≈∑ , 3.84 3.87 3.855

2M +

= =

The American League has both the higher mean and median earned-run average.

(b) American League:

( ) ( )2

2

2

1

166.26715.1876

40 0.78740 1

ii

xx

nn

s−

−= = ≈

∑∑

National League:

( ) ( )2

2

2

1

149.93576.4971

40 0.61040 1

ii

xx

nn

s−

−= = ≈

∑∑

The American League has more dispersion.

39. (a) Since 70 is exactly 2 standard deviations below the mean [70 = 100 – 2(15)] and 130 is exactly 2 standard deviations above the mean [130 = 100 + 2(15)], the Empirical Rule predicts that approximately 95% of people has an IQ score between 70 and 130.

(b) Since about 95% of people has an IQ score between 70 and 30, then approximately 5% of people has an IQ score either less than 70 or greater than 130.

(c) Approximately 5% / 2 2.5%= of people has an IQ score greater than 130.

40. (a) Since 404 is exactly 1 standard deviation below the mean [404 = 518 – 114] and 632 is exactly 1 standard deviation above the mean [632 = 518 + 114], the Empirical Rule predicts that approximately 68% of SAT scores is between 404 and 632.

(b) Since about 68% of SAT scores is between 404 and 632, then approximately 32% of people of SAT scores is either less than 404 or greater than 632.

(c) Since 746 is exactly 2 standard deviations above the mean [746 = 518 + 2(114)], the Empirical Rule predicts that approximately 2.5% of SAT scores is greater than 746.

41. (a) Approximately 95% of the data will be within 2 standard deviations of the mean. Now, 325 – 2(30) = 265 and 325 + 2(30) = 385. Thus, about 95% of pairs of kidneys will be between 265 and 385 grams.

(b) Since 235 is exactly 3 standard deviations below the mean [235 = 325 – 3(30)] and 415 is exactly 3 standard deviations above the mean [415 = 325 + 3(30)], the Empirical Rule predicts that about 99.7% of pairs of kidneys weighs between 235 and 415 grams.

(c) Since about 99.7% of pairs of kidneys weighs between 235 and 415 grams, then about 0.3% of pairs of kidneys weighs either less than 235 or more than 415 grams.

(d) Since 295 is exactly 1 standard deviations below the mean [295 = 325 – 30] and 385 is exactly 2 standard deviations above the mean [385 = 325 + 2(30)], the Empirical Rule predicts that approximately 34% + 34% + 13.5% = 81.5% of pairs of kidneys weighs between 295 and 385 grams.

Page 22: Chapter 3

Chapter 3 Numerically Summarizing Data

142

42. (a) Approximately 68% of the data will be within 1 standard deviation of the mean. Now, 4 – 0.007 = 3.993 and 4 + 0.007 = 4.007. Thus, about 68% of bolts manufactured will be between 3.933 and 4.007 inches long.

(b) Since 3.986 is exactly 2 standard deviations below the mean [3.986 = 4 – 2(0.007)] and 4.014 is exactly 2 standard deviations above the mean [4.014 = 4 + 2(0.007)], the Empirical Rule predicts that about 95% of bolts manufactured will be between 3.986 and 4.014 inches long.

(c) Since about 95% of bolts is between 3.986 and 4.014 inches, then about 5% of bolts manufactured will either be shorter than 3.986 or longer than 4.014 inches. That is, about 5% of the bolts will be discarded.

(d) Since 4.007 is exactly 1 standard deviations above the mean [4.007 = 4 + 0.007] and 4.021 is exactly 3 standard deviations above the mean [4.021 = 4 + 3(0.007)], the Empirical Rule predicts that approximately 13.5% + 2.35% = 15.85% of bolts manufactured will be between 4.007 and 4.021 inches long.

43. (a) By Chebyshev’s inequality, at least 2 2

1 11 100% 1 100% 88.9%3k

⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ ≈⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

of

gasoline prices has prices within 3 standard deviations of the mean.

(b) By Chebyshev’s inequality, at least 2 2

1 11 100% 1 100% 84%2.5k

⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

of

gasoline prices has prices within k = 2.5 standard deviations of the mean. Now, 1.37 2.5(0.05) 1.245− = and 1.37 2.5(0.05) 1.495+ = . Thus, the gasoline prices that are within 2.5 standard deviations of the mean are from $1.245 to $1.495.

(c) Since 1.27 is exactly k = 2 standard deviations below the mean [1.27 = 1.37 – 2(0.05)] and 1.47 is exactly k = 2 standard deviations above the mean [1.47 = 1.37 + 2(0.05)],

Chebyshev’s theorem predicts that at least %75%100211%10011 22 =⋅⎟⎠⎞

⎜⎝⎛ −=⋅⎟

⎠⎞

⎜⎝⎛ −

k of

gas stations has prices between $1.27 and $1.47 per gallon.

44. (a) By Chebyshev’s inequality, at least 2 2

1 11 100% 1 100% 75%2k

⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

of

commuters in Boston has a commute time within 2 standard deviations of the mean.

(b) By Chebyshev’s inequality, at least 2 2

1 11 100% 1 100% 55.6%1.5k

⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ ≈⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

of

commuters in Boston has a commute time within 1.5 standard deviations of the mean. Now, 27.3 1.5(8.1) 15.15− = and 27.3 1.5(8.1) 39.45+ = . Thus, the commute times within 1.5 standard deviations of the mean are from 15.15 to 39.45 minutes.

(c) Since 3 is exactly k = 3 standard deviations below the mean [3 = 27.3 – 3(8.1)] and 51.6 is exactly k = 3 standard deviations above the mean [51.6 = 27.3 + 3(8.1)],

Chebyshev’s theorem predicts that at least 2 2

1 11 100% 1 100% 88.9%3k

⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ ≈⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

of

gas stations has prices between $1.27 and $1.47 per gallon.

Page 23: Chapter 3

Section 3.2 Measures of Dispersion

143

45. When calculating the variability in team batting averages, we are finding the variability of means. When calculating the variability of all players, we are finding the variability of individuals. Since there is more variability among individuals than among means, the teams will have less variability.

46. (a) Range = Largest Data Value – Smallest Data Value = 75 – 30 = $45 thousand. For the population variance and standard deviation, we use the computational formula:

500ix =∑ ; 2 26,600ix =∑ ; 10N = ;

( ) ( )2 2

2

2 2

50026,600

10 160 thousand $10

ii

xx

NN

σ− −

= = =

∑∑;

( ) ( )2 2

2 50026,600

10 $12.6 thousand10

ii

xx

NN

σ− −

= = ≈

∑∑

(b) Add $2500 ($2.5 thousand) to each salary to form the new data set. New data set: 32.5, 32.5, 47.5, 52.5, 52.5, 52.5, 57.5, 57.5, 62.5, 77.5

Range = Largest Data Value – Smallest Data Value = 77.5 – 32.5 = $45 thousand. 525ix =∑ ; 2 29,162.5ix =∑ ; 10N = ;

( ) ( )2 2

2

2 2

52529,162.5

10 160 thousand $10

ii

xx

NN

σ− −

= = =

∑∑;

( ) ( )2 2

2 52529,162.5

10 $12.6 thousand10

ii

xx

NN

σ− −

= = ≈

∑∑

All three measures of variability remain the same. (c) Multiply each original data value by 1.05 to generate the new data set.

New data set: 31.5, 31.5, 47.25, 52.5, 52.5, 52.5, 57.75, 57.75, 63, 78.75 Range = Largest Data Value – Smallest Data Value = 78.75 – 31.5 = $47.25 thousand.

525ix =∑ ; 2 29,326.5ix =∑ ; 10N = ;

( ) ( )2 2

2

2 2

52529,326.5

10 176.4 thousand $10

ii

xx

NN

σ− −

= = =

∑∑;

( ) ( )2 2

2 52529,326.5

10 $13.3 thousand10

ii

xx

NN

σ− −

= = ≈

∑∑

All three measures of variability are larger than original, showing greater dispersion of salaries. (Note that R and σ are each 5% larger than original, and 2σ is 1.1025 times larger than original which is 2(1.05) .)

Page 24: Chapter 3

Chapter 3 Numerically Summarizing Data

144

(d) Add $25 thousand to the largest data value to form the new data set. New data set: 30, 30, 45, 50, 50, 50, 55, 55, 60, 100

Range = Largest Data Value – Smallest Data Value = 100 – 30 = $70 thousand. 525ix =∑ ; 2 30,975ix =∑ ; 10N = ;

( ) ( )2 2

2

2 2

52530,975

10 341.25 thousand $10

ii

xx

NN

σ− −

= = =

∑∑;

( ) ( )2 2

2 52530,975

10 $18.5 thousand10

ii

xx

NN

σ− −

= = ≈

∑∑

All three measures of variability are significantly larger than original.

47. Sample size of 5: All data recorded correctly: 5.3s ≈ . 106 recorded incorrectly as 160: 27.9s ≈ . Sample size of 12: All data recorded correctly: 14.7s ≈ . 106 recorded incorrectly as 160: 22.7s ≈ . Sample size of 30: All data recorded correctly: 15.9s ≈ . 106 recorded incorrectly as 160: 19.2s ≈ . As the sample size increases, the impact of the misrecorded data value on the standard deviation decreases.

48. We use the computational formula: 312ix =∑ ; 2 24,336ix =∑ ; 4n = ;

( ) ( )2 2

2 31224,336

4 01 4 1

ii

xx

nsn

− −= = =

− −

∑∑

If all values in a data set are identical, then there is zero variance.

49. (a) The coefficient of variation for blood pressure before exercise is =⋅ %100121

1.14 11.65%,

while the coefficient of variation for blood pressure after exercise is =⋅ %1009.1351.18

13.32%. There is more variability in systolic blood pressure after exercise. (b) The coefficient of variation for free calcium concentration in the group of people with

normal blood pressure is =⋅ %1009.1071.16 14.92%, while the coefficient of variation for

free calcium concentration in the group of people with high blood pressure is

=⋅ %1002.1687.31 18.85%. There is more variability in free calcium concentration in the

high blood pressure group.

Page 25: Chapter 3

Section 3.2 Measures of Dispersion

145

50. From Section 3.1, Exercise 17, we know $381.75x = .

( )

Data, Sample Mean, Deviations, Squared Deviations, 420 381.75 38.25 38.25462 381.75 80.25 80.25409 381.75 27.25 27.25236 381.75 145.75 145.75

0 291.50

i i i

i i

x x x x x x

x x x x

− −

−− = − =∑ ∑

| | $291.50MAD $72.875

4ix xn−

= = =∑ , which is somewhat less than the sample standard

deviation of $99.81s ≈ .

51. (a) Skewness = 3(50 40) 310−

= . The distribution is skewed to the right.

(b) Skewness = 3(100 100) 015−

= . The distribution is perfectly symmetric.

(c) Skewness = 3(400 500) 2.5120−

= − . The distribution is skewed to the left.

(d) Skewness = 3(0.8742 0.88) 0.440.0397

−≈ − . The distribution is slightly skewed to the left.

(e) Skewness = 3(104.136 104) 0.076.249

−≈ . The distribution is symmetric.

52. (a) Reading from the graph, the average annual return for a portfolio that is 10% foreign is 14.9%. The level of risk is 14.7%.

(b) To best minimize risk, 30% should be invested in foreign stocks. According to the graph, a 30% investment in foreign stocks has the smallest standard deviation (level of risk) at about 14.3%.

(c) Answers will vary. One possibility follows: The risk decreases because a portfolio including foreign stocks is more diversified.

(d) According to Chebyshev’s theorem, at least 75% of returns are within k = 2 standard deviations of the mean. Thus, at least 75% of returns are between x ks− = 15.8 2(14.3) 12.8%− = − and 15.8 2(14.3) 44.4%x ks+ = + = . By Chebyshev’s theorem, at least 88.9% of returns are within k = 3 standard deviations of the mean, Thus, at least 88.9% of returns are between 15.8 3(14.3) 27.1%x ks− = − = − and

15.8 3(14.3) 58.7%x ks+ = + = . An investor should not be surprised if she has a negative rate of return. Chebyshev’s theorem indicates that a negative return is fairly common.

Page 26: Chapter 3

Chapter 3 Numerically Summarizing Data

146

Consumer Reports®: Basement Waterproofing Coatings

(a) 546.2 91.03 g6

iA

xx

n= = ≈∑ ; 90.9 91.2 182.1 91.05 g

2 2AM += = = ;

There are 2 modes: 90.8 g and 91.2 g (each value occurs twice). (b) 546.2ix =∑ ; 2 49,722.66ix =∑ ; 6n = ;

( ) ( )2 2

2 546.249,722.66

6 0.23 g1 6 1

ii

A

xx

nsn

− −= = ≈

− −

∑∑

(c) 522.3 87.05 g6

iB

xx

n= = =∑ ; 87.0 87.1 174.1 87.05 g

2 2BM += = =

There are 2 modes: 87.0 g and 87.2 g (each value occurs twice).

(d) 522.2ix =∑ ; 2 45, 448.9ix =∑ ; 6n = ;

( ) ( )2 2

2 522.345,466.33

6 0.15 g1 6 1

ii

B

xx

nsn

− −= = ≈

− −

∑∑

(e) A B 86 887 0 0 1 2 28889

9 8 8 903 2 2 91

Yes, there appears to be a difference in these two product’s ability to mitigate water seepage. All 6 of the measurements for product B are less than the measurements for product A. Although it is not clear whether there is any practical difference in these two products ability to mitigate water seepage, product B appears to do a better job. 3.3 Measures of Central Tendency and Dispersion from Grouped Data 1. When we approximate the mean and standard deviation from grouped data, we assume that

all of the data points within each group can be approximated by the midpoint of that group.

2. ixx

n= ∑ is a weighted average in which the value of each weight is one.

Page 27: Chapter 3

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data

147

3. Class Midpoint, ix Frequency, if i ix f x ix x− ( )2i ix x f−

10 – 19

10 202

15+= 8 120 32.8333 17.8333− 2544.2127

20 – 29

20 302

25+= 16 400 32.8333 7.8333− 981.7694

30 – 39 35 21 735 32.8333 2.1667 98.5864 40 – 49 45 11 495 32.8333 12.1667 1628.3145 50 – 59 55 4 220 32.8333 22.1667 1965.4504 60if =∑ 1970i ix f =∑ ( )2 7218.3334ix x f− =∑

1970 32.8333 $32.8360

i i

i

x fx

f= = ≈ ≈∑∑

; ( )( )

27185.3334 $11.06

60 11i

i

x x fs

f−

= = ≈−−

∑∑

4. Class Midpoint, ix Frequency, if i ix f μ ix μ− ( )2i ix fμ−

1 – 5

1 62

3.5+= 11 38.5 14.5714 11.0714− 1348.3349

6 – 10

6 112

8.5+= 0 0 14.5714 6.0714− 0

11 – 15 13.5 5 67.5 14.5714 1.0714− 5.7395 16 – 20 18.5 6 111 14.5714 3.9286 92.6034 21 – 25 23.5 1 23.5 14.5714 8.9286 79.7199 26 – 30 28.5 2 57 14.5714 13.9286 388.0118 31 – 35 33.5 1 33.5 14.5714 18.9286 358.2919 36 – 40 38.5 2 77 14.5714 23.9286 1145.1558 28if =∑ 408i ix f =∑ ( )2 3417.8572ix fμ− =∑

408 14.5714 14.6 points28

i i

i

x ff

μ = = ≈ ≈∑∑

; ( )2

3417.8572 11.0 points28

i

i

x ffμ

σ−

= = ≈∑∑

Page 28: Chapter 3

Chapter 3 Numerically Summarizing Data

148

5. Class Midpoint, ix Frequency, if i ix f μ ix μ− ( )2i ix fμ−

0 – 9

0 102

5+= 31 155 17.3 12.3− 4689.99

10 – 19

10 202

15+= 39 585 17.3 2.3− 206.31

20 – 29 25 17 425 17.3 7.7 1007.93 30 – 39 35 6 210 17.3 17.7 1879.74 40 – 49 45 4 180 17.3 27.7 3069.16 50 – 59 55 2 110 17.3 37.7 2842.58 60 – 69 65 1 65 17.3 47.7 2275.29 100if =∑ 1730i ix f =∑ ( )2 15,971ix fμ− =∑

1730 17.3 days100

i i

i

x ff

μ = = =∑∑

; ( )2

15,971 12.6 days100

i

i

x ffμ

σ−

= = ≈∑∑

6. Class Midpoint, ix Frequency, if i ix f x ix x− ( )2i ix x f−

0 – 9

0 102

5+= 24 120 21.6 16.6− 6613.44

10 – 19

10 202

15+= 14 210 21.6 6.6− 609.84

20 – 29 25 39 975 21.6 3.4 450.84 30 – 39 35 18 630 21.6 13.4 3232.08 40 – 49 45 5 225 21.6 23.4 2737.8 100if =∑ 2160i ix f =∑ ( )2 13,644ix x f− =∑

2160 21.6 hr/wk100

i i

i

x fx

f= = =∑∑

; ( )( )

213,644 11.7 hr/wk100 11

i

i

x x fs

f−

= = ≈−−

∑∑

Page 29: Chapter 3

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data

149

7. Class Midpoint, ix Frequency, if

(in millions) i ix f μ ix μ− ( )2i ix fμ−

25 – 34

25 352

30+= 28.9 867 44.4695 14.4695− 6050.6898

35 – 44

35 452

40+= 35.7 1428 44.4695 4.4695− 713.1586

45 – 54 50 35.1 1755 44.4695 5.5305 1073.5837 55 – 64 60 24.7 1482 44.4695 15.5305 5957.5518 124.4if =∑ 5532i ix f =∑ ( )2 13,794.9839ix fμ− =∑

5532 44.4695 44.5 yrs124.4

i i

i

x ff

μ = = ≈ ≈∑∑

; ( )2

13,794.9839 10.5 yrs124.4

i

i

x ffμ

σ−

= = ≈∑∑

8. Class Midpt, ix Freq, if i ix f μ ix μ− ( )2

i ix fμ−

0 – 0.9 0 1

20.5+

= 539 269.5 2.7627 2.2627− 2759.5783

1.0 – 1.9

1 22

1.5+= 1 1.5 2.7627 1.2627− 1.5944

2.0 – 2.9 2.5 1336 3340 2.7627 0.2627− 92.1991 3.0 – 3.9 3.5 1363 4770.5 2.7627 0.7373 740.9422 4.0 – 4.9 4.5 289 1300.5 2.7627 1.7373 872.2631 5.0 – 5.9 5.5 21 115.5 2.7627 2.7373 157.3490 6.0 – 6.9 6.5 2 13 2.7627 3.7373 27.9348 3551if =∑ 9810.5i ix f =∑ ( )2 4651.8609ix fμ− =∑

9810.5 2.7627 2.83551

i i

i

x ff

μ = = ≈ ≈∑∑

; ( )2

4651.8609 1.13551

i

i

x ffμ

σ−

= = ≈∑∑

Page 30: Chapter 3

Chapter 3 Numerically Summarizing Data

150

9. (a) Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−

50 – 59

50 602

55+= 1 55 80.9350 25.9350− 672.6242

60 – 69

60 702

65+= 308 20,020 80.9350 15.9350− 78,208.6613

70 – 79 75 1519 113,925 80.9350 5.9350 53,505.5977 80 – 89 85 1626 138,210 80.9350 4.0650 26,868.3900 90 – 99 95 503 47,785 80.9350 14.0650 99,505.5851 100 – 109 105 11 1155 80.9350 24.0650 6370.3665 3968if =∑ 321,150i ix f =∑ ( )2 265,131.2248ix fμ− =∑

321,150 80.9350 80.9 F3968

i i

i

x ff

μ = = ≈ ≈ °∑∑

; ( )2

265,131.2248 8.2 F3968

i

i

x ffμ

σ−

= = ≈ °∑∑

(b) 18001600140012001000 800 600 400 200 0 50 60 70 80 90 100 110

Temperature

Freq

uenc

y

High Temperatures in August in Chicago

(c) By the Empirical Rule, 95% of the observations will be within 2 standard deviations of the mean. Now, 2 80.9 2(8.2) 64.5μ σ− = − = and 2 80.9 2(8.2) 97.3μ σ+ = + = , so 95% of the of days in August will have temperatures between 64.5 F° and 97.3 F° .

Page 31: Chapter 3

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data

151

10. (a) Class Midpoint, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−

20 – 24

20 252

22.5+= 4 90 37.8333 15.3333− 940.4404

25 – 29

25 302

27.5+= 15 412.5 37.8333 10.3333− 1601.6563

30 – 34 32.5 27 877.5 37.8333 5.3333− 767.9904 35 – 39 37.5 40 1500 37.8333 0.3333− 4.4436 40 – 44 42.5 28 1190 37.8333 4.6667 609.7865 45 – 49 47.5 15 712.5 37.8333 9.6667 1401.6763 50 – 54 52.5 4 210 37.8333 14.6667 860.4484 55 – 59 57.5 2 115 37.8333 19.6667 773.5582 135if =∑ 5107.5i ix f =∑ ( )2 6960.0001ix fμ− =∑

5107.5 37.8333 37.8 in135

i i

i

x ff

μ = = ≈ ≈∑∑

; ( )2

6960.0001 7.2 in135

i

i

x ffμ

σ−

= = ≈∑∑

(b) Annual Rainfall for St. Louis, MO4540353025201510 5 0 20 25 30 35 40 45 50 60 65

Rainfall (inches)

Freq

uenc

y

(c) By the Empirical Rule, 95% of the observations will be within 2 standard deviations of the mean. Now, 2 37.8 2(7.2) 23.4μ σ− = − = and 2 37.8 2(7.2) 52.2μ σ+ = + = , so 95% of annual rainfalls in St. Louis will be between 23.4 and 52.2 inches.

Page 32: Chapter 3

Chapter 3 Numerically Summarizing Data

152

11. (a) Class Midpoint, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−

15 – 19

15 202

17.5+= 93 1627.5 32.2721 14.7721− 20,293.99

20 – 24

20 252

22.5+= 511 11,497.5 32.2721 9.7721− 48,787.40

25 – 29 27.5 1628 44,770 32.2721 4.7721− 37,074.34 30 – 34 32.5 2832 92,040 32.2721 0.2279 147.09 35 – 39 37.5 1843 69,112.5 32.2721 5.2279 50,370.92 40 – 44 42.5 377 16,022.5 32.2721 10.2279 39,437.95 7284if =∑ 235,070i ix f =∑ ( )2 196,111.69ix fμ− =∑

235,070 32.2721 32.3 yr7284

i i

i

x ff

μ = = ≈ ≈∑∑

; ( )2

196,111.69 5.2 yr7284

i

i

x ffμ

σ−

= = ≈∑∑

(b) 30002500200015001000 500 0 15 20 25 30 35 40 45

Mother’s Age

Freq

uenc

y

Number of Multiple Births in 2002

(c) By the Empirical Rule, 95% of the observations will be within 2 standard deviations of the mean. Now, 2 32.3 2(5.2) 21.9μ σ− = − = and 2 32.3 2(5.2) 42.7μ σ+ = + = , so 95% of mothers of multiple births will be between 21.9 and 42.7 years of age.

Page 33: Chapter 3

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data

153

12. (a) Class Midpoint, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−

400 – 449

400 4502

425+= 281 119,425 603.1482 178.1482− 8,918,035.5

450 – 499

450 5002

475+= 577 274,075 603.1482 128.1482− 9,475,471.6

500 – 549 525 840 441,000 603.1482 78.1482− 5,129,998.6

550 – 599 575 1120 644,000 603.1482 28.1482− 887,399.7

600 – 649 625 1166 728,750 603.1482 21.8518 556,766.4

650 – 699 675 900 607,500 603.1482 71.8518 4,646,413.0

700 – 749 725 518 375,550 603.1482 121.8518 7,691,192.1

750 – 800 775.5 394 305,547 603.1482 172.3518 11,703,826.

3 5796if =∑ 3,495,650i ix f =∑ ( )2 49,009,103.2ix fμ− =∑

3, 495,847 603.1482 603.15796

i i

i

x ff

μ = = ≈ ≈∑∑

; ( )2

49,009,103.2 92.05796

i

i

x ffμ

σ−

= = ≈∑∑

(b) SAT Verbal Scores, 200312001000 800 600 400 200 0

400 450 500 550 600 650 700 750 800 Score

Freq

uenc

y

(c) By the Empirical Rule, 95% of the observations will be within 2 standard deviations of the mean. Now, 2 603.1 2(92.0) 419.1μ σ− = − = and 2 603.1 2(92) 787.1μ σ+ = + = , so 95% of ISACS college-bound seniors will have SAT Verbal scores between 419 and 787.

Page 34: Chapter 3

Chapter 3 Numerically Summarizing Data

154

13. Class Midpt, ix Freq, if i ix f x ix x− ( )2i ix x f−

20 – 29

20 302

25+= 1 25 51.75 26.75− 715.5625

30 – 39

30 402

35+= 6 210 51.75 16.75− 1683.375

40 – 49 45 10 450 51.75 6.75− 455.625 50 – 59 55 14 770 51.75 3.25 147.875 60 – 69 65 6 390 51.75 13.25 1053.375 70 – 79 75 3 225 51.75 23.25 1621.6875 40if =∑ 2070i ix f =∑ ( )2 5677.5ix x f− =∑

2070 51.75 51.840

i i

i

x fx

f= = = ≈∑∑

(compared to 51.1 using the raw data.);

( )( )

25677.5 12.140 11

i

i

x x fs

f−

= = ≈−−

∑∑

(compared to 10.9 using the raw data.)

14. Class Midpoint, ix Frequency, if i ix f x ix x− ( )2i ix x f−

3 – 4.99

3 52

4+= 12 48 6 2− 48

5 – 6.99

5 72

6+= 14 84 6 0 0

7 – 8.99 8 6 48 6 2 24 9 – 10.99 10 3 30 6 4 48 35if =∑ 210i ix f =∑ ( )2 120ix x f− =∑

210 6 million shares35

i i

i

x fx

f= = =∑∑

(compared to 5.88 million shares using the raw data.);

( )( )

2120 1.879 million shares

35 11i

i

x x fs

f−

= = ≈−−

∑∑

(compared to 2.059 million shares using

the raw data.)

Page 35: Chapter 3

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data

155

15. 5(3) 3(4) 4(4) 3(2) 49GPA 3.275 3 4 3 15

i iw

i i

w xx

w x+ + +

= = = = ≈+ + +

∑∑

16. 5(100) 10(93) 60(86) 25(85) 8715Course Average 87.15%5 10 60 25 100

i iw

i i

w xx

w x+ + +

= = = = =+ + +

∑∑

17. Cost per pound = 4($3.50) 3($2.75) 2($2.25) $2.974 3 2

i iw

i i

w xx

w x+ +

= = ≈+ +

∑∑

/lb

18. Cost per pound = 2.5($1.30) 4($4.50) 2($3.75) $3.382.5 4 2

i iw

i i

w xx

w x+ +

= = ≈+ +

∑∑

/lb

19. (a) Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−

0 – 9 5 20,225 101,125 35.6058 30.6058− 18,945,060.7

10 – 19 15 21,375 320,625 35.6058 20.6058− 9,075,803.5

20 – 29 25 20,437 510,925 35.6058 10.6058− 2,298,814.9

30 – 39 35 21,176 741,160 35.6058 0.6058− 7,771.5

40 – 49 45 22,138 996,210 35.6058 9.3942 1,953,700.5

50 – 59 55 16,974 933,570 35.6058 19.3942 6,384,515.4

60 – 69 65 10,289 668,785 35.6058 29.3942 8,889,891.4

70 – 79 75 6,923 519,225 35.6058 39.3942 10,743,824.4

80 – 89 85 3,053 259,505 35.6058 49.3942 7,448,669.7

90 – 99 95 436 41,420 35.6058 59.3942 1,538,064.6

143,026if =∑ 5,092,550i ix f =∑ ( )2 67,286,116.6ix fμ− =∑

5,092,550 35.6058 35.6 yr143,026

i i

i

x ff

μ = = ≈ ≈∑∑

; ( )2

67,286,116.6 21.7 yr143,026

i

i

x ffμ

σ−

= = ≈∑∑

Page 36: Chapter 3

Chapter 3 Numerically Summarizing Data

156

(b) Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−

0 – 9 5 19,319 96,595 38.0872 33.0872− 21,149,722.6

10 – 19 15 20,295 304,425 38.0872 23.0872− 10,817,616.6

20 – 29 25 19,459 486,475 38.0872 13.0872− 3,332,836.4

30 – 39 35 20,936 732,760 38.0872 3.0872− 199,536.9

40 – 49 45 22,586 1,016,370 38.0872 6.9128 1,079,312.8

50 – 59 55 17,864 982,520 38.0872 16.9128 5,109,868.6

60 – 69 65 11,563 751,595 38.0872 26.9128 8,375,067.1

70 – 79 75 9,121 684,075 38.0872 36.9128 12,427,862.3

80 – 89 85 5,367 456,195 38.0872 46.9128 11,811,751.5

90 – 99 95 1,215 115,425 38.0872 56.9128 3,935,466.2

147,725if =∑ 5,626,435i ix f =∑ ( )2 78,239,041.0ix fμ− =∑

5,626,435 38.0872 38.1 yr147,725

i i

i

x ff

μ = = ≈ ≈∑∑

; ( )2

78,239,041 23.0 yr147,725

i

i

x ffμ

σ−

= = ≈∑∑

(c) & (d) Females have both a higher mean age and more dispersion in age.

20. (a) Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−

10 – 14 12.5 1.1 13.75 25.8462 13.4996− 200.4631

15 – 19 17.5 53.0 927.5 25.8462 8.4996− 3828.8896

20 – 24 22.5 115.1 2589.75 25.8462 3.4996− 1409.6527

25 – 29 27.5 112.9 3104.75 25.8462 1.5004 254.1605

30 – 34 32.5 61.9 2011.75 25.8462 6.5004 2615.5969

35 – 39 37.5 19.8 742.5 25.8462 11.5004 2618.7322

40 – 44 42.5 3.9 165.75 25.8462 16.5004 1061.8265

45 – 49 47.5 0.2 9.5 25.8462 21.5004 92.4534

367.9if =∑ 9565.25i ix f =∑ ( )2 12,081.7749ix fμ− =∑

Page 37: Chapter 3

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data

157

9565.25 25.9996 26.0 yr367.9

i i

i

x ff

μ = = ≈ ≈∑∑

; ( )2

12,081.7749 5.7 yr367.9

i

i

x ffμ

σ−

= = ≈∑∑

(b) Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−

10 – 14 12.5 0.7 8.75 27.6180 15.118− 159.9877

15 – 19 17.5 43.0 752.5 27.6180 10.118− 4402.0787

20 – 24 22.5 103.6 2331 27.6180 5.118− 2713.6905

25 – 29 27.5 113.6 3124 27.6180 0.118− 1.5818

30 – 34 32.5 91.5 2973.75 27.6180 4.882 2180.8040

35 – 39 37.5 41.4 1552.5 27.6180 9.882 4042.8725

40 – 44 42.5 8.3 352.75 27.6180 14.882 1838.2336

45 – 49 47.5 0.5 23.75 27.6180 19.882 197.6470

402.6if =∑ 11,119i ix f =∑ ( )2 12,081.7749ix fμ− =∑

11,1119 27.6180 27.6 yr402.6

i i

i

x ff

μ = = ≈ ≈∑∑

; ( )2

15,536.8952 6.2 yr402.6

i

i

x ffμ

σ−

= = ≈∑∑

(c) & (d) The year 2002 has both the higher mean age of mothers and more dispersion in the age of mothers.

21. Class Frequency, f Cumulative Frequency, CF 0 – 9 31 31 10 – 19 39 70 20 – 29 17 87 30 – 39 6 93 40 – 49 4 97 50 – 59 2 99 60 – 69 1 100

The total frequency is 100, so the position of the median is 100 502 2n= = , which is in the

second class, 10 – 19. Then ( )50 312 10 20 10 14.9 days39

n CFM L i

f

− −= + ⋅ = + − ≈ .

Page 38: Chapter 3

Chapter 3 Numerically Summarizing Data

158

22. Class Frequency, f Cumulative Frequency, CF 0 – 9 24 24 10 – 19 14 38 20 – 29 39 77 30 – 39 18 95 40 – 49 5 100

The total frequency is 100, so the position of the median is 100 502 2n= = , which is in the

third class, 20 – 29. Then ( )50 382 20 30 20 23.1 hr/wk39

n CFM L i

f

− −= + ⋅ = + − ≈ .

23. Class Frequency, f (millions) Cumulative Frequency, CF (millions) 25 – 34 28.9 28.9 35 – 44 35.7 64.6 45 – 54 35.1 99.7 55 – 64 24.7 124.4

The total frequency is 124.4 (million), so the position of the median is 124.4 62.22 2n= = ,

which is in the second class, 35 – 44. Then

( )62.2 28.92 35 45 35 44.3 years35.7

n CFM L i

f

− −= + ⋅ = + − ≈ .

24. Class Frequency, f Cumulative Frequency, CF 0 – 0.9 539 539 1.0 – 1.9 1 540 2.0 – 2.9 1336 1876 3.0 – 3.9 1363 3239 4.0 – 4.9 289 3528 5.0 – 5.9 21 3549 6.0 – 6.9 2 3551

The total frequency is 3551, so the position of the median is 3551 1775.52 2n= = , which is

in the third class, 2.0 – 2.9. Then ( )1775.5 5402 2.0 3.0 2.0 2.91336

n CFM L i

f

− −= + ⋅ = + − ≈ .

Page 39: Chapter 3

Section 3.3 Measures of Central Tendency and Dispersion from Grouped Data

159

25. From the table in Problem 5, the modal class (highest frequency class) is 10 – 19 days.

26. From the table in Problem 6, the modal class (highest frequency class) is 20 – 29 hr/wk.

27. From the table in Problem 7, the modal class (highest frequency class) is 25 – 44 years.

28. From the table in Problem 8, the modal class (highest frequency class) is 3.0 – 3.9.

29. (a) Answers will vary. One possibility follows: Many colleges do not permit students under age 16 to enroll in courses, so a reasonable midpoint to use would be 17.

(b) Answers will vary. One possibility follows: Since it is not likely that many students would be over 70 years old, a reasonable midpoint would be 60.

(c) Answers will vary depending on choices for midpoints in parts (a) and (b). Using the choices midpoints from above:

Class Midpoint, ix Freq, if i ix f

Less than 18 17 139 2363

18 – 19 18 20

219+

= 4089 77,691

20 – 21

20 222

21+= 3357 70,497

22 – 24

22 252

23.5+= 1661 39,033.5

25 – 29

25 302

27.5+= 470 12,925

30 – 34

30 352

32.5+= 145 4712.5

35 – 39

35 402

37.5+= 95 3562.5

40 – 49

40 502

45+= 117 5265

50 and above 60 21 1260 10,094if =∑ 217,309.5i ix f =∑

217,309.5 21.5 years10,094

i i

i

x ff

μ = = ≈∑∑

. This estimate is a little higher than the actual

mean age of 20.9 years.

Page 40: Chapter 3

Chapter 3 Numerically Summarizing Data

160

3.4 Measures of Position 1. Answers will vary. The kth percentile of a set of data is the value which divides the bottom

k% of the data from the top (100–k)% of the data. For example, if a data value lies at the 60th percentile, then approximately 60% of the data is below it and approximately 40% is above this value.

2. This can happen because the percentile is rounded to the nearest integer. For example, if

there were 150 scores in the class then the percentile for the top score would be given by 149 100 99.3150

⋅ = which rounds to the 99th percentile, while the next score would

correspond to a percentile of 148 100 98.7150

⋅ = which also rounds to the 99th percentile.

3. A four-star mutual fund is in the top 40% but not in the top 20% of its investment class.

That is, it is above the bottom 60% but below the top 20% of the ranked funds. 4. Not necessarily. When an outlier is discovered it should be investigated to find its cause.

Once the cause is determined, then it can be determined whether it should be removed from the data set.

5. To qualify for Mensa, one needs to have an IQ that is in the top 2% of people. 6. Comparing z-scores gives us a unitless comparison of standard deviations from the mean.

They also take the relative size and variability of the data into account. This allows us to have a standard basis for comparison and also enables us to more easily detect possible outliers.

7. z-score for the 34-week gestation baby: 2400 2600 0.30670

xz μσ− −

= = ≈ −

z-score for the 40-week gestation baby: 3300 3500 0.42475

xz μσ− −

= = ≈ −

The weight of a 34-week gestation baby is 0.30 standard deviations below the mean, while the weight of a 40-week gestation baby is 0.42 standard deviations below the mean. Thus, the 40-week gestation baby weighs less relative to the gestation period.

8. z-score for the 34-week gestation baby: 3000 2600 0.60670

xz μσ− −

= = ≈

z-score for the 40-week gestation baby: 3900 3500 0.84475

xz μσ− −

= = ≈

The weight of a 34-week gestation baby is 0.60 standard deviations above the mean, while the weight of a 40-week gestation baby is 0.84 standard deviations above the mean. Thus, the 34-week gestation baby weighs less relative to the gestation period.

Page 41: Chapter 3

Section 3.4 Measures of Position

161

9. z-score for the 75-inch man: 75 69.6 22.7

xz μσ− −

= = =

z-score for the 70-inch woman: 70 64.1 2.272.7

xz μσ− −

= = ≈

The height of the 75-inch man is 2 standard deviations above the mean, while the height of a 70-inch woman is 2.27 standard deviations above the mean. Thus, the 70-inch woman is relatively taller than the 75-inch man.

10. z-score for the 68-inch man: 68 69.6 0.592.7

xz μσ− −

= = ≈ −

z-score for the 62-inch woman: 62 64.1 0.812.7

xz μσ− −

= = ≈ −

The height of the 68-inch man is 0.59 standard deviations below the mean, while the height of a 62-inch woman is 0.81 standard deviations below the mean. Thus, the 68-inch man is relatively taller than the 62-inch woman.

11. z-score for Jake Peavy: 2.27 4.198 2.500.772

xz μσ− −

= = ≈ −

z-score for Johann Santana: 2.61 4.338 2.200.785

xz μσ− −

= = ≈ −

Jake Peavy’s 2004 ERA was 2.50 standard deviations below the mean, while Johann Santana’s 2004 ERA was 2.20 standard deviations below the mean. Thus, Peavy had the better year relative to his peers.

12. z-score for Ted Williams: 0.406 0.28062 3.820.03281

xz μσ− −

= = ≈

z-score for Ichiro Suzuki: 0.372 0.26992 4.740.02154

xz μσ− −

= = ≈

Ted Williams’ 1941 batting average was 3.82 standard deviations above the mean, while Ichiro Suzuki’s 2004 batting average was 4.74 standard deviations above the mean. Thus, Suzuki had the better year relative to his peers.

13. The data provided in Table 17 are already listed in ascending order.

(a) ( ) ( )401 51 1 20.8100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 20.8 is not an integer, we average

the 20th and 21st data values: 40325.5 333.2 329.35

2P +

= = . This means that

approximately 40% of the states have violent crime rates less than 329.35 crimes per 100,000 population, and approximately 60% of the states have violent crime rates more than this.

Page 42: Chapter 3

Chapter 3 Numerically Summarizing Data

162

(b) ( ) ( )951 51 1 49.4100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 49.4 is not an integer, we average

the 49th and 50th data values: 95730.2 793.5 761.85

2P +

= = . This means that

approximately 95% of the states have violent crime rates less than 761.85 crimes per 100,000 population, and approximately 5% of the states have violent crime rates more than this.

(c) ( ) ( )101 51 1 5.2100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 5.2 is not an integer, we average the

5th and 6th data values: 10173.4 221.0 197.2

2P +

= = . This means that approximately

10% of the states have violent crime rates less than 197.2 crimes per 100,000 population, and approximately 90% of the states have violent crime rates more than this.

(d) Of the 51 states, 48 have a violent crime rate less than Florida’s violent crime rate.

Percentile rank of Florida 48 100 9451

⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠

. Florida’s violent crime rate is at the 94th

percentile. This means that approximately 94% of the states have violent crime rates that are less than that of Florida, and approximately 6% of the states have violent crime rates that are larger than that of Florida.

(e) Of the 51 states, 40 have a violent crime rate less than California’s violent crime rate.

Percentile rank of California 40 100 7851

⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠

. California’s violent crime rate is at the

78th percentile. This means that approximately 78% of the states have violent crime rates that are less than that of California, and approximately 22% of the states have violent crime rates that are larger than that of California.

14. The data provided in Table 17 are already listed in ascending order.

(a) ( ) ( )301 51 1 15.6100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 15.6 is not an integer, we average

the 15th and 16th data values: 30275.8 285.6 280.7

2P +

= = . This means that

approximately 30% of the states have violent crime rates less than 280.7 crimes per 100,000 population, and approximately 70% of the states have violent crime rates more than this.

(b) ( ) ( )851 51 1 44.2100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 44.2 is not an integer, we average

the 44th and 45th data values: 85646.3 658.0 652.15

2P +

= = . This means that

approximately 85% of the states have violent crime rates less than 652.15 crimes per 100,000 population, and approximately 15% of the states have violent crime rates more than this.

Page 43: Chapter 3

Section 3.4 Measures of Position

163

(c) ( ) ( )51 51 1 2.6100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 2.6 is not an integer, we average the

2nd and 3rd data values: 5108.9 110.2 109.55

2P += = . This means that approximately

5% of the states have violent crime rates less than 109.55 crimes per 100,000 population, and approximately 95% of the states have violent crime rates more than this.

(d) Of the 51 states, 45 have a violent crime rate less than New Mexico’s violent crime

rate. Percentile rank of New Mexico 45 100 8851

⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠

. New Mexico’s violent crime

rate is at the 88th percentile. This means that approximately 88% of the states have violent crime rates that are less than that of New Mexico, and approximately 12% of the states have violent crime rates that are larger than that of New Mexico.

(e) Of the 51 states, 15 have a violent crime rate less than Rhode Island’s violent crime

rate. Percentile rank of Rhode Island 15 100 2951

⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠

. Rhode Island’s violent crime

rate is at the 29th percentile. This means that approximately 29% of the states have violent crime rates that are less than that of Rhode Island, and approximately 71% of the states have violent crime rates that are larger than that of Rhode Island.

15. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields 3.9935x = inches and 1.7790s ≈ inches. Using these values as approximations for

the μ and σ , the z-score for x = 0.97 inches 0.97 3.9935 1.701.7790

xz μσ− −

= ≈ ≈ − . The

rainfall in 1971 (0.97 inches) is 1.70 standard deviations below the mean. (b) The data provided are already listed in ascending order. There are n = 20 data points.

The index for the first quartile is ( )25100

20 1 5.25i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

. Since i = 5.2 is not an

integer, we average the 5th and 6th data values: 12.47 2.78

22.625Q +

= = inches. The

index for the second quartile is ( )50100

20 1 10.5i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

. Since i = 10.5 is not an

integer, we average the 10th and 11th data values: 23.97 4.0

23.985Q +

= = inches. The

index for the third quartile is ( )75100

20 1 15.75i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

. Since i = 15.75 is not an

integer, we average the 15th and 16th data values: 35.22 5.50

25.36Q +

= = inches.

(c) IQR 735.2625.236.5 =−=−= 13 QQ inches (d) Lower fence ( ) ( )1.5 IQR 2.625 1.5 2.735 1.4781Q= − = − = − inches.

Upper fence ( ) ( )3 1.5 IQR 5.36 1.5 2.735 9.463Q= + = + = inches. According to this criterion, there are no outliers.

Page 44: Chapter 3

Chapter 3 Numerically Summarizing Data

164

16. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields 10.08x = g/dL and 1.8858s ≈ g/dL. Using these values as approximations for the μ

and σ , the z-score for x = 7.8 g/dL is 7.8 10.08 1.211.8858

xz μσ− −

= ≈ ≈ − . Blackie’s

hemoglobin level (7.8 g/dL) is 1.21 standard deviations below the mean. (b) The data provided are already listed in ascending order. There are n = 20 data points.

The index for the first quartile is ( )25100

20 1 5.25i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

. Since i = 5.25 is not an

integer, we average the 5th and 6th data values: 18.9 9.4

29.15Q +

= = g/dL. The index

for the second quartile is ( )50100

20 1 10.5i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

. Since i = 10.5 is not an integer, we

average the 10th and 11th data values: 29.9 10.0

29.95Q +

= = g/dL. The index for the

third quartile is ( )75100

20 1 15.75i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

. Since i = 15.75 is not an integer, we

average the 15th and 16th data values: 311.0 11.2

211.1Q +

= = g/dL.

(c) IQR 11.1 9.15 1.953 1Q Q= − = − = g/dL (d) Lower fence ( ) ( )1.5 IQR 9.15 1.5 1.95 6.2251Q= − = − = g/dL.

Upper fence ( ) ( )3 1.5 IQR 11.1 1.5 1.95 14.025Q= + = + = g/dL. The hemoglobin level 5.7 g/dL is an outlier because it is less than the lower fence.

17. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields

15.9227x ≈ mg/L and 7.3837s ≈ mg/L. Using these values as approximations for the

μ and σ , the z-score for 20.46x = mg/L is 20.46 15.9227 0.617.3837

xz μσ− −

= ≈ ≈ . The

organic concentration of 20.46 mg/L is 0.61 standard deviations above the mean. (b) There are n = 33 data points, and we must put them in ascending order: 5.2, 5.29, 5.3, 6.51, 7.4, 8.09, 8.81, 9.72, 10.3, 11.4, 11.9, 14, 14.86, 14.86,

14.9, 15.35, 15.42, 15.72, 15.91, 16.51, 16.87, 17.5, 17.9, 18.3, 19.8, 20.46, 20.46, 22.49, 22.74, 27.1, 29.8, 30.91, 33.67

The index for the first quartile is ( )25100

33 1 8.5i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

. Since i = 8.5 is not an

integer, we average the 8th and 9th data values: 19.72 10.3

210.01Q +

= = mg/L. The

index for the second quartile is ( )50100

33 1 17i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

. Since i = 17 is an integer, the

17th data value is the second quartile: 2 15.42Q = mg/L. The index for the third

Page 45: Chapter 3

Section 3.4 Measures of Position

165

quartile is ( )75100

33 1 25.5i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

. Since i = 25.5 is not an integer, we average the

25th and 26th data values: 319.8 20.46

220.13Q +

= = mg/L.

(c) IQR 20.13 10.1 10.123 1Q Q= − = − = mg/L (d) Lower fence ( ) ( )1.5 IQR 10.01 1.5 10.12 5.171Q= − = − = − mg/L.

Upper fence ( ) ( )3 1.5 IQR 20.13 1.5 10.12 35.31Q= + = + = mg/L. According to this criterion, there are no outliers.

18. (a) Computing the sample mean ( x ) and sample standard deviation (s) for the data yields

10.0266x ≈ mg/L and 4.9789s ≈ mg/L. Using these values as approximations for the

μ and σ , the z-score for 20.46x = mg/L is 17.99 10.0266 1.604.9789

xz μσ− −

= ≈ ≈ . The

organic concentration of 17.99 mg/L is 1.60 standard deviations above the mean. (b) There are n = 47 data points, and we must put them in ascending order: 3.02, 3.79, 3.91, 3.99, 4.6, 4.71, 4.8, 4.85, 4.9, 5.5, 7, 7.11, 7.31, 7.45, 7.66,

7.85, 7.9, 7.92, 8.05, 8.37, 8.5, 8.5, 8.79, 9.1, 9.11, 9.29, 9.6, 9.81, 10.3, 10.72, 10.47, 10.89, 11.33, 11.56, 11.72, 11.72, 11.8, 11.97, 12.57, 12.89, 16.92, 17.9, 17.99, 21, 21.4, 21.82, 22.62

The index for the first quartile is ( )25100

47 1 12i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

. Since i = 12 is an integer, the

12th data value is the first quartile: 1 7.11Q = mg/L. The index for the second quartile

is ( )50100

47 1 24i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

. Since i = 24 is an integer, the 24th data value is the second

quartile: 2 9.1Q = mg/L. The index for the third quartile is ( )75100

47 1 36i ⎛ ⎞= + =⎜ ⎟⎝ ⎠

.

Since i = 36 is an integer, the 36th data is the third quartile: 3 11.72Q = mg/L. (c) IQR 11.72 7.11 4.613 1Q Q= − = − = mg/L (d) Lower fence ( ) ( )1.5 IQR 7.11 1.5 4.61 0.1951Q= − = − = mg/L.

Upper fence ( ) ( )3 1.5 IQR 20.13 1.5 10.12 18.635Q= + = + = mg/L. The organic carbon concentrations 21, 21.4, 21.82, and 22.62 mg/L are outliers because they are higher than the upper fence.

19. The first and third quartiles are 4331Q = minutes and 489.53Q = minutes.

Upper fence ( ) ( )3 1.5 IQR 489.5 1.5 489.5 433 574.25Q= + = + − = minutes. The cutoff point is 574 minutes. If more minutes are used, the customer is contacted.

20. The first and third quartiles are $841Q = and $1383Q = .

Upper fence ( ) ( )3 1.5 IQR 138 1.5 138 84 $219Q= + = + − = . If daily charges exceed $219, the customer will be contacted.

Page 46: Chapter 3

Chapter 3 Numerically Summarizing Data

166

21. (a) The first and third quartiles are $671Q = and $4793Q = . Lower fence ( ) ( )1 1.5 IQR 67 1.5 479 67 $551Q= − = − − = −

Upper fence ( ) ( )3 1.5 IQR 479 1.5 479 67 $1097Q= + = + − = . Therefore, $12,777 is an outlier because it is greater than the upper fence.

(b)

(c) Answers will vary. One possibility is that a student may have provided his or her

annual income instead of his or her weekly income. 22. (a) The first and third quartiles are $211Q = and $543Q = .

Lower fence ( ) ( )1 1.5 IQR 21 1.5 54 21 $28.50Q= − = − − = −

Upper fence ( ) ( )3 1.5 IQR 54 1.5 54 21 $103.50Q= + = + − = . Therefore, $115 and $1000 are outliers because they are greater than the upper fence.

(b)

(c) Answers will vary. One possibility follows: It is possible that $115 is correct but

simply an unusual situation. For the data value $1000, perhaps a student provided his or her annual expenditures for entertainment instead of his or her weekly expenditures.

Page 47: Chapter 3

Section 3.5 The Five-Number Summary and Boxplots

167

23. Pulse z-score 76 0.49 60 – 1.59 60 – 1.59 81 1.14 72 – 0.03 80 1.01 80 1.01 68 – 0.55 73 0.10 μ 72.2 0.0 = mean of the z-scores σ 7.671 1.00 = standard deviation of the z-scores 24. Travel Time z-score 39 0.98 21 – 0.42 9 – 1.36 32 0.43 30 0.28 45 1.44 11 – 1.20 12 – 1.12 39 0.98 μ 26.4 0.0 = mean of the z-scores σ 12.842 1.000 = standard deviation of the z-scores 3.5 The Five-Number Summary and Boxplots 1. The median and interquartile range are better measures of central tendency and dispersion

if the data are skewed or if the data contain outliers. 2. right 3. (a) The median is to the left of the center of the box and the right line is substantially

longer than the left line, so the distribution is skewed right. (b) Reading the boxplot, the five-number summary is approximately: 0, 1, 3, 6, 16. 4. (a) The median is near the center of the box and the horizontal lines are approximately the

same in length, so the distribution is symmetric. (b) Reading the boxplot, the five-number summary is approximately: 1− , 2, 5, 8, 11.

Page 48: Chapter 3

Chapter 3 Numerically Summarizing Data

168

5. The data in ascending order are as follows: 42, 43, 46, 46, 47, 48, 49, 49, 50, 50, 51, 51, 51, 51, 52, 52, 54, 54, 54, 54, 54, 55, 55, 55, 55, 56, 56, 56, 57, 57, 57, 57, 58, 60, 61, 61, 61, 62, 64, 64, 65, 68, 69

The smallest number (youngest president) in the data set is 42. The largest number in the data set is 69. The first quartile is 1 51Q = (the 11th data point). The median is 55M = (the 22nd data point). The third quartile is 3 58Q = (the 33rd data point). The five-number summary is 42, 51, 55, 58, 69. The upper and lower fences are: ( ) ( )1Lower fence 1.5 IQR 51 1.5 58 51 40.5Q= − = − − = ;

( ) ( )3Upper fence 1.5 IQR 58 1.5 58 51 68.5Q= + = + − = . Thus, 69 is an outlier.

The median is near the center of the box and the horizontal lines are approximately the

same in length, so the distribution is symmetric. 6. The data in ascending order are as follows:

1, 2, 8, 8, 11, 11, 12, 15, 16, 16, 17, 23, 23, 23, 23, 28, 28, 31, 33, 33, 35, 40 The smallest number in the data set is 1. The largest number in the data set is 40. The first

quartile is 111 11

211Q +

= = (the mean of the 5th and 6th data points). The median is

17 232

20M += = (the mean of the 11th and 12th data points). The third quartile is

328 28

228Q +

= = (the mean of the 16th and 17th data points). The five-number summary is

1, 11, 20, 28, 40. The upper and lower fences are: ( ) ( )1Lower fence 1.5 IQR 11 1.5 28 11 14.5Q= − = − − = − ;

( ) ( )3Upper fence 1.5 IQR 28 1.5 28 11 53.5Q= + = + − = . Thus, there are no outliers.

The median is near the center of the box and the horizontal lines are approximately the

same in length, so the distribution is symmetric.

Page 49: Chapter 3

Section 3.5 The Five-Number Summary and Boxplots

169

7. The data is ascending order are as follows: 1, 3, 3, 3, 3, 4, 4, 4, 5, 7, 7, 7, 9, 10, 10, 10, 12, 13, 14, 15, 16, 17, 17, 17, 17, 18, 19, 19, 21, 22, 23, 25, 27, 27, 29, 32, 35, 36, 45

The smallest number in the data set is 1. The largest number in the data set is 45. The first quartile is 1 7Q = (the 10th data point). The median is 15M = (the 20th data point). The third quartile is 3 22Q = (the 30th data point). The five-number summary is 1, 7, 15, 22, 45. The upper and lower fences are:

( ) ( )1Lower fence 1.5 IQR 7 1.5 22 7 15.5Q= − = − − = − ;

( ) ( )3Upper fence 1.5 IQR 22 1.5 22 7 44.5Q= + = + − = . Thus, 45 is an outlier.

The median is to the left of the center of the box and the right line is substantially longer

than the left line, so the distribution is skewed right. 8. The data is ascending order are as follows:

18, 19, 19, 19, 20, 21, 22, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 26, 27, 27, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 38, 39, 46

The smallest number in the data set is 18. The largest is 46. The first quartile is 1 26Q = (the 16th data point). The median is 30M = (the 32nd data point). The third quartile is

3 32Q = (the 48th data point). The five-number summary is 18, 26, 30, 32, 46. The upper and lower fences are: ( ) ( )1Lower fence 1.5 IQR 18 1.5 32 26 9Q= − = − − = − ;

( ) ( )3Upper fence 1.5 IQR 32 1.5 32 26 41Q= + = + − = . Thus, 46 is an outlier.

The median is to the right of the center of the box, so the distribution is skewed left.

Page 50: Chapter 3

Chapter 3 Numerically Summarizing Data

170

9. The data is ascending order are as follows: 0.598, 0.600, 0.600, 0.601, 0.602, 0.603, 0.605, 0.605, 0.605, 0.606, 0.607, 0.607, 0.608, 0.608, 0.608, 0.608, 0.608, 0.609, 0.610, 0.610, 0.610, 0.610, 0.611, 0.611, 0.612

The smallest number in the data set is 0.598. The largest is 0.612. The first quartile is

10.603 0.605

20.604Q +

= = (the mean of the 6th and 7th data points). The median is

0.608M = (the 13th data point). The third quartile is 30.610 0.610

20.610Q +

= = (the mean

of the 19th and 20th data points). The five-number summary is 0.598, 0.604, 0.608, 0.610, 0.612. The upper and lower fences are:

( ) ( )1Lower fence 1.5 IQR 0.604 1.5 0.610 0.604 0.595Q= − = − − = ;

( ) ( )3Upper fence 1.5 IQR 0.610 1.5 0.610 0.604 0.619Q= + = + − = . Thus, there are no outliers.

The median is to the right of the center of the box, so the distribution is skewed left.

Answers will vary concerning the source of variability in weight.

10. The data is ascending order are as follows: 421, 480, 581, 583, 598, 611, 616, 618, 643, 645, 646, 649, 653, 654, 660, 664, 666, 667, 669, 672, 675, 678, 679, 682, 683, 684, 688, 688, 692, 692, 698, 698, 704, 706, 707, 707, 711, 711, 713, 715, 726, 737, 740, 741, 787, 791, 802, 816, 821, 830, 971

The smallest number in the data set is 421. The largest number in the data set is 971. The first quartile is 1 653Q = (the 13th data point). The median is 684M = (the 26th data point). The third quartile is 3 713Q = (the 39th data point). The five-number summary is 421, 653, 684, 713, 971. The upper and lower fences are:

( ) ( )1Lower fence 1.5 IQR 653 1.5 713 653 563Q= − = − − = ;

( ) ( )3Upper fence 1.5 IQR 713 1.5 713 653 803Q= + = + − = . Thus, the data points 421, 480, 816, 821, 830, and 971 are outliers.

***

The median is near the center of the box. Though the left line is longer than the right line,

when we consider the positions of the outliers, the distribution is relatively symmetric. Answers will vary. Wyoming is very rural resulting in the need to drive further distances. New York is more urban with many mass transit systems resulting in many individual gasoline expenditures.

Page 51: Chapter 3

Section 3.5 The Five-Number Summary and Boxplots

171

11. (a) The data is ascending order are as follows: 28, 32, 33, 35, 36, 38, 39, 44, 44, 45, 45, 46, 46, 48, 48, 48, 49, 50, 51, 51, 51, 52, 52, 53, 53, 54, 55, 56, 56, 58, 59, 60, 60, 62, 63, 66, 69, 70, 70, 73

The smallest number in the data set is 28. The largest number in the data set is 73. The

first quartile is 145 45

245Q +

= = (the mean of the 10th and 11th data points). The

median is 51 512

51M += = (the mean of the 20th and 21st data points). The third

quartile is 358 59

258.5Q +

= = (the 30th and 31st data points). The five-number

summary is 28, 45, 51, 58.5, 73.

(b) ( ) ( )1Lower fence 1.5 IQR 45 1.5 58.5 45 24.75Q= − = − − = ;

( ) ( )3Upper fence 1.5 IQR 58.5 1.5 58.5 45 78.75Q= + = + − = . There are no outliers.

(c) The median is near the center of the box and the horizontal lines are approximately

equal in length, so the distribution is symmetric. This is confirmed by the histogram. (d) Since the distribution is symmetric and contains no outliers, the mean and standard

deviation should be reported as the measures of central tendency and dispersion. 12. (a) The data is ascending order are as follows:

3.01, 3.04, 3.25, 3.38, 3.38, 3.56, 3.78, 4.35, 4.43, 4.50, 4.74, 4.88, 5.00, 5.02, 5.32, 5.34, 5.53, 5.58, 5.64, 5.75, 6.06, 6.07, 6.23, 6.52, 6.57, 6.92, 7.16, 7.25, 7.57, 7.97, 8.40, 8.74, 9.70, 10.32, 10.96

The smallest number in the data set is 3.01. The largest number in the data set is 10.96. The first quartile is 1 4.43Q = (the 9th data point). The median is 5.58M = (the 18th data point). The third quartile is 3 7.16Q = (the 27th data point). The five-number summary is 3.01, 4.43, 5.58, 7.16, 10.96.

(b) ( ) ( )1Lower fence 1.5 IQR 4.43 1.5 7.16 4.43 0.335Q= − = − − = ;

( ) ( )3Upper fence 1.5 IQR 7.16 1.5 7.16 4.43 11.255Q= + = + − = . There are no outliers.

Page 52: Chapter 3

Chapter 3 Numerically Summarizing Data

172

(c) The median is to the left of the center of the box and the right line is substantially longer than the left line, so the distribution is skewed right. This is confirmed by the histogram.

(d) Since the distribution is skewed, the median and interquartile range should be reported as the measures of central tendency and dispersion.

13. (a) The data is ascending order are as follows:

0, 0, 0, 0, 0, 0, 0, 0.41, 0.62, 0.64, 0.67, 0.89, 0.94, 1.05, 1.06, 1.15, 1.22, 1.35, 1.68, 1.7, 1.7, 2.04, 2.07, 2.16, 2.38, 2.45, 2.59, 2.83

The smallest number in the data set is 0. The largest number in the data set is 2.83.

The first quartile is 10 0.41

20.205Q +

= = (the mean of the 7th and 8th data points). The

median is 1.05 1.062

1.055M += = (the mean of the 14th and 15th data points). The third

quartile is 31.7 2.04

21.87Q +

= = (the 21st and 22nd data points). The five-number

summary is 0, 0.205, 1.055, 1.87, 2.83.

(b) ( ) ( )1Lower fence 1.5 IQR 0.205 1.5 1.87 0.205 2.2925Q= − = − − = − ;

( ) ( )3Upper fence 1.5 IQR 1.87 1.5 1.87 0.205 4.3675Q= + = + − = . Thus, there are no outliers.

(c) The right line is substantially longer than the left line, so the distribution is skewed

right. This is confirmed by the histogram. (d) Since the distribution is skewed, the median and interquartile range should be reported

as the measures of central tendency and dispersion. 14. (a) The data is ascending order are as follows:

78, 107, 108, 161, 177, 225, 234, 237, 255, 262, 268, 274, 279, 285, 286, 291, 292, 311, 314, 343, 345, 351, 352, 352, 357, 375, 377, 402, 424, 444, 459, 470, 484, 496, 503, 539, 540, 553, 563, 579, 593, 599, 621, 638, 662, 717, 740, 770, 770, 822, 1633

The smallest number in the data set is 78. The largest is 1633. The first quartile is 1 279Q = (the 13th data point). The median is 375M = (the 26th data point). The third

quartile is 3 563Q = (the 39th data point). The five-number summary is 78, 279, 375, 563, 1633.

Page 53: Chapter 3

Section 3.5 The Five-Number Summary and Boxplots

173

(b) ( ) ( )1Lower fence 1.5 IQR 285.5 1.5 563 285.5 130.75Q= − = − − = − ;

( ) ( )3Upper fence 1.5 IQR 563 1.5 563 285.5 979.25Q= + = + − = . Thus, the data point 1633 is an outlier.

(c) The median is to the left of the center of the box, so the distribution is skewed right.

This is confirmed by the histogram. (d) Since the distribution is skewed, the median and interquartile range should be reported

as the measures of central tendency and dispersion. 15. The data in ascending order are:

Keebler: 20, 20, 21, 21, 21, 22, 23, 24, 24, 24, 25, 25, 26, 28, 28, 28, 28, 29, 31, 32, 33 Store Brand: 16, 17, 18, 21, 21, 21, 23, 23, 24, 24, 24, 25, 26, 26, 27, 27, 28, 29, 30, 31, 33 Since both sets of data contain n = 21 data points, the quartiles are in the same positions for

both sets. Namely, the first quartile is the mean of the 5th and 6th data points, the median is the 11th data point, and the third quartile is the mean of the 16th and 17th data points.

The five-number summaries are: Keebler: 20, 21.5, 25, 28, 33

Store Brand: 16, 21, 24, 27.5, 33 The fences for Keebler Chips Deluxe Chocolate Chip Cookies are:

( )Lower fence 21.5 1.5 28 21.5 11.75= − − = ; ( )Upper fence 28 1.5 28 21.5 37.75= + − = The fences for the store brand chocolate chip cookies are:

( )Lower fence 21 1.5 27.5 21 11.25= − − = ; ( )Upper fence 27.5 1.5 27.5 21 37.25= + − = So, neither data set has any outliers.

Keebler appears to have both a higher number of chocolate chips per cookie and the more

consistent number of chips per cookie.

Page 54: Chapter 3

Chapter 3 Numerically Summarizing Data

174

16. The data in ascending order are: Oklahoma: 18, 30, 40, 44, 47, 55, 61, 62, 64, 64, 73, 78, 79, 83, 145

Kansas: 42, 59, 62, 64, 68, 71, 73, 88, 91, 92, 95, 101, 113, 116, 122 Nebraska: 26, 28, 30, 55, 60, 61, 62, 63, 65, 69, 74, 81, 88, 102, 110 Since all three sets of data contain n = 15 data points, the quartiles are in the same positions

for all three sets. Namely, the first quartile is the 4th data point, the median is the 8th data point, and the third quartile is the 12th data point.

The five-number summaries are: Oklahoma: 18, 44, 62, 78, 145

Kansas: 42, 64, 88, 101, 122 Nebraska: 26, 55, 63, 81, 110

Oklahoma: ( )Lower fence 44 1.5 78 44 7= − − = − ; ( )Upper fence 78 1.5 78 44 129= + − = , so 145 is an outlier.

Kansas: ( )Lower fence 64 1.5 101 64 8.5= − − = ; ( )Upper fence 101 1.5 101 64 156.5= + − = , so there are no outliers.

Nebraska: ( )Lower fence 55 1.5 81 55 16= − − = ; ( )Upper fence 81 1.5 81 55 120= + − = , so there are no outliers.

Kansas appears to have a higher number of tornados per year. 17. The data in ascending order are: McGwire: 340, 341, 350, 350, 360, 360, 360, 369, 370, 370, 370, 370, 377, 380, 380, 380,

380, 380, 385, 385, 388, 390, 390, 390, 390, 398, 400, 400, 409, 410, 410, 410, 410, 410, 420, 420, 420, 420, 420, 423, 425, 430, 430, 430, 430, 430, 430, 430, 440, 440, 440, 450, 450, 450, 450, 452, 458, 460, 460, 461, 470, 470, 470, 478, 480, 500, 510, 510, 527, 550

The smallest number in the data set is 340. The largest number is 550. The first quartile is 1 380Q = (the mean of the 17th and 18th data points). The median is 420M = (the mean of

the 35th and 36th data points). The third quartile is 3 450Q = (the mean of the 53rd and 54th data points). The five-number summary for Mark McGwire is 340, 380, 420, 450, 550. Lower fence 380 1.5(450 380) 275= − − = ; Upper fence 450 1.5(450 380) 555= + − = . Thus, there are no outliers.

Page 55: Chapter 3

Section 3.5 The Five-Number Summary and Boxplots

175

Sosa: 340, 344, 350, 350, 350, 360, 364, 364, 365, 366, 368, 370, 370, 370, 370, 370, 371, 380, 380, 380, 380, 380, 380, 388, 390, 390, 400, 400, 400, 400, 400, 405, 410, 410, 410, 410, 410, 414, 415, 420, 420, 420, 420, 420, 420, 420, 420, 430, 430, 430, 430, 430, 430, 433, 433, 434, 434, 440, 440, 440, 450, 460, 480, 480, 482, 500,

The smallest number in the data set is 340. The largest number is 500. The first quartile is 1 370.5Q = (the mean of the 16th and 17th data points). The median is 410M = (the mean

of the 33rd and 34th data points). The third quartile is 3 430Q = (the mean of the 50th and 51st data points). The five-number summary for Sammy Sosa is 340, 370.5, 410, 430, 500.

( )Lower fence 370.5 1.5 430 370.5 281.25= − − = ;

( )Upper fence 430 1.5 430 370.5 519.25= + − = . Thus, there are no outliers. (Note: The TI-84 gives 1 371Q = because the calculator uses a different, but acceptable, procedure for determining the quartiles. In most cases, the different procedures produce the same results, but in this case, they differ slightly.)

Bonds: 320, 320, 347, 350, 360, 360, 360, 361, 365, 370, 370, 375, 375, 375, 375, 380, 380, 380, 380, 380, 385, 390, 390, 391, 394, 396, 400, 400, 400, 400, 404, 405, 410, 410, 410, 410, 410, 410, 410, 410, 410, 410, 411, 415, 415, 416, 417, 417, 420, 420, 420, 420, 420, 420, 420, 420, 429, 430, 430, 430, 430, 430, 435, 435, 436, 440, 440, 440, 440, 442, 450, 454, 488

The smallest number in the data set is 320. The largest number is 488. The first quartile is 1 380Q = (the mean of the 18th and 19th data points). The median is 410M = (the 37th

data point). The third quartile is 3 420Q = (the mean of the 55th and 56th data points). The five-number summary for Barry Bonds is 320, 380, 410, 420, 488.

( )Lower fence 380 1.5 420 380 320= − − = ; ( )Upper fence 420 1.5 420 380 480= + − = . Thus, 488 is an outlier.

Mark McGwire appears to have longer distances. Barry Bonds appears to have the most

consistent distances.

Page 56: Chapter 3

Chapter 3 Numerically Summarizing Data

176

Chapter 3 Review Exercises

1. (a) 7925.110

792.51 m/sx

nx = = =∑ ; 792.4 792.4

2792.4 m/sM +

= =

Data in order: 789.6, 791.4, 791.7, 792.3, 792.4, 792.4, 793.1, 793.8, 794.0, 794.4 (b) Range = Largest Data Value – Smallest Data Value = 974.4 789.6 4.8 m/s− = .

( )2

2

2

2

Data, Sample Mean, Deviations, Squared Deviations, 793.8 792.51 793.8 792.51 1.29 1.29 1.6641793.1 792.51 793.1 792.51 0.59 0.59 0.3481792.4 792.51 792.4 792.51 0.11 ( 0.11) 0.0121794.0 792.51 79

i i ix x x x x x− −− = =− = =− = − − =

2

2

2

2

2

34.0792.51 1.49 1.49 2.2201791.4 792.51 791.4 792.51 1.11 ( 1.11) 1.2321792.4 792.51 792.4 792.51 0.11 ( 0.11) 0.0121791.7 792.51 791.7 792.51 0.81 ( 0.81) 0.6561792.3 792.51 792.3 792.51 0.21 ( 0.21) 0.0

= =− = − − =− = − − =− = − − =− = − − =

( ) ( )

2

2

2

441789.6 792.51 789.6 792.51 2.91 ( 2.91) 8.4681794.4 792.51 794.4 792.51 1.89 1.89 3.5721

7925.1 0 18.2290i ix x x x x

− = − − =− = =

= − = − =∑ ∑ ∑

( )2

2 218.229041 10 1

2.03 (m/s)ix xn

s−

− −= = ≈∑ ;

( )218.22904

1 10 11.42 m/six x

ns

− −= = ≈∑ .

2. (a) 126810

126.8 beats/minx

nx = = =∑ ; 128 129

2128.5 beats/min.M +

= =

Data in order: 86, 96, 115, 120, 128, 129, 136, 143, 146, 169 (b) Range = Largest Data Value – Smallest Data Value = 169 86 83 beats/min.− =

( )2Data, Sample Mean, Deviations, Squared Deviations, 136 126.8 9.2 84.64169 126.8 42.2 1780.84120 126.8 6.8 46.24128 126.8 1.2 1.44129 126.8 2.2 4.84143 126.8 16.2 262.44115 126.8 11.8 139.24146 126.8 19.2 368.64

i i ix x x x x x− −

( ) ( )2

96 126.8 30.8 948.6486 126.8 40.8 1664.64

1268 0 5301.60i ix x x x x

−−

= − = − =∑ ∑ ∑

Page 57: Chapter 3

Chapter 3 Review Exercises

177

( )2

2 25301.601 10 1

589.1 (beats/min.)ix xn

s−

− −= = ≈∑ ;

( )25301.60

1 10 124.3 beats/min.ix x

ns

− −= = ≈∑

3. (a) 91, 6109

10,178.8889 $10,178.89x

nx = = ≈ ≈∑ ; $9,980M =

Data in order: 5500, 7200, 7889, 8998, 9980, 10995, 12999, 13999, 14050 (b) Range = Largest Data Value – Smallest Data Value 14,050 5,500 $8,550= − = .

( )2Data, Sample Mean, Deviations, Squared Deviations, 14,050 10,178.8889 3871.1111 14,985,501.113,999 10,178.8889 3820.1111 14,593,248.812,999 10,178.8889 2820.1111 7,953,026.610,995 10,178.8889 816.111

i i ix x x x x x− −

1 666,037.39,980 10,178.8889 198.8889 39,556.88,998 10,178.8889 1180.8889 1,394,498.67,889 10,178.8889 2289.8889 5,243,591.27,200 10,178.8889 2978.8889 8,873,779.15,550 10,178.8889 4678.8889 21,892,001.3

91,61x

−−−−−

=∑ ( ) ( )20 0 75,641,240.9i ix x x x− = − =∑ ∑

( )2

75, 641, 240.91 9 1

$3,074.92ix xn

s−

− −= = ≈∑ .

(c) 118, 6109

13,178.8889 $13,178.89x

nx = = ≈ ≈∑

Data in order: 5500, 7200, 7889, 8998, 9980, 10995, 12999, 13999, 41050 $9,980M = ; Range 41,050 5,500 $35,550= − = .

( )2Data, Sample Mean, Deviations, Squared Deviations, 41,050 13,178.8889 27,871.1111 776,798,833.913,999 13,178.8889 820.1111 672,582.212,999 13,178.8889 179.8889 32,360.010,995 13,178.8889 2183.8889 4

i i ix x x x x x− −

−− ,769,370.7

9,980 13,178.8889 3198.8889 10,232,890.18,998 13,178.8889 4180.8889 17,479,831.97,889 13,178.8889 5289.8889 27,982,924.57,200 13,178.8889 5978.8889 35,747,112.45,550 13,178.8889 7678.8889 58,965,334.7

−−−−−

( ) ( )2118,610 0 932,681,240.9i ix x x x x= − = − =∑ ∑ ∑

Page 58: Chapter 3

Chapter 3 Numerically Summarizing Data

178

( )2

932, 681, 240.91 9 1

$10,797.46ix xn

s−

− −= = ≈∑ .

The mean, range, and standard deviation are all changed considerably by the incorrectly entered data value. The median does not change. The median is resistant.

4. (a) 2, 071, 02415

138,068.2667 $138,068x

nx = = ≈ ≈∑

Data in order: 99000, 115000, 124757, 128429, 135512, 136529, 136833, 136924, 138820, 140794, 149143, 149380, 153146, 157216, 169541

$136,924M =

(b) Range = Largest Data Value – Smallest Data Value 169,541 99,000 $70,541= − = .

( )2Data, Sample Mean, Deviations, Squared Deviations, 138,820 138,068.2667 751.7333 565,103169,541 138,068.2667 31,472.7333 990,532,941135,512 138,068.2667 2556.2667 6,534,499149,143 138,068.2667 11,07

i i ix x x x x x− −

−4.7333 122,649,717

140,794 138,068.2667 2,725.7333 7,429,622153,146 138,068.2667 15,078.7333 227,338,04199,000 138,068.2667 39,068.2667 1,526,329,462

136,924 138,068.2667 1,144.2667 1,309,346136,833 138,068.2667 1,

−−− 235.2667 1,525,884

115,000 138,068.2667 23,068.2667 532,144,928124,757 138,068.2667 13,311.2667 177,189,821128,429 138,068.2667 9,639.2667 92,915,463157,216 138,068.2667 19,147.7333 366,635,690149,380 138,068.266

−−−

( ) ( )2

7 11,311.7333 127,955,310136,529 138,068.2667 1,539.2667 2,369,342

91,610 0 4,183,425,169i ix x x x x

= − = − =∑ ∑ ∑

( )2

4,183, 425,1691 15 1

$17,286.30ix xn

s−

− −= = ≈∑ .

Page 59: Chapter 3

Chapter 3 Review Exercises

179

5. (a) 93316

58.3 yearsx

Nμ = = ≈∑

Data in order: 44, 46, 51, 55, 56, 56, 56, 58, 59, 62, 62, 62, 64, 65, 68, 69

58 59 58.5 years2

M += =

The data is bimodal: 56 years and 62 years. Both have frequencies of 3.

(b) Range = 69 – 44 = 25 years To calculate the population standard deviation, we use the computational formula:

( ) ( )

2 22 933

55,16916

166.9 years

ii

xx

NN

σ− −

= =

∑∑

(c) Answers will vary depending on samples selected.

2

2

Data value, Data value squared, 44 193656 313651 260146 211659 348156 313658 336455 302565 422564 409668 462469 476156 313662 384462 384462 3844

933 55,169

i i

i i

x x

x x= =∑ ∑

6. (a) To find the mean, we find 2846ix =∑ and 16n = , so 2846 177.9 home runs16

μ = ≈ .

To find the median, we put the data in order and find the mean of the 8th and 9th data

values: 183 185 184 home runs2

M += = . The mode is the most frequent data value,

which is 135 home runs. (b) Range = 235 – 135 = 100 home runs. To find the standard deviation, we determine

2 521,902ix =∑ . So,

( ) ( )2

2

22846521,902

16 31.3 home runs16

ii

xx

NN

σ− −

= = ≈

∑∑.

(c) Answers will vary. (d) The reporter is not lying because the mode is an “average”. He is being deceptive,

however, because the word “average” is usually meant as the mean.

7. (a) To find the mean, we determine 78ix =∑ and 36n = , so 78 2.2 children36

x = ≈ .

To find the median, we put the data in order and find the mean of the 18th and 19th data

values: 2 3 2.5 children2

M += = .

Page 60: Chapter 3

Chapter 3 Numerically Summarizing Data

180

(b) Range = 4 – 0 = 4 children. To find the standard deviation, we determine 2 224ix =∑ .

( ) ( )2

2

2

1

78224

36 1.3 children36 1

ii

xx

nn

s−

−= = ≈

∑∑.

8. (a) To find the mean, we determine 134ix =∑ and 30n = , so 134 4.5 cars30

x = ≈ .

To find the median, we put the data in order and find the mean of the 15th and 16th data

values: 4 5 4.5 cars2

M += = .

(b) Range = 9 – 1 = 8 cars. To find the standard deviation, we determine 2 754ix =∑ .

( ) ( )2

2

2

1

134754

30 2.3 cars30 1

ii

xx

nn

s−

−= = ≈

∑∑.

9. (a) By the Empirical Rule, approximately 99.7% of the data will be within 3 standard

deviations of the mean. Now, 600 – 3(53) = 441 and 600 + 3(53) = 759. Thus, about 99.7% of light bulbs have lifetimes between 441 and 759 hours.

(b) Since 494 is exactly 2 standard deviations below the mean [494 = 600 – 2(53)] and 706 is exactly 2 standard deviations above the mean [706 = 600 + 2(53)], the Empirical Rule predicts that approximately 95% of the light bulbs will have lifetimes between 494 and 706 hours.

(c) Since 547 is exactly 1 standard deviations below the mean [547 = 600 – 1(53)] and 706 is exactly 2 standard deviations above the mean [706 = 600 + 2(53)], the Empirical Rule predicts that approximately 34 + 47.5 = 81.5% of the light bulbs will have lifetimes between 547 and 706 hours.

(d) Since 441 hours is 3 standard deviations below the mean [441 = 600 – 3(53)], the Empirical Rule predicts that 0.15% of light bulbs will last less than 441 hours. Thus, the company should expect to replace about 0.15% of the light bulbs.

(e) By Chebyshev’s theorem, at least 2 2

1 11 100% 1 100% 84%2.5k

⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

of all the

light bulbs are within k = 2.5 standard deviations of the mean. (f) Since 494 is exactly k = 2 standard deviations below the mean [494 = 600 – 2(53)] and

706 is exactly 2 standard deviations above the mean [706 = 600 + 2(53)], Chebyshev’s

inequality indicates that at least 2 2

1 11 100% 1 100% 75%2k

⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

of the light

bulbs will have lifetimes between 494 and 706 hours. 10. (a) By the Empirical Rule, approximately 99.7% of the data will be within 3 standard

deviations of the mean. Now, 4302 – 3(340) = 3282 and 4302 + 3(340) = 5322. Thus, about 99.7% of toner cartridges will print between 3282 and 5322 page.

Page 61: Chapter 3

Chapter 3 Review Exercises

181

(b) Since 3622 is exactly 2 standard deviations below the mean [3622 = 4302 – 2(340)] and 4982 is exactly 2 standard deviations above the mean [4982 = 4302 + 2(340)], the Empirical Rule predicts that approximately 95% of the toner cartridges will print between 3622 and 4982 hours.

(c) Since 3622 is exactly 2 standard deviations below the mean [3622 = 4302 – 2(340)] the Empirical Rule predicts that 0.15 + 2.35 = 2.5% of the toner cartridges light bulbs will last less than 3622 pages. Thus, the company should expect to replace about 2.5 of the toner cartridges.

(d) By Chebyshev’s theorem, at least 2 2

1 11 100% 1 100% 55.6%1.5k

⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ ≈⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

of all the

toner cartridges are within k = 1.5 standard deviations of the mean. (e) Since 3282 is exactly k = 3 standard deviations below the mean [3282 = 4302 – 3(340)]

and 5322 is exactly 3 standard deviations above the mean [5322 = 4302 + 3(340)],

Chebyshev’s inequality indicates that at least 2 2

1 11 100% 1 100% 88.9%3k

⎛ ⎞ ⎛ ⎞− ⋅ = − ⋅ ≈⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

of the toner cartidges will print between 3282 and 5322 pages.

11. Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−

20 – 24 22.5 6035 135,787.5 42.2826 19.7826− 2,361,804.87

25 – 29 27.5 4352 119,680 42.2826 14.7826− 951,021.94

30 – 34 32.5 4083 132,697.5 42.2826 9.7826− 390,740.09

35 – 39 37.5 3933 147,487.5 42.2826 4.7826− 89,960.54

40 – 44 42.5 4194 178,245 42.2826 0.2174 198.22

45 – 49 47.5 3716 176,510 42.2826 5.2174 101,154.21

50 – 54 52.5 3005 157,762.5 42.2826 10.2174 313,707.76

55 – 59 57.5 2355 135,412.5 42.2826 15.2174 545,345.61

60 – 64 62.5 1664 104,000 42.2826 20.2174 680,148.79

65 – 69 67.5 1173 79,177.5 42.2826 25.2174 745,930.95

70 – 74 72.5 1025 74,312.5 42.2826 30.2174 935,918.54

75 – 79 77.5 895 69,362.5 42.2826 35.2174 1,110,037.41

80 – 84 82.5 744 61,380 42.2826 40.2174 1,203,374.81

37,174if =∑ 1,571,815i ix f =∑ ( )2 9,429,343.76ix fμ− =∑

(a) 1,571,815 42.2826 42.28 years37,174

i i

i

x ff

μ = = ≈ ≈∑∑

(b) ( )2

9,429,343.76 15.93 years37,174

i

i

x ffμ

σ−

= = ≈∑∑

Page 62: Chapter 3

Chapter 3 Numerically Summarizing Data

182

12. Class Midpt, ix Freq, if i ix f μ ix μ− ( )2i ix fμ−

20 – 24 22.5 1903 42,817.5 43.7136 21.2136− 856,382.02

25 – 29 27.5 1415 38,912.5 43.7136 16.2136− 371,976.37

30 – 34 32.5 1364 44,330 43.7136 11.2136− 171,515.94

35 – 39 37.5 1430 53,625 43.7136 6.2136− 55,210.62

40 – 44 42.5 1409 59,882.5 43.7136 1.2136− 2,075.21

45 – 49 47.5 1242 58,995 43.7136 3.7864 17,806.34

50 – 54 52.5 1008 52,920 43.7136 8.7864 77,818.43

55 – 59 57.5 784 45,080 43.7136 13.7864 149,040.82

60 – 64 62.5 599 37,437.5 43.7136 18.7864 211,404.37

65 – 69 67.5 415 28,012.5 43.7136 23.7864 234,804.02

70 – 74 72.5 482 34,945 43.7136 28.7864 399,412.59

75 – 79 77.5 456 35,340 43.7136 33.7864 520,533.50

80 – 84 82.5 372 30,690 43.7136 38.7864 559,631.15

12,879if =∑ 562,987.5i ix f =∑ ( )2 3,627,581.38ix fμ− =∑

(a) 562,987.5 43.7136 43.71 years12,879

i i

i

x ff

μ = = ≈ ≈∑∑

(b) ( )2

3,627,581.38 16.78 years12,879

i

i

x ffμ

σ−

= = ≈∑∑

(c) The mean age of a female involved in a traffic fatality is greater than the mean age of a male involved in a traffic fatality. Also, the ages of females involved in traffic a traffic fatality are more dispersed. Answers will vary. One possibility is that an insurance company might use this information in order to help establish the rates it would charge for insuring drivers.

13. 5(4) 4(3) 3(4) 3(2) 50GPA 3.335 4 3 3 15

i iw

i i

w xx

w x+ + +

= = = = ≈+ + +

∑∑

14. Cost per pound = 12

12

2($2.70) 1($1.30) ($1.80) $2.17 / lb2 1

i iw

i i

w xx

w x+ +

= = ≈+ +

∑∑

15. (a) Yankees: 184,193,950ix =∑ and 29n = , so Yankees184,193,950 $6,351,516

29μ = ≈ .

Mets: 96,660,970ix =∑ and 28n = , so Mets96,660,970 $3,452,177

28μ = ≈ .

Page 63: Chapter 3

Chapter 3 Review Exercises

183

(b) Yankees: Yankees $3,100,000M = (the 15th data value)

Mets: Mets800,000 1,000,000 $900,000

2M +

= = (the mean of the 14th and 15th values)

(c) In both cases, the mean is substantially larger than the median, so both distributions are skewed right.

(d) Yankees: 2 152.3001 10ix ≈ ×∑ , so

( ) ( )2

2

215

Yankees

184,193,9502.3001 10

29 $6,242,767.019

ii

xx

NN

σ− × −

= = ≈

∑∑

Mets: 2 149.37457 10ix ≈ ×∑ , so

( ) ( )2

2

214

Mets

96,660,9709.37457 10

28 $4,643,606.128

ii

xx

NN

σ− × −

= = ≈

∑∑

(e) Yankees: $301,400; $837,500; $3,100,000; $11,623,571.50; $22,000,000 Mets: $300,000; $318,750; $900,000; $4,666,666.50; $17,166,667

(f) Fences for the Yankees: Lower fence 837,500 1.5(11,623,571.50 837,500) $15,341,607.25= − − = − Upper fence 11,623,571.50 1.5(11,623,571.50 837,500) $27,802,678.75= + − = The Yankees have no outliers.

Fences for the Mets: Lower fence 318,750 1.5(4,666,666.50 318,750) $6,203,124.75= − − = − Upper fence 4,666,666.50 1.5(4,666,666.50 318,750) $11,188,541.25= + − = The data values $16,071,429 (Vaughn) and $17,166,667 (Piazza) are outliers.

Annotations will vary. One possibility is that the Mets’ salaries are clearly lower and

less dispersed than the Yankees’ salaries. (g) In both boxplots, the median is to the left of the center of the box and the right line is

substantially longer than the left line, so both distributions are skewed right. (h) For both distributions, the median is the better measure of central tendency since the

distributions are skewed.

Page 64: Chapter 3

Chapter 3 Numerically Summarizing Data

184

16. (a) Material A: 64.04ix =∑ and 10n = , so A64.04 6.404 million cycles

10x = = .

Material B: 113.32ix =∑ and 10n = , so B113.32 11.332 million cycles

10x = = .

(b) Material A: A5.69 5.88

25.785 million cyclesM +

= = (the mean of 5th and 6th values)

Material B: B8.20 9.65

28.925 million cyclesM +

= = (the mean of 5th and 6th values)

(c) In both cases, the mean is substantially larger than the median, so both distributions are skewed right.

(d) Material A: 2 472.177ix ≈∑ , so

( ) ( )2

2

2

A 1

64.04472.177

10 2.626 million cycles10 1

ii

xx

nn

s−

−= = ≈

∑∑

Material B: 2 1597.4002ix ≈∑ , so

( ) ( )2

2

2

B 1

113.321597.4002

10 5.900 million cycles10 1

ii

xx

nn

s−

−= = ≈

∑∑

(e) Material A: 3.17; 4.52; 5.785; 8.01; 11.92 million cycles Material B: 5.78; 6.84; 8.925; 14.71; 24.37 million cycles

(f) Fences for Material A: Lower fence 4.52 1.5(8.01 4.52) 0.715 million cycles= − − = − Upper fence 8.01 1.5(8.01 4.52) 13.245 million cycles= + − = Material A has no outliers.

Fences for Material B: Lower fence 6.84 1.5(14.71 6.84) 4.965 million cycles= − − = − Upper fence 14.71 1.5(14.71 6.84) 26.515 million cycles= + − = Material B has no outliers

Bearing Failures

Page 65: Chapter 3

Chapter 3 Review Exercises

185

(g) In both boxplots, the median is to the left of the center of the box and the right line is substantially longer than the left line, so both distributions are skewed right.

(h) For both distributions, the median is the better measure of central tendency since the distributions are skewed.

17. The data provided are already listed in ascending order.

(a) ( ) ( )401 88 1 35.6100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 35.6 is not an integer, we average

the 35th and 36th data values: 40366,155 371,479 $368,817

2P +

= = . This means that

approximately 40% of drivers in the 2004 Nextel Cup Series earned less than $368,817, and approximately 60% of drivers in the 2004 Nextel Cup Series earned more than $368,817.

(b) ( ) ( )951 88 1 84.55100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 84.55 is not an integer, we average

the 84th and 85th data values: 955,692,620 6,221,710 $5,957,165

2P +

= = . This means

that approximately 95% of drivers in the 2004 Nextel Cup Series earned less than $5,957,165, and approximately 5% of drivers in the 2004 Nextel Cup Series earned more than $5,957,165.

(c) ( ) ( )101 88 1 8.9100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 8.9 is not an integer, we average the

8th and 9th data values: 1065,175 70,550 $67862.50

2P +

= = . This means that

approximately 10% of drivers in the 2004 Nextel Cup Series earned less than $67,862.50, and approximately 90% of drivers in the 2004 Nextel Cup Series earned more than $67,862.50.

(d) Of the 88 drivers in the 2004 Nextel Cup Series, 73 earned less than $4,117,750.

Percentile rank of $4,117,750 73 100 8388

⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠

. Thus, $4,117,750 was at the 83rd

percentile. This means that approximately 83% of drivers in the 2004 Nextel Cup Series earned less than $4,117,750, and approximately 17% of drivers in the 2004 Nextel Cup Series earned more than $4,117,750.

(e) Of the 88 drivers in the 2004 Nextel Cup Series, 13 earned less than $116,359.

Percentile rank of $116,359 13 100 1588

⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠

. Thus, $116,359 was at the 15th

percentile. This means that approximately 15% of drivers in the 2004 Nextel Cup Series earned less than $116,359, and approximately 85% of drivers in the 2004 Nextel Cup Series earned more than $116,359.

Page 66: Chapter 3

Chapter 3 Numerically Summarizing Data

186

18. The data provided are already listed in ascending order.

(a) ( ) ( )301 88 1 26.7100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 26.7 is not an integer, we average

the 26th and 27th data values: 30366,155 371,479 $268,422.50

2P +

= = . This means that

approximately 30% of drivers in the 2004 Nextel Cup Series earned less than $268,422.50, and approximately 70% of drivers in the 2004 Nextel Cup Series earned more than $268,422.50.

(b) ( ) ( )901 88 1 80.1100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 80.1 is not an integer, we average

the 80th and 81st data values: 904,759,020 5,152,670 $4,955,845

2P +

= = . This means

that approximately 90% of drivers in the 2004 Nextel Cup Series earned less than $4,955,845, and approximately 5% of drivers in the 2004 Nextel Cup Series earned more than $4,955,845.

(c) ( ) ( )51 88 1 4.45100 100

ki n⎛ ⎞ ⎛ ⎞= + = + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. Since i = 4.45 is not an integer, we average

the 4th and 5th data values: 557,450 57,590 $57,520

2P += = . This means that

approximately 5% of drivers in the 2004 Nextel Cup Series earned less than $57,520, and approximately 95% of drivers in the 2004 Nextel Cup Series earned more than $57,520.

(d) Of the 88 drivers in the 2004 Nextel Cup Series, 49 earned less than $1,333,520.

Percentile rank of $1,333,520 49 100 5688

⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠

. Thus, $1,333,520 was at the 56th

percentile. This means that approximately 56% of drivers in the 2004 Nextel Cup Series earned less than $1,333,520, and approximately 44% of drivers in the 2004 Nextel Cup Series earned more than $1,333,520.

(e) Of the 88 drivers in the 2004 Nextel Cup Series, 16 earned less than $139,614.

Percentile rank of $139,614 16 100 1888

⎛ ⎞= ⋅ ≈⎜ ⎟⎝ ⎠

. Thus, $139,614was at the 18th

percentile. This means that approximately 18% of drivers in the 2004 Nextel Cup Series earned less than $139,614, and approximately 82% of drivers in the 2004 Nextel Cup Series earned more than $139,614.

19. z-score for the female: 160 156.5 0.0751.2

xz μσ− −

= = ≈

z-score for the male: 185 183.4 0.0440

xz μσ− −

= = =

The weight of the 160-pound female is 0.07 standard deviations above the mean, while the weight of the 185-pound male is 0.04 standard deviations above the mean. Thus, the 160-pound female is relatively heavier.

Page 67: Chapter 3

Case Study Who Was “A Mourner”?

187

20. (a) Reading the boxplot, the median crime rate is approximately 4050 per 100,000 population.

(b) Reading the boxplot, the 25th percentile crime rate is approximately 3100 per 100,000 population.

(c) Reading the boxplot, there is one outlier. It is approximately 8000. (d) Reading the boxplot, the lowest crime rate is approximately 2200 per 100,000

population. Case Study: Who Was “A Mourner”?

1. The table below gives the length of each word, line by line in the passage. A listing is also provided of the proper names, numbers, abbreviation, and titles that have been omitted from the data set. 3, 7, 8, 3, 7, 3, 3, 6, 2, 3, 3, 2, 3 4, 3, 8, 2, 3, 7, 4, 2, 11 (omitted Richardson and 22d) 6, 3, 4, 9, 3, 7, 4, 2, 4, 2, 6, 4, 3 7, 5, 2, 8, 2, 4 (omitted Frogg Lane, Liberty-Tree, and Monday) 4, 3, 3, 7, 2, 7, 3, 4, 2, 10, 2, 6 5, 4, 8, 2, 3, 7, 2, 4, 6, 4, 3, 5, 6, 2 3, 5, 5, 5, 5, 6, 5, 4, 8, 8 2, 3, 8, 7, 2, 3, 6, 3, 6, 2, 3, 9 (omitted appear’d) 3, 6, 4, 3, 3, 7, 3, 5, 2, 9, 3 8, 8, 2, 6, 4, 3, 4, 5, 2, 3, 3, 4, 2, 7 5, 6, 8, 4, 3, 7, 6, 6, 5, 2, 3 6, 12, 5, 6, 2 (omitted Wolfe’s Summit of human Glory) 5, 2, 3, 1, 7, 6, 3, 5, 4, 4, 1, 6, 3

2. Mean = 4.54; Median = 4; Mode = 3; standard deviation 2.21≈ ; sample variance 4.90≈ ;

Range = 11; Minimum = 1; Maximum = 12; Sum = 649; Count = 143 Answers will vary. None of the provided authors match both the measures of central

tendency and the measures of dispersion well. In other words, there is no clear cut choice for the author based on the information provided. Based on measures of central tendency, James Otis or Samuel Adams would appear to be the more likely candidates for A MOURNER. Based on measures of dispersion, Tom Sturdy seems the more likely choice. Still, the unknown author’s mean word length differs considerably from that of Sturdy, and the unknown author’s standard deviation differs considerably from those of Otis and Adams.

Page 68: Chapter 3

Chapter 3 Numerically Summarizing Data

188

3. Comparing the two Adams summaries, both the measures of center and the measures of variability differ considerably for the two documents. For example, the means differ by 0.09 and the standard deviations differ by 0.19, not to mention the differences in word counts and the maximum length. This calls into question the viability of word-length analysis as a tool for resolving disputed documents. Word-length may be a part of the analysis needed to determine unknown authors, but other variables should also be taken into consideration.

4. Other information that would be useful to identify A MOURNER would be the style of the

rhetoric, vocabulary choices, use of particular phrases, and the overall flow of the writing. In other words, identifying an unknown author requires qualitative analysis in addition to quantitative analysis.