chapter 03 numerical methods for describing data distributionsdeviation should be used to describe...

26
1 ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. AP* SOLUTIONS Chapter 3 Numerical Methods for Describing Data Distributions Section 3.1 Exercise Set 1 3.1: The distribution is approximately symmetric with no outliers, so the mean and standard deviation should be used to describe the center and spread, respectively. 95 90 85 80 75 70 65 60 55 50 45 40 35 30 amount (mL) 3.2: The distribution is positively skewed with an outlier, so the median and interquartile range should be used to describe the center and spread, respectively. 100 90 80 70 60 50 40 30 20 10 0 Tip Percent 3.3: The distribution is positively skewed with a possible outlier, so the median and interquartile range should be used to describe center and spread, respectively. 175 170 165 160 155 150 145 140 135 130 125 120 115 110 105 100 95 90 85 Defects per 100 cars *AP and Advanced Placement Program are registered trademarks of the College Entrance Examination Board, which was not involved in the production of, and does not endorse, this product.

Upload: others

Post on 12-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

1  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

AP* SOLUTIONS

Chapter 3 Numerical Methods for Describing Data Distributions

Section 3.1 Exercise Set 1

3.1: The distribution is approximately symmetric with no outliers, so the mean and standard deviation should be used to describe the center and spread, respectively.

9590858075706560555045403530amount (mL)

 

3.2: The distribution is positively skewed with an outlier, so the median and interquartile range should be used to describe the center and spread, respectively.

1009080706050403020100Tip Percent

3.3: The distribution is positively skewed with a possible outlier, so the median and interquartile range should be used to describe center and spread, respectively.

175170165160155150145140135130125120115110105100959085Defects per 100 cars

*AP and Advanced Placement Program are registered trademarks of the College Entrance Examination Board, which was not involved in the production of, and does not endorse, this product.

Page 2: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

2  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.4: The average may not be the best measure of a typical value for this data set because examination of the dotplot (reproduced below) indicates that the distribution is clearly skewed and may contain an outlier.

300280260240220200180160140120100806040200minutes

Section 3.1 Exercise Set 2

3.5:

The distribution of times between ordering and receiving coffee is roughly symmetric, so using the mean and standard deviation to describe center and spread, respectively, is appropriate.

3.6:

The distribution of APEAL ratings is roughly symmetric, so using the mean and standard deviation to describe center and spread, respectively, is appropriate.

Page 3: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

3  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.7:

The distribution of male exercise times in positively skewed, so the median and interquartile range should be used to describe the center and spread, respectively.

3.8: The dotplot of average weekday circulation (reproduced below) shows that the distribution is strongly positively skewed. The mean should be used to describe a typical value of symmetric distributions, and therefore should not be used to describe the center of this distribution.

Additional Exercises for Section 3.1

3.9: The distribution is skewed, so median and interquartile range should be used to describe center and spread, respectively.

160140120100806040200Weekend Exercise Time (minutes)

Page 4: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

4  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.10:

The distribution of female exercise time is positively skewed, so the median and interquartile range should be used to describe center and spread, respectively.

3.11: The distribution is roughly symmetric with no obvious outliers, so the mean and standard deviation should be used to describe center and spread, respectively.

706560555045Passive Knee Extension (degrees)

Section 3.2 Exercise Set 1

3.12: The mean is 51.33x ounces. This is a typical or representative value for the amount of alcohol poured. The standard deviation is 15.22s ounces, which represents how much, on average, the values in the data set spread out, or deviate, from the mean.

3.13: (a) 59.23x ounces, and 16.71s ounces. The mean represents a typical or representative value for the amount of alcohol poured and the standard deviation represents how much, on average, the values in the data set spread out, or deviate, from the mean. (b) Individuals pouring alcohol into short wide glasses pour, on average, more alcohol when pouring one shot than when pouring into tall, slender glasses.

3.14: (a) 59.85x hours, 14.78s hours (b) 56.67x hours, 9.75s hours. When Los Angeles was excluded from the data set, the mean and standard deviation both decreased. This suggests that using the mean and standard deviation as measures of center and spread for data sets with outliers present can be risky, because outliers seem to have a significant impact on those measures.

Page 5: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

5  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.15: Answers will vary, here is one possible answer. The mean, $444, is large, and we can likely assume that some parents spend amounts close to zero. Thus it is likely that the amounts vary greatly, making the standard deviation large.

Section 3.2 Exercise Set 2

3.16: (a) 448.30x , which is the typical number of speed-related fatalities of these 20 dates; 28.24s is, on average, how much the number of speed-related fatalities deviates from

the mean.

(b) It is not reasonable to generalize from the sample of 20 days to the other 345 days of the year because these days were not randomly selected. Rather, these are the 20 days that had the highest number of speed-related fatalities between 1994 and 2003.

3.17: 49.40x cents, which is the typical cost per serving (in cents) for this set of 15 high-fiber cereals rated very good or good by Consumer Reports; 16.10s cents is, on average, how much the costs per serving deviate from the mean.

3.18: (a) 152.1x seconds; 74.6s seconds

(b) 139.4x seconds; 51.6s seconds. Deleting the observation of 380 had a profound impact on the mean and standard deviation. The mean decreased from 152.1 to 139.4 seconds, and the standard deviation decreased from 74.6 to 51.6 seconds. This suggests that using the mean and standard deviation to measure center and spread when outliers are present can give a misleading perception of the distribution.

3.19: The standard deviation is a reasonable measure of volatility because it measures how much, on average, individual asset returns deviate from the mean return of the portfolio. A smaller standard deviation indicates smaller deviations (on average) from the mean return, and therefore less risk.

Additional Exercises for Section 3.2

3.20: (a) 9.625x mg/ounce (b) The caffeine concentration of Coca-Cola and Pepsi Cola are quite a bit lower than the energy drinks. In fact, the average caffeine concentration of the energy drinks is more than 3 times the caffeine concentration of Coca-Cola and Pepsi Cola, and some of the individual energy drinks have even more than 3 times the caffeine concentration.

Page 6: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

6  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.21: (a) 287.714x ; the table below shows the deviations from the mean (b) the table below shows the sum of the deviations (at the bottom of column 2)

Data Values ix Deviation from mean ix x Squared Deviations 2

ix x

497 497 – 287.714 = 209.286 209.2862 = 43,800.630 193 193 – 287.714 = -94.714 (-94.714)2 = 8970.742 328 328 – 287.714 = 40.286 40.2862 = 1622.962 155 155 – 287.714 = -132.714 (-132.714)2 = 17,613.006 326 326 – 287.714 = 38.286 38.2862 = 1465.818 245 245 – 287.714 = -42.714 (-42.714)2 = 1824.486 270 270 – 287.714 = -17.714 (-17.714)2 = 313.786

0.002ix x 275,611.43ix x

(c) To calculate the variance and standard deviation, the squared deviations and sum of the squared deviations are needed. The third column contains these values. The variance is

computed using the formula 2 75,611.43

12,601.9051 7 1

ix xs

n

. The standard

deviation is 2 12,601.905 112.258s s .

3.22: (a) 48.36x cm. This is a typical distance (in centimeters) at which a bat first detects a

nearby insect. (b) 2 327.05s cm2, 18.08s cm. The variance is the mean squared deviation from the mean distance at which a bat first detects a nearby insect, in square centimeters. The standard deviation represents, on average, how much a distance at which a bat first detects a nearby insect deviates from the mean, in centimeters.

3.23: The mean found after subtracting 10 from each sample observation is 38.36x cm. The table below shows the original sample observations, the values after subtracting 10, and the deviations from the new mean.

Original Sample Observation

Sample Observation minus 10

Deviation from the new mean

62 52 13.64 23 13 -25.36 27 17 -21.36 56 46 7.64 52 42 3.64 34 24 -14.36 42 32 -6.36 40 30 -8.36 68 58 19.64 45 35 -3.36 83 73 34.64

Page 7: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

7  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

The deviations for the data set obtained by subtracting 10 from each sample observation are exactly the same as the corresponding deviations from the mean for the original data set. Since the deviations are the same, the new variance (s2) and standard deviation (s) are also the same as old variance and standard deviation. Subtracting or adding the same number to every value in a data set does not change the variance (s2) or standard deviation (s).

3.24: The standard deviation of the original data is s = 18.08 cm. After multiplying each data value by 10, the new standard deviation is s = 180.8 cm. In general, if each observation is multiplied by a positive constant c, the standard deviation s is also multiplied by c.

Section 3.3 Exercise Set 1

3.25: (a) There is an even number of observations (n = 20), so the median is the average of the

two middle values: 438,722 427,771

433, 246.52

median

. This value, 433,246.5, is

the value that divides the ordered data set into two halves. This tells us that half of the values in our data set had average weekly circulations of less than 433,246.5, and the other half had average weekly circulations of more than 433,246.5. (b) The median is preferable to the mean for describing the center for this data set because the distribution is positively skewed and contains outliers. (c) It is not reasonable to generalize from this sample to the population of daily newspapers in the United States because these newspapers were not randomly selected. Rather, they are the top 20 newspapers in average weekday circulation.

3.26: Lower quartile = 10,478; upper quartile = 11,778. The lower quartile of 10,478 mg/kg is the value such that 25% of the catsups have sodium contents lower than this value, and 75% are higher. The upper quartile of 11,778 mg/kg is the value such that 75% of the catsups have sodium contents lower than this value, and 25% are higher. The interquartile range is 11,778 10, 478 1300iqr . The interquartile range of 1300 mg/kg is the range

of the middle 50% of the catsup sodium contents. It tells us how spread out the middle 50% of the data values are.

3.27: Because n = 25, the median is the value in the middle of the ordered list. Therefore, the median is 142. The lower quartile is 0, and the upper quartile is 195. The interquartile range is 195 0 195iqr . Half of the values of number of minutes used in cell phone

calls in one month are less than or equal to 142 minutes, and half of the data values of number of minutes used in cell phone calls is greater than or equal to 142 minutes. The middle 50% of the data values have a range of 195 minutes.

3.28: The median tipping percentage is 21%. The lower quartile is 10.75%, and the upper quartile is 35.6%. The interquartile range is 35.6 – 10.75 = 24.85%. The median tipping percentage of 21% indicates that half of the tips were below 21%, and the remaining half

Page 8: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

8  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

were above 21%. The interquartile range indicates that the middle 50% of tips had a range of 24.85%.

Section 3.3 Exercise Set 2

3.29: (a) The median repair cost is $1,688. The median is the middle value in the ordered list of repair costs, so half the repair costs are less than or equal to $1,688, and half the repair costs are greater than or equal to $1,688.

(b) The median is preferable to the mean because the distribution of repair costs is positively skewed (see dotplot below).

3.30: The lower quartile is 49.5 hours, which is the number of extra hours that divides the lower 25% of values from the upper 75%. The upper quartile is 65.5 hours, which is the number of extra hours that divides the lower 75% of values from the upper 25%. The interquartile range is 65.5 49.5 16iqr , which is the range of the middle 50% of the data values.

3.31: The median exercise time for this set of 20 males is 31.5. The median value of 31.5 is the middle value in the ordered list of exercise times, so half the values are less than or equal to 31.5 and half the values are greater than or equal to 31.5. The lower quartile is 3.75, and the upper quartile is 67.5. Therefore, the interquartile range is 67.5 3.75 63.75iqr ,

which is the range of the middle 50% of the exercise times.

3.32: (a) The median exercise time for this set of 20 females is 7.5, which represents the middle value of the ordered female exercise times, so half the exercise times are less than or equal to 7.5, and half the exercise times are greater than or equal to 7.5. The lower quartile is 1.0, and the upper quartile is 49.50, so the interquartile range is 49.5 1.0 48.5iqr . The

middle 50% of the data has a spread of 48.5.

(b) The median male exercise time is much greater than the median female exercise time. In addition, the male interquartile range is greater than the female interquartile range, which indicates that there is more variability in the middle 50% of exercise times for males than for females.

Page 9: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

9  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

Additional Exercises for Section 3.3

3.33: The large difference between the mean and median indicates that there were some parents who spent large amounts of money on school supplies, while the amounts for the lowest spenders were less far from the median value. These outliers have the effect of pulling the mean toward the outliers, yet the median generally remains unchanged.

3.34: The median is the measure of center that determines this salary, and is $4,286. The other measure of center is the mean, and it’s value for this data set is $3,969x . The value of

the mean is less than that of the median, which makes the mean not as favorable to the San Luis Obispo County supervisors.

3.35: (a) The dotplot is relatively symmetric, with a possible outlier at the high end of the scale. As such, the mean and median will be relatively close to each other, with the mean being greater than the median.

425415405395385375365355345335325time (s)

(b) 370.69x seconds, 369 370

369.52

median

seconds. (c) The largest time could

be increased by any amount and not affect the sample median because the position of the middle value will not change if the largest value is increased. The largest time could be decreased to 370 seconds without changing the value of the median.

Page 10: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

10  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

Section 3.4 Exercise Set 1

3.36: Minimum = 0, lower quartile = 14, median = 33.5, upper quartile = 63, maximum = 151.

3.37:

1701601501401301201101009080manufacturing defects

The boxplot shows that there is one outlier (170 defects), and the value of the largest non-outlier is 146 defects. The middle 50% of the data values range between about 106 and 126 defects. The distribution is positively skewed. The median is not centered in the middle 50% of the data values, further indicating the skewed nature of the distribution.

Page 11: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

11  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.38: (a) No, they are not outliers. Any values greater than upper quartile + 1.5(iqr) or less than lower quartile – 1.5(iqr) are considered outliers. For this data set, values are outliers if they are greater than 32.3 1.5(32.3 20) 50.75 cents or less than 20 1.5(32.3 20) 1.55

cents. The largest value is not greater than 50.75, and the smallest value is not less than 1.55, so these values are not outliers.

(b)

5040302010gasoline tax per gallon (cents)

The boxplot is positively skewed. The median is not located at the center of the middle 50%, further indicating a skewed distribution.

3.39: (a) lower quartile = 16.05 inches, upper quartile = 21.93 inches. 21.93 16.05 5.88iqr

inches.

(b) Any values greater than upper quartile + 1.5(iqr) or less than lower quartile – 1.5(iqr) are considered outliers. For this data set, values are outliers if they are greater than 21.93 1.5(5.88) 30.75 inches or less than 16.05 1.5(5.88) 7.23 inches. The value

31.57 inches is an outlier.

Page 12: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

12  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

(c)

3025201510rainfall (inches)

The modified boxplot shows one outlier at the high end of the scale. The distribution of inches of rainfall is slightly positively skewed.

3.40:

Tall Slender

Short Wide

1009080706050403020amount of alcohol poured (mL)

Both distributions (short wide and tall slender) are skewed, although the direction of skew is different for the two distributions. The distribution of amount of alcohol poured into short wide glasses is positively skewed, and the distribution of amount of alcohol poured

Page 13: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

13  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

into tall slender glasses is negatively skewed. The amount of alcohol poured into short wide glasses tends to be more than the amount poured into tall slender glasses. Specifically, the five-number-summary values for short wide glasses are all greater than the corresponding values for tall slender glasses. For example, the maximum amount of alcohol poured into short wide glasses (92.4 mL) is much greater than the maximum amount of alcohol poured into tall wide glasses (73.5 mL). In addition, the median amount of alcohol poured into short wide glasses (60.4 mL) is greater than the median amount of alcohol poured into tall slender glasses.

Section 3.4 Exercise Set 2

3.41: Minimum: 28.8; Lower Quartile: 35.7; Median: 37.3; Upper Quartile: 38.5; Maximum: 42.2

3.42:

21019017015013011090705030Waiting time (seconds)

 

The distribution of waiting times is nearly symmetric, with a median of 120 seconds. The times range from a minimum of 40 to a maximum of 200 seconds. The middle 50% of waiting times range between 85 and 160 seconds.

3.43: (a) Lower quartile: 11.1; Upper quartile: 13.4; interquartile range = 13.4 – 11.1 = 2.3. Any observations smaller than 11.1 1.5(2.3) 7.65 or larger than 13.4 1.5(2.3) 16.85 are

considered outliers. Vermont’s data value (9.5) is not an outlier because it is not smaller than 7.65. Mississippi’s data value (18.0) is an outlier because it is larger than 16.85.

(b) The boxplot shows the one large outlier (Mississippi). Excluding the outlier, the boxplot is relatively symmetric. The median is 12.3, and the upper and lower quartiles

Page 14: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

14  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

agree with the values stated in part (a). Clearly, Mississippi has an unusually large value for the percent of premature births in 2008.

1817161514131211109Premature percent

3.44: (a) Lower quartile: 81.5; Upper quartile: 94; Interquartile range = 94 – 81.5 = 12.5. Outliers are observations that are smaller than 81.5 1.5(12.5) 62.75 and larger than

94 1.5(12.5) 112.75 . The farmer’s observation (43) is an outlier, and the student’s

observation (152) is an outlier.

(b)

160140120100806040Accidents per 1,000

(c) Answers may vary. One possible answer is to offer a professional discount on auto insurance to the professions below the lower quartile for accidents (law enforcement, physical therapist, veterinarian, clerical (secretary), clergy, homemaker, politician, pilot, firefighter, and farmer).

Page 15: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

15  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.45:

Female

Male

200150100500Exercise Time

The distributions of exercise times for males and females are both positively skewed. The distribution of the middle 50% of the male observations is approximately symmetric, but the distribution of the middle 50% of the female observations is positively skewed. The values of the lower quartile (3.75), median (31.5), and upper quartile (67.5) for the males are all larger than the corresponding values for female exercise times (1.0, 7.5, and 49.5, respectively). The male distribution has one large outlier, while the female distribution has no outliers.

Additional Exercises for Section 3.4

3.46: The boxplot is shown below. No, the boxplot is not approximately symmetric, it is positively skewed.

6050403020Maximum Annual Wind Speed

Page 16: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

16  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.47: (a)

West

Middle States

East

252015105Wireless %

(b) The most noticeable difference between the wireless percent for the three geographical regions is that the Middle States region is negatively skewed, and has a smaller interquartile range than the East and West regions. The Eastern region has the smallest median (11.4%), and the Middle States and Western regions have medians that are much closer to each other (16.9% and 16.3%, respectively).

3.48: (a) Lower quartile: 44; Upper quartile: 53; Interquartile range: 53 – 44 = 9. Observations smaller than 44 1.5(9) 30.5 or larger than 53 1.5(9) 66.5 are outliers. There are no

observations smaller than 30.5 or larger than 66.5, so there are no outliers in this data set.

(b) The boxplot is shown below. As indicated in part (a), there are no outliers. The median of this data set is 46%. The entire data set ranges between a minimum of 33% and a maximum of 60%. The middle 50% of observations range between 44% and 53%. The middle 50% is also asymmetric, with the lower half ranging between 44% and 46% and the upper half ranging between 46% and 53%.

Page 17: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

17  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

60555045403530Juice Lost After Thawing (%)

3.49: The fact that the mean is so much higher than the median indicates that the distribution is positively skewed. There were undoubtedly some very high punitive damage awards, which pulled the mean up toward the large values.

Section 3.5 Exercise Set 1

3.50: First national aptitude test: 625 475

1.5100

z

. Second national aptitude test:

45 301.875

8z

. The student performed better on the second national aptitude test

relative to the other test takers because the z-score for the second test is higher than that for the first test.

3.51: (a) 40 minutes is 1 standard deviation above the mean; 30 minutes is 1 standard deviation below the mean. The values that are 2 standard deviations away from the mean are 25 and 45 minutes. (b) Approximately 95% of times are between 25 and 45 minutes; approximately 0.3% of times are less than 20 minutes or greater than 50 minutes; Approximately 0.15% of times are less than 20 minutes.

3.52: The 10th percentile of $0 indicates that 10% of students have $0 or less of student debt. The 25th percentile (which is the lower quartile) indicates that 25% of students have $0 or less of student debt. The 50th percentile (the median) indicates that 50% of students have $11,000 or less of student debt. The 75th percentile (the upper quartile) indicates that 75% of students have $24,600 or less of student debt. The 90th percentile indicates that 90% of students have $39,300 or less of student debt.

Page 18: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

18  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.53: (a)

262524232221201918171615

100

90

80

70

60

50

40

30

20

10

0

Bus Travel Times (minutes)

Freq

uenc

y

(b) (i) 86th percentile is approximately 21 minutes; (ii) 15th percentile is approximately 18 minutes; (iii) 90th percentile is approximately 21.5 minutes; (iv) 95th percentile is approximately 25.5 minutes; (v) 10th percentile is approximately 17.5 minutes

Section 3.5 Exercise Set 2

3.54: (a) The z-score tells us that the score is 2.2 standard deviations above the mean. Because the distribution was mound-shaped and symmetric, the empirical rule applies, and this z-score corresponds to a score slightly above the 97.5th percentile, which means the score is greater than or equal to approximately 97.5% of all the scores.

(b) The z-score tells us that the score is 0.4 standard deviations above the mean. Because the distribution was mound-shaped and symmetric, the empirical rule applies, and this z-score tells us that the score is in the upper half of all scores.

(c) The z-score tells us that the score is 1.8 standard deviations above the mean. Because the distribution was mound-shaped and symmetric, the empirical rule applies, and this z-score corresponds to a little below the 97.5th percentile.

(d) The z-score tells us that the score is 1.0 standard deviation above the mean. Because the distribution was mound-shaped and symmetric, the empirical rule applies, and this z-score corresponds to approximately the 84th percentile, which means the score is greater than or equal to approximately 84% of all the scores.

(e) The z-score of 0 indicates my score was equal to the mean and median.

Page 19: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

19  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.55: (a) Given that the distribution is symmetric and mound-shaped, we can apply the empirical rule. Twenty-seven mph is 1 standard deviation below the mean, and 57 mph is 1 standard deviation above the mean. The empirical rule tells us that approximately 68% of the vehicle speeds lie within one standard deviation of the mean, or between 27 and 57 mph.

(b) Given that the distribution is symmetric and mound-shaped, we can apply the empirical rule. Fifty-seven mph is 1 standard deviation above the mean. Therefore, by the empirical rule, 84% of the vehicle speeds lie below 1 standard deviation above the mean, so 16% of the observations will lie above 1 standard deviation above the mean.

3.56: The 83rd percentile indicates that her score was greater than or equal to 83% of all scores on the verbal section of the test. Additionally, she scored greater than or equal to 94% of all scores on the math section.

3.57: (a) The frequency distribution is shown in the table below.

Expenditures (per capita)

Frequency

0 - < 2 13 2 - < 4 18 4 - < 6 10 6 - < 8 5 8 - < 10 1 10 - < 12 2 12 - < 14 0 14 - < 16 0 16 - < 18 2

181614121086420

20

15

10

5

0

Expenditures (per capita)

Freq

uenc

y

Page 20: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

20  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

(b) (i) The 50th percentile is between per capita expenditures of 2 and 4.

(ii) The 70th percentile is between per capita expenditures of 4 and 6.

(iii) The 10th percentile is between per capita expenditures of 0 and 2.

(iv) The 90th percentile is between per capita expenditures of 7 and 8.

(v) The 40th percentile is between per capita expenditures of 2 and 4.

Additional Exercises for Section 3.5

3.58: (a) data value mean 0 1,650

2.2standard deviation 750

z

(b) data value mean 10,000 1,650

11.133standard deviation 750

z

(c) data value mean 4,500 1,650

3.8standard deviation 750

z

(d) data value mean 300 1,650

1.8standard deviation 750

z

3.59: (a) 1100 gallons; (b) 1400 gallons; (c) 1700 gallons

3.60: (a) data value mean 320 450

1.857standard deviation 70

z

(b) data value mean 475 450

0.357standard deviation 70

z

(c) data value mean 420 450

0.429standard deviation 70

z

(d) data value mean 610 450

2.286standard deviation 70

z

3.61: (a) 120; (b) 20; (c) 90 100

0.520

z

; (d) 97.5; (e) Since a score of 40 is 3 standard

deviations below the mean, that corresponds to a percentile of 0.15%. Therefore, there were relatively few scores below 40.

Page 21: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

21  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 3: Are You Ready to Move On?

3.62: (a) The distribution of female weekend exercise time is positively skewed (see the boxplot below), so the median and interquartile range should be used to describe center and spread, respectively.

9080706050403020100Female Weekend Exercise Time

(b) The distribution of amount of alcohol poured is negatively skewed (see the boxplot below), so the median and interquartile range should be used to describe center and spread, respectively.

80706050403020Amount of Alcohol (mL)

(c) The distribution of wait times is positively skewed with a large outlier (see the boxplot below), so the median and interquartile range should be used to describe center and spread, respectively.

Page 22: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

22  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

40035030025020015010050wait time (sec)

3.63: The mean APEAL rating is 792.03x , which is a typical or representative value for the APEAL ratings in the sample. The standard deviation is 36.70s and represents how much, on average, the values in the data set spread out, or deviate, from the mean APEAL rating.

3.64: The high-caffeine energy drinks show much more variability in caffeine per ounce. This can be seen in the comparative boxplots below. In addition, since both distributions are reasonably symmetric, the standard deviation is an appropriate measure of variability. The standard deviation for the caffeine content per ounce in the energy drinks is s = 0.667, and the standard deviation for the caffeine content per ounce in the high-caffeine energy drinks is s = 8.31.

 

High Caffeine Energy Drink

Top Selling Energy Drink

353025201510Caffeine per Ounce

3.65: (a) The mean tipping percent is 27.31%x , and the standard deviation is 23.83%s . (b)

After removing the 105% tip, the new mean and standard deviation are 23.23%newx , and

Page 23: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

23  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

15.70%news . These values are much smaller than the mean and standard deviation

computed with 105 included. This suggests that the mean and standard deviation can change dramatically when outliers are present (or removed) from the data set, and, therefore, are probably not the best measures of center and spread to use in this situation.

3.66: The mean repair cost is $2,119, and the median repair cost is $1,688. These values are so different because the distribution is positively skewed, and the mean tends to be pulled toward larger values in positively skewed distributions, whereas the median is more resistant. Therefore, the median is preferable to the mean because the distribution of repair costs is positively skewed (see dotplot below).

3.67: (a) The median is 140 seconds, and the interquartile range is 200 100 100iqr seconds.

The median divides the ordered list into two equal halves, with half the values less than 140 seconds and half the values greater than 140 seconds. The interquartile range of 100 seconds indicates that the middle 50% of the data values have a range of 100 seconds. (b) Due to the presence of the large outlier (the value 380 seconds), the median and interquartile range are the appropriate summary measures to describe center and spread for this data set.

Page 24: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

24  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.68: (a) Median = 58, lower quartile = 53.5, upper quartile = 64.4

(b) Outliers at the low end of the distribution are values less than lower quartile – 1.5(iqr). The iqr = 64.4 – 53.5 = 10.9, so values less than 53.5 – 1.5(10.9) = 37.15 are considered outliers. Since the values for Alaska (28.2) and Wyoming (35.7) are both less than 37.15, they are outliers.

(c) The distribution is negatively skewed with two outliers on the low end of the scale. The median is 58%, and the lower and upper quartiles are 53.5% and 64.4%, respectively. The middle 50% of the data values range between these quartiles, and is approximately symmetric. Excluding the outliers, the distribution of the remaining data values is approximately symmetric.

807060504030Percent Still Living in the State

Page 25: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

25  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

3.69: (a) Median = 8 grams/serving; lower quartile = 7 grams/serving; upper quartile = 12 grams/serving; interquartile range = 12 – 7 = 5 grams/serving

(b) Median = 10 grams/serving; lower quartile = 6 grams/serving; upper quartile = 13 grams/serving; interquartile range = 13 – 6 = 7 grams/serving

(c) There are no outliers in the sugar content data set because there are no values greater than 1.5(iqr) above the upper quartile or smaller than 1.5(iqr) below the lower quartile.

(d) The minimum value and lower quartile are the same because the smallest five values in the data set are all equal to 7.

(e)

Sugar

Fiber

20151050Content (grams/serving)

The sugar content in grams/serving is much more variable than the fiber content in grams/serving. The range in sugar content (19 grams/serving) is greater than the range in fiber content (7 grams/serving). The boxplot of fiber content shows that the minimum and lower quartile are equal to each other, which is not observed in the sugar content. The distribution of sugar content values are approximately symmetric, which is different from the skewed fiber distribution.

3.70: Use z-scores to make comparisons between the two different stimuli. For stimulus 1, 4.2 6.0

1.51.2

z

, and for stimulus 2, 1.8 3.6

2.250.8

z

. The z-scores indicate that

your reaction time for stimulus 1 is 1.5 standard deviations below the mean, and your

Page 26: Chapter 03 Numerical Methods for Describing Data Distributionsdeviation should be used to describe the center and spread, respectively. ... 3.26: Lower quartile = 10,478; upper quartile

26  

©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

reaction time for stimulus 2 is 2.25 standard deviations below the mean. Therefore, compared to other people, you are reacting to stimulus 2 more quickly.

3.71: (a) The 25th percentile indicates that 25% of full-time female workers age 25 or older with an Associate degree earn $26,800 or less. The 50th percentile indicates that 50% of full-time female workers age 25 or older with an Associate degree earn $36,800 or less. The 75th percentile indicates that 75% of full-time female workers age 25 or older with an Associate degree earn $51,100 or less. (b) The 25th, 50th, and 75th percentile values for men are all greater than the corresponding percentiles for female workers, indicating that full-time employed men age 25 or older with an Associate degree, in general, earn more than full-time employed women age 25 or older with an Associate degree.