chapter 1 section 1.2 describing distributions with numbers
TRANSCRIPT
![Page 1: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/1.jpg)
Chapter 1Section 1.2
Describing Distributions with
Numbers
![Page 2: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/2.jpg)
Parameter -
Fixed value about a population
Typical unknown
![Page 3: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/3.jpg)
Statistic -
Value calculated from a sample
![Page 4: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/4.jpg)
Measures of Central Tendency
Mean - the arithmetic average
Use m to represent a population mean
Use to represent a sample mean
n
xx
Formula: S is the capital Greek letter
sigma – it means to sum the values that follow
parameter
statisticThis is on the formula
sheet, so you do not have to memorize it.
![Page 5: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/5.jpg)
Measures of Central Tendency
Median - the middle of the data; 50th percentile
Observations must be in numerical orderIs the middle single value if n is oddThe average of the middle two values if n is even
NOTE: n denotes the sample size
![Page 6: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/6.jpg)
Measures of Central Tendency
Mode – the observation that occurs the most often
Can be more than one mode
If all values occur only once – there is no mode
Not used as often as mean & median
![Page 7: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/7.jpg)
Range-The difference between the largest and smallest observations.
This is only one number! Not 3-8 but 5
Measures of Central Tendency
![Page 8: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/8.jpg)
Suppose we are interested in the number of lollipops that are bought at a certain store. A sample of 5 customers buys the following number of lollipops. Find the median.
2 3 4 8 12
The numbers are in order & n is odd – so find the
middle observation.
The median is 4 lollipops!
![Page 9: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/9.jpg)
Suppose we have sample of 6 customers that buy the following number of lollipops. The median is …
2 3 4 6 8 12
The numbers are in order & n is even – so find the middle two observations.
The median is 5 lollipops!
Now, average these two values.
5
![Page 10: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/10.jpg)
Suppose we have sample of 6 customers that buy the following number of lollipops. Find the mean.
2 3 4 6 8 12
To find the mean number of lollipops add the observations and divide by
n.
61286432 833.5x
![Page 11: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/11.jpg)
What would happen to the median & mean if the 12 lollipops were 20?
2 3 4 6 8 20
The median is . . . 5
The mean is . . .
62086432
7.17
What happened?
![Page 12: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/12.jpg)
What would happen to the median & mean if the 20 lollipops were 50?
2 3 4 6 8 50
The median is . . . 5
The mean is . . .
65086432
12.17
What happened?
![Page 13: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/13.jpg)
Resistant -
Statistics that are not affected by outliers
Is the median resistant?
►Is the mean resistant?
YES
NO
![Page 14: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/14.jpg)
Now find how each observation deviates from the mean.
What is the sum of the deviations from the mean?
Look at the following data set. Find the mean.
22 23 24 25 25 26 29 30
5.25x
xx 0
Will this sum always equal zero?
YESThis is the deviation from
the mean.
![Page 15: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/15.jpg)
Look at the following data set. Find the mean & median.
Mean =
Median =
21 23 23 24 25 25 26 26 2627
27 27 27 28 30 30 30 31 3232
27
Create a histogram with the data. (use x-scale of 2) Then find the mean and median.
27
Look at the placement of the mean and median in
this symmetrical distribution.
![Page 16: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/16.jpg)
Look at the following data set. Find the mean & median.
Mean =
Median =
22 29 28 22 24 25 28 2125
23 24 23 26 36 38 62 23
25
Create a histogram with the data. (use x-scale of 8) Then find the mean and median.
28.176
Look at the placement of the mean and median in
this right skewed distribution.
![Page 17: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/17.jpg)
Look at the following data set. Find the mean & median.
Mean =
Median =
21 46 54 47 53 60 55 55 60
56 58 58 58 58 62 63 64
58
Create a histogram with the data. Then find the mean and
median.
54.588
Look at the placement of the mean and median in
this skewed left distribution.
![Page 18: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/18.jpg)
Go to java view
![Page 19: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/19.jpg)
Recap:
In a symmetrical distribution, the mean and median are equal.
In a skewed distribution, the mean is pulled in the direction of the skewness.
In a symmetrical distribution, you should report the mean!
In a skewed distribution, the median should be reported as the measure of center!
![Page 20: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/20.jpg)
Quartiles Arrange the observations in increasing order and locate the median M in the ordered list of observations.
The first quartile Q1 is the median of the 1st half of the observations
The third quartile Q3 is the median of the2nd half of the observations.
![Page 21: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/21.jpg)
16 19 24 25 25 33 33 34 34 37 37 40 42 46 49 73
medianQ1
Q3
25 34 41
![Page 22: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/22.jpg)
What if there is odd number?16 19 24 25 25 33 33 34 34
medianWhen dividing data in half, forget about the middle number
![Page 23: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/23.jpg)
The interquartile range (IQR)The distance between the first and third quartiles.
IQR = Q3 – Q1
Always positive
![Page 24: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/24.jpg)
Outlier:We call an observation an outlier if it falls more than 1.5 x IQR above the third or below the first.
Let’s look back at the same data:
16 19 24 25 25 33 33 34 34 37 37 40 42 46 49 73
Q1=25 Q3=41IQR=41-25=1625 - 1.5 x 16 = 141 + 1.5 x 16 = 65
Lower Cutoff Upper Cutoff
![Page 25: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/25.jpg)
Since 73 is above the upper cutoff, we will call it an outlier.
![Page 26: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/26.jpg)
Five-number summary
Minimum
Q1
Median
Q3
Maximum
![Page 27: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/27.jpg)
If you plot these five numbers on a graph, we have a ………
![Page 28: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/28.jpg)
Advantage boxplots?
ease of constructionconvenient handling of outliersconstruction is not subjective (like histograms)Used with medium or large size data sets (n > 10)useful for comparative displays
![Page 29: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/29.jpg)
Disadvantage of boxplots
does not retain the individual observations
should not be used with small data sets (n < 10)
![Page 30: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/30.jpg)
How to construct find five-number summary
Min Q1 Med Q3 Max
draw box from Q1 to Q3
draw median as center line in the box
extend whiskers to min & max
![Page 31: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/31.jpg)
Modified boxplots
display outliers
fences mark off the outliers
whiskers extend to largest (smallest) data value inside the fence
ALWAYS use modified boxplots in this class!!!
![Page 32: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/32.jpg)
Modified Boxplot
Q1 Q3
Q1 – 1.5IQR Q3 + 1.5IQRAny observation outside this fence is an outlier! Put a dot
for the outliers.
Interquartile Range (IQR) – is the range (length) of the box
Q3 - Q1
These are called the fences and should not be seen.
![Page 33: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/33.jpg)
Modified Boxplot . . .
Q1 Q3
Draw the “whisker” from the quartiles to the observation that is
within the fence!
![Page 34: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/34.jpg)
A report from the U.S. Department of Justice gave the following percent increase in federal prison populations in 20 northeastern & mid-western states in 1999.
5.9 1.3 5.0 5.9 4.5 5.6 4.1 6.3 4.86.9
4.5 3.5 7.2 6.4 5.5 5.3 8.0 4.4 7.23.2
Create a modified boxplot. Describe the distribution.
Use the calculator to create a modified boxplot.
![Page 35: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/35.jpg)
Evidence suggests that a high indoor radon concentration might be linked to the development of childhood cancers. The data that follows is the radon concentration in two different samples of houses. The first sample consisted of houses in which a child was diagnosed with cancer. Houses in the second sample had no recorded cases of childhood cancer.
(see data on note page)
Create parallel boxplots. Compare the distributions.
![Page 36: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/36.jpg)
Cancer
No Cancer
100 200Radon
The median radon concentration for the no cancer group is lower than the median for the cancer group. The range of the cancer group is larger than the range for the no cancer group. Both distributions are skewed right. The cancer group has outliers at 39, 45, 57, and 210. The no cancer group has outliers at 55 and 85.
![Page 37: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/37.jpg)
Assignment 1.2
![Page 38: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/38.jpg)
![Page 39: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/39.jpg)
Why is the study of variability important?
Allows us to distinguish between usual & unusual values
In some situations, want more/less variability
scores on standardized tests
time bombs
medicine
![Page 40: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/40.jpg)
Measures of Variability
range (max-min)
interquartile range (Q3-Q1)
deviations
variance
standard deviation
xx
2
Lower case Greek letter sigma
![Page 41: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/41.jpg)
Suppose that we have these data values:
24 34 26 30 3716 28 21 35 29
Find the mean.
Find the deviations. xx
What is the sum of the deviations from the mean?
![Page 42: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/42.jpg)
24 34 26 30 3716 28 21 35 29
Square the deviations: 2xx
Find the average of the squared deviations:
n
xx 2
![Page 43: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/43.jpg)
The average of the deviations squared is called the variance.
Population Sample
2 2s
parameter statistic
![Page 44: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/44.jpg)
Calculation of variance of a sample
1
2
2
n
xxs n
df
![Page 45: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/45.jpg)
A standard deviation is a measure of the average deviation from the mean.
![Page 46: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/46.jpg)
Calculation of standard deviation
1
2
n
xxs n
![Page 47: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/47.jpg)
Degrees of Freedom (df)
n deviations contain (n - 1) independent pieces of information about variability
![Page 48: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/48.jpg)
Which measure(s) of variability is/are
resistant?
![Page 49: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/49.jpg)
Activity (worksheet)
![Page 50: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/50.jpg)
Linear transformation ruleWhen multiplying or adding a constant to a random variable, the mean and median changes by both.
When multiplying or adding a constant to a random variable, the standard deviation changes only by multiplication.
Formulas:xbax
xbax
a
ba
![Page 51: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/51.jpg)
An appliance repair shop charges a $30 service call to go to a home for a repair. It also charges $25 per hour for labor. From past history, the average length of repairs is 1 hour 15 minutes (1.25 hours) with standard deviation of 20 minutes (1/3 hour). Including the charge for the service call, what is the mean and standard deviation for the charges for labor? 25.61$)25.1(2530
33.8$31
25
![Page 52: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/52.jpg)
Rules for Combining two variables
To find the mean for the sum (or difference), add (or subtract) the two means
To find the standard deviation of the sum (or differences), ALWAYS add the variances, then take the square root.
Formulas:
baba
baba
22baba
If variables are independent
![Page 53: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/53.jpg)
Bicycles arrive at a bike shop in boxes. Before they can be sold, they must be unpacked, assembled, and tuned (lubricated, adjusted, etc.). Based on past experience, the times for each setup phase are independent with the following means & standard deviations (in minutes). What are the mean and standard deviation for the total bicycle setup times?Phase Mean SD
Unpacking
3.5 0.7
Assembly 21.8 2.4
Tuning 12.3 2.7minutes6.373.128.215.3 T
minutes680.37.24.27.0 222 T
![Page 54: Chapter 1 Section 1.2 Describing Distributions with Numbers](https://reader038.vdocuments.site/reader038/viewer/2022102805/551695b7550346f0208b486c/html5/thumbnails/54.jpg)
Assignment 1.2B