confidence intervals for population meanweb2.slc.qc.ca/pfoth/qm2/presentations/09_estimate_mu... ·...
TRANSCRIPT
1
Confidence Intervals for Population Mean
Quantitative Methods II
Plan
• Inferential Statistics
• Point and Interval Estimates
• Confidence Intervals
• Estimating the required sample size
• Examples
2
Inferential Statistics
• Goal = use information obtained from a sample to increase our knowledge about the population from which the sample was taken (i.e., to estimate or make inferences about the population)
• 2 types:– Estimating the value of a population parameter
– Testing a hypothesis
• Using the Sampling Distribution of the Sample Mean (SDSM) is key
Estimating a population mean
• One of the purposes of randomly sampling a population is to get an estimate of the mean of the population
• Usually, the best estimate of a population mean is the sample mean. Example: mean SEL test score for a group of 64 students is 77.4, thus 77.4 is the best estimate for the population of all students who take SEL test
• Logic behind it is that you are more likely to get a sample mean of 77.4 from a population with a mean
of 77.4: this is a point estimate
3
Point and Interval Estimates
• Point estimate is when you estimate a specific value of a population parameter– Accuracy of the point estimate = SD (how much
the scores in this distribution typically vary)
• Interval estimate is when you estimate a range in which the population parameter is likely to fall– You can do this because the distribution of means
is generally a normal curve, thus you know the percentage of scores that lie at a given area of the distribution: about 68 % of all sample means lie between the mean ± 1 SD
Terminology
• Point estimate: a single number designed to estimate a quantitative parameter of a population, usually the corresponding sample statistic
• Interval estimate: an interval bounded by two values that is calculated from the sample and that is used to estimate the value of a population parameter
• Confidence interval: an interval estimate with a specified level of confidence
• Level of confidence 1 − 𝛼 : the proportion of all interval estimates that include the parameter being estimated –usually 90% , 95% , 98% or 99%
4
Example
Take a city, like Trenton, NJ. We want to know how much time it takes workers living in Trenton to get to work and back: the commuting time
• Sample = 36 workers from Trenton
• Mean = 49 minutes
• This mean becomes the point estimate for the population of all Trenton workers
• σ = 15 minutes
Example: continued
• This mean should be close to the population
mean, μ
• SDSM and the CLT tell us how close this mean,
a point estimate, is to the population mean, μ
• Recall: with a large enough sample the SDSM will be close to normally distributed
5
Recall: the Empirical Rule
Example: continued
If we knew the value of 𝜇, the population mean, then we could have calculated an interval between which ̴95% of the sample average commuting times should fall:
From 𝜇 − 2𝜎 ҧ𝑥 to 𝜇 + 2𝜎 ҧ𝑥 , i.e.
from 𝜇 − 2𝜎
𝑛to 𝜇 + 2
𝜎
𝑛, i.e.
from μ − 215
36to μ + 2
15
36, i.e.
from 𝜇 − 5 to 𝜇 + 5 minutes
6
Sampling Distribution of ഥ𝒙 ’s , unknown μ
In algebraic terms: 𝑃 𝜇 − 5 < ҧ𝑥 < 𝜇 + 5 ≈ 95%
Interval Estimates
• Interval estimate: an interval bounded by two values that is calculated from the sample and that is used to estimate the value of a population parameter
• Level of confidence 1 − 𝛼: the proportion of all interval estimates that include the parameter being estimated
• Confidence interval: an interval estimate with a specified level of confidence
7
Example: continued
What are the bounds of the interval centered at ҧ𝑥 = 49 minutes?
From ҧ𝑥 − 2𝜎 ҧ𝑥 to ҧ𝑥 + 2𝜎 ҧ𝑥 , i.e.
from 49−5 to 49+5 minutes
This means that the 95.44%
confidence interval for μ is
from 44 to 54 minutes.
Confidence Intervals
8
Summary : Calculating Confidence Intervals
• Sample Mean: ҧ𝑥
• Sample Size: n
• Population standard deviation: σ
• Level of confidence we wish to have: 1 − 𝛼
1 − 𝛼 ∙ 100% gives us an estimate of how confident you can be that your mean falls within this interval
0.95 *100% = 95%: you are 95% confident that the population mean falls within this interval
Estimation of Mean μ (σ known)
Assumption: either the general population
has the bell-shaped symmetric distribution,
or the sample size is at least 25.
Step by step
9
Confidence Coefficient 𝒛( Τ𝜶 𝟐)
Constructing a Confidence Interval
• Step 1: Set-Up
– Describe the population parameter of interest
• Step 2: The Confidence Interval Criteria
– Check the assumptions
– Identify the probability distribution and the formula to be used
– State the level of confidence 𝟏 − 𝜶
• Step 3: The Sample Evidence
– Collect the sample information
10
Constructing a Confidence Interval
• Step 4: The Confidence Interval
– Determine the confidence coefficient 𝑧( Τ𝛼 2)
– Find the error bound for a population mean
𝐸𝐵𝑀 = 𝑧( Τ𝛼 2) ∙𝜎
𝑛– Find the lower and upper confidence limits
• Step 5: State the confidence interval
from ҧ𝑥 − 𝐸𝐵𝑀 to ҧ𝑥 + 𝐸𝐵𝑀 (units)
The confidence coefficient
• Some useful numbers from the table:
If 1 − 𝛼 = 0.80 (80%), then 𝑧 Τ𝛼 2 = 1.28
if 1 − 𝛼 = 0.90 (90%), then 𝑧 Τ𝛼 2 = 1.645
if 1 − 𝛼 = 0.94 (94%), then 𝑧 Τ𝛼 2 = 1.88
If 1 − 𝛼 = 0.95 (95%), then 𝑧 Τ𝛼 2 = 1.96
If 1 − 𝛼 = 0.96 (96%), then 𝑧 Τ𝛼 2 = 2.055
If 1 − 𝛼 = 0.98 (98%), then 𝑧 Τ𝛼 2 = 2.33
if 1 − 𝛼 = 0.99 (99%), then 𝑧 Τ𝛼 2 = 2.575
Check for yourself!
11
Example: textbook cost
A random sample of 60 students from X University has revealed that their average annual textbook spending is $928. From previous studies, it is known that the standard deviation for annual textbook costs can be takes as $230. Find a 95% confidence interval for the mean annual textbook costs for allstudents at X University.
Example: textbook costs
Step 1: What is the population parameter of interest?
Step 2: 𝜎 = $230 is known. Is a sample of 60 students good enough? (we need the sampling distribution to be approximately normal); we will therefore use the standard normal distribution; the level of confidence is 1 − 𝛼 = 0.95 (95%)
Step 3: 𝑛 = 60, ҧ𝑥 = $928
12
Example: textbook costs
Step 4: 0.95/2 = 0.475, 𝑧 Τ𝛼 2 = 1.96 (table)
𝐸𝐵𝑀 = 𝑧 Τ𝛼 2 ∙𝜎
𝑛= 1.96 ∙
230
60= 58.2
ҧ𝑥 − 𝐸𝐵𝑀 = 869.8 , ҧ𝑥 + 𝐸𝐵𝑀 = 986.2
Step 5: The 95% confidence interval for the population mean 𝜇 is:
from $870 to $986
(same precision as the data)
How to decrease the error?
• To decrease the value of EBM (and thus, to decrease the size of the confidence interval for 𝜇) there are two possibilities:
(A) Decrease the confidence level. A smaller confidence level will result in a smaller 𝑧(𝛼/2) аnd thus, you’ll get a smaller EBM.
(B) Increase the size of a sample. A larger value of n means a larger value of 𝑛 and thus, you’ll get a smaller value of EBM.
• Tradeoffs: (A) less certain, (B) more costly
13
Example: practice
A survey by Future Shop involving 35 households in the area revealed the mean spending of $850 on home electronics during the last year. Construct a 98% confidence interval for the average annual spending on home electronics for all households in the area, if the population standard deviation is known to be $300.
Answer: from $732 to $968.
Estimating the sample size
• If we wish the error EBM to be smaller than a
certain value, 𝜀, but the confidence level is
fixed at 1 − 𝛼, we can choose the necessary
sample size:
𝜀 > 𝐸𝐵𝑀 = 𝑧( ൗ𝛼 2) ∙𝜎
𝑛
Thus, 𝑛 >𝑧 Τ𝛼 2 ∙𝜎
𝜀
2
14
Estimating the sample size
• The number 𝑧 Τ𝛼 2 ∙𝜎
𝜀
2rounded up to the
nearest integer is denoted by 𝑛𝑚𝑖𝑛: the minimum required sample size.
• Example: a supermarket manager needs to estimate the average weekly grocery spending by his customers at a 90% level of confidence and with an error not exceeding $10. What is the minimum sample size needed, if he knows that the population standard deviation is $60?
Example: grocery shopping
• Solution.
• Given: 1 − 𝛼 = 0.9, 𝜎 = $60, 𝜀 = $10
• Find: 𝑛𝑚𝑖𝑛
• First, we have 𝑧( Τ𝛼 2) = 1.645
• Now, we compute:
𝑧 Τ𝛼 2 ∙𝜎
𝜀
2=
1.645∙60
10
2= 97.4
Thus, the minimum required sample size is
𝑛𝑚𝑖𝑛 = 98 customers
15
Example: practice
An insurance company wants to estimate the average mileage driven by residents per week in Hamilton, so that the error does not exceed 20 km at the 99% level of confidence. From other studies they know that the population standard deviation can be taken as 100 km. Estimate the sample size needed for this study.
Answer: 166 drivers