lecture 8: introduction to estimation & hypothesis...
TRANSCRIPT
Business Statistics:
Lecture 8: Introduction to Estimation & Hypothesis Testing
2
Agenda
Introduction to Estimation
Point estimation
Interval estimation
Introduction to Hypothesis Testing
Concepts en terminology of hypothesis testing
Null and alternative hypothesis
Type I and type II error
3
Concepts of Estimation
Objective: determine the approximate value of a population
parameter on the basis of a sample statistic, e.g.
sample mean ( ) is used to estimate population mean (μ)
sample proportion ( ) is used to estimate population proportion (p)
Estimator: formula (statistic) that provides the guess of the
population parameter (denoted by uppercase letters: , )
Estimate: numerical outcome of estimator once the sample
has been drawn (denoted by lowercase letters: , )
Two types of estimators:
Point estimator: provides single value
Interval estimator: provides interval with certain confidence
X
P̂
X P̂
x p̂
4
Point Estimators
Point Estimator: draws inferences about a population by
estimating the value of an unknown parameter using a single value
Example:
Consider adult, male inhabitants interested in mean height (μ) in cm; standard deviation (σ) is assumed to be 10 cm; population is assumed to be normal
Draw random sample X1, X2, …, Xn from population E(Xi) = μ for i = 1, …, n
Consider two point estimators:
Sample mean:
Sample median: M
X
5
Interval Estimators
Drawbacks of Point Estimator:
Virtually certain that estimate is wrong:
We need to know how close estimator is to the parameter of interest:
Therefore use is made of Interval Estimator: draws
inferences about a population by estimating the value of an unknown parameter using an interval
The width of the interval is related to the confidence (probability) that the interval includes the true parameter
P(X ) 0
P(| X | )?
6
Interval Estimator for μ (σ known)
Derivation:
Suppose that X ~ N(μ, σ) or n > 30 (CLT)
Then it follows that
The interval is a random interval
It includes (covers) the parameter μ with probability 1 –
X XN ,X / n
/2 /2
XP z z 1
/ n
/2 /2P z / n X z / n 1
/2 /2P X z / n X z / n 1
1 – = 0.95; z/2 = z.025 = 1.96
/2 /2[X z / n,X z / n]
7
Interval Estimator for μ (σ known)
Interpretation: ‘with repeated sampling from this population, the proportion of
valus of for which the interval includes the population mean μ is equal to 1 – ’
The interval is called the (1 – )x100% confidence interval (CI) estimator of μ
1 – is the confidence level (probability of correct estimate)
: lower confidence limit (LCL)
: upper confidence limit (UCL)
If we replace by the observed value , we get the (1 – )x100% confidence interval estimate for μ
Alternative notation:
X
/2 /2[X z / n,X z / n]
/2 /2[X z / n,X z / n]
/2X z / n
/2X z / n
X x
/2x z / n
8
Interval Estimator for μ (σ known)
Example (mean height of adult, male inhabitants of the Netherlands – cont.):
Suppose n = 400 and = 182 cm (σ = 10 cm)
Question: compute 95.44% CI estimate for μ
Solution:
(1 – )x100% CI estimate:
= 182
1 – = 0.9544 /2 = 0.0228 z/2 = 2.0 (Table 4)
95.44% CI estimate for μ: 182 2.0×0.5 = 182 1 cm
LCL = 181 cm; UCL = 183 cm
x
/2 /x z n
x
10 / 400/ n 0.5
9
The Error of Estimation
Sampling error can be defined as difference between an estimator (e.g. ) and a parameter (e.g. μ); also called error of estimation
X
10
Hypothesis & Hypothesis Testing
Hypothesis
Answer to a research question or assumption made about
a population parameter (Not a sample estimate!)
population mean
population proportion
Example: The mean monthly cell phone bill of this city is = $42
Example: The proportion of adults in this city with cell phones is p = .68
Hypothesis Testing
Determine whether there is enough statistical evidence in
favor of a certain belief or hypothesis about a parameter
11
Concepts of Hypothesis Testing
Overview of critical concepts in hypothesis testing:
1. There are two hypotheses
H0 (null hypothesis) & H1 (alternative hypothesis)
2. Testing procedure starts from assumption that H0 is true
3. Goal of the process is to determine whether there is enough evidence in favor of H1
4. There are two possible decisions:
‘there is enough evidence to reject H0 in favor of H1’
‘there is not enough evidence to reject H0 in favor of H1’
5. There are two possible errors:
Type I error: Reject a true H0; P(Type I error) =
Type II error: Do not reject a false H0; P(Type II error) =
The Null Hypothesis, H0
States the assumption (numerical) to be tested
Example: The average number of TV sets in U.S. Homes is at
least three ( )
Is always about a population parameter, not about a sample statistic
3μ:H0
The Null Hypothesis, H0
Begin with the assumption that the null hypothesis is true
Always contains “=” , “≤” or “” sign
May or may not be rejected
(continued)
The Alternative Hypothesis, HA
Is the opposite of the null hypothesis
e.g.: The average number of TV sets in U.S. homes is less than 3 ( HA: < 3 )
Never contains the “=” , “≤” or “” sign
May or may not be accepted
HA is generally the hypothesis that is believed (or needs to be supported) by the researcher
Population
Claim: the population mean age is 50. (Null Hypothesis:
REJECT
Suppose the sample mean age is 20: x = 20
Sample Null Hypothesis
20 likely if = 50? Is
Hypothesis Testing Process
If not likely,
Now select a random sample
H0: = 50 )
x
Sampling Distribution of x
= = 50 If H0 is true
It is unlikely that we would get a sample mean of this value ...
... then we reject the null hypothesis that = 50.
Reason for Rejecting H0
20
... if in fact this were the population mean…
x X
How much is a value of sample statistic far away from the
population value under H0?
We choose the critical value (cut-off value) on
your sampling distribution that tells you that your sample statistic is very far from the null hypothesis and
thus not likely.
17
Level of Significance,
In statistics, a critical value is the value corresponding
to a given significance level
Level of significance defines unlikely values of sample
statistic if null hypothesis is true
Defines rejection region of the sampling distribution
Is designated by (level of significance)
Typical values are .01, .05, or .10
Level of Significance and the Rejection Region
H0: μ ≥ 3
HA: μ < 3 0
H0: μ ≤ 3
HA: μ > 3
H0: μ = 3
HA: μ ≠ 3
/2
critical value
Lower tail test
Level of significance =
0
0
/2
Upper tail test
Two tailed test
Rejection region is shaded
Errors in Making Decisions
Type I Error
Reject a true null hypothesis
The probability of Type I Error is
Called level of significance of the test
Set by researcher in advance
Errors in Making Decisions
Type II Error
Fail to reject a false null hypothesis
The probability of Type II Error is β
(continued)
β
Outcomes and Probabilities
State of Nature
Decision
Do Not Reject
H 0
No error (1 - )
Type II Error ( β )
Reject H 0
Type I Error ( )
Possible Hypothesis Test Outcomes
H0 False H0 True
Key: Outcome (Probability) No Error:
Power ( 1 - β )