hypothesis tests and confidence intervals€¦ · johan a. elkink (ucd) hypothesis tests 16 october...
TRANSCRIPT
Hypothesis tests andconfidence intervals
Johan A. Elkink
University College Dublin
16 October 2014
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 1 / 31
Outline
1 Confidence intervals
2 Hypothesis tests
3 Sample size and errors
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 2 / 31
Confidence intervals
Outline
1 Confidence intervals
2 Hypothesis tests
3 Sample size and errors
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 3 / 31
Confidence intervals
Standard error and z-score
From the previous two lectures:
zi =xi − x̄
σx
The z-distribution is a normal distribution with
mean 0 and variance 1.
S .E . = σx̄ =σx√n
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 4 / 31
Confidence intervals
Margin of error
m = zc · σx̄ = zc · σx√n
zc is a relevant critical value of z , for example,
to get a 95% confidence interval, zc = 1.96.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 5 / 31
Confidence intervals
Confidence interval
CI = x̄ ±m = x̄ ± zcσx√n
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 6 / 31
Confidence intervals
Example
A sample of 100 respondents shows an average
sympathy score for politician A of 35 (on a
0-100 scale), with a variance of 10. What is the
95% confidence interval?
CI = x̄ ± zcσx√n
= 35± 1.96 ·√
10√100
= 35± 0.62
Thus the confidence interval is [34.38; 35.62].
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 7 / 31
Confidence intervals
Example
A sample of 100 respondents shows an average
sympathy score for politician A of 35 (on a
0-100 scale), with a variance of 10. What is the
95% confidence interval?
CI = x̄ ± zcσx√n
= 35± 1.96 ·√
10√100
= 35± 0.62
Thus the confidence interval is [34.38; 35.62].
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 7 / 31
Confidence intervals
CI of a proportion: example
Say, we want to look at the voter support for
Vladimir Putin. 66% of 1600 respondents in a
survey say they will vote for Putin.
SE (p̂) = σp̂ =√
p̂(1− p̂)/(n − 1) =√
(.66 · .34)/1599 = .012
CI95% = [p̂ − 1.96σp̂); p̂ + 1.96σp̂] = [.637; .683]
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 8 / 31
Confidence intervals
CI of a proportion: example
Say, we want to look at the voter support for
Vladimir Putin. 66% of 1600 respondents in a
survey say they will vote for Putin.
SE (p̂) = σp̂ =√
p̂(1− p̂)/(n − 1) =√
(.66 · .34)/1599 = .012
CI95% = [p̂ − 1.96σp̂); p̂ + 1.96σp̂] = [.637; .683]
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 8 / 31
Confidence intervals
CI of a proportion: example
Say, we want to look at the voter support for
Vladimir Putin. 66% of 1600 respondents in a
survey say they will vote for Putin.
SE (p̂) = σp̂ =√
p̂(1− p̂)/(n − 1) =√
(.66 · .34)/1599 = .012
CI95% = [p̂ − 1.96σp̂); p̂ + 1.96σp̂] = [.637; .683]
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 8 / 31
Confidence intervals
CI of a proportion: example
0.62 0.64 0.66 0.68 0.70
05
10
15
20
25
30
35
Support Putin
Density
Figure: Confidence interval of support for Putin
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 9 / 31
Confidence intervals
CI: caution
Note that the confidence interval only takes into
account the sampling error.
All other errors are ignored, such as
measurement error or non-response bias.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 10 / 31
Confidence intervals
Exercise
(Verzani 7.6) A student wishes to find the
proportion of left-handed people. She surveys
100 fellow students and finds that only 5 are
left-handed. Does a 95% confidence interval
contain the value of p = 110?
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 11 / 31
Hypothesis tests
Outline
1 Confidence intervals
2 Hypothesis tests
3 Sample size and errors
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 12 / 31
Hypothesis tests
Hypotheses
The research hypothesis, or alternative hypothesis, isalways contrasted to the null hypothesis.
Null hypothesis (H0): states the assumption that thereis no effect or difference (depending on the researchquestion).
Alternative hypothesis (H1 or Ha): states the research
hypothesis which we will test assuming the null
hypothesis.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 13 / 31
Hypothesis tests
Hypotheses
The research hypothesis, or alternative hypothesis, isalways contrasted to the null hypothesis.
Null hypothesis (H0): states the assumption that thereis no effect or difference (depending on the researchquestion).
Alternative hypothesis (H1 or Ha): states the research
hypothesis which we will test assuming the null
hypothesis.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 13 / 31
Hypothesis tests
Hypotheses
The research hypothesis, or alternative hypothesis, isalways contrasted to the null hypothesis.
Null hypothesis (H0): states the assumption that thereis no effect or difference (depending on the researchquestion).
Alternative hypothesis (H1 or Ha): states the research
hypothesis which we will test assuming the null
hypothesis.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 13 / 31
Hypothesis tests
Null hypothesis: example
Null hypothesis: men have the same thermometer scorefor Bertie Ahern as do women.
Alternative hypothesis: women have a higherthermometer score for Bertie Ahern than do men.
H0 : µwomen = µmen
H1 : µwomen > µmen
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 14 / 31
Hypothesis tests
Null hypothesis: example
Null hypothesis: men have the same thermometer scorefor Bertie Ahern as do women.
Alternative hypothesis: women have a higherthermometer score for Bertie Ahern than do men.
H0 : µwomen = µmen
H1 : µwomen > µmen
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 14 / 31
Hypothesis tests
Null hypothesis: example
Null hypothesis: men have the same thermometer scorefor Bertie Ahern as do women.
Alternative hypothesis: women have a higherthermometer score for Bertie Ahern than do men.
H0 : µwomen = µmen
H1 : µwomen > µmen
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 14 / 31
Hypothesis tests
Null hypothesis: example
Null hypothesis: men have the same thermometer scorefor Bertie Ahern as do women.
Alternative hypothesis: women have a higherthermometer score for Bertie Ahern than do men.
H0 : µwomen = µmen
H1 : µwomen > µmen
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 14 / 31
Hypothesis tests
Null hypothesis as basis
Why do we need to assume one of the two
hypotheses to test the other? ...
... Because we need to have some estimate of
the sampling distribution to determine how likely
our outcome is. For this estimate, we use the
null hypothesis.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 15 / 31
Hypothesis tests
Null hypothesis as basis
Why do we need to assume one of the two
hypotheses to test the other? ...
... Because we need to have some estimate of
the sampling distribution to determine how likely
our outcome is. For this estimate, we use the
null hypothesis.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 15 / 31
Hypothesis tests
Hypothesis testing
Given the sampling distribution, depending on
the distribution and test, we can calculate a test
statistic.
We can then calculate the probability of
observing the observed value or more extreme, in
the direction of the alternative hypothesis, given
the assumptions of the null hypothesis.
This is the p-value.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 16 / 31
Hypothesis tests
Hypothesis testing
Given the sampling distribution, depending on
the distribution and test, we can calculate a test
statistic.
We can then calculate the probability of
observing the observed value or more extreme, in
the direction of the alternative hypothesis, given
the assumptions of the null hypothesis.
This is the p-value.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 16 / 31
Hypothesis tests
Hypothesis testing
Given the sampling distribution, depending on
the distribution and test, we can calculate a test
statistic.
We can then calculate the probability of
observing the observed value or more extreme, in
the direction of the alternative hypothesis, given
the assumptions of the null hypothesis.
This is the p-value.Johan A. Elkink (UCD) hypothesis tests 16 October 2014 16 / 31
Hypothesis tests
Hypothesis testing: old-fashionedWe used to calculate the critical value given the teststatistic and the sampling distribution where we wouldreject the null hypothesis.
This critical value you can find in tables for the specificprobability distribution.
It’s easier to just calculate p-values.
Note that you can reject a null hypothesis, but
you can never “accept” a null hypothesis. You
either reject or you do not.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 17 / 31
Hypothesis tests
Hypothesis testing: old-fashionedWe used to calculate the critical value given the teststatistic and the sampling distribution where we wouldreject the null hypothesis.
This critical value you can find in tables for the specificprobability distribution.
It’s easier to just calculate p-values.
Note that you can reject a null hypothesis, but
you can never “accept” a null hypothesis. You
either reject or you do not.Johan A. Elkink (UCD) hypothesis tests 16 October 2014 17 / 31
Hypothesis tests
Hypothesis testing: p-values
Density
Figure: Calculating p-values
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 18 / 31
Hypothesis tests
Hypothesis testing: critical values
The α-level refers to critical value, the maximumacceptable probability that you reject a null hypothesisthat is correct.
Typical alpha values:
α = .05: most common in the social sciences
α = .01: more common in pharmaceutical research
α = .10: less picky social scientists
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 19 / 31
Hypothesis tests
Hypothesis testing: critical values
The α-level refers to critical value, the maximumacceptable probability that you reject a null hypothesisthat is correct.
Typical alpha values:
α = .05: most common in the social sciences
α = .01: more common in pharmaceutical research
α = .10: less picky social scientists
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 19 / 31
Hypothesis tests
t-distribution
Note that the z-test assumes that σ2x is known.
The t-distribution is used instead of the
z-distribution when the variance σ2x is estimated.
This distribution converges to the z-distribution
as n gets larger.
The Student t-distribution is invented in 1908 by William S. Gosset,
a Guinness employee, who published under pseudonym Student.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 20 / 31
Hypothesis tests
t-distribution
Note that the z-test assumes that σ2x is known.
The t-distribution is used instead of the
z-distribution when the variance σ2x is estimated.
This distribution converges to the z-distribution
as n gets larger.
The Student t-distribution is invented in 1908 by William S. Gosset,
a Guinness employee, who published under pseudonym Student.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 20 / 31
Hypothesis tests
Significance test for a mean
H0 : µ = 0
H1 : µ 6= 0
t =x̄ − µ0
σ̂x̄=
x̄ − 0
σ̂x/√n
tc(α=.05) = ±1.96 (for large n)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 21 / 31
Hypothesis tests
Significance test for a mean
H0 : µ = 0
H1 : µ 6= 0
t =x̄ − µ0
σ̂x̄=
x̄ − 0
σ̂x/√n
tc(α=.05) = ±1.96 (for large n)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 21 / 31
Hypothesis tests
Significance test for a mean
H0 : µ = 0
H1 : µ 6= 0
t =x̄ − µ0
σ̂x̄=
x̄ − 0
σ̂x/√n
tc(α=.05) = ±1.96 (for large n)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 21 / 31
Sample size and errors
Outline
1 Confidence intervals
2 Hypothesis tests
3 Sample size and errors
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 22 / 31
Sample size and errors
Hypothesis testing: mistakes
Type-I error: null hypothesis is falsely rejected.
Type-II error: null hypothesis should have been
rejected, but was not.
We usually try to be conservative and prefer the
risk of Type-II errors over that of Type-I errors.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 23 / 31
Sample size and errors
Hypothesis testing: mistakes
Type-I error: null hypothesis is falsely rejected.
Type-II error: null hypothesis should have been
rejected, but was not.
We usually try to be conservative and prefer the
risk of Type-II errors over that of Type-I errors.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 23 / 31
Sample size and errors
Hypothesis testing: mistakes
Type-I error: null hypothesis is falsely rejected.
Type-II error: null hypothesis should have been
rejected, but was not.
We usually try to be conservative and prefer the
risk of Type-II errors over that of Type-I errors.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 23 / 31
Sample size and errors
Power
Power: probability of rejecting a hypothesis that
is false
1− P(Type II error)
The power of a test increases when:
the true value is further from the null
hypothesis value;
the variance is lower;
the sample size is larger.
(Davidson & MacKinnon 1999: 126)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 24 / 31
Sample size and errors
Power
Power: probability of rejecting a hypothesis that
is false
1− P(Type II error)
The power of a test increases when:
the true value is further from the null
hypothesis value;
the variance is lower;
the sample size is larger.
(Davidson & MacKinnon 1999: 126)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 24 / 31
Sample size and errors
p-value
The p-value is the probability of a Type I error
when rejecting the null hypothesis.
You can say a test is “statistically significant” if
p < α, but the p-value contains more
information by itself.(Davidson & MacKinnon 1999: 128)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 25 / 31
Sample size and errors
p-value
The p-value is the probability of a Type I error
when rejecting the null hypothesis.
You can say a test is “statistically significant” if
p < α, but the p-value contains more
information by itself.(Davidson & MacKinnon 1999: 128)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 25 / 31
Sample size and errors
α = .05Note that his value is absolutely arbitrary and just habit since thepublication of Fisher (1923).
The p-value is, one could argue, just a complicated way ofmeasuring the sample size.
“Another interesting example (...) is the propensity for published
studies to contain a disproportionally large number of Type I errors;
studies with statistically significant results tend to get published,
whereas those with insignificant results do not.” (Kennedy 2008:
61)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 26 / 31
Sample size and errors
α = .05Note that his value is absolutely arbitrary and just habit since thepublication of Fisher (1923).
The p-value is, one could argue, just a complicated way ofmeasuring the sample size.
“Another interesting example (...) is the propensity for published
studies to contain a disproportionally large number of Type I errors;
studies with statistically significant results tend to get published,
whereas those with insignificant results do not.” (Kennedy 2008:
61)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 26 / 31
Sample size and errors
α = .05Note that his value is absolutely arbitrary and just habit since thepublication of Fisher (1923).
The p-value is, one could argue, just a complicated way ofmeasuring the sample size.
“Another interesting example (...) is the propensity for published
studies to contain a disproportionally large number of Type I errors;
studies with statistically significant results tend to get published,
whereas those with insignificant results do not.” (Kennedy 2008:
61)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 26 / 31
Sample size and errors
α = .05
(Gerber & Malhotra 2006)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 27 / 31
Sample size and errors
Optimal sample size
Given a margin of error m, we can calculate the
minimum required sample size as:
n∗ =
(zcσxm
)2
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 28 / 31
Sample size and errors
Exercise
Suppose a test evaluating students’ motivation, attitudetowards study, and study habits, produces a score on a0-200 scale. The mean score for Irish students is 115.
A teacher expects older students to score higher and
takes a sample of 25 student who are 30 or older. He
finds a mean score x̄ = 127.8 and a variance of 30.
State the hypotheses and calculate the p-value for a
one-sided t-test.
(Moore, McCabe & Craig 2012: 376)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 29 / 31
Sample size and errors
Exercise
The owner of a car which has a board computer that calculates theefficiency of the engine in km/l performs, in addition, manualcalculation by dividing the distance driven by the liters used eachtime she fills her tank. After 20 times, she has the followingdifferences between her estimates and those of her car:
5.0 6.5 -0.6 1.7 3.7 4.5 8.0 2.2 4.9 3.04.4 0.1 3.0 1.1 1.1 5.0 2.1 3.7 -0.6 -4.2
State the hypotheses and perform a t-test whether her estimates
differ significantly from those of her car.
(Moore, McCabe & Craig 2012: 377)
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 30 / 31
Sample size and errors
Exercise
Open bes.dta in SPSS.
Perform a t-test whether the mean of lr is
different from 5 (the center of the scale).
In the 2005 British parliamentary elections,
Labour won 40.7% of the vote. Test
whether the percentage in the sample
(labour) differs.
Johan A. Elkink (UCD) hypothesis tests 16 October 2014 31 / 31