sociology 5811: lecture 9: ci / hypothesis tests copyright © 2005 by evan schofer do not copy or...
TRANSCRIPT
Sociology 5811:Lecture 9: CI / Hypothesis Tests
Copyright © 2005 by Evan Schofer
Do not copy or distribute without permission
Announcements
• Problem Set #3 Due next week• Problem set posted on course website
• We are a bit ahead of reading assignments in Knoke book
• Try to keep up; read ahead if necessary
Review: Confidence Intervals
• General formula for Confidence Interval:
)(σ ZY :C.I. Yα/2• Where:
• Y-bar is the sample mean
• Sigma sub-Y-bar is the standard error of the mean
• Z (alpha/2) is the critical Z-value for a given level of confidence– If you want 90%, look up Z for 45% (/2)– See Knoke, Figure 3.5 on page 87 for info
Small N Confidence Intervals• Issue: What if N is not large?
• The sampling distribution may not be normal
• Z-distribution probabilities don’t apply…
• Standard CI formula doesn’t work
• Solution: Use the “T-Distribution”• A different curve that accurately approximates the shape of
the sampling distribution for small N
• Result: We can look up values in a “t-table” to determine probabilities associated with a # of standard deviations from the mean.
Confidence Intervals for Small N
• Small N C. I. Formula:• Yields accurate results, even if N is not large
)σ̂( tY :C.I. Yα/2
N
s tY :C.I. α/2
• Again, the standard error can be estimated by the sample standard deviation:
T-Distributions
• The T-distribution is a “family” of distributions• In a T-Distribution table, you’ll find many T-distributions
to choose from
– Basically, the shape of sampling distribution varies with the size of your sample
• You need a specific t-distribution depending on sample size
• One t-distribution for each “degree of freedom”– Also called “df” or “DofF”
• Which T-distribution should you use?
• For confidence intervals: Use T-distribution for df = N - 1
• Ex: If N = 15, then look at T-distribution for df = 14.
Looking Up T-Tables
Choose the correct df
(N-1)
Choose the desired
probability for /2
Find t-value in correct row
and column
Interpretation is just like a Z-score.
2.145 = number of standard
errors for C.I.!
Answering Questions…
• Knowledge of the standard error allows us to begin answering questions about populations
• Example: National educational standard requires all schools to maintain a test score average of 60
• You observe that a sample (N=16, s=6) has a mean of 62
• Question: Are you confident that the school population is above the national standard?
• We know Y-bar for the sample, but what about for the whole school?
• Are we confident that > 60?
Question: Is > 60?
• Strategy 1: Construct a confidence interval around Y-bar
• And, see if the bounds fall above 60
• Visually: Confident that > 60:
58 59 60 61 62 63 64 65 66
Y
• Visually: might be 60 or less
58 59 60 61 62 63 64 65 66
Y
Question: Is > 60?
• Strategy 1: Construct a confidence interval around Y-bar– Let’s choose a desired confidence level of .95– N of 16 is “small”… we must use the t-distribution,
not the Z-distribution– Look up t=value for 15 degrees of freedom (N-1).
Looking Up T-Tables
Choose the correct df (N-1)=15
Choose the desired
probability for /2
Find t-value in correct row
and column
Result:
t = 2.131
Question: Is > 60?• Strategy 1: Construct a confidence interval around Y-
bar
N
s tY)σ̂( tY :C.I. α/2Yα/2
16
62.131 62 47.3 62
58 59 60 61 62 63 64 65 66
Y
• CI is 58.53 to 65.47! We aren’t confident > 60
Question: Is > 60?
• Note #1: Results would change if we used a different confidence level
• A 95% and 50% CIs yield different conclusions:
• Idea: Wouldn’t it be nice to know exactly which CI would describe the distance from Y-bar to ?
• i.e., to calculate the exact probability of Y-bar falling a certain distance from ?
58 59 60 61 62 63 64 65 66
Y
Question: Is > 60?
• Note #2: We typically draw CIs around Y-bar– But, we can also get the same result focusing on our
comparison point (Y = 60)
• Example: If 60 is outside of CI around Y-bar
• Then, Y-bar is outside of the CI around 60
58 59 60 61 62 63 64 65 66
Y
58 59 60 61 62 63 64 65 66
Y
Question: Is > 60?
• The critical issue is: How far is the distance between Y-bar and 60– Is it “far” compared to the width of the sampling
distribution?• Ex: Y-bar is more than 2 Standard Errors from 60?
• In which case, the school probably exceeds the standard
– Or, is it relatively close?• Ex: Y-bar is only .5 Standard Errors from 60
• In which case we aren’t confident…
– Note: If we know the sampling distribution is normal (or t-distributed), we can convert SE’s to a probability
Question: Is > 60?
• Strategy 2: Determine the probability of Y-bar = 62, if is really 60 or less
• Procedure:– 1. Use Y=60 as a reference point– 2. Determine how far Y-bar is from 60, measured in
Standard Errors• Which we can convert to a probability
– 3. Issue: Is it likely to observe a Y-bar as high as 62?• If this is common to observe, even when = 60 (or less),
then we can’t be confident that > 60!
• But, if that is a rare event, we can be confident that > 60!
Question: Is > 60?
• Strategy 2: Look at sampling distribution
• Confident that not 60 or less:
58 59 60 61 62 63 64 65 66
Y
• Visually: might easily be <60
58 59 60 61 62 63 64 65 66
Y
is unlikely to really be 60… because Y-bar usually falls near the
center of the sampling distribution!
In this case, it is common to get Y-bars of
62 or even higher
Question: Is > 60?
• Issue: How do we tell where Y-bar falls within the sampling distribution?
• Strategy: Compute a Z-score
• Recall: Z-scores help locate the position of case within a distribution
• It can tell us how far a Y-bar falls from the center of the sampling distribution
• In units of “standard errors”!
• Probability can be determined from a Z-table• Note: for small N, we call it a t-score, look up in a t-table.
Question: Is > 60?
• Note: We use a slightly modified Z formula
Yσ
)μ()(
Y
s
YYZ
Y
ii
• “Old” formula calculates # standard deviations a case falls from the sample mean
• From Y-sub-i to Y-bar
• New formula tells the number of standard errors a mean estimate falls from the population mean
• Distance from Y-bar to in the sampling distribution
• In this case we compare to hypothetical = 60.
Question: Is > 60?• Let’s calculate how far Y-bar falls from
– Since N is small, we call it a “t-score” or “t-value”
YYY
σ̂/2σ̂
)6062(
σ
)μ(
Yt
5.14
6σ̂Y
N
s
333.15.1/2σ̂/2 Y t• Y-bar is 1.33 standard errors above !
Question: Is > 60?
• Question: What is the probability of t>1.33 • i.e., Y-bar falling 1.333 or more standard errors from ?
• Result: p = about .105• Note: Knoke t-table doesn’t contain this range… have to
look it up elsewhere or use SPSS to calculate probability.
58 59 60 61 62 63 64 65 66
Y
This area reflects the probability
Question: Is > 60?
• Result: p = .105
• In other words, if = 60, we will observe Y-bar of 62 or greater about 10% of the time
• Conclusion: It is plausible that is 60 or lower• We are not 95% confident that > 60
• Conclusion matches result from confidence interval
• We have just tested a claim using inferential statistics!
Hypothesis Testing
• Hypothesis Testing:
• A formal language and method for examining claims using inferential statistics– Designed for use with probabilistic empirical
assessments
• Because of the probabilistic nature of inferential statistics, we cannot draw conclusions with absolute certainty– We cannot “prove” our claims are “true”– However, improbable, we will occasionally draw an
un-representative sample, even if it is random
Hypothesis Testing
• The logic of hypothesis testing:
• We cannot “prove” anything
• Instead, we will cast doubt on other claims, thus indirectly supporting our own
• Strategy:
• 1. We first state an “opposing” claim• The opposite of what we want to claim
• 2. If we can cast sufficient doubt on it, we are forced (grudgingly) to accept our own claim.
Hypothesis Testing
• Example: Suppose we wish to argue that our school is above the national standard
• First we state the opposite:• “Our school is not above the national standard”
• Next we state our alternative:• “Our school is above the national standard”
• If our statistical analysis shows that the first claim is highly improbable, we can “reject” it, in favor of the second claim
• …“accepting” the claim that our school is doing well.
Hypothesis Testing: Jargon
• Hypotheses: Claims we wish to test
• Typically, these are stated in a manner specific enough to test directly with statistical tools– We typically do not test hypotheses such as “Marx
was right” / “Marx was wrong”– Rather: The mean years of education for Americans
is/is not above 18 years.
Hypothesis Testing: Jargon
• The hypothesis we hope to find support for is referred to as the alternate hypothesis
• The hypothesis counter to our argument is referred to as the null hypothesis
• Null and alternative hypotheses are denoted as:
• H0: School does not exceed the national standard• H-zero indicates null hypothesis
• H1: School does exceed national standard • H-1 indicates alternate hypotheses
• Sometimes called: “Ha”
Hypothesis Testing: More Jargon
• If evidence suggests that the null hypothesis is highly improbable, we “reject” it
• Instead, we “accept” the alternative hypothesis
• So, typically we:
• Reject H0, accept H1
– Or:
• Fail to reject H0, do not find support for H1
• That was what happened in our example earlier today…
Hypothesis Testing• In order to conduct a test to evaluate hypotheses,
we need two things:• 1. A statistical test which reflects on the
probability of H0 being true rather than H1• Here, we used a z-score/t-score to determine the probability
of H0 being true
• 2. A pre-determined level of probability below which we feel safe in rejecting H0 ()
• In the example, we wanted to be 95% confident… =.05• But, the probability was .10, so we couldn’t conclude that
the school met the national standard!
Hypothesis Test for the Mean
• Example: Laundry Detergent
• Suppose we work at the Tide factory
• We know the “cleaning power” of tide detergent, exactly: It is 73 on a continuous scale.
• “Cleaning Power” of Tide = 73
• You conduct a study of a competitor. You buy 50 bottles of generic detergent and observe a mean cleaning power of 65
• H0: Tide is no better than competitor ( >= 73)
• H1: Tide is better than competitor ( < 73)
Hypothesis Test: Example
• It looks like Tide is better:
• Cleaning power is 73, versus 65 for a sample of the competition
• Question: Can we reject the null hypothesis and accept the alternate hypothesis?
• Answer: No! It is possible that we just drew an atypical sample of generic detergent. The true population mean for generics may be higher.
Hypothesis Test: Example
• We need to use our statistical knowledge to determine:
• What is the probability of drawing a sample (N=50) with mean of 65 from a population of mean 73 (the mean for Tide)
• If that is a probable event, we can’t draw very strong conclusions…
• But, if the event is very improbable, it is hard to believe that the population of generics is as high as that of Tide…
• We have grounds for rejecting the null hypothesis.
Hypothesis Test: Example• How would we determine the probability (given
an observed mean of 65) that the population mean of generic detergent is really 73?
• Answer: We apply the Central Limit Theorem to determine the shape of the sampling distribution
• And then calculate a Z-value or T-value based on it
• If we chose an alpha () of .05• If we observe a t-value with probability of
only .0023, then we can reject the null hypothesis.
• If we observe a t-value with probability of .361, we cannot reject the null hypothesis
Hypothesis Test: Steps
• 1. State the research hypothesis (“alternate hypothesis), H1
• 2. State the null hypothesis, H0
• 3. Choose an -level (alpha-level)– Typically .05, sometimes .10 or .01
• 4. Look up value of test statistic corresponding to the -level (called the “critical value”)
• Example: find the “critical” t-value associated with =.05
Hypothesis Test: Steps
• 5. Use statistics to calculate a relevant test statistic. – T-value or Z-value– Soon we will learn additional ones
• 6. Compare test statistic to “critical value”– If test statistic is greater, we reject H0
– If it is smaller, we cannot reject H0
Hypothesis Test: Steps
• Alternate steps:
• 3. Choose an alpha-level
• 4. Get software to conduct relevant statistical test.– Software will compute test statistic and provide a
probability… the probability of observing a test statistic of a given size.
– If this is lower than alpha, reject H0
Hypothesis Test: Errors
• Due to the probabilistic nature of such tests, there will be periodic errors.
• Sometimes the null hypothesis will be true, but we will reject it– Our alpha-level determines the probability of this
• Sometimes we do not reject the null hypothesis, even though it is false
Hypothesis Test: Errors
• When we falsely reject H0, it is called a Type I error
• When we falsely fail to reject H0, it is called a Type II error
• In general, we are most concerned about Type I errors… we try to be conservative.
Hypothesis Tests About a Mean
• What sorts of hypothesis tests can one do?
• 1. Test the hypothesis that a population mean is NOT equal to a certain value– Null hypothesis is that the mean is equal to that value.
• 2. Population mean is higher than a value– Null hypothesis: mean is equal or less than a value
• 3. Population mean is lower than a value– Null hypothesis: mean is equal or greater than a value
• Question: What are examples of each?
Hypothesis Tests About Means• Example: Bohrnstedt & Knoke, section 3.93, pp.
108-110. N = 1015, Y-bar = 2.91, s=1.45• H0: Population mean = 4• H1: Population mean not = 4• Strategy:• 1. Choose Alpha (let’s use .001)• 2. Determine the Standard Error• 3. Use S.E. to determine the range in which
sample means (Y-bar) is likely to fall 99.9% of time, IF the population mean is 4.
• 4. If observed mean is outside range, reject H0
Example: Is =4?• Let’s determine how far Y-bar is from hypothetical =4
• In units of standard errors
0.24.046/09.1 t
YYY σ̂
09.1
σ̂
)491.2(
σ
)μ(
Yt
• Y-bar is 24 standard errors below 4.0!
046.1015
45.1
N
sσ̂ Y
Y
Hypothesis Tests About a Mean
• A Z-table (if N is large) or a T-table will tell us probabilities of Y-bar falling Z (or T) standard deviations from
• In this example, the desired = .001• Which corresponds to t=3.3 (taken from t-table)
– That is: .001 (i.e, .1%) of samples (of size 1015) fall beyond 3.29 standard errors of the population mean
– 99.9% fall within 3.29 S.E.’s.
Hypothesis Tests About a Mean
• There are two ways to finish the “test”
• 1. Compare “critical t” to “observed t”– Critical t is 3.3, observed t = -24
• We reject H0: t of +/-24 is HUGE, very improbable
• It is highly unlikely that = 4
• 2. Actually calculate the probability of observing a t-value of 24, compare to pre-determined
• If observed probability is below , reject H0
– In this case, probability of t=27 is .0000000000000…• Very improbable. Reject H0!
Two-Tail Tests
• Visually: Most Y-bars should fall near • 99.9% CI: –3.3 < t < 3.3, or 3.85 to 4.15
Sampling Distribution of the Mean
3.85 4 4.15 Z=-3.3 Z=+3.3
Mean of 2.91 (t=24) is far into the red
area (beyond edge of graph)
Hypothesis Tests About a Mean
• Note: This test was set up as a “two-tailed test”• Meaning, that we reject H0 if observed Y-bar falls in either
tail of the sampling distribution
• Ex: Very high Y-bar or very low Y-bar means reject H0
– Not all tests are done that way… Sometimes you only reject H0 if Y-bar falls in one particular tail.
Hypothesis Testing
• Definition: Two-tailed test: A hypothesis test in which the -area of interest falls in both tails of a Z or T distribution.
• Example: H0: m = 4; H1: m ≠ 4
• Definition: One-tailed test: A hypothesis test in which the -area of interest falls in just one tail of a Z or T distribution.
• Example: H0: > or = 4; H1: < 4
• This is called a “directional” hypothesis test.
Hypothesis Tests About Means
• A one-tailed test: H1: < 4
• Entire -area is on left, as opposed to half (/2) on each side. Also, critical t-value changes.
4
Hypothesis Tests About Means
• T-value changes because the alpha area (e.g., 5%) is all concentrated in one size of distribution, rather than split half and half.
• One tail vs. Two-tail:
Hypothesis Tests About Means
• Use one-tailed tests when you have a directional hypothesis– e.g., > 5
• Otherwise, use 2-tailed tests
• Note: In many instances, you are more likely to reject the null hypothesis when utilizing a one-tailed test– Concentrating the alpha area in one tail reduces the
critical T-value needed to reject H0
Tests for Differences in Means
• A more useful and interesting application of these same ideas…
• Hypothesis tests about the means of two different groups– Up until now, we’ve focused on a single mean for a
homogeneous group– It is more interesting to begin to compare groups– Are they the same? Different?
• We’ll do that next class!