sociology 5811: lecture 9: ci / hypothesis tests copyright © 2005 by evan schofer do not copy or...

Sociology 5811:Lecture 9: CI / Hypothesis Tests

Copyright © 2005 by Evan Schofer

Do not copy or distribute without permission

Announcements

• Problem Set #3 Due next week• Problem set posted on course website

• We are a bit ahead of reading assignments in Knoke book

• Try to keep up; read ahead if necessary

Review: Confidence Intervals

• General formula for Confidence Interval:

)(σ ZY :C.I. Yα/2• Where:

• Y-bar is the sample mean

• Sigma sub-Y-bar is the standard error of the mean

• Z (alpha/2) is the critical Z-value for a given level of confidence– If you want 90%, look up Z for 45% (/2)– See Knoke, Figure 3.5 on page 87 for info

Small N Confidence Intervals• Issue: What if N is not large?

• The sampling distribution may not be normal

• Z-distribution probabilities don’t apply…

• Standard CI formula doesn’t work

• Solution: Use the “T-Distribution”• A different curve that accurately approximates the shape of

the sampling distribution for small N

• Result: We can look up values in a “t-table” to determine probabilities associated with a # of standard deviations from the mean.

Confidence Intervals for Small N

• Small N C. I. Formula:• Yields accurate results, even if N is not large

)σ̂( tY :C.I. Yα/2

N

s tY :C.I. α/2

• Again, the standard error can be estimated by the sample standard deviation:

T-Distributions

• The T-distribution is a “family” of distributions• In a T-Distribution table, you’ll find many T-distributions

to choose from

– Basically, the shape of sampling distribution varies with the size of your sample

• You need a specific t-distribution depending on sample size

• One t-distribution for each “degree of freedom”– Also called “df” or “DofF”

• Which T-distribution should you use?

• For confidence intervals: Use T-distribution for df = N - 1

• Ex: If N = 15, then look at T-distribution for df = 14.

Looking Up T-Tables

Choose the correct df

(N-1)

Choose the desired

probability for /2

Find t-value in correct row

and column

Interpretation is just like a Z-score.

2.145 = number of standard

errors for C.I.!

Answering Questions…

• Knowledge of the standard error allows us to begin answering questions about populations

• Example: National educational standard requires all schools to maintain a test score average of 60

• You observe that a sample (N=16, s=6) has a mean of 62

• Question: Are you confident that the school population is above the national standard?

• We know Y-bar for the sample, but what about for the whole school?

• Are we confident that > 60?

Question: Is > 60?

• Strategy 1: Construct a confidence interval around Y-bar

• And, see if the bounds fall above 60

• Visually: Confident that > 60:

58 59 60 61 62 63 64 65 66

Y

• Visually: might be 60 or less

58 59 60 61 62 63 64 65 66

Y

Question: Is > 60?

• Strategy 1: Construct a confidence interval around Y-bar– Let’s choose a desired confidence level of .95– N of 16 is “small”… we must use the t-distribution,

not the Z-distribution– Look up t=value for 15 degrees of freedom (N-1).

Looking Up T-Tables

Choose the correct df (N-1)=15

Choose the desired

probability for /2

Find t-value in correct row

and column

Result:

t = 2.131

Question: Is > 60?• Strategy 1: Construct a confidence interval around Y-

bar

N

s tY)σ̂( tY :C.I. α/2Yα/2

16

62.131 62 47.3 62

58 59 60 61 62 63 64 65 66

Y

• CI is 58.53 to 65.47! We aren’t confident > 60

Question: Is > 60?

• Note #1: Results would change if we used a different confidence level

• A 95% and 50% CIs yield different conclusions:

• Idea: Wouldn’t it be nice to know exactly which CI would describe the distance from Y-bar to ?

• i.e., to calculate the exact probability of Y-bar falling a certain distance from ?

58 59 60 61 62 63 64 65 66

Y

Question: Is > 60?

• Note #2: We typically draw CIs around Y-bar– But, we can also get the same result focusing on our

comparison point (Y = 60)

• Example: If 60 is outside of CI around Y-bar

• Then, Y-bar is outside of the CI around 60

58 59 60 61 62 63 64 65 66

Y

58 59 60 61 62 63 64 65 66

Y

Question: Is > 60?

• The critical issue is: How far is the distance between Y-bar and 60– Is it “far” compared to the width of the sampling

distribution?• Ex: Y-bar is more than 2 Standard Errors from 60?

• In which case, the school probably exceeds the standard

– Or, is it relatively close?• Ex: Y-bar is only .5 Standard Errors from 60

• In which case we aren’t confident…

– Note: If we know the sampling distribution is normal (or t-distributed), we can convert SE’s to a probability

Question: Is > 60?

• Strategy 2: Determine the probability of Y-bar = 62, if is really 60 or less

• Procedure:– 1. Use Y=60 as a reference point– 2. Determine how far Y-bar is from 60, measured in

Standard Errors• Which we can convert to a probability

– 3. Issue: Is it likely to observe a Y-bar as high as 62?• If this is common to observe, even when = 60 (or less),

then we can’t be confident that > 60!

• But, if that is a rare event, we can be confident that > 60!

Question: Is > 60?

• Strategy 2: Look at sampling distribution

• Confident that not 60 or less:

58 59 60 61 62 63 64 65 66

Y

• Visually: might easily be <60

58 59 60 61 62 63 64 65 66

Y

is unlikely to really be 60… because Y-bar usually falls near the

center of the sampling distribution!

In this case, it is common to get Y-bars of

62 or even higher

Question: Is > 60?

• Issue: How do we tell where Y-bar falls within the sampling distribution?

• Strategy: Compute a Z-score

• Recall: Z-scores help locate the position of case within a distribution

• It can tell us how far a Y-bar falls from the center of the sampling distribution

• In units of “standard errors”!

• Probability can be determined from a Z-table• Note: for small N, we call it a t-score, look up in a t-table.

Question: Is > 60?

• Note: We use a slightly modified Z formula

Yσ

)μ()(

Y

s

YYZ

Y

ii

• “Old” formula calculates # standard deviations a case falls from the sample mean

• From Y-sub-i to Y-bar

• New formula tells the number of standard errors a mean estimate falls from the population mean

• Distance from Y-bar to in the sampling distribution

• In this case we compare to hypothetical = 60.

Question: Is > 60?• Let’s calculate how far Y-bar falls from

– Since N is small, we call it a “t-score” or “t-value”

YYY

σ̂/2σ̂

)6062(

σ

)μ(

Yt

5.14

6σ̂Y

N

s

333.15.1/2σ̂/2 Y t• Y-bar is 1.33 standard errors above !

Question: Is > 60?

• Question: What is the probability of t>1.33 • i.e., Y-bar falling 1.333 or more standard errors from ?

• Result: p = about .105• Note: Knoke t-table doesn’t contain this range… have to

look it up elsewhere or use SPSS to calculate probability.

58 59 60 61 62 63 64 65 66

Y

This area reflects the probability

Question: Is > 60?

• Result: p = .105

• In other words, if = 60, we will observe Y-bar of 62 or greater about 10% of the time

• Conclusion: It is plausible that is 60 or lower• We are not 95% confident that > 60

• Conclusion matches result from confidence interval

• We have just tested a claim using inferential statistics!

Hypothesis Testing

• Hypothesis Testing:

• A formal language and method for examining claims using inferential statistics– Designed for use with probabilistic empirical

assessments

• Because of the probabilistic nature of inferential statistics, we cannot draw conclusions with absolute certainty– We cannot “prove” our claims are “true”– However, improbable, we will occasionally draw an

un-representative sample, even if it is random

Hypothesis Testing

• The logic of hypothesis testing:

• We cannot “prove” anything

• Instead, we will cast doubt on other claims, thus indirectly supporting our own

• Strategy:

• 1. We first state an “opposing” claim• The opposite of what we want to claim

• 2. If we can cast sufficient doubt on it, we are forced (grudgingly) to accept our own claim.

Hypothesis Testing

• Example: Suppose we wish to argue that our school is above the national standard

• First we state the opposite:• “Our school is not above the national standard”

• Next we state our alternative:• “Our school is above the national standard”

• If our statistical analysis shows that the first claim is highly improbable, we can “reject” it, in favor of the second claim

• …“accepting” the claim that our school is doing well.

Hypothesis Testing: Jargon

• Hypotheses: Claims we wish to test

• Typically, these are stated in a manner specific enough to test directly with statistical tools– We typically do not test hypotheses such as “Marx

was right” / “Marx was wrong”– Rather: The mean years of education for Americans

is/is not above 18 years.

Hypothesis Testing: Jargon

• The hypothesis we hope to find support for is referred to as the alternate hypothesis

• The hypothesis counter to our argument is referred to as the null hypothesis

• Null and alternative hypotheses are denoted as:

• H0: School does not exceed the national standard• H-zero indicates null hypothesis

• H1: School does exceed national standard • H-1 indicates alternate hypotheses

• Sometimes called: “Ha”

Hypothesis Testing: More Jargon

• If evidence suggests that the null hypothesis is highly improbable, we “reject” it

• Instead, we “accept” the alternative hypothesis

• So, typically we:

• Reject H0, accept H1

– Or:

• Fail to reject H0, do not find support for H1

• That was what happened in our example earlier today…

Hypothesis Testing• In order to conduct a test to evaluate hypotheses,

we need two things:• 1. A statistical test which reflects on the

probability of H0 being true rather than H1• Here, we used a z-score/t-score to determine the probability

of H0 being true

• 2. A pre-determined level of probability below which we feel safe in rejecting H0 ()

• In the example, we wanted to be 95% confident… =.05• But, the probability was .10, so we couldn’t conclude that

the school met the national standard!

Hypothesis Test for the Mean

• Example: Laundry Detergent

• Suppose we work at the Tide factory

• We know the “cleaning power” of tide detergent, exactly: It is 73 on a continuous scale.

• “Cleaning Power” of Tide = 73

• You conduct a study of a competitor. You buy 50 bottles of generic detergent and observe a mean cleaning power of 65

• H0: Tide is no better than competitor ( >= 73)

• H1: Tide is better than competitor ( < 73)

Hypothesis Test: Example

• It looks like Tide is better:

• Cleaning power is 73, versus 65 for a sample of the competition

• Question: Can we reject the null hypothesis and accept the alternate hypothesis?

• Answer: No! It is possible that we just drew an atypical sample of generic detergent. The true population mean for generics may be higher.

Hypothesis Test: Example

• We need to use our statistical knowledge to determine:

• What is the probability of drawing a sample (N=50) with mean of 65 from a population of mean 73 (the mean for Tide)

• If that is a probable event, we can’t draw very strong conclusions…

• But, if the event is very improbable, it is hard to believe that the population of generics is as high as that of Tide…

• We have grounds for rejecting the null hypothesis.

Hypothesis Test: Example• How would we determine the probability (given

an observed mean of 65) that the population mean of generic detergent is really 73?

• Answer: We apply the Central Limit Theorem to determine the shape of the sampling distribution

• And then calculate a Z-value or T-value based on it

• If we chose an alpha () of .05• If we observe a t-value with probability of

only .0023, then we can reject the null hypothesis.

• If we observe a t-value with probability of .361, we cannot reject the null hypothesis

Hypothesis Test: Steps

• 1. State the research hypothesis (“alternate hypothesis), H1

• 2. State the null hypothesis, H0

• 3. Choose an -level (alpha-level)– Typically .05, sometimes .10 or .01

• 4. Look up value of test statistic corresponding to the -level (called the “critical value”)

• Example: find the “critical” t-value associated with =.05


• 5. Use statistics to calculate a relevant test statistic. – T-value or Z-value– Soon we will learn additional ones

• 6. Compare test statistic to “critical value”– If test statistic is greater, we reject H0

– If it is smaller, we cannot reject H0


• Alternate steps:

• 3. Choose an alpha-level

• 4. Get software to conduct relevant statistical test.– Software will compute test statistic and provide a

probability… the probability of observing a test statistic of a given size.

– If this is lower than alpha, reject H0

Hypothesis Test: Errors

• Due to the probabilistic nature of such tests, there will be periodic errors.

• Sometimes the null hypothesis will be true, but we will reject it– Our alpha-level determines the probability of this

• Sometimes we do not reject the null hypothesis, even though it is false

Hypothesis Test: Errors

• When we falsely reject H0, it is called a Type I error

• When we falsely fail to reject H0, it is called a Type II error

• In general, we are most concerned about Type I errors… we try to be conservative.

Hypothesis Tests About a Mean

• What sorts of hypothesis tests can one do?

• 1. Test the hypothesis that a population mean is NOT equal to a certain value– Null hypothesis is that the mean is equal to that value.

• 2. Population mean is higher than a value– Null hypothesis: mean is equal or less than a value

• 3. Population mean is lower than a value– Null hypothesis: mean is equal or greater than a value

• Question: What are examples of each?

Hypothesis Tests About Means• Example: Bohrnstedt & Knoke, section 3.93, pp.

108-110. N = 1015, Y-bar = 2.91, s=1.45• H0: Population mean = 4• H1: Population mean not = 4• Strategy:• 1. Choose Alpha (let’s use .001)• 2. Determine the Standard Error• 3. Use S.E. to determine the range in which

sample means (Y-bar) is likely to fall 99.9% of time, IF the population mean is 4.

• 4. If observed mean is outside range, reject H0

Example: Is =4?• Let’s determine how far Y-bar is from hypothetical =4

• In units of standard errors

0.24.046/09.1 t

YYY σ̂

09.1

σ̂

)491.2(

σ

)μ(

Yt

• Y-bar is 24 standard errors below 4.0!

046.1015

45.1

N

sσ̂ Y

Y


• A Z-table (if N is large) or a T-table will tell us probabilities of Y-bar falling Z (or T) standard deviations from

• In this example, the desired = .001• Which corresponds to t=3.3 (taken from t-table)

– That is: .001 (i.e, .1%) of samples (of size 1015) fall beyond 3.29 standard errors of the population mean

– 99.9% fall within 3.29 S.E.’s.


• There are two ways to finish the “test”

• 1. Compare “critical t” to “observed t”– Critical t is 3.3, observed t = -24

• We reject H0: t of +/-24 is HUGE, very improbable

• It is highly unlikely that = 4

• 2. Actually calculate the probability of observing a t-value of 24, compare to pre-determined

• If observed probability is below , reject H0

– In this case, probability of t=27 is .0000000000000…• Very improbable. Reject H0!

Two-Tail Tests

• Visually: Most Y-bars should fall near • 99.9% CI: –3.3 < t < 3.3, or 3.85 to 4.15

Sampling Distribution of the Mean

3.85 4 4.15 Z=-3.3 Z=+3.3

Mean of 2.91 (t=24) is far into the red

area (beyond edge of graph)


• Note: This test was set up as a “two-tailed test”• Meaning, that we reject H0 if observed Y-bar falls in either

tail of the sampling distribution

• Ex: Very high Y-bar or very low Y-bar means reject H0

– Not all tests are done that way… Sometimes you only reject H0 if Y-bar falls in one particular tail.

Hypothesis Testing

• Definition: Two-tailed test: A hypothesis test in which the -area of interest falls in both tails of a Z or T distribution.

• Example: H0: m = 4; H1: m ≠ 4

• Definition: One-tailed test: A hypothesis test in which the -area of interest falls in just one tail of a Z or T distribution.

• Example: H0: > or = 4; H1: < 4

• This is called a “directional” hypothesis test.

Hypothesis Tests About Means

• A one-tailed test: H1: < 4

• Entire -area is on left, as opposed to half (/2) on each side. Also, critical t-value changes.

4


• T-value changes because the alpha area (e.g., 5%) is all concentrated in one size of distribution, rather than split half and half.

• One tail vs. Two-tail:


• Use one-tailed tests when you have a directional hypothesis– e.g., > 5

• Otherwise, use 2-tailed tests

• Note: In many instances, you are more likely to reject the null hypothesis when utilizing a one-tailed test– Concentrating the alpha area in one tail reduces the

critical T-value needed to reject H0

Tests for Differences in Means

• A more useful and interesting application of these same ideas…

• Hypothesis tests about the means of two different groups– Up until now, we’ve focused on a single mean for a

homogeneous group– It is more interesting to begin to compare groups– Are they the same? Different?

• We’ll do that next class!

sociology 5811: lecture 9: ci / hypothesis tests copyright © 2005 by evan schofer do not copy or...

Documents