confidence intervals, hypothesis testing

54
Confidence Intervals, Hypothesis Testing

Upload: shyla

Post on 24-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Confidence Intervals, Hypothesis Testing. Example 1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Confidence Intervals, Hypothesis Testing

Confidence Intervals, Hypothesis Testing

Page 2: Confidence Intervals, Hypothesis Testing

Example 1

You manage a large educational nonprofit and are trying to estimate the amount of deductions your teachers apply for in order to comment to the media (you can write off $250 for supplies annually on your federal tax return). Your assistant randomly samples 50 employees . The mean write-off was 150$ with a SD of $55. What is the probability that the mean write-off is between 140 and 160?

Page 3: Confidence Intervals, Hypothesis Testing

What can we say about the question before we even start calculations?

• We know we can use the z distribution because of the sample size

• In asking for a range of values, (between 140 and 160) we can take the normalized scores (number of SD from the mean i.e. z scores) and take that area under the curve

• This is an example of how one would apply the confidence interval concept

Page 4: Confidence Intervals, Hypothesis Testing

Step one

• Calculate what you can:• Standard error= sigma/sqrt(n)=55/sqrt(50)

• =7.78

• Z score= (160-150)/7.78=1.28

Page 5: Confidence Intervals, Hypothesis Testing

Visualized

Page 6: Confidence Intervals, Hypothesis Testing

Find z score

Page 7: Confidence Intervals, Hypothesis Testing

Put it in plain English

• Multiply your area under the curve (probability) by 2 to get 0.798

• The probability that the write-offs were within 10 of that mean is 79.8%– Comment on these results– Think of the alpha we commonly use in class: 0.05

and 0.1

Page 8: Confidence Intervals, Hypothesis Testing

What is a confidence interval?

• Definition: the best estimate for a range of a population value (parameter) that we can come up with given a sample (sample statistic)

• The general formula for n>30: X bar plus or minus (critical value * s.e.)

• Here is a list of critical values at the most common confidence levels

Page 9: Confidence Intervals, Hypothesis Testing

T versus z

The General formula for n<30:

• If your n is less than 30, you need to look up the critical value in the t table, at the intersection of the (df) and the significance level depending on if it is a one tailed or 2 tailed test.

• Thus the critical value changes depending on your sample size n and your confidence level that you desire

Page 10: Confidence Intervals, Hypothesis Testing

Example 2

Example: we know the mean test scores for 20 people out of a class of 300. The mean score is an 82. The sample standard deviation=15. What confidence level are the scores between 75 and 89?

Page 11: Confidence Intervals, Hypothesis Testing

What can we say about the question before we even start calculations?

• We know we can use the t distribution because of the sample size

• In asking for a range of values, (between 75 and 89) we can take the t scores (number of SD from the mean) and take that area under the curve

• This is an example of how one would apply the confidence interval concept

Page 12: Confidence Intervals, Hypothesis Testing

Calculate what you can:

• Standard error= sigma/sqrt(n)=15/sqrt(20)• =3.35

• t score= (89-82)/3.35=2.0896• Don’t want to use the t table?– http://stattrek.com/online-calculator/t-

distribution.aspx– Be careful, it is a cumulative probability here

• OR want an easy t table? • http://www.medcalc.org/manual/t-distribution.php

Page 13: Confidence Intervals, Hypothesis Testing

T table

Page 14: Confidence Intervals, Hypothesis Testing

Visualize your answer

Page 15: Confidence Intervals, Hypothesis Testing

Put your answer in words

• We are 94.96% confident that the population mean of exam scores is between 75 and 89

Page 16: Confidence Intervals, Hypothesis Testing

Hypothesis Testing

• A null hypothesis: nothing has changed or happened• change has not occurred, the effect has not been

realized• A statement of no difference

• Always refers to the population, and is therefore untestable, so it is an implied hypothesis

• The null hypothesis is a statement of equality

• The purpose of the null: acts as a starting point or benchmark against which the actual outcomes of a study can be measured

• Until you prove there is a difference, you assume there is no difference

Page 17: Confidence Intervals, Hypothesis Testing

Research hypothesis

Definition: a definite statement that there is a relationship between variables

• They posit a relationship between variables, not an equality • They always refer to the sample, not the population

Page 18: Confidence Intervals, Hypothesis Testing

One tailed v. two tailed Non-directional: says two variables are differentDirectional: specifies if one is more than or less than the other

One tailed tests: reflect a directional hypothesis Greater use than a two tailed test

Two Tailed Tests: reflect a non-directional hypothesis There is a difference but in no particular direction

Example one tailed test: Arrest rate is higher after a crackdown on prostitution

Example two tailed test: The arrest rate after the crackdown does not equal the arrest rate after

Page 19: Confidence Intervals, Hypothesis Testing

Steps to work through a CIGeneral Steps to take to test a null hypothesis1. State the null hypothesis 2. Set the level of risk associated with the null hypothesis 3. Select the appropriate test statistic (z or t score, depends on n)4. Compute the test statistic 5. Determine the value needed for rejection of the null based on a table of critical values for that particular statistic -each test statistic has a critical value, this is the value you’d expect if the null were true 6. If the obtained value is more extreme than the critical value, the null cannot be accepted, that is, the null occurring by chance is not the best explanation of the events 7. If the obtained value doesn’t exceed the critical value, you do not reject the null

Page 20: Confidence Intervals, Hypothesis Testing

ExampleThere is a series of complaints made to the local police department on prostitution. Before the crackdown, there were 3.4 arrests per day. The chief wants to show that the crackdown has worked. What is the null and research hypothesis?

Page 21: Confidence Intervals, Hypothesis Testing

Hypotheses

• H_0: Following the crackdown arrests after=arrests before; arrests after=3.4

• H_A: Following the crackdown, arrests after > 3.4

Page 22: Confidence Intervals, Hypothesis Testing

Here is the random sample of arrests per day

Day Prostitution Arrests1 32 53 74 25 36 67 48 39 610 1

Step 1: Estimate the population and sample means there are a lot of sites out there that do this: http://www.miniwebtool.com/sample-standard-deviation-calculator/

Page 23: Confidence Intervals, Hypothesis Testing

Use these estimates to calculate the standard error

• Sample mean: 4• Sample SD (hint divide by n-1)=1.94• S.e.= s/sqrt(n)= 1.94/sqrt(10)=0.61

Page 24: Confidence Intervals, Hypothesis Testing

Test the hypothesis with these numbers

• When you’re told to test a hypothesis, this is asking you to get the probability of taking a random sample of 10 with the mean at 4.0 if the population mean is actually 3.4

• Get the t score for 4.0

Page 25: Confidence Intervals, Hypothesis Testing

T score

• 4.0-3.4 / 0.61= 0.98• Look up the t score

Page 26: Confidence Intervals, Hypothesis Testing

Interpret the t score

• You can see it is in between 0.15 and 0.2 • (the computer shows it is 0.176)• this means that The probability of drawing a

sample of 10 with a mean of 4 if the population mean is really 3.4 is between 0.1 and 0.2; should we accept the null?

Page 27: Confidence Intervals, Hypothesis Testing

Why is the t score not enough?

• Typically we’d reject the null with 95% confidence, the critical value there is 1.833, we only got 0.98

Page 28: Confidence Intervals, Hypothesis Testing

Significance levels (alpha)• The risk that what you observe is not due to the

treatment• Also, the risk you’re willing to take that you’ll reject a

null hypothesis when it is actually true • Example: the increase in test scores is by chance,

not due to the after school program • If an article reports significance at the 0.05 level, this

means there is a 1 in 20 chance that whatever they observed can be attributed to chance as opposed to the treatment they hypothesized

• The researcher picks this value (the risk they’re willing to accept)

Page 29: Confidence Intervals, Hypothesis Testing

How sure must you be?

• If the t score you generate EXCEEDS the t score that is associated with the alpha, we can reject the null hypothesis

And accept the research hypothesis • The alpha is the probability that you SELECT in order to

reject the null• When our alpha is 0.05, this is the threshold it takes to

reject the null, if our t score exceeds the t score associated with 0.05 (at the df) then we reject the null, but there is still a 5% chance that the null is true

Page 30: Confidence Intervals, Hypothesis Testing

In the previous example

• Returning to the problem above, the t score is 0.98• if our alpha was 0.05 at (df=9) and the t score is 1.833.

0.98 does not exceed 1.833, so we cannot reject the null.

There is ~17% chance that the null is true, and that’s too high

Page 31: Confidence Intervals, Hypothesis Testing

Language in psets

“Evaluate your hypothesis at alpha=0.1 and at alpha 0.05 “• This is asking you to see if your t/z score that

you calculated exceeds the t/z score at the chosen level of confidence (alpha = 0.05 is the same as 95% confidence)

• If you’re using the t dist. make sure you determine the correct value at the proper degrees of freedom

Page 32: Confidence Intervals, Hypothesis Testing

Handy graphic for errors

Page 33: Confidence Intervals, Hypothesis Testing

Interpreting the previous graphic• The null can either be true or false, you’ll never know because

you’re not testing the whole population• You can either accept it or reject it • Type I: The value associated with a type I is the risk you’re rilling

to take and it is conventionally between 0.01 and 0.05• If it is at 0.05, there is a 5% chance you’ll reject the null when it is

actually true Reduce the chance of getting a type I by using smaller and smaller alphas• Raising the alpha increases the chance you commit a type II

error!

Page 34: Confidence Intervals, Hypothesis Testing

Interpreting the previous graphic

Type II: you accepted a null by mistake, and conclude there are no differences when there actually areReduce your likelihood of committing a type II error by increasing the sample size

Page 35: Confidence Intervals, Hypothesis Testing

SAMPLE SIZESample size• When you test a hypothesis with a small sample, the t scores

with the associated alpha values will be higher than those for larger samples

• This is because when you estimate a population with a small sample, it contains more error

• As the number of df goes up, the t values for rejecting the null go down

• If n is bigger than 30, use the normal distribution

Page 36: Confidence Intervals, Hypothesis Testing

FORMULA TO DETERMINE SAMPLE SIZE

How to determine the sample size:

N=[(t (i.e.1.96) * s)/ error we can tolerate ] ^ 2

Page 37: Confidence Intervals, Hypothesis Testing

ExampleWe need to determine for the Welfare office the average income for all residents that receive welfare. They want to be 95% confident that the estimate of average income is within $100 of the actual average. How large of a sample do we need in order to reduce the error to 100 (the SD is 442)?

Page 38: Confidence Intervals, Hypothesis Testing

Solve by plugging inStep1: we know to build a 95% confidence interval we take(that is the t score/critical value that we want

Step 2: n=[(1.96 * 442)/100]^2N=75.05 or 76

In English: the best sample size is 76 respondents

Page 39: Confidence Intervals, Hypothesis Testing

ExampleWe are testing the effect of a drug by injecting 100 people with it and recording their response time. The mean response time for those not who did not get the drug was 1.2 seconds, and the mean response time for those who were injected with the drug its 1.05 seconds. The sample standard deviation is 0.5 seconds. Do you think the drug affects the response time?

Page 40: Confidence Intervals, Hypothesis Testing

Step one

• Set the hypotheses:

Null: the response time is equal between those injected and those not injected (the drug has no effect)

Research hypothesis: The response time for those injected is less than those not injected (mu _(with drug) < 1.2 seconds)

Page 41: Confidence Intervals, Hypothesis Testing

Step 2• If the null was true, what is the probability we would have

gotten this with the sample (if that probability is really small then we can reject the null.)

• we know that n>30, so the CI can use the critical value in the z distribution

Page 42: Confidence Intervals, Hypothesis Testing

Next steps

• Step 3: Estimate the s.e. = s/sqrt(n) = 0.5/10 = 0.05• Step 4: get the test statistic

• Conceptualize the problem by drawing it out: 1.2 is the mean, how many SD is 1.05s away from 1.2s. Then get the z score for 1.05 to find how many SD it is away from the mean of 1.2.

Page 43: Confidence Intervals, Hypothesis Testing

Get the z score

• get the z score using this formula:z=[(1.2-1.05)/0.05] = 3 • In english this means that 1.05 seconds is 3 SD away from the

mean • So in setting up this confidence interval, you’re asking what the

odds of getting a score 3 SD from the mean (1.05 s) completely by chance. Since it is far out there in the tails, intuition says it is low.

• given we set our hypothesis up this way, we are only testing to see if the drug lowers response time • This calls for a one tailed test

Page 44: Confidence Intervals, Hypothesis Testing

Draw it out to help

Page 45: Confidence Intervals, Hypothesis Testing

Look at the z tableYou look at the z table and see that 3.0 has .4986 between mu and the score. Thus if we add .5 to .4986 we see that the odds of getting this score by chance are 1-.9956 or .0014

How to put this into plain English?

Page 46: Confidence Intervals, Hypothesis Testing

Estimating population proportions

Page 47: Confidence Intervals, Hypothesis Testing

Proportions

• You can set up confidence intervals around them just like we did with means

• Here are the steps:

1. estimate the proportion2. Take the SD with this formula: s= sqrt(p * (1-p)) 3. Find the s.e. with this formula: s / sqrt(n)4. Set up the confidence interval with this formula: proportion plus or minus t * s.e.

Page 48: Confidence Intervals, Hypothesis Testing

ExampleThe warden wants to estimate how many re-admits he is getting because of a new job training program taking place in the jail. He takes a sample of 100 inmates who went through the program, and found that 68 became inmates again. Give a 95% confidence interval around this population proportion.

Page 49: Confidence Intervals, Hypothesis Testing

Calculate what we can

Step 1: estimate the population proportion =0.68 become re-admitted each year

Step 2: get the sample standard deviation using this formula: s= sqrt(p * (1-p)) =sqrt(1* 0.68 * 0.32) =0.47

Page 50: Confidence Intervals, Hypothesis Testing

Next stepsStep 3: Use this in order to find the standard error: = s / sqrt(n) =0.47/ sqrt(100) = 0.047

Step 4: What are the 95% confidence limits of the proportion? Since n is bigger than 30, the normal curve can be used. Set up a confidence interval using this formula: proportion plus or minus t * s.e.

=0.68 + or - 1.96 * 0.047=0.68 + or - 0.092=0.59 to 0.77

Page 51: Confidence Intervals, Hypothesis Testing

In English

• We are 95% confident that the population proportion is between .59 and .77 readmitted to prison.

Page 52: Confidence Intervals, Hypothesis Testing

Is the program working?

• Why or why not?

Page 53: Confidence Intervals, Hypothesis Testing

ExampleYour boss wants to estimate how many welfare recipients own cars. She wants to know the proportion within 2% and wants to be 95% certain. What is the sample size she needs in order to do this?

Page 54: Confidence Intervals, Hypothesis Testing

Steps to solve

Step one: the sample size formula is n=[ (z * sigma)/ error ] ^2 where error is the amount of error we can tolerate-since she can deal with 2% error, this becomes: n=[ (1.96 * sigma) / 0.02]

Step two: We insert 0.5 for sigma since the largest standard deviation is .5 for s proportion of .5, thus if we don’t know a population proportion and need to estimate a sample size, 0.5 is the best proportion estimate to use (which has a SD of 0.5). n=[ (1.96 *.5) / 0.02]^2=2401

Step 3: she needs to sample 2401 employees