r what is r? what does it tell us about the relationship between two variables?

72
r What is r? What does it tell us about the relationship between two variables?

Upload: elfrieda-lawrence

Post on 02-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: R What is r? What does it tell us about the relationship between two variables?

r• What is r?• What does it tell us about the relationship between two

variables?

Page 2: R What is r? What does it tell us about the relationship between two variables?

Coincidence?

• Calculate the day of the year on which your birthday falls.

• Draw a number• Record numbers in spreadsheet on board..• We’ll run regression to determine relationship

between your day and the random number you drew.

Page 3: R What is r? What does it tell us about the relationship between two variables?

Vietnam Draft

• A similar exercise was done with the Vietnam war draft lottery. The relationship was found to be -.226. What does this mean?

• Statisticians found that the probability of this happening by chance was less than .001. So there was strong evidence that the lottery was unfair.

Page 4: R What is r? What does it tell us about the relationship between two variables?

Introduction to Inference

Estimating with Confidence

Page 5: R What is r? What does it tell us about the relationship between two variables?

Statistical Inference

• Statistical inference provides methods for drawing conclusions about a population from sample data.

• If we believe something to be true and we design an experiment to test it, how likely is our experiment to give us true results?

Page 6: R What is r? What does it tell us about the relationship between two variables?

Statistical Inference

• Confidence intervals: estimate values• Tests of significance: assess the evidence for a

claim about a population

• This chapter we use oversimplified examples to understand the reasoning.

Page 7: R What is r? What does it tell us about the relationship between two variables?

Estimating with Confidence

• In 2000 1,260,278 college-bound seniors took the SAT. The mean math score was 514 with a s.d. of 113. (Verbal: 505,111)

• In California about 49% of students take the SAT – many take the ACT and others are not college-bound.

• How could we estimate the SAT score of all California seniors?

Page 8: R What is r? What does it tell us about the relationship between two variables?

California Confidence

• Suppose you arrange to give the test to an SRS of 500 California Seniors. The mean of the sample is 461.

• How would this sample vary if we took many samples of seniors from the same population?

Page 9: R What is r? What does it tell us about the relationship between two variables?

Central Limit Theorem

• CLT tells us that xbar has a distribution that is close to normal.

• The mean of the sampling distribution is the same as the mean of the entire population.

• Standard deviation would be s/√500

Page 10: R What is r? What does it tell us about the relationship between two variables?

Going on

• Somehow we know that the true standard deviation is 100. (Next chapter we’ll deal with not knowing the true standard deviation.) Our standard deviation of xbar is then 4.5. (VERIFY)

• If we collect another sample we would expect a different mean.

Page 11: R What is r? What does it tell us about the relationship between two variables?

Statistical ConfidenceAs we collect different samples and calculate different means, 95% of the time we would expect the mean to fall within two standard deviations of the true, but unknown mean.

What would the range of values be if we go with our sample data with xbar = 461?

Page 12: R What is r? What does it tell us about the relationship between two variables?

Confidence Intervals

• We say of the interval (452,470) that it will contain the true mean of the population 95% of the time.

• Our margin of error is ±9.

Page 13: R What is r? What does it tell us about the relationship between two variables?

Confidence Intervals

A level C confidence interval for a parameter has two parts.

• An interval calculated from the data, usually of the form estimate ± margin or error

• A confidence level C, which gives the probability that the interval will capture the true parameter value in repeated samples.

Page 14: R What is r? What does it tell us about the relationship between two variables?

Conditions for constructing a confidence interval for a mean

• Data come from an SRS from the population of interest.

• The sampling distribution of xbar is approximately normal.– If the population distribution is normal, then the

sampling distribution of xbar will be normal.– From the CLT, is sampling distribution is

approximately normal if the sample size is large enough. Unless strongly skewed distribution, n≥15 is usually adequate.

Page 15: R What is r? What does it tell us about the relationship between two variables?

• One-sample confidence interval on µ (s known)

estimate margin of error or

• CONDITIONSThe sample must be reasonably random.The sampling distribution of is approximately normal.

• z* or z critical value is the number of standard deviations on

either side of the mean necessary to have the given confidence level. Can be found on the last two lines of Table C.

*x zn

Page 16: R What is r? What does it tell us about the relationship between two variables?

Inference Toolbox To construct a confidence interval: • Step 1: Identify the population of interest and the parameter

you want to draw conclusions about. • Step 2: Choose the appropriate inference procedure. Verify

the conditions for using the selected procedure. • Step 3: If the conditions are met, carry out the inference

procedure.

• Step 4: Interpret your results in the context of the problem.

Page 17: R What is r? What does it tell us about the relationship between two variables?

z critical value

• Aka z*• The number of standard deviations we must

go out on either side of the mean to catch the central probability.

• Comes from the standard normal table.

Page 18: R What is r? What does it tell us about the relationship between two variables?

The t tableMost of the mysteries of the t-table will be revealed in Chapter 11. For now, go to the t-table in either your formula chart or inside back cover of the book.

Look at the last row.

Pick your confidence level at the bottom, and go up One row to find z*

Page 19: R What is r? What does it tell us about the relationship between two variables?

Video Screen Tension p. 546

• A manufacturer of high-resolution video terminals must control the tension on the mesh of fine wires that lies behind the surface of the viewing screen.

• Here are the tension readings from an SRS of 20 screens from a single day’s production.

Page 20: R What is r? What does it tell us about the relationship between two variables?

269.5 297.0 269.6 283.3 304.8280.4 233.5 257.4 317.5 327.4264.7 307.7 310.0 343.3 328.1342.6 338.8 340.1 374.6 336.1

Find xbar.

Page 21: R What is r? What does it tell us about the relationship between two variables?

Construct a 90% confidence interval for the mean tension mof all the screens produced on this day.

Step1: Identify the population of interest and the parameteryou want to draw conclusions about. The population of interest is all of the video terminals produced on the day inquestion. We want to estimate m, the mean tension for all of these screens.

Page 22: R What is r? What does it tell us about the relationship between two variables?

Step 2: Choose the appropriate inference procedure.Verify the conditions for using the selected procedure.(Only one inference procedure so far – confidence intervalwith known standard deviation.) We were given thatThe sample was an SRS. Let’s look at a stem plot anda normal probability plot to see if there is any reasonto doubt the normality of the sampling distribution.(20 might be large…)

Page 23: R What is r? What does it tell us about the relationship between two variables?

Stemplot Normal Probability plot

Page 24: R What is r? What does it tell us about the relationship between two variables?

Step 3: If the conditions are met, carry out the inference procedures.

*x zn

Page 25: R What is r? What does it tell us about the relationship between two variables?

Step 4: Interpret your results in the context of the problem.We are 90% confident that the true mean tension in the entire batch of video terminals produced that day isbetween 290.4 and 322.1 mV.

Page 26: R What is r? What does it tell us about the relationship between two variables?

Movie Theaters

• A survey of 81 movie theaters showed that the average length of a feature film was 98 minutes. Past studies indicate that s = 12 minutes. Determine a 95% confidence interval for estimating the mean length of all feature films. Interpret the interval in the context of the problem.

Page 27: R What is r? What does it tell us about the relationship between two variables?

What your calculator can do for you!

• A survey of 81 movie theaters showed that the average length of a feature film was 98 minutes. Past studies indicate that s = 12 minutes. Determine a 95% confidence interval for estimating the mean length of all feature films. Interpret the interval in the context of the problem.

Page 28: R What is r? What does it tell us about the relationship between two variables?

Tests of significance

• Confidence intervals are used to estimate a population parameter.

• Tests of significance assess the evidence provided by data about some claim concerning a population.

Page 29: R What is r? What does it tell us about the relationship between two variables?

Basketball

• Leah claims that she makes 80% of her basketball free throws.

• In her next 20 throws she makes only 8. (40%)• Caleb says “Aha! Someone who really makes 80% of

their free throw shots would very rarely make only 8 out of 20. I don’t believe your claim!”

• In fact, if Leah’s claim is true, the probability of her making 40% in one string is .0001. The small probability convinces us that while her claim is possible, it is very unlikely.

Page 30: R What is r? What does it tell us about the relationship between two variables?

Hypotheses

• In order to do an inference test you must have some question or claim.– Is Leah’s average 80%?– Does this drug reduce blood pressure?– Does this treatment reduce blood pressure?– Does this produce cause whiter teeth?– Does this cola lose sweetness over time?– Is the diameter of these ball bearings within the

customer’s tolerance?

Page 31: R What is r? What does it tell us about the relationship between two variables?

Warmup

• Weekly sales of regular ground coffee at a supermarket have in the recent past varied according to a normal distribution with mean m=354 units per week and standard deviation s=33 units. The store reduces the price by 5%. Sales in the next three weeks are 405,378, and 411. Is this good evidence that average sales are now higher? Write the hypotheses.

Page 32: R What is r? What does it tell us about the relationship between two variables?

Soda Sweetness

• Diet colas use artificial sweeteners which may lose their sweetness over time.

• Trained testers sip cola and score the cola on a sweetness scale.

• Cola is then stored for a month at high temperature to imitate 4 months storage.

• Testers again rate the colas.

Page 33: R What is r? What does it tell us about the relationship between two variables?

Sweetness losses

• Here are sweetness losses as judged by 10 tasters• 2.0 0.4 0.7 2.0 -0.4 2.2 -1.3 1.2 1.1 2.3• (A negative indicates taster thought it gained

sweetness.)• Mean is 1.02• That’s not a large loss. If we had another group

try they would have different numbers. Is there good evidence that the cola lost sweetness?

Page 34: R What is r? What does it tell us about the relationship between two variables?

Hypotheses

• Our test asks– Does the sample result xbar = 1.02 reflect a real

loss of sweetness?– OR– Could we easily get the outcome xbar=1.02 just by

chance?

Page 35: R What is r? What does it tell us about the relationship between two variables?

Null hypothesis

• The null hypothesis says that there is no effect or no change in the population. Status quo.

• ALWAYS stated in terms of a parameter.• Generally written as Ho and referred to as

H-nought.• For our cola problem:–Ho: m = 0

Page 36: R What is r? What does it tell us about the relationship between two variables?

Alternate Hypothesis

• However, we suspect that cola does lose its sweetness. So our alternate hypothesis is that the difference will be positive:– HA: m > 0

Alternate hypothesis may be one-sided or two-sided.

Page 37: R What is r? What does it tell us about the relationship between two variables?

Steps in Inference Testing

• First step is always to define your population and parameter of interest. (This should sound familiar. Yes, you have to do it every time.)

• Second step is to state your hypotheses

Page 38: R What is r? What does it tell us about the relationship between two variables?

Practice:

• A car dealer advertises that its new subcompact model gets 47 mpg. You assume the dealer will not underrate the car, but you are suspicious about the claim.

• Complete the first two steps of inference procedure.

Page 39: R What is r? What does it tell us about the relationship between two variables?

Questions on hw?

Page 40: R What is r? What does it tell us about the relationship between two variables?

Back to cola example

• 10 tasters; xbar=1.02; standard deviation for individual tasters is known to be 1.

• Ho: m = 0• HA: m > 0

• So, sampling distribution of xbar from 10 tasters is then normal with mean m=0 (if there is no change in sweetness) and standard deviation 1/√10.

Page 41: R What is r? What does it tell us about the relationship between two variables?

Sample Distribution

Page 42: R What is r? What does it tell us about the relationship between two variables?

• So the value xbar=1.02 seems unlikely. The probability of this happening is about 6/10000. An outcome this unlikely convinces us that the true mean is really more than 0.

• The probability is called the p-value. A large p-value fails to give evidence. A small p-value means the null hypothesis is unlikely.

Page 43: R What is r? What does it tell us about the relationship between two variables?

Outline of a test

• State population, parameter and hypotheses.• Choose appropriate procedure. Verify

conditions.• If conditions met, calculate test statistic and

find the probability (p-value) that your statistic could have occurred by chance.

• Interpret your results in context.

Page 44: R What is r? What does it tell us about the relationship between two variables?

Back to cola

• State the hypotheses:• Find the value of the test statistic (what is it?)• Sketch the normal curve of the test statistics

when Ho is true. Why is the sampling distribution normal?

• Find the p value:• Is the result significant at the a=0.05 level?

Page 45: R What is r? What does it tell us about the relationship between two variables?

Molly JeraelPaulo Alex U

Bryan CheyanneKatie W

Nathan Alex BChelsea Emily

AJ MacCole Michael

Denver JonnaKatie S Braxton

Ramon EricaRichard

Page 46: R What is r? What does it tell us about the relationship between two variables?

Interpreting results

• The final step of an inference test is to interpret the results in the context of the problem. It should consist of 2 sentences.

• “Because p is _____________ we reject/fail to reject the null hypothesis. There is/is not evidence that …….”

Page 47: R What is r? What does it tell us about the relationship between two variables?

2nd Period

Page 48: R What is r? What does it tell us about the relationship between two variables?

• How can you tell from a problem when we use a confidence interval and when a test?

Page 49: R What is r? What does it tell us about the relationship between two variables?

Steps for inference testing

• See Inference Toolbox on page 571

Page 50: R What is r? What does it tell us about the relationship between two variables?

Single or two-sided test

• If our alternate hypothesis is < or > we have a one-sided test. P is the area under the normal curve to the left or right of the test statistic.

• If our alternate hypothesis is ≠ then the p value is twice the area to the left or right of the test statistic.

Page 51: R What is r? What does it tell us about the relationship between two variables?
Page 52: R What is r? What does it tell us about the relationship between two variables?

10.38: Pressing Pills

Pills hardness values:Enter into calculatorThe target values for the hardness are m=11.5.

Standard deviation is known to be 0.2. Is there significant evidence at the 5% level that the mean hardness of the tablets is different from the target value? Use the Inference toolbox.

Page 53: R What is r? What does it tell us about the relationship between two variables?

Fixed significance level

• Sometimes our problem specifies an a with which to compare p. If p is less than a then the statistic is significant at level a.

Page 54: R What is r? What does it tell us about the relationship between two variables?

• 10.27, 10.33, 10.28, 10.34, 10.38

Page 55: R What is r? What does it tell us about the relationship between two variables?

• How do we know when to use a confidence interval and when to use a significance test?

• What does 95% confident mean?• If we wanted to establish a 95% confidence

interval with a margin of error of 8 for the problem in 10.27, 10.33, what size sample would we need.

Page 56: R What is r? What does it tell us about the relationship between two variables?

Practice now

• 10.55,54 (p. 585)

• HW: 10.51, 52

Page 57: R What is r? What does it tell us about the relationship between two variables?

Which is worse – to condemn an innocent person or to let a guilty person go free?

Page 58: R What is r? What does it tell us about the relationship between two variables?

Errors

• Since our testing procedure calls for us to reject a null hypothesis based on a probability, there will be times we will make a mistake!

• We classify our errors into two types – imaginatively named Type I and Type II

Page 59: R What is r? What does it tell us about the relationship between two variables?

Juries

• Innocent Guilty

Guilty

Innocent

The reality:

Jury

Finds

Page 60: R What is r? What does it tell us about the relationship between two variables?
Page 61: R What is r? What does it tell us about the relationship between two variables?

Errors

• If we reject Ho when it is in fact true we commit a Type I error.

• If we fail to reject Ho when it is in fact false we commit a Type II error.

Page 62: R What is r? What does it tell us about the relationship between two variables?

Potato Chips

• When a batch of potato chips are produced, a sample are tested to see if they meet standards. If they do not, the distributor refuses to take the chips. (Acceptance testing.)

• Ho: the batch of potato chips meets standards• Ha: the potato chips do not meet standards.• What are Type I and Type II errors and their

consequences?

Page 63: R What is r? What does it tell us about the relationship between two variables?

Probabilities

• What is the probability of committing a Type I error?

• What is the probability of committing a Type II error?

Page 64: R What is r? What does it tell us about the relationship between two variables?

Type II error

• Type II error can only be calculated for a specific value of the parameter.

• For instance, in our potato chips, salt content is supposed to be 2 mg. s = .1 mg.

• Assume a = .05. Let us also assume that the true mean of the sodium values is 2.05.

Page 65: R What is r? What does it tell us about the relationship between two variables?

Pictures:

Page 66: R What is r? What does it tell us about the relationship between two variables?

Power

What is it?

Page 67: R What is r? What does it tell us about the relationship between two variables?

fail to reject Ho reject Ho

type II error

power

Page 68: R What is r? What does it tell us about the relationship between two variables?

fail to reject Ho reject Ho

type II error

power

What if we increased alpha?

and the power increases!

Page 69: R What is r? What does it tell us about the relationship between two variables?

fail to reject Ho reject Ho

type II error

power

What if we increased sample size?

Well, the standard deviation would get smaller.

What happened to power?

To a type II error?

Page 70: R What is r? What does it tell us about the relationship between two variables?

fail to reject Ho reject Ho

type II error

power

What if the alternative (truth?) were further from Ho?

type II error is near 0

power is near 1

Page 71: R What is r? What does it tell us about the relationship between two variables?

So it’s a balancing act.

What is most important in the context of the problem?

Which error would be

most costly or dangerous?

Page 72: R What is r? What does it tell us about the relationship between two variables?

To raise the power of a test

(that is, to increase the probability of correctly rejecting Ho if it is NOT true)

1. increase sample size

2. decrease variation

3. move alternative further from Ho

4. increase alpha (prob. of type I error)