confidence intervals

42
1 Confidence Intervals

Upload: bluebaby-orange

Post on 22-Oct-2014

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Confidence Intervals

1

Confidence Intervals

Page 2: Confidence Intervals

2

Inferential Statistics

Based on a sample, inferential statistics is all about making some type of statement concerning the possible value of the population parameter Statements are made in a probabilistic sense,

ie, we can never say “I am absolutely sure that the true value of the population parameter is ….”

Page 3: Confidence Intervals

3

Types of statements Point Estimate: your best guess as to the value of the

population parameter Confidence interval

» Based on a sample, make a statement like “I am 90% sure that the true value of the population mean is BETWEEN 65 and 72 (here 65 is a lower bound and 72 is an upper bound)

Hypothesis testing» Assume some value for the population parameter, eg, I think

the true mean is at least 85. Then, take a sample and see if the evidence supports or refutes this claim

Page 4: Confidence Intervals

4

Remember, we are generally making statements concerning 2 population parameters, the population mean and the population proportion, and we are going to use the sample mean and the sample proportion respectively to estimate the parameters

Parameter

Estimate

When to use

X When the outcome of each individual trial has many different possible outcomes

p p When the outcome of each individual trial only has 2 possible outcomes

Page 5: Confidence Intervals

5

Point Estimate

This is the “Best Guess” of the value of the population parameter, given your sample information Recall, the sample mean is normally distributed with a mean

of which means that, on average, the sample mean will be equal to the true population mean (the sample mean is not a “biased” estimator of the population mean).

Therefore, given a sample mean ofX obtained from the sample – this is your point estimate of .

Likewise forp : the sample proportion is the point estimate for p

Page 6: Confidence Intervals

6

Point Estimate Example

The signing bonus for 30 new players in the PBL are used to estimate the mean bonus for all new players. The sample mean is P130,000 with a standard deviation of P25,500. What is the point estimate of the mean signing bonus for all new PBL players? Answer: the sample mean is P130,000 so this is your

point estimate of the population mean, Note that this is a “mean” problem because in our

sample, we are going to look at the signing bonus which can be a whole bunch of different numbers

Page 7: Confidence Intervals

7

Point Estimate Example

In a random sample of 100 students at a particular college, 60 indicated that they favored having the option of receiving a pass-fail grade for elective courses. What is the point estimate of the proportion of ALL students who favor having the option of receiving a pass-fail grade? Answer: the sample proportion is 60/100 = .6 so this is

your point estimate of the population proportion, p Note that this is a “proportion” problem because in our

sample, we are going to look at whether a student favors having the pass/fail option or not – there are only 2 outcomes – they either favor having the option or they don’t

Page 8: Confidence Intervals

8

Confidence Interval

A “confidence interval” consists of a range in which population parameter may fall and a confidence level The range is a lower and upper bound between which you

think the population parameter lies The confidence level is how sure you are that the parameter

in within this range

Interpretation: a 95% confidence interval means that 95% of similarly constructed intervals will contain the population parameter

Page 9: Confidence Intervals

9

Understanding Confidence Intervals

Remember that by the Central Limit Theorem, the sample mean is normally distributed with a mean . Also, recall that by the the empirical rule, if a variable is normally distributed then 95.5% of all possible values lie within 2 standard deviations of the mean. Therefore, 95.5% of all possible sample means lie within 2

standard deviations of the true mean Looking at it another way: the population mean lies within 2

standard deviations of any given sample mean Bottom line: If you haveX and add and subtract 2 standard

deviations from it, this is 95.5% confidence interval

Page 10: Confidence Intervals

10

Understanding Confidence Intervals

Suppose that you know the value of the true population number of customers a restaurant has and it is = 85 and suppose you know the standard deviation of the number of customers, = 35. Suppose we take a sample of 100 nights and calculate the sample mean number of customers the restaurant has.

The distribution the sample mean is drawn on the next page.

Page 11: Confidence Intervals

11

Distribution of the Sample Mean

X

5.3100

35X

85X

Page 12: Confidence Intervals

12

Understanding Confidence Intervals

If we move up 2 standard deviations and down 2 standard deviations from the true mean of 85 we get a range of 85-2(3.5) and 85+2(3.5) (78, 92)

The Empirical Rule tells us that 95.5% of all possible sample means lie within 2 standard deviations of the mean so 95.5% of all sample means lie within the range (78, 92)» If you go out and take a sample of size 100 and calculate the

sample mean,

– You may get a value of 86 – that # lies within this range

– You may get a value of 89 – that # lies within this range

– You may get a value of 79 – that # lies within this range

– You may get a value of 75 – that # DOES NOT lie within this range – but it is still a possible sample mean

Page 13: Confidence Intervals

13

In 95.5% of the samples of size 100, you will get a sample mean between 78 and 92. 95.5% of the time, you will get a sample mean that is

WITHIN 2 standard deviations of the true mean. 4.5% of the time, you will get a sample mean that is NOT

WITHIN 2 standard deviations of the true mean.

Page 14: Confidence Intervals

14

Turn this situation around: In 95.5% of the samples of size 100, the true mean is WITHIN 2 standard deviations of the sample mean So suppose we take a sample, of size n = 100, and get a

sample mean ofX = 82 and suppose the standard deviation of the sample is s = 35.

To calculate the 95.5% confidence interval, move 2 standard deviations below the sample mean and 2 standard deviations above the sample mean

100

35)2(82,

100

35)2(82

)89,78(

Page 15: Confidence Intervals

15

Important Notes

When we say move 2 standard deviations up and down from the sample mean, I am talking about 2 STANDARD DEVIATIONS OFX where

We may not know the true population standard deviation, , which we need to know to calculate the standard deviation of the sample mean, but we can just use the sample standard deviation, s, as an estimate of (this is what we did in the previous problem where we said suppose the sample standard deviation is 35 (and we used that number to in the calculation of the sample mean standard deviation)

nX

Page 16: Confidence Intervals

16

General Confidence Intervals

The most frequently used confidence levels are 80, 90, 95, and 99%.

Suppose we want to calculate a confidence interval based on a confidence level of L% (where L could be 80, 90, 95, 99, or any number) (they use 1- notation in the book)

Page 17: Confidence Intervals

17

Calculating a Confidence Interval

First, go out and take a sample of size n, and calculate the sample mean,X , and the standard deviation of the sample, s

To form the confidence level, add and subtract a certain number of standard deviations from the sample mean where a “standard deviation” is

The number of standard deviations you move is called the confidence coefficient, ZL, and is based on the confidence level L.

n

sX

n

sZX,

n

sZX LL

Page 18: Confidence Intervals

18

Confidence Coefficient

If your confidence level is L% then to calculate the appropriate confidence coefficient Take L/2 Look this number up as a probability in the standard normal

table (meaning, try to find the number as close to this in the body of the table because recall that the numbers in the body of the table are probabilities whereas the numbers on the left and top are Z’s)

Find the Z that this probability corresponds to This Z is your confidence coefficient

Page 19: Confidence Intervals

19

Intuition of Confidence Intervals

The basic idea of a confidence interval is that you want to start with the sample estimate (mean or proportion) and then move up and down a certain number of standard deviations so that you cover 95% (or 90% or 99% - depending on your confidence level) of the area.

The number of standard deviations you have to move depends on the confidence level. For a 95% confidence level you must move 1.96 standard deviations up and down so that 0.4750 is between 0 and 1.96 standard deviations and so 0.95 (2*0.475) is between –1.96 and 1.96 standard deviations

Page 20: Confidence Intervals

20

95% Confidence Interval

0 1.96

4750.0)96.1Z0(P

Z

4750.0)96.1Z0(P

-1.96

Page 21: Confidence Intervals

21

Example

Suppose you wanted to find the confidence coefficient for confidence level of 90% L = .90

» Take 0.90/2 = 0.45

» Try to find the number in the body as close to 0.45 as you can

» Note that you see a .4495 and a .4505 and these are as close to 0.45 as you can get (it doesn’t matter which of these two you choose, but we will go with the .4495 number)

» The .4495 number corresponds to a Z = 1.64 so the confidence coefficient for a 90% confidence interval is 1.64

If the confidence level is 95% then the confidence coefficient is 1.96

If the confidence level is 99% then the confidence coefficient is 2.57 (or 2.58)

Page 22: Confidence Intervals

22

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.07530.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.11410.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.15170.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.18790.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.22240.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.25490.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.28520.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.31330.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.33891 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.38301.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.40151.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.41771.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.43191.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.45451.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.46331.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.47061.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.47672 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.48572.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.48902.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.49162.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.49362.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.49522.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.49642.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.49742.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.49812.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.49863 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

Standard-Normal Distribution

Page 23: Confidence Intervals

23

Example

Dell Publishing samples 48 shipments to estimate the mean postal cost. The sample mean is $25.36 with a standard deviation of $4.80. Calculate the 98% confidence interval for the mean postal cost. Note that the sample size is > 30 so the Central Limit

Theorem applies and the sample mean is normally distributed X = 25.36, s = 4.80, n = 48, Z.98 = 2.33

98% Confidence interval for the mean postal cost is

n

sZX,

n

sZX LL

97.26,75.23

48

80.433.236.25,

48

80.433.236.25

Page 24: Confidence Intervals

24

Confidence Intervals for Proportions

Suppose we want to form a confidence interval for the population proportion, p

Recall, the distribution of the sample proportion

p

n

)p1(pp

pp

Page 25: Confidence Intervals

25

The confidence interval for the population proportion is calculated in a similar manner as that for the population mean

Where

Note that in calculating the standard deviation of the sample proportion, we are using our sample proportion,p, and not the population proportion, p» This is because we DON’T KNOW WHAT THE

POPULATION PROPORTION IS – WE ARE TRYING TO FORM AN INTERVAL IN WHICH WE THINK THE POPULATION PROPORTION IS

» So we use our best guess of p which isp

pLpL Zp,Zp

n

)p1(pp

Page 26: Confidence Intervals

26

Example

If in a sample of 1200 tourists, 840 plan to repeat their trips the following year. Calculate the 95% confidence interval of travelers who expect to repeat their trips p = 840/1200 = 0.70, n = 1200, Z.95 (calculated the same

way as with confidence intervals for means) = 1.96, and

The 95% confidence interval for the population proportion is

We are 95% sure that the true proportion of individuals who expect to repeat their trips is between .674 and .726

0132.01200

)70.01(70.0p

pLZp

726.0,674.0

0132.096.170.0,0132.096.170.0

Page 27: Confidence Intervals

27

Margin of Error or Sampling Error

Margin of Error or Sampling Error is the distance between the estimate and the true parameter Margin of Error = | estimate – parameter|

As a byproduct of a confidence interval, we can also calculate our maximum margin of error (once again, this “maximum” is in a probabilistic sense – for example, we are 95% sure that our maximum margin of error is a certain amount)

Confidence interval:

Margin of error:

pLZp

Page 28: Confidence Intervals

28

Margin of Error

Suppose the true proportion is 0.71 If your estimate is .69 then your margin of error is .02 If your estimate is .74 then your margin of error is .03

With a population proportion of 0.71, the standard deviation of the sample proportion is (assume a sample size, n = 200)

Note that 90% of the sample proportions are within 1.64 standard deviations of the true proportion 90% are between 0.71-(1.64)(0.032) and 0.71+(1.64)(0.032) 90% are between 0.658 and 0.763 The margin of error

032.0200

)71.01(71.0p

Page 29: Confidence Intervals

29

Note that 90% of the sample proportions are within 1.64 standard deviations of the true proportion 90% are between 0.71-(1.64)(0.032) and 0.71+(1.64)(0.032) 90% are between 0.658 and 0.763 The margin of error is less than (1.64)(.032) = 0.053 for 90%

of the sample proportions

If you form a 90% confidence interval, then you are also 90% sure that your margin of error is no larger than the part that you add and subtract from your estimate.

Page 30: Confidence Intervals

30

Calculating the Appropriate Sample Size: Means

In some applications, you may want to know how large a sample is necessary in order for your margin of error to be no larger than a certain amount Recall, the standardized value of x is found as

For the sample mean

Rearranging, and solving for n yields

» Where Z is determined by the confidence level, s is the standard deviation of the sample, andX - is the maximum margin of error

x

xxZ

n

sXX

Z X

X

X

2

22

)X(

sZn

Page 31: Confidence Intervals

31

Calculating the Appropriate Sample Size: Means

As a restaurant owner, you need to decide how much food to prepare each night. In a sample of 100 nights, the mean number of customers is 85 with a standard deviation of 38. How large must your sample be if you want to be 99% sure

that your sample error is no larger than 4? X - = 4, s = 38, Z = 2.57

» Therefore, plugging these numbers in to the formula from the last page

» Interpretation: You have to take a sample of 569 in order for you to be 99% sure that your maximum margin of error is no larger than 4 customers

5694

)38()57.2(

)X(

sZn

2

22

2

22

Page 32: Confidence Intervals

32

Calculating the Appropriate Sample Size: Proportions

In some applications, you may want to know how large a sample is necessary in order for your margin of error to be no larger than a certain amount Recall, the standardized value of x is found as

For the sample proportion:

» Note that we have usedp in calculating the standard deviation of the sample proportion.

x

xxZ

n)p1(p

pppZ

p

p

Page 33: Confidence Intervals

33

Calculating the Appropriate Sample Size: Proportions

Rearranging, and solving for n yields

» Where Z is determined by the confidence level, andp - p is the maximum margin of error

» Problem: we are trying to figure out how large a sample to take – so we haven’t taken a sample yet so we don’t know what p is

» Solution:

– Use 0.5 for the value of p

– Or take a pilot survey/sample (which is like a preliminary sample) to get an estimate of what estimate you are likely to get when you take your real sample

2

2

)pp(

)p1(pZn

Page 34: Confidence Intervals

34

Calculating the Appropriate Sample Size: Proportions

In a survey, CNN wants to estimate the proportion of Americans who plan to travel this Christmas. If they want to be 95% that their estimate is off by no more than 2%, how many people must they survey? They wantp - p = .02, and to be 95% sure they need to move

1.96 standard deviations (so Z = 1.96), therefore, using the formula on the previous page

Interpretation: if CNN takes a sample of 2401 then they can be 95% sure that their estimate is no more than 2% from the true population proportion

2401)02(.

5.96.1n

2

22

Page 35: Confidence Intervals

35

Small Sample Confidence Intervals

The previous section on constructing confidence intervals is valid if the sample size is > 30 The Central Limit Theorem only applies when the sample

size is > 30. If the sample size is < 30 then the sample mean is not

approximately normally distributed, but instead has a STUDENT-T distribution» The Student-t distribution looks like the normal distribution,

but it has more area in the tails of the distribution

» The ONLY difference when constructing confidence intervals for small samples versus constructing them for large samples is that for small samples, use a “t” number instead of a Z number.

Page 36: Confidence Intervals

36

In the formula

Use a “t” number instead of a ZL number

n

sZX,

n

sZX LL

Page 37: Confidence Intervals

37

Choosing a “t”

Which t number should you use? There is a different “t-distribution” for every “degree of

freedom”» DEGREES OF FREEDOM = n-1

» The numbers in the body of the t-distribution table are STANDARD DEVIATIONS – not probabilities

» If your sample size is 28 then you have 28-1=27 degrees of freedom.

As the degrees of freedom get larger, the t-distribution starts to look EXACTLY like the normal distribution

Page 38: Confidence Intervals

38

If you want to form a 90% confidence level (and you have 27 degrees of freedom) you want to choose the column that is headed by “.05” and the 27 degrees of freedom row You should find a number of 1.703

» t numbers are subscripted by the degrees of freedom and how much area is beyond a certain point– In this example, we would have t27,.05 = 1.703

» We will also start doing this with our Z numbers as well – subscripting the Z’s with the amount of area BEYOND a certain point

» If we had a LARGE sample and wanted to form a 95% confidence interval, we would use a Z.05 = 1.96

» Notice that the column headings are the areas in the TAILS of the t-distribution

» Therefore, there is .5-”area in the upper tail” between 0 and a given number of standard deviations

Page 39: Confidence Intervals

39

Interpretation: for a t-distribution with 27 degrees of freedom, you need to move 1.703 standard deviations from the mean to have .45 between 0 and 1.703 And therefore, .90 between –1.703 and 1.703 See the next slide for a graph Note: If you have a small sample, you cannot form any

general confidence intervals of any given confidence level – You can only look up confidence levels of 60%, 80%, 90%, 95%, 98%, and 99%

Page 40: Confidence Intervals

40

90% Confidence Interval with Small Samples

05.0)t703.1(P

0 1.703

45.0)703.1t0(P

Student-t

45.0)703.1t0(P

-1.703

Page 41: Confidence Intervals

41

Finding t-Numbers

Suppose you have 21 observations and you want to form a 95% confidence interval t20,.025 = 2.086

Suppose you have 30 observations and you want to form a 80% confidence interval t29,.10 = 1.311

Suppose you have 16 observations and you want to form a 99% confidence interval t15,.005 = 2.947

Page 42: Confidence Intervals

42

Example

Dell Publishing samples 18 shipments to estimate the mean postal cost. The sample mean is $25.36 with a standard deviation of $4.80. Calculate the 98% confidence interval for the mean postal cost. Note that the sample size is < 30 so the Central Limit

Theorem DOES NOT apply so the sample mean has a student t distribution

X = 25.36, s = 4.80, n = 18, t17,.01 = 2.567

98% Confidence interval for the mean postal cost is

n

stX,

n

stX 01,.1701,.17

26.28,46.22

18

80.4567.236.25,

18

80.4567.236.25