chapter 8: confidence intervals€¦ · section 8.3: confidence interval for a population...

21
Intro to Statistics Class Notes – Professor Tran Page 1 Problem: In a survey of 1007 people, the report tells us that 85% of adults’ population knows what Twitter is. Questions: There are 325.7 million people in the U.S. in 2017. How do we use 1007 respondents to reflect the whole U.S population? Goal: Use sample data to make inferences about a population – inferential statistics. For this section: use a sample proportion to make an inference for a population proportion. Population proportion is a parameter that describes a percentage value associated with a population. Point Estimate A single value used to approximate a population parameter. p : sample proportion. p: population proportion. p is the best point estimate of the population proportion p. x p n , x= subjects, n=sample size. Example: A survey conducted at a local high school found that 700 of 1000 students use Facebook. Based on the result, find the point estimate of the proportion for all students who use Facebook. 700 0.7 1000 x p n Confidence Intervals(CI) An interval of numbers, along with a measure of the likelihood that the interval contains the unknown parameter based on the point estimate. Confidence Level The probability that the confidence interval actually does contain the population parameter, assuming that the estimation process is repeated a large number of times. The notation we use is (1 ) 100% for the confidence interval. Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals

Upload: others

Post on 14-Jun-2020

45 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 1

Problem: In a survey of 1007 people, the report tells us that 85% of adults’ population

knows what Twitter is.

Questions: There are 325.7 million people in the U.S. in 2017. How do we use 1007

respondents to reflect the whole U.S population?

Goal: Use sample data to make inferences about a population – inferential statistics. For this

section: use a sample proportion to make an inference for a population proportion.

Population proportion is a parameter that describes a percentage value associated with a

population.

Point Estimate

A single value used to approximate a population parameter.

p : sample proportion.

p: population proportion.

p is the best point estimate of the population proportion p.

x

pn

, x= subjects, n=sample size.

Example: A survey conducted at a local high school found that 700 of 1000 students use

Facebook. Based on the result, find the point estimate of the proportion for all students

who use Facebook.

7000.7

1000

xp

n

Confidence Intervals(CI)

An interval of numbers, along with a measure of the likelihood that the interval

contains the unknown parameter based on the point estimate.

Confidence Level

The probability that the confidence interval actually does contain the population

parameter, assuming that the estimation process is repeated a large number of

times. The notation we use is (1 ) 100% for the confidence interval.

Section 8.3: Confidence Interval for a Population Proportion

Chapter 8: Confidence Intervals

Duy
Pencil
Page 2: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 2

Most Common Confidence Levels Corresponding Values of 90% (or 0.90) confidence level =0.10 95% (or 0.95) confidence level =0.05 99% (or 0.99) confidence level =0.01

is the probability that the confidence interval will not contain the true parameter

value.

For =0.01, the confidence level is (1 ) 100% (1 0.01) 100% 99% . It tells us

that if 100 different confidence intervals are constructed, each based on a

different sample from the same population, then we will expect 99 of the intervals

to contain the true parameter and 1 not to include the parameter.

95% confidence level is the most common because it provides a good balance

between precision and reliability.

Critical Values: /2z

Recall that a z-score is the number of standard deviations from the mean a data

point is.

/2z : the value of z-score on the standard normal distribution with 2

area to the

right. This number is on the borderline separating sample statistics that are likely

to occur from those that are unlikely.

Example: Find the critical value /2z corresponding to a 95% confidence level.

95% confidence level corresponds to =0.05, and thus 0.0252

. This is the red

area to the right tail, and thus the cumulative area to its left must be 1 0.025 0.975

Page 3: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 3

Thus, /2 1.96z

To find the critical values using TI-83/84 Plus:

1. 2nd then VARS (which is DISTR) and then go to invNorm

2. Enter invNorm(area to the left)=invNorm(0.9750)=1.959963986=1.96

Note: DISTR = Distributions. invNorm ( )computes the inverse cumulative normal

distribution function for a given area under the normal distribution curve.

Page 4: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 4

Confidence Level Critical Value, /2z

90% 0.10 1.645 95% 0.05 1.96

99% 0.01 2.575

Margin of Error (E):

The maximum likely difference between the observed sample proportion p and the

true value of the population proportion p.

/2 /2

1p ppqE z z

n n

Example: for a 95% confidence level, 0.05 , so there is a probability of 0.05 that

the sample proportion will be in error by more than E.

Constructing Confidence Interval (CI) for p:

Requirements for creating a confidence interval about p of size n from a population of size

N:

1. The sample is a simple random sample (a subset of a statistical population in which

each member of a subset has an equal probability of being chosen).

2. Satisfies binomial distribution.

3. Satisfies normal approximation to binomial (there are at least 5 successes and at

least 5 failures) or (1 ) 10np p

Confidence Interval: p E or ( , )p E p E

Lower bound:

/2

1p pp E p z

n

Upper bound:

/2

1p pp E p z

n

Duy
Pencil
Duy
Sticky Note
Accepted set by Duy
Duy
Sticky Note
None set by Duy
Duy
Sticky Note
Accepted set by Duy
Duy
Pencil
Duy
Pencil
Page 5: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 5

Example: A Pew Research Center poll of 1007 randomly selected U.S. adults showed that

85% of the respondents know what Twitter is. The sample results are 1007n and

0.85p .

a) Find the margin of error E that corresponds to a 95% confidence level.

b) Find the 95% confidence interval estimate of the population proportion p.

c) Based on the results, can we safely conclude that more than 75% of adults know

what Twitter is?

Answers:

Duy
Pencil
Page 6: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 6

Conclusion: We are 95% confident that the proportion of adults who know Twitter is

between 0.828 and 0.872.

Determining Sample Size:

How many sample units ‘n’ must be obtained?

When an estimate p is known:

2

/2

2

z pqn

E

When no estimate p is known:

2

/2

2

0.25zn

E

Round-Off rule: always round ‘n’ up to the next larger whole number if ‘n’ is not a

whole number.

Example: Gap, Banana Republic, J. Crew, Yahoo, and America Online are just a few of the many

companies interested in knowing the percentage of adults who buy clothing online. How many

adults must be surveyed in order to be 95% confident that the sample percentage is in error by no

more than three percentage point?

a) Using this recent result from the Census Bureau: 66% of adults buy clothing online

b) Assume that we have no prior information suggesting a possible value of proportion.

Duy
Pencil
Page 7: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 7

Finding the Point Estimate and E from a Confidence Interval

(upper confidence interval limit) + (lower confidence interval limit)

2p

(upper confidence interval limit) - (lower confidence interval limit)

2E

Duy
Pencil
Page 8: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 8

TI-84 Plus Calculator Applications

Example: A Pew Research Center poll of 1007 randomly selected U.S. adults showed that

85% of the respondents know what Twitter is. The sample results are 1007n and

0.85p

Find the 95% confidence interval estimate of the population proportion p

On TI 83/84 Calculator:

1. Press STAT, click on right arrow to TESTS, move down to 1-PropZInt, press Enter.

2. Input given information:

X (the number observed): 856

N (sample size): 1007

C-Level (confidence level): 0.95

3. Arrow down to Calculate, press Enter to run.

We can see that the confidence interval is (0.828,0.8721), which is what we got earlier.

Note: The 1-PropZInt is single proportion z interval that calculates a confidence interval for a

population, at a specific confidence level. For example, if the confidence level is 95%, you are

95% certain that the proportion lies within the interval you get. The command assumes that

the sample is large enough that the normal approximation to binomial distribution is valid.

Duy
Pencil
Page 9: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 1

Problem: We would like to estimate the mean GPA of US senior high school students using

our class data (assuming it is a representative random sample). Moreover, we wish to

determine the margin of error for our estimate to have a measure of its precision.

Goal: In the last section, we used sample proportion to estimate population proportion.

Now, we use sample mean to make inference about the value of the population mean.

Part I: Estimating a Population Mean When Standard Deviation Is Not Known

Student’s t-distribution is used when the population standard deviation is not known

and the sample size is small (n < 30). If is known, then using the normal distribution is

correct.

Student t Distribution

Let = population mean, x = sample mean, n = number of sample values taken

from a population, E = margin of error, s = sample standard deviation.

If a population has a normal distribution, then the sample drawn has a distribution

of Student’s t-distribution with degree of freedom df = n-1.

xt

s

n

If the original population is not itself normally distributed, we use the condition

n>30 for justifying use of the normal distribution.

We can think of the t-distribution as the z distribution but with an adjusted standard

deviation that increases for smaller sample sizes to account for a larger margin of

error.

Degree of freedom df = n-1: the number of values in the final calculation of a

statistic that are free to vary. For instance, if 10 test scores have the restriction that

their mean is 80, then their sum must be 800, and we can freely assign values to the

first 9 scores, but the 10th score would then be determined.

The critical value /2t can be found using technology TI 83/84 or Table A-3.

Sections 8.1 & 8.2: Confidence Interval for a Population Mean

Page 10: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 2

Example: A sample size of n=12 is a simple random sample selected from a normally

distributed population. Find the critical value /2t corresponding to a 95% confidence level.

Answer: n=12 , thus df = 12 – 1 = 11.

95% confidence interval 0.05 0.0252

We use the command:

invT(area to the left of value, degrees of freedom)

On TI 83/84 press: 2nd, Dist (Vars), arrow down to invT, press Enter

Please note that this invT() is not available for TI 83, you need to download the program to

your TI 83.

Or we can use table A-3 as in the next page:

Page 11: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 3

Constructing Confidence Interval (CI) for :

Requirements:

1. The sample is a simple random sample.

2. Either or both of these conditions is satisfied: The population is normally

distributed or n>30.

Confidence Interval: x E x E

Let = population mean, x = sample mean, n = number of sample values taken from a

population, E = margin of error, s = sample standard deviation.

Lower bound: /2

sx t

n

Upper bound: /2

sx t

n

Note: As the level of confidence increases, the margin of error increases and the interval

becomes bigger. As the sample size increases, the margin of error decreases and the interval

becomes smaller.

Page 12: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 4

Example: Constructing a confidence interval for Highway Speeds.

The data below lists the speeds (mi/h) measured from southbound traffic on I-280 near

Cupertino, California. This simple random sample was obtained at 3:30pm on a weekday.

The speed limit for this road is 65 mi/h. Use the sample data to construct a 95% confidence

interval for the mean speed. What does the confidence interval suggest about the speed

limit?

62 61 61 57 61 54 59 58 59 69 60 67

Answers:

Requirements check: This is a simple random sample, and we can use the dotplot to verify

that the speeds have a distribution that is not dramatically different from a normal

distribution.

Conclusion: It appears that the mean speed is below the speed limit of 65 mph.

To find sample mean and standard deviation using TI 83/84:

1. STAT , choose Edit… , enter the data in column L1

2. Go back to STAT, right arrow to CALC, press 1-Var Stats, 2nd 1 (L1), hit Enter

3. x is the sample mean, and Sx is the sample standard deviation

Page 13: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 5

Finding a Point Estimate and E from a Confidence Interval

Similar to section 7.2:

(upper confidence interval limit) + (lower confidence interval limit)

2x

Determining Sample Size

How many sample units ‘n’ must be obtained?

2

/2zn

E

, =population standard population, and we use z-statistics, not t-

statistics.

Usually is unknown, we can use the range rule of thumb 4

range , or we can use

the first several values of the sample to calculate the sample standard deviation ‘s’,

and use it in the place of , and thus usually the formula becomes 2

/2z sn

E

Example: Assume that we want to estimate the mean IQ score for the population of

statistics students. How many statistics students must be randomly selected for IQ tests of

we want 95% confidence that the sample mean is within 3 IQ points of the population

means. Assume s=15.

Answer: because we want the sample mean to be within 3 IQ points of , the margin of

error is E=3.

Page 14: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 6

Part II: Estimating a Population Mean When Standard Deviation Is Known

It is very rare in real world that we know the standard deviation of the population. If we do,

we can construct the confidence interval using standard normal distribution instead of

Student’s t-distribution.

Lower bound: /2x zn

Upper bound: /2x zn

Example: Constructing a confidence interval for Highway Speeds.

The data below lists the speeds (mi/h) measured from southbound traffic on I-280 near

Cupertino, California. This simple random sample was obtained at 3:30pm on a weekday.

The speed limit for this road is 65 mi/h. Use the sample data to construct a 95% confidence

interval for the mean speed, assuming that 4.1 . What does the confidence interval

suggest about the speed limit?

62 61 61 57 61 54 59 58 59 69 60 67

Answer: We already confirmed the requirement checks for earlier example.

It appears that the mean speed is still below the

speed limit of 65 mph.

Page 15: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 7

TI-84 Plus Calculator Applications

Example: Constructing a confidence interval for Highway Speeds.

The data below lists the speeds (mi/h) measured from southbound traffic on I-280 near Cupertino,

California. This simple random sample was obtained at 3:30pm on a weekday. The speed limit for

this road is 65 mi/h. Use the sample data to construct a 95% confidence interval for the mean

speed.

62 61 61 57 61 54 59 58 59 69 60 67

On TI 83/84 Calculator:

1. Input data in a list: STAT, in EDIT, press Edit…. Then enter all the data in column L1,

L2, or L3.

2. STAT, right arrow to TESTS, pick Tinterval.

3. On the screen, input the details:

Inpt: Data (choose Data, hit Enter)

List: pick the list (L1, L2, or L3) that you stored the data before.

Freq: 1

C-Level: (confidence level) = 0.95 for 95%

4. Arrow down to Calculate, hit Enter.

The TInterval command calculates

a confidence interval for the mean

value of a population, at a specific

confidence level: for example, if the

confidence level is 95%, you are 95%

certain that the mean lies within the

interval you get. Use TInterval

when you have a single variable to

analyze, and don't know the standard

deviation. The TInterval assumes

that your distribution is normal, but it

will work for other distributions if the

sample size is large enough.

Page 16: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 1

Goal: In section 7.2, we used sample proportion p to estimate population proportion p. In

section 7.3, we used sample mean x to estimate population mean . In this section, we use

sample standard deviation s (or sample variance 2s ) to estimate the population standard

deviation (or population variance 2 ).

Point Estimate:

The sample variance 2s is the best point estimate for population variance 2 , but the

sample standard deviation s is not the best point estimate (it is a biased estimator) for the

population standard deviation .

Chi-Square Distribution ( 2 ):

We use Chi-Square 2 distribution to construct the confidence interval for this section.

Sample Statistics: 2

2

2

( 1)n s

We need a normally distributed population.

2 : population variance, n: sample size, 2s : sample variance.

n-1: number of degree of freedom (df).

It’s not symmetrical like the normal and Student’s t distributions.

Since it’s not symmetrical, the critical values are different for the left-tailed 2

L and

the right-tailed 2

R . Use technology or table A4 to find them.

When degree of freedom increases, the Chi-Square becomes more symmetric.

2 can be zero or positive (not negative).

Section 8.4: Confidence Interval for a Standard Deviation or Variance

Duy
Pencil
Duy
Pencil
Page 17: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 2

Example: Find Critical Values of 2

A simple random sample of 22 IQ scores is obtained. Construction of a confidence interval

for the population standard deviation requires the left and the right critical values of 2

corresponding to a confidence level of 95% and a sample size of n=22. Find the critical

value of 2 separating an area of 0.025 in the left tail, and find the critical value of 2

separating an area of 0.025 in the right tail.

Answer:

n=22, thus df = n -1 = 21

Table A-4 ONLY gives the

critical values of the area to

the right of the critical values!

Page 18: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 3

Confidence Interval:

Requirements:

Simple random sample.

The population must have normally distributed values (even if the sample is large).

The requirement of a normal distribution is much stricter here than in earlier

sections, so departures from normal distribution can result in large errors.

Confidence Interval for the Population Variance 2 : 2 2

2

2 2

( 1) ( 1)s

R L

n s n

Confidence Interval for the Population Standard Deviation :

2 2

2 2

( 1) ( 1)s

R L

n s n

Page 19: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 4

Example: The table below shows the sale price of 12 randomly selected 6-year-old Chevy

Corvettes. Construct a 90% confidence interval for the population variance and standard

deviation of the price of a 6-year-old Chevy Corvette.

$41,844 $41,500 $39,995 $36,995 $40,990 $37,995 $41,995 $38,900 $42,995 $36,995 $43,995 $35,950

Check the requirements: This is simple random sample, and a normal probability

plot below suggests that the price could be normally distributed (since the plot has a

linear shape).

Page 20: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 5

To find sample standard deviation s, use TI 83/84:

Press STAT, on EDIT, choose Edit…, enter the data to column L1

Press STAT, right arrow to CALC, choose 1-Var Stats, type L1 to the List, arrow

down to CALCULATE, hit ENTER

Page 21: Chapter 8: Confidence Intervals€¦ · Section 8.3: Confidence Interval for a Population Proportion Chapter 8: Confidence Intervals. Intro to Statistics Class Notes – Professor

Intro to Statistics Class Notes – Professor Tran Page 6

Conclusion: we

are 90%

confident that

the population

standard

deviation of the

price of all 6-

year-old Chevy

Corvettes is

between $1955

and $4055