6.1 inference for a single proportion statistical confidence confidence intervals how...
Post on 05-Jan-2016
Embed Size (px)
6.1 Inference for a Single Proportion
Statistical confidenceConfidence intervalsHow confidence intervals behave
*Sampling Distribution of a Sample Proportion
After we have selected a sample, we know the responses of the individuals in the sample. However, the reason for taking the sample is to infer from that data some conclusion about the wider population represented by the sample.*Statistical InferenceStatistical inference provides methods for drawing conclusions about a population from sample data.PopulationSampleCollect data from a representative sample...Make an inference about the population.
Methods for drawing conclusions about a population from sample data are called statistical inference So well use data to make these inferences; i.e., draw conclusions about populations from data in our samples or from our experimentsWe'll consider two types of inference: Confidence interval estimationTests of significanceIn both of these cases, we'll consider our data as either being a random sample from a population or as data from a randomized experimentStart with estimation there are two situations we'll considerestimating the mean m of a population of measurementsestimating the proportion p of Ss in a population of Ss and Fs
In either case, we'll construct a confidence interval of the form estimate +/- M.O.E., where M.O.E. = margin of error of the estimator. The MOE gives information on how good the estimate is through the variation in the estimator (its standard error) and through the level of confidence in the confidence interval (through a tabulated value). The standard error of an estimator is its estimated standard deviation (treating the estimator as a statistic with a sampling distribution)Best estimator of m is and we will learn that is approximately Best estimator of p is phat and weve learned that phat is approx. . Well start here
In case of inference, well try to make sure that n is a fairly large sample this will assure normality of the sampling distribution of p-hat The mean and standard deviation of p-hat will be given by these formulas:We did a simulation using Table B and can use our results to show the formulas make sense
Ive modified Example 6.4 on page 320:
Assume p = 0.60; i.e., that 60% of the population are Success. We will simulate drawing a random sample of size 20 from the population
We can imitate the population by Table B, with each entry standing for a person. Six of the 10 digits (say 0 to 5) stand for people who are Success. The remaining four digits, 6 to 9, stand for Failure. Because all digits in a random number table are equally likely, this assignment produces a population proportion of Success equal to p = 0.60. We then imitate an SRS of 20 students from the population by taking 20 consecutive digits from Table B. The statistic is the proportion of 0s to 5s in the sample of size n = 20.
Here are the first 100 entries in Table B, with digits 0 to 5 highlighted What are the first 5 p-hats?? Continue with JMP
These samples show the sampling variability of p-hat: because the samples are random, we dont expect to get the same proportion of Ss in each sample of n=20 but notice that the variability in the p-hats can be characterized as normal I used the Random -> Binomial Formula in JMP & divided by 20.
*Sampling Distribution of a Sample Proportion
*Large-Sample Confidence Intervalfor a ProportionTo construct a confidence interval for an unknown population proportion p well use our best estimator p-hat and construct the CI as estimate +/- M.O.E. here the MOE is (value from Table) * (SE of estimator)
*How do we find the critical value for our confidence interval?If the Normal condition is met, we can use a Normal curve. To find a level C confidence interval, we need to catch the central area C under the standard Normal curve.For example, to find a 95% confidence interval, we use a critical value of 2 based on the 68-95-99.7 rule. Using a standard Normal table or a calculator, we can get a more accurate critical value. Note, the critical value z* is actually 1.96 for a 95% confidence level.Large-Sample Confidence Interval for a Proportion
*Once we find the critical value z*, our confidence interval for the population proportion p is:Large-Sample Confidence Interval for a Proportion
*Large-Sample Confidence Interval for a ProportionWhat does the CI for p actually mean? Heres a picture of (Figure 6.7 on page 327) 25 confidence intervals computed from 25 samples of the same size-note that they vary quite a bit, but only 1 out of the 25 actually misses the mean=p : approximately 95% of the confidence intervals computed this way should capture p inside
*ExampleIt is claimed that 50% of the beads in a container are red. A random sample of 251 beads is selected, of which 107 are red. Calculate and interpret a 90% confidence interval for the proportion of red beads in the container. Use your interval to comment on the claim that the beads in the container are red. For a 90% confidence level, z* = 1.645We are 90% confident that the interval from 0.375 to 0.477 captures the actual proportion of red beads in the container.Since this interval gives a range of plausible values for p and since 0.5 is not contained in the interval, we have reason to doubt the claim.
Confidence intervals contain the population proportion p in C% of samples, in the long run. Different areas under the curve give different confidence levels C. Example: For an 80% confidence level C, 80% of the normal curves area is contained in the interval.Cz*z*Varying confidence levelsPractical use of z: z* z* is related to the chosen confidence level C. C is the area under the standard normal curve between z* and z*.
How do we find specific z* values?We can use a table of z (Table A) or t values (Table D). In Table D, for a particular confidence level, C, the appropriate z* value is just above it. We can use software. In JMP: Create a new column, Edit Formula, and choose Normal Quantile( p ) under Probability where p = (1-C)/2 is the area to the left of z*Since we want the middle C probability, the probability we require is (1 - C)/2
Example: A 98% confidence level, Normal Quantile (.01) = 2.326349 (= neg. z*)Example: For a 98% confidence level, z*=2.326
Link between confidence level and margin of errorThe confidence level C determines the value of z* (in table A or D).The margin of error m also depends on z*.Higher confidence C implies a larger margin of error m (thus less precision in our estimates).
A lower confidence level C produces a smaller margin of error m (thus better precision in our estimates).
The margin of error is smaller whenz* (and thus the confidence level C) gets smallerp(1-p) is smallern is larger this is the usual way to decrease MOE increase the sample size!
Properties of Confidence IntervalsUser chooses the confidence level, C, and hence z*Margin of error follows from this choice as (z*)(SE of estimator)We wantA high level of confidenceA small margin of error
Interpretation of Confidence IntervalsConditions under which an inference method is valid are never fully met in practice. Exploratory data analysis and judgment should be used when deciding whether or not to use a statistical procedure.Any individual confidence interval either will or will not contain the true population proportion, p. It is wrong to say that the probability is 95% that the true proportion falls in the confidence interval. The correct interpretation of a 95% confidence interval is that we are 95% confident that the true proportion falls within the interval. The confidence interval was calculated by a method that gives correct results in ~95% of all possible samples. (See slide #13 above!) In other words, if many such confidence intervals were constructed, ~95% of these intervals would contain the true proportion. HW: Read Introduction to Chapter 6 and Section 6.1 - 6.1.6; do # 6.3, 6.5-6.9
Previous HW: Read section 5.5; omit section 5.6Do Exercises #5.85, 5.87- 5.90, 5.93-5.95, 5.99, 5.100, 5.102, 5.144