chap6-12

Chapter 6: Point and Interval Estimation

Any numerical feature of a population, such as mean and variance, is called a parameter. Statistical Inference deals with drawing conclusions about population parameters from the analysis of information contained in the sample data. Studying the whole population is usually impractical; that is why we study part of it.

Statistical Inference Theory may be broadly classified into three categories:

Point Estimation, providing a guessing value or an estimate for the unknown true parameter value based on the data

Interval Estimation, providing an interval of plausible values for the parameter, and determines the accuracy of the procedure

Hypothesis Testing, helping decide whether the parameter value equals to some pre-assumed value.

School of Economics, SMU Chapter 6

STAT151, Term II 11-12 1 Zhenlin Yang, SMU

Learning Objectives

Point Estimation

⎯ Method of Moments

⎯ Method of Maximum Likelihood

⎯ Method of Least Squares

Confidence Intervals

⎯ Case of one population

⎯ Case of two populations

Sample Size Estimation



6.1. Point Estimation

Definition 6.1. Let θ denote the population parameter(s) that inferences concern, and let X1, X2, . . . , Xn be a random sample of size n taken from the population. Point estimation of θ is just to find a statistic such that its value computed from sample data would reflect the value of θ as closely as possible. Such a statistic is called an estimator of θ, denoted by θ̂ , and a specific value of θ̂ computed from the observed sample data is called an estimate of θ.

Example 6.1. Suppose that a firm wants to estimate μ, the mean hourly wages of skilled workers. A random sample of 16 workers is chosen and their wages are: $6.80, 9.40, 7.90, 12.10, 10.84, 7.74, 8.84, 9.60, 8.30, 11.60, 10.48, 9.00, 10.20, 12.50, 8.00, 9.20. The sample mean is

X = 16

50.152$ = $9.53.



Thus we could say that the mean hourly wages of skilled Seattle workers is estimated to be $9.53.

We must recognize, however, that if we had taken a different set of 16 observations, we would probably have obtained a different estimate.

Indeed, if we had repeated the sampling again and again, we could have obtained a sequence of different values for X , such as $9.83, 9.07, 10.09, . . . .

Hence using sample mean X to estimate population mean μ, it is almost certain that a specific sample mean that we happen to observe will be a bit high or a bit low. The questions need to be addressed are:

• On average, will X be on the target, μ?

• How close will the value of X be to μ?

• Is μ equal to a prescribed value $10.00, say?



The three questions are related, respectively, to point estimation, confidence interval and hypothesis testing. Answers of these questions depend on the sampling distributional properties of X , or generally the sampling distributional properties of the estimator involved. How do we choose an estimator? There are several fundamental principles for estimating population parameter(s), or in general model parameters:

Analogy principle, or method of moments: using a sample quantity to estimate the corresponding population quantity.

Likelihood principle, from a given family of distributions indexed by some parameters (θ, say), find the one (value of θ) most likely to have produced the sample data.

Least squares principle, minimizing the sum of squared differences between the data and the population values.



In many simple situations such as:

• Estimating the population mean

• Estimating the population proportion

• Estimating the difference between two population means,

• Estimating the difference between two population proportions, etc.,

the three principles produce identical estimates. But many other situations they give different estimates. Method of maximum likelihood requires the knowledge of population distributions, whereas the method of moments and the method of least squares do not. However, the method of moments and the method of least squares may not provide a solution for the complicated models. We introduce the three principles by using the following simple example.



Example 6.2. Let X1, X2, . . . , Xn be a random sample of size n taken from the normal population with mean μ and variance 1. We are interested in estimation the population mean parameter μ.

Method of Moments Estimator (MME). The first sample moment is sample mean X , thus the MME of the population mean μ is given by X .

Maximum Likelihood Estimator (MLE): As Xi ~ N(μ, 1), then its pdf is

⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2)(21exp

21),( μπ

μ ii xxf , i = 1, 2, . . . , n.

• As X1, X2, . . . , Xn are independent having the same distribution, their joint distribution is

⎟⎟⎠

⎞⎜⎜⎝

⎛−−⎟⎟

⎠

⎞⎜⎜⎝

⎛=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −−⎟⎟⎠

⎞⎜⎜⎝

⎛=⋅⋅⋅

∑

∏

=

=

n

ii

n

n

ii

n

n

x

xxfxf

1

2

1

21

)(21exp

21

)(21exp

21),(),(

μπ

μπ

μμ



• The function,

⎟⎟⎠

⎞⎜⎜⎝

⎛−−⎟⎟

⎠

⎞⎜⎜⎝

⎛=⋅⋅⋅= ∑

=

n

ii

n

n xxfxfL1

21 )(

21exp

21),(),()( μπ

μμμ

now viewed as a function of μ, is called the likelihood function of μ.

• Its natural logarithm is called the log-likelihood function of μ :

∑=

−−−==n

iixnL

1

2)(21)2ln(

2)](ln[)( μπμμ

• Maximizing L(μ), or equivalently )(μ , gives μ̂ , which maximizes the

likelihood for the observed data x1, x2, . . . , xn to have come from the population N( μ̂ , 1) , or for N( μ̂ , 1) to have produced the data.

• Finally differentiating )(μ gives likelihood equation:

∑=

−=n

iix

1)(0 μ

and the solution, x=μ̂ , gives a maximum likelihood estimate of μ.



In general, X=μ̂ , the mean of X1, X2, . . . , Xn before being ‘observed’

is called the Maximum Likelihood Estimator, or MLE, of μ.

Least Squares Estimator (LSE): Use the results E[Xi] = μ, and consider the deviations: (Xi − μ), i = 1, 2, . . . , n. Every observation may differ from its expectation. Define the sum of squared deviations,

∑ =−= n

i iXQ1

2)()( μμ , i = 1, 2, . . . , n.

Minimizing the sum of squared deviations gives X=μ̂ , called the least

squares estimator, or LSE, of μ.



Example 6.3. Let X1, X2, . . . , Xn be a random sample of size n taken from the normal population with mean μ and variance σ2. Find the MMEs, and MLEs of μ and σ2. Solution:

MME: The kth sample moment is defined as ∑=

=n

i

kik X

nm

1

1

There are two parameters to be estimated, thus we need the first two sample moments:

∑=

=n

iiX

nm

11

1 and ∑=

=n

iiX

nm

1

22

1

The first two population moments are:

E(X) = μ and E(X2) = σ2 + μ2.

Set m1 = E(X) and m2 = E(X2) and solve for μ and σ2. That is

μ=∑=

n

iiX

n 1

1 and 22

1

21 μσ +=∑=

n

iiX

n,



which gives the method of moments estimators

X=μ̂ and 2

1

2

1

22 )(11ˆ XXn

XXn

n

ii

n

ii −=−= ∑∑

==

σ .

MLE: The likelihood function now becomes

⎟⎟⎠

⎞⎜⎜⎝

⎛−−⎟⎟

⎠

⎞⎜⎜⎝

⎛= ∑

=

n

ii

n

xL1

22

2 )(2

1exp21),( μ

σσπσμ

The log-likelihood is

∑=

−−−−=n

iixnn

1

22

22 )(2

1)ln(2

)2ln(2

),( μσ

σπσμ

Differentiating ),( 2σμ with respect to μ, and then with respect to σ2, and

setting the two partial derivatives to zero, we have

∑=

=−n

iix

10)( μ and 0)(

21

2 1

242 =−+− ∑

=

n

iixn μ

σσ.

Solving for for μ and σ2, we obtain the MLEs:

X=μ̂ and 2

1

2

1

22 )(11ˆ XXn

XXn

n

ii

n

ii −=−= ∑∑

==

σ .



What estimator is considered to be an appropriate one?

The common criterions for judging an estimator are: (i) whether it is unbiased, (ii) whether it has a small variance or a small mean square error.

Definition 6.2. (Unbiasedness) If E(θ̂ ) = θ, then θ̂ is called an unbiased estimator of θ. Otherwise, it is a biased estimator of θ with Bias = E(θ̂ ) – θ.

Unbiasedness says that although a specific value of θ̂ is a bit off but on average it will be on target; Bias means that the values of θ̂ will be off the target on average.

In case of choosing among several unbiased estimators of the same parameter, it is clear that the one with smallest variance is preferred.

In case of choosing among several estimators, biased or unbiased, the one with smallest mean square error, E(θ̂ −θ)2, is preferred.



Example 6.4. Continuing on Example 6.4, one show (a) the sample mean ∑ =

−= n

i iXn1

1μ̂ is an unbiased estimator of μ,

(b) the sample variance ∑ =− −−= n

i i XXns1

212 )()1( is an unbiased estimator

of σ2, and (c) the MME or MLE ∑ =

− −= n

i i XXn1

212 )(σ̂ is a biased estimator of σ2.

Solution: On White Board



6.2. Interval Estimation for One Population

A point estimator provides only a single number as an estimate of the parameter. Its accuracy needs to be assessed in terms of the standard error, or in general, the confidence interval, i.e., an interval of values that is likely to contain the true value of the parameter.

Definition 6.3. Let X1, X2, . . . , Xn be a random sample and θ be an unknown population parameter. Let (1–α) be a specified high probability and L(X) and U(X) be functions of X = (X1, X2, . . . , Xn), such that

P[L(X) < θ < U(X)] = 1 – α

Then, the interval {L(X), U(X)} is called a 100(1−α)% confidence interval

(C.I.) for θ. L(X) is called the lower confidence limit, and U(X) is called the

upper confidence limit. (1−α) is called the confidence coefficient (level).



Z–Interval for the Mean of a Normal Population

Suppose we want to construct a confidence interval for the mean μ of a normal population N(μ, σ2) with σ known, based on a random sample X1, X2, . . . , Xn. As X ~ N(μ, σ2/n), we have

Z = nXσ

μ− ~ N(0, 1)

Hence,

P{–1.96 ≤ nXσ

μ− ≤ 1.96} = 0.95

⇔ P{ X − 1.96 nσ ≤ μ ≤ X + 1.96 nσ } = 0.95.

Following Definition 6.6, the interval

{ X − 1.96 nσ , X + 1.96 nσ }

forms a 95% C.I. for μ.



In general, let 2αZ denotes the upper

α/2 point on the standard normal curve as shown on the right. We have the following general definition.

− Zα 2 Zα 2

Result 6.1. Let X1, X2, . . . , Xn be a random sample from N(μ, σ2) with σ known. A 100(1−α)% C.I. for μ is

⎭⎬⎫

⎩⎨⎧ +≤≤−

nZX

nZX σμσ

αα 22

• The above CI is called the Z–interval since it is based on the Z–statistic.

• It is a random interval and covers the fixed μ with probability (1−α).



• Once a sample data is obtained, the lower and upper limits can be calculated to give an interval estimate for μ.

Example 6.5. In order to estimate the average amount that families of four people spent on groceries each week at a certain supermarket, a sample of 15 people was taken. The sample mean was calculated as $27.85, while the standard deviation σ was assumed known and equal to 1.50 (obtained from other similar studies). The data could be safely assumed to be normal. Calculate a 99% C.I. for average weekly amount spent on groceries at the supermarket by families of four in the population.

Solution: i) X = 27.85, σ = 1.5, n = 15, α = 0.01, Zα 2 = 2.58,

ii) n

Z σα 2 = 2.58

1550.1 = 1.00,

iii) X −n

Z σα 2 = 27.85−1.00 = 26.85, X +

nZ σ

α 2 = 27.85+1.00 = 28.85,



iv) The 99% C.I. for “average weekly amount spent on groceries at the supermarket by families of four in the population” is {$26.85, $28.85}.

This is interpreted as saying that although the average amount spent is unknown, we are 99% confident that it lies in is {$26.85, $28.85}.

• we must never write P{26.85 ≤ μ ≤ 28.85} = 0.95, or say anything similarly, because no random variables involved in the expression;

• The interval (26.85, 28.85) either covers or does not cover μ, and we will never known which is the case;

• If you draw anther sample, you will end up a different interval, hence you could have a “population of intervals” by repeated sampling. Some intervals cover μ and some don't. The percentage of intervals which do cover μ is 99%. That is, the long-run frequency of having correct intervals is 99%.



95% C.I.

Z–Interval for the Mean of a Non-Normal Population

Let now X1, X2, . . . , Xn represent a random sample from an arbitrary population with mean μ and standard deviation σ.

The interval given in Result 6.2 is valid for a normal population no matter the sample size n is large or small. It is also approximately valid for a non-normal population if the sample size is large. This is guaranteed by the CLT. The larger the n, the better is the approximation.



Example 6.6. Given a random sample of 100 observations from a population for which μ is unknown and σ = 8 suppose that the sample mean is found to be 42.7. Provide a 95% confidence interval for μ.

Solution: i) Here the distribution is unknown, but sample size n = 100 is large, so that the normal approximation to the distribution of X is justified.

ii) X = 42.7, σ = 8, n = 100, α = 0.05, Zα 2 = 1.96;

iii) X −n

Z σα 2 = 42.7 – 1.96

1008 = 41.1,

X +n

Z σα 2 = 42.7 + 1.96

1008 = 44.3;

iv) The 95% C.I. for μ is {41.1, 44.3),

This is interpreted as saying that although the true population mean is unknown, we are 95% confident that it lies in the interval {41.1, 44.3}.



When σ is unknown, we can replace σ by s, the sample standard deviation, in the C.I. given by Result 6.2 because LLN says that, when n is large s is close to σ. The resulted interval is

⎭⎬⎫

⎩⎨⎧ ±

nsZX 2α

is an approximated 100(1−α)% C.I. for μ , called the Large Sample Z C.I.

Example 6.7. To estimate the average weekly income of restaurant waiters and waitresses in a large city, an investigator collects weekly income data from a sample of 75 restaurant workers. The mean and the standard deviation are found to be $127 and $15, respectively. Compute (a) 90% and (b) 80% C.I.'s for the mean weekly income.

Solution: i) Since n =75 is large, use of the large sample Z C.I. is appropriate;



ii) X = 127, s = 15; for α = 0.1, 0.2, Zα 2 = 1.64, 1.28;

iii) 90% C.I.: ⎭⎬⎫

⎩⎨⎧ ±

nsZX 2α =

⎭⎬⎫

⎩⎨⎧ ±

751564.1127 = {127 ± 2.84},

80% C.I.: ⎭⎬⎫

⎩⎨⎧ ±

nsZX 2α =

⎭⎬⎫

⎩⎨⎧ ±

751528.1127 = {127 ± 2.22};

Comparing the two results, the 80% C.I. is shorter than the 90% C.I.. A shorter interval seems to give a more precise location for μ but suffer from a lower long-run frequency of being correct.



Z–Interval for a Population Proportion A special non-normal distribution is Binomial. Let Y ~ Binomial(n, π). To estimate π, the population proportion of successes, we use π̂ = Y n , the sample proportion (it can be shown that it is MME and MLE of π).

By CLT, π̂ is approximately distributed as normal with a mean of π and a standard deviation of n)1( ππ − provided that both nπ ≥ 5 and n(1–π) ≥ 5.

The situation is a bit more complicated because π, the parameter of interest, is also involved in the expression of the standard deviation. However, LLN allows to estimate n)1( ππ − by n)ˆ1(ˆ ππ − , if n π̂ ≥ 5 and n(1– π̂ ) ≥ 5.

Result 6.2. Consider Y ~ Binomial(n, π). Define π̂ = Y n . If n π̂ ≥ 5 and n(1−π̂ ) ≥ 5, then an approximate 100(1–α)% C.I. for π is

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧ −+−−

nZ

nZ )ˆ1(ˆˆ,)ˆ1(ˆˆ 22

ππππππ αα



• A Binomial r.v. represents the total number of successes in n trials. Hence the sample proportion π̂ is similar to the sample average. This is why the CLT can be directly applied to it.

• There are other distributions, such as Exponential and Uniform, which are similar to the Binomial, i.e., the variance depends on the mean. The treatment is similar.

Example 6.8. A government agency wishes to assess the prevailing rate of unemployment in a particular country. It is correctly felt that this assessment must be made quickly and effectively by sampling a small fraction of labor force in the country and counting the number of persons currently unemployed. Suppose that a random sample of 500 persons is interviewed and that 41 are found to be unemployed. Compute a 95% confidence interval for the rate of unemployment in the country.



Solution: • Since n π̂ = 41 ≥ 5 and n(1−π̂ ) =459 ≥ 5, the use of the C.I. in Result

6.2 is justified. • α/2=0.025, 025.0Z = 1.96, π̂ = 41/500 = .082.

• n

Z )ˆ1(ˆ2

ππα

− = 1.96 500

918.082. × = 1.96 × .012 = .024

• Therefore, a 95% C.I. for the rate of unemployment in the country is

.082 ± .024 = {0.058, 0.106}, or {5.8%, 10.6%}.



t–Interval for the Mean of a Normal Population

Suppose that we want to construct a C.I. for the mean μ of a normal population with an unknown variance σ2.

Recall Student's t–Distribution from Result 5.7: if X1, X2, . . . , Xn is a random sample from a N(μ, σ2) with μ and σ2 both unknown, and if X and s2 are the sample mean and sample variance, respectively, then

tn−1 = ns

X μ−

has a Student's t –distribution with n–1 degrees of freedom (d.f).

Result 6.3. (t- Interval) Let X1, X2, . . . , Xn be a random sample from N(μ, σ2) with σ unknown. A 100(1–α)% C.I for μ. is

⎭⎬⎫

⎩⎨⎧

+≤≤− −− nstX

nstX nn )2()2( 11 αμα ,

where )2(1 α−nt be the upper 100(α/2)% percentage point.



This result holds for any n as long as the population is normal, hence it can be applied to the cases of small samples. Repeat the reasoning for the Z–interval, we can ‘prove’ the above result.

Example 6.9. Reconsider the Example 6.5, and suppose that σ is unknown (perhaps data from other similar studies were not available). From the sample of size 15 with mean $27.85, the sample standard deviation was calculated to be $2.00. Calculate 95% and 99% C.I.s for μ, the population weekly average bill of a family of four at the supermarket.

Solution:

• X = 27.85, s = 2, n = 15, d.f. = n–1 = 14.

• For 95% C.I., )025(.1−nt = 2.145;

For 99% C.I., )005(.1−nt = 2.977.



• )025(.1−nt ns =

15)00.2)(145.2( = 1.11;

)005(.1−nt ns =

15)00.2)(977.2( = 1.54.

• nstX n )025(.1−± = 27.85 ± 1.11;

nstX n )005(.1−± = 27.85 ± 1.54.

• The 95% C.I. is between $26.74 and $28.96,

while 99% C.I. is between $26.31 and $29.39



Chi-Square Intervals for σ2 or σ

Now we are interested in finding a C.I. for the variance σ2 or the standard deviation σ of a normal population. The requirement of normality is much more important here than estimating mean case, since the estimates of the mean are not so ‘sensitive’ to departures from normality as are the estimates of the variance and standard deviation. The point estimate of σ2 is

s2 = ∑=

−−

n

ii XX

n 1

2)(1

1 ,

and of the standard deviation σ is

s = ∑=

−−

n

ii XX

n 1

2)(1

1 .

Recall from Result 5.5: If sample is from normal, then 22)1( σsn − ~ χ n−1

2 a chi-square distribution with n–1 d.f.



Result 6.4. (Chi-Square C.I.) Let X1, X2, . . . , Xn be a random sample of size n from a N(μ, σ2) with both μ and σ unknown. Then

{ )2()1( 21

2 αχ −− nsn , )21()1( 21

2 αχ −− −nsn } is a 100(1−α)% confidence interval for σ2, and

{ )2()1( 21

2 αχ −− nsn , )21()1( 21

2 αχ −− −nsn }

is a 100(1−α)% confidence interval for σ, where )21(21 αχ −−n and )2(2

1 αχ −n

are, respectively, the lower and upper 100(α/2)% percentage points of 21−nχ .

To see this result, we have P{ )21(2

1 αχ −−n ≤ 21−nχ ≤ )2(2

1 αχ −n } = 1 − α,

⎭⎬⎫

⎩⎨⎧

≤−≤ −− )025(.)1()975(. 212

22

1 nnsnP χ

σχ

= 1 − α,

⎭⎬⎫

⎩⎨⎧ −≤≤−

−− )975(.)1(

)025(.)1(

21

22

21

2

nn

snsnPχ

σχ = 1 − α.



Example 6.10. A precision watchmaker wishes to learn about the variability of his product. A random sample of 10 watches is selected and the deviations from a standard clock are recorded at the end of one month. The resulted mean and standard deviation are, respectively, 0.7 seconds and 0.4 seconds. Suppose that the deviations are normally distributed. Find 90% confidence intervals for the variance and standard deviation of the population.

Solution: i) X = 0.7, s = 0.4, n = 10, d.f. = n–1 = 9.

χ 92 (.95) = 3.325, χ 9

2 (.05) = 16.919.

ii) (n −1) s2 χ n−12 (α 2) = (9)(.16)/16.919 = .0851;

(n −1) s2 χ n−12 (1 − α 2) = (9)(.16)/3.325 = .4331.

iii) 90% C.I. for σ2: {0.0851, 0.4331},

90% C.I. for σ: {0.2917, 0.6581}.



Estimation of Sample Size In the planning stage of an investigation, an important decision must be made concerning the sample size required to achieve the desired protection against imprecision with the estimation procedure.

Increasing sample size will reduce the error for estimating the mean. On the other hand, increasing the sample size is more costly and time consuming, both in the sampling operation and in processing the data.

Plots on the left show the pdf of X under different sample sizes n. The larger the n, the taller is the curve, meaning smaller variability and more precise in estimating the population mean using the sample mean.



Case 1: Normal population with σ known

Suppose we want to determine the sample size required to achieve a certain precision when estimating the mean μ of a normal population with a known population standard deviation σ.

For this situation, the Result 6.2 gives a 100(1−α)% C.I. for μ as:

nZX σα 2− ≤ μ ≤ nZX σα 2+ ,

which says that with probability 1−α the error of estimation is such that

|| μ−X ≤ nZ σα 2 .

Thus, to be 100(1–α)% sure that the error || μ−X does not exceed an amount E; we must have nZ σα 2 ≤ E, or

n ≥ 2

2⎟⎟⎠

⎞⎜⎜⎝

⎛E

Z σα



Example 6.11. A limnologist wishes to estimate the mean phosphate content per unit volume of certain lake water. It is known from studies in previous years that the standard deviation has a fairly stable value σ = 4. How many water samples must limnologist analyze to be 90% certain that the error of estimation does not exceed 0.8?

Solution: It is given that σ = 4, α = 0.1, Z.05 = 1.64, E = 0.8. Thus,

n ≥ 2

2⎟⎟⎠

⎞⎜⎜⎝

⎛E

Z σα = 2

8.0464.1⎟⎠⎞

⎜⎝⎛ × = 67.24.

The required sample size is n = 68.

• The above formula can also be used for non-normal case provided we knew that the n would be large.

• When σ is completely unknown, a preliminary study is needed to obtain an estimate of σ to be used in the formula to compute n.



Example 6.12. We wish to estimate the amount of money that a family of four spent weekly at a supermarket. It is assumed that the amount spent follows a normal distribution, and an earlier study yielded an estimate of the standard deviation as $2.00. We would like to use a 95% C.I. and we would prefer the length of the interval to be no longer than 1.75. What is the minimum sample size required?

Solution:

• Since length of interval is twice of estimation error, we have

• E = 1.75/2 = 0.875; α = 0.05, Z.025 = 1.96, σ ≈ 2.00

• n ≥ 2

2⎟⎟⎠

⎞⎜⎜⎝

⎛E

Z σα = 2

275.1296.1⎟⎟⎠

⎞⎜⎜⎝

⎛ × = 20.07.

• The required sample size is approximately 21.



Case 2: Sample size required for estimating binomial π.

For Binomial, σ 2 = π (1−π). Putting it into above formula gives

n ≥ π (1−π)2

2⎟⎟⎠

⎞⎜⎜⎝

⎛E

Zα ,

where π is unknown! If π is known to be around π*, then take

n ≥ π*(1−π*)2

2⎟⎟⎠

⎞⎜⎜⎝

⎛E

Zα

as an approximate value for n. Otherwise, take the maximum

n ≥ 14

22⎟⎟⎠

⎞⎜⎜⎝

⎛E

Zα ,

since the maximum of π (1–π) is 1/4.



Example 6.13. A public health survey is to be designed to estimate the proportion π of a population having defective vision. How many persons should be examined if the public health commissioner wishes to be 98% certain that the error of estimation is within ±.05 when

(a) there is no knowledge about the value π?

(b) π is known to be about 0.3?

Solutions:

• n ≥ 14

22⎟⎟⎠

⎞⎜⎜⎝

⎛E

Zα = 14

2

05.03263.2

⎟⎠⎞

⎜⎝⎛ = 541.17, i.e., n ≥ 542.

• n ≥ π*(1−π*)2

2⎟⎟⎠

⎞⎜⎜⎝

⎛E

Zα =0.3(1−0.3)2

05.03263.2

⎟⎠⎞

⎜⎝⎛ = 454.58, i.e., n ≥ 455.



6.3. Interval Estimation for Two Populations In virtually every area of human activities, search is continuously underway to develop modes of action or to modify and revise existing techniques. The new methods or techniques need to be compared with the old ones to see if they are really "better". For example,

• Agriculture fields trials: To see if the new strain of seeds produces higher yield per acre compared to a current major variety.

• Drug evaluation: To see if the new drug is more efficient in curing diseases.

• Effect of advertising campaign: To see whether the campaign has an effect on daily sales of a certain product.

The methods for comparison could be interval estimation or testing hypothesis. The former is often more attracting and is considered first.



Z-interval for Two Population Means

Case I: Two normal populations with known variances

Result 6.5. If, from first population N(μ1, 21σ ), 1X is the mean of a sample of

size n1, and similarly from second population N(μ2, 22σ ) independent of the

first, 2X is the mean of a sample of size n2, then

⎟⎟⎠

⎞⎜⎜⎝

⎛+−−

2

22

1

21

2121 ,~nn

NXX σσμμ

and a 100(1–α)% C.I. for μ1 − μ2 of two independent normal populations is the Z-interval given by

2

22

1

21

221 nnZXX σσ

α +±−

Example 6.14. In one industry, the worker's wages are normally distributed with variance 0.50. In a second industry, the worker's wages are normally distributed with variance 0.25. From the first industry, 20 workers are selected



randomly and their mean wage is calculated to be $5.00, while from the second industry, 10 workers are selected randomly and their mean wage is calculated to be $4.00. Find a 95% confidence interval for the difference of the mean wages of the two industries.

Solution: i) n1=20, n2=10, 1X =5.00, 2X =4.00, 21σ = 0.50, 2

2σ = 0.25,

α = .05, Z.025 = 1.96.

ii) 2

22

1

21

2 nnZ σσ

α + = 1025.

2050.96.1 + = 0.4383.

iii) 21 XX − ± 0.4383 = 1.00 ± 0.4383 = {0.5617, 1.4383}.

iv) The 95% C.I. is {0.5617, 1.4383}, which says that the true difference of the mean wages of the two industries is unknown, but we are 95% confident that it lies in the interval {0.5617, 1.4383.



Having constructed a C.I. for for μ1 − μ2, we can check if the interval contains 0. If it does not, we can conclude with 100(1−α)% confidence that the two means are different.

In the above example, since the C.I. contains only positive values, we conclude with 95% confidence that the wage in industry 1 is higher than that of industry 2.

Case II: Two non-normal populations If in Case I the normality assumption is removed, when n1 ≥ 30 and n2 ≥ 30,

Z = ( ) ( )2

221

21

2121

nnXX

σσμμ

+−−− is approximately N(0, 1)

and in this case the the C.I. in Result 6.5 is approximately valid.



Example 6.15. A study is made comparing the prices asked for existing one-family homes in two adjacent communities. In College Heights, the mean asking price for a random sample of 50 homes is $142,000. In University Gardens, the mean asking price for a random sample of 35 homes is $168,000. The standard deviations of asking prices of the two communities were known to be $30,000 for College Heights and $40,000 for University Gardens. Calculate a 98% C.I. for the difference in mean asking prices.

Solution: i) Since both sample sizes > 30, use of the Z–interval is justified.

ii) n1=50, n2=35, 1X =142,000, 2X =168,000, σ1=30,000, σ2 =40,000,

Z0.01 = 2.3263.

iii) 2

22

1

21

2 nnZ σσ

α + = 2.3263(1000)3540

5030 22

+ = 18,563,

iv) 98% C.I.: 1X – 2X ± 18,593 = –26,000 ± 18,563.



When both variances are unknown, they can be replaced by the corresponding sample variances s1

2 and s22 when both n1 ≥ 30 and n2 ≥ 30.

The approximate C.I. for μ1 − μ2 is

21

22

1

21

221 ns

nsZXX +±− α

Example 6.16. A study is conducted comparing average starting salaries offered to new B.A. recipients at two universities. A sample of 42 students from one school are offered an average of $1360 per month with a standard deviation of $320 while a sample of 48 students from the other school are offered an average of $1320 with a standard deviation of $375. Construct a 95% C.I. for the difference in the mean starting salaries.

Solution:

i) Since both sample sizes > 30, we can use normal approximations.



ii) n1 = 42, n2 = 48, 1X = 1360, 2X = 1320, s1 =320, s2=375, α = 0.05, Z0.025 = 1.96.

iii) 2

22

1

21

2 ns

nsZ +α = 1.96

48375

42320 22

+ = 143.60

iv) 1X – 2X ± 143.60 = 40 ± 143.60 = {–103.60, 183.60}.



Compare Two Binomial Proportions π1 and π2

Suppose we have two independent binomial populations: X1 ~ Bin(n1, π1) and X2 ~ Bin(n2, π2), and we want to compare π1 and π2, i.e., inference concerns π1−π2. Two sample proportions are 1π̂ = X1/n1, and 2π̂ = X2/n2.

Extending the discussions for the case of one binomial proportions, it is easy to see that 1π̂ − 2π̂ , the difference between the two sample proportions, is an unbiased estimator of π1−π2, i.e., E( 1π̂ − 2π̂ ) = π1−π2.

Further, it can be shown that

Var( 1π̂ − 2π̂ ) = 1

11 )1(n

ππ − + 2

22 )1(n

ππ − , and

222111

2121

)1()1()()ˆˆ(

nn ππππππππ−+−

−−− ~ N(0, 1), approximately,

provided that n1π1 ≥ 5, n1(1–π1) ≥ 5, n2π2 ≥ 5, n2(1–π2) ≥ 5.



The variance involves the unknown π1 and π2 and can be estimated by substituting 1π̂ and 2π̂ into the variance expression.

Result 6.6. For two independent binomial populations X1 ~ Bin(n1, π1) and X2 ~ Bin(n2, π2), letting 1π̂ = X1/n1, and 2π̂ = X2/n2 be the sample proportions, then,

Z = 222111

2121

)ˆ1(ˆ)ˆ1(ˆ)()ˆˆ(

nn ππππππππ−+−

−−− is approximately N(0, 1),

and an approximate 100(1–α)% C.I. for π1 – π2 is given by

222111221 )ˆ1(ˆ)ˆ1(ˆ)ˆˆ( nnZ ππππππ α −+−±− ,

provided that n1 1π̂ , n1(1− 1π̂ ), n2 2π̂ and n2(1− 2π̂ ) ≥ 5,



Example 6.17. In a political poll, 42 out of 100 randomly selected men surveyed preferred candidate Smith, and 92 out of 200 randomly selected women preferred Smith. Construct a 95% C.I. to see if there is any difference in proportions of men and women who preferred Smith.

Solution:

i) n1 1π̂ = 42 > 5, n1(1− 1π̂ ) = 58 > 5, n2 2π̂ = 92 > 5, n2(1− 2π̂ ) = 108 > 5.

Application of Result 6.6 is justified.

ii) 222111 )ˆ1(ˆ)ˆ1(ˆ nn ππππ −+− = 20054.46.10058.42. ×+× = .0606.

iii) 1π̂ − 2π̂ ± 1.96×.0606 = –.04 ±.1188 = {–0.1588, 0.0788}.

iv) Cannot conclude there is a difference, since the 95% C.I. covers 0.



t-Interval for Two Normal Population Means Consider again two normal populations N(μ1, 2

1σ ) and N(μ2, 22σ ).

Independent samples of size n1 and n2 are drawn from the two populations, resulting sample means 1X and 2X , and sample variances 2

1s and 22s .

When sample sizes n1 and n2 are small and population variances are unknown, the Z–interval in Result 6.5 is no longer valid. Similar to the one-sample case, we seek a t–interval. Assuming σ1 = σ2 = σ, we have,

( ) ( )1

21

1

2121−− +−−−

nnXX

σμμ ~ N(0, 1) .

As the two populations have the same variance, we ‘pool’ the two sample information to give a more accurate (unbiased) estimator of this common variance, called the pooled estimator of the common σ2

2ps =

2)1()1(

21

222

211

−+−+−

nnsnsn



From Result 5.8, we have

( ) ( )1

21

1p

2121−− +−−−

nnsXX μμ ~ 221 −+nnt ,

which is called the pooled two-sample t statistic.

Result 6.7. If, from first population N(μ1, 21σ ), a sample of size n1 is drwan,

and similarly from second population N(μ2, 22σ ) independent of the first, a

sample of size n2 is drawn, then 100(1–α)% confidence interval for μ1–μ2 is

( )21

22111)2(

21 nnstXX pnn +±− −+ α

where )2(221α−+nnt is the upper α/2 point of 221 −+nnt .

• Pooled t–statistic is powerful tool for comparing population means when assumptions are satisfied.



Example 6.18. Suppose you wish to compare a new method of teaching reading to "slow learners" to the current standard method. You decide to base this comparison on the results of a reading test given at the end of a learning period of 6 months. Of a random sample of 20 slow learners, 8 are taught by the new method and 12 by standard method. The results are summarized as follows: new method: 1X = 76.9, s1 = 4.85; standard method: 2X = 72.7, s2 =

6.35. Estimate the true mean difference between the test scores for the new method and the standard method using a 90% C.I. What assumptions must be made in order that the estimate be valid?

Solution: i) Assume: a) two population test scores are normally distributed; b) variances of the two populations are the same; c) two samples are independent.

ii) n1 = 8, n2 =12, 1X = 76.9, 2X = 72.7,

s1 =4.85, s2 = 6.35, d.f. = n1+n2–2 = 18.



For α = 0.10, )2(221α−+nnt = )05(.18t = 1.734.

iii) s p2 = (n1 −1)s1

2 + (n2 − 1)s22

n1 + n2 − 2 = 7 × 4.852 + 11× 6.352

8 +12 − 2 = 33.7892.

iv) 90% C.I.: ( )21

22111)2(

21 nnstXX pnn +±− −+ α

= (76.9–72.7) ± 1.734 ⎟⎠⎞

⎜⎝⎛ +

121

817892.33 = 4.20 ± 4.60 or {–0.40, 8.80}.

v) No sufficient evidence from data to show a difference.



F-interval for Variances of Two Normal Populations

To compare the variances σ12 and σ2

2 of two normal populations, we construct a C.I. for 2

221 σσ . Recall from Result 5.6: If s1

2 and s22 are sample

variances of two independent normal samples, then F = 21

22

22

21

σσ⋅

ss ~ 1,1 21 −− nnF , the

F–distribution with n1–1 and n2–1 degrees of freedom.

Result 6.8. (F–interval) A 100(1–α)% confidence interval for 22

21 σσ is

⎟⎟⎠

⎞⎜⎜⎝

⎛

222

21 1

αFss ≤ 2

2

21

σσ ≤ ⎟

⎟⎠

⎞⎜⎜⎝

⎛

− 2122

21 1

αFss

where F1−α 2 and Fα 2 are the lower and upper α/2 percentage points of

1,1 21 −− nnF .



Example 6.19. In Example 6.18, we used the pooled t–statistic to compare the mean reading scores of two groups of slow learners who had been taught to read using two different methods. The pooled–t was base on the assumption that the population variances of the test scores were equal for the two methods. Check this assumption using α = 0.10.

Solution: i) n1 = 8, n2 =12, s1 = 4.85, s2 = 6.35; d.f.s are 7 and11;

For α = 0.10, 2αF (7,11) = 3.01 and 21 α−F (7,11) = )7,11(1 2αF = 1/3.605.

ii) ⎟⎟⎠

⎞⎜⎜⎝

⎛

222

21 1

αFss = ⎟

⎠⎞

⎜⎝⎛

01.31

35.685.4

2

2

= 0.1938; ⎟⎟⎠

⎞⎜⎜⎝

⎛

− 2122

21 1

αFss = 2

2

35.685.4 ×3.605 = 2.103.

iii) The 90% C.I. for 22

21 σσ between 0.1938 and 2.103.

iv) Since the interval covers the value 1, meaning variances equal, there is no sufficient evidence against the assumption of equal variances.



Interval Estimation for Two Dependent Populations

Two samples are dependent through pairing, having feature as "before and after". Comparing the means based on such samples is called Paired Comparisons.

1. Times of 100m run "before and after" taking steroid for the same individuals.

2. Heights of the same boys at age of 16 and age of 25.

3. Starting salaries of male and female college graduates, paired such that each pair have the same major and similar grade-point average.

In general the ith pair of observations (Xi, Yi) are from the same ith object or the ith pair of objects paired so that they have similar features, hence Xi and Yi are dependent. The n pairs (X1, Y1), (X2, Y2), . . . , (Xn, Yn) are randomly chosen and hence are i.i.d.



Let di = Xi –Yi, i = 1, 2, . . . , n. Then d1, d2, . . . , dn are now like a random sample from a population with mean μd (= μ1–μ2) and variance σd

2 . Define the sample mean and the sample variance as

∑=

=n

iid

nd

1

1 , and 2ds = ∑

=

−−

n

ii dd

n 1

2)(1

1 .

Problem of comparing μ1 with μ2 now reduces to making inferences about μd based on d1, d2, . . . , dn .

Result 6.9. (t-interval for paired comparison) Assume that the differences d1, d2, . . . , dn are independent with a N(μd , σd

2 ) distribution. Then a 100(1−α)% C.I. for μd is given by

nstd d

2 α± , d.f.= n – 1



Example 6.20. A medical researcher wishes to determine if a pill has the undesirable side effect of reducing the blood pressure of the user. The study involves recording the initial blood pressure of 15 college-age women. After they used the pill for six months, their blood pressures are again recorded. The researcher wishes to draw inferences about the effect of the pill on blood pressure from the observations given bellow.

Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Before (x) After (y)

70 80 72 76 76 76 72 78 82 64 74 92 74 68 84 68 72 62 70 58 66 68 52 64 72 74 60 74 72 74

d = x – y 2 8 10 6 18 10 4 26 18 –8 0 32 0 –4 10

Solution: i) Assuming paired differences constitute a random sample from N(μd, σd

2 ), use of the C.I in Theorem 4.10 is valid. Now, n = 15, d.f.=14, and

t0.025 = 2.145.



ii) ∑=

=n

iidd

1151 = 8.80, and sd = ∑

=

−n

ii dd

1

2)(141 = 10.98.

iii) The 96% C.I. for μd is computed as n

std d025.0 ± = 8.80 ± 6.08

or (2.72, 14.88). Since the C.I. contains only positive values, a reduction in blood pressure is strongly indicated by the data.



chap6-12

Documents

value of x

sample mean x

point estimation

specific value

specific sample

data interval estimation

population case

smu chapter 6stat151