inference on averages data are collected to learn about certain numerical characteristics of a...

20
Inference on averages Inference on averages Data are collected to learn about certain numerical characteristics of a process or phenomenon that in most cases are unknown. Example: A study was conducted to analyze women’s bone health. Data on the daily intakes of calcium (in milligrams) for 36 women, between the age of 18 and 24 years, were collected. What is the estimated average calcium intake for women in this age range? The sample average is an estimate of the average calcium intake for women between the age of 18 and 24 years. Population = all the women of age (18-24) years. Sample = 36 women of age (18-24) years selected at random

Post on 22-Dec-2015

217 views

Category:

Documents


4 download

TRANSCRIPT

Inference on averagesInference on averages

Data are collected to learn about certain numerical characteristics of a process or phenomenon that in most cases are unknown.

Example: A study was conducted to analyze women’s bone health. Data on the daily intakes of calcium (in milligrams) for 36 women, between the age of 18 and 24 years, were collected. What is the estimated average calcium intake for women in this age range?

The sample average is an estimate of the average calcium intake for women between the age of 18 and 24 years.

Population = all the women of age (18-24) years.Sample = 36 women of age (18-24) years selected at random

Estimating the population averageEstimating the population average

To estimate the population average:• Select a simple random sample of size n from the population of interest, so that each unit in the sample has the same probability to be selected.• Collect data from the sample• Compute the sample average and the standard deviation.

The sample average x is an estimate of the population average.

How accurate is such an estimate?

A measure of the accuracy is given by the standard error S.E. of the sample average.

where s is the standard deviation of the observations. The larger the sample, the more accurate the average is as an estimate of the population average

n

sxES ).(.

What is distribution of the sample average?What is distribution of the sample average?

If the investigators takes several samples of size n and compute the averages in each sample, then all the sample averages will be somewhere around the population average.

sample average = population average sampling errorx

S.E.

x

What is the shape of the sampling distribution?What is the shape of the sampling distribution?

If the sample size n is large (n>50), the sample average is approximately normal with mean equal to the population mean and standard deviation equal to the standard error of the sample average.

),(n

sNelyapproximatisX

The larger the sample, the more accurate the normal approximation is. If the distribution of the population is not symmetric, the normal

approximation is less accurate, and you need a larger sample.

Confidence Intervals for averagesConfidence Intervals for averages

Problem: We want to estimate the unknown population mean μ.

Answer: We compute a confidence interval for μ, that is the set of plausible

values for μ in the light of the data.

A 95% confidence interval for μ is defined as

sample average margin of error

Where the margin of error indicates how accurate our estimate is.

Confidence IntervalsConfidence Intervals

In samples of size n, a level C confidence interval for the population average is

sample average ± t*S.E. =

where t is the critical value, such that the area between t and tunder the curve of the t-distribution with n-1 degrees of freedom is C=1-.

n

stx 2/

The value of ta/2 is computed using the Excel function

=TINV(, df)

Where df = sample size -1

0.95

t- t

ExampleExample

Data on the daily intakes of calcium (in milligrams) for 36 women, between the age of 18 and 24 years were collected.

The sample average is

The standard deviation is s=422The sample size is n=36 The standard error is S.E.=422/sqrt(36)=70.33

The 95% confidence interval is

(898.44 – t 0.025*70.33, 898.44+ t 0.025 *70.33)

The value t 0.025=2.03, thus a 95% C.I. for is (755.66mg, 1041.23mg)

We are 95% confident that the “true” average calcium intake is a value between 755.66 mg and 1041.23 mg.

898.44x

= COUNT(data)

= B4/sqrt(B5) stdev/sqrt(n)= B5-1 n-1=TINV((1-B6), B10) TINV(alpha, df)

For about 95 out of 100 samples, the population average lies in the associated 95% confidence intervals. Suppose we take 25 samples of 36 women between 18 and 24 years of age and for each sample we compute the sample average and the 95% C.I.

Distribution of sample averages

In the long run, 95% of all the samples will produce an interval that contains the true value .

Be careful though, it might happen that the C.I. computed with the sample collected in the study DOES NOT contain the true average value!

Why do the intervals move around?

How many intervals contain the true value ?

Understanding a 95% confidence intervalUnderstanding a 95% confidence interval

What is the t-distribution?What is the t-distribution?

The t-distribution with n-1 degrees of freedom is a symmetric distribution with center at 0. For large n, the t-distribution is close to the standard normal distribution.

Comparing the t-distribution curve and the standard normal curveComparing the t-distribution curve and the standard normal curve

-6 -4 -2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

Re

lativ

e F

req

ue

ncy

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

Re

lativ

e F

req

ue

ncy

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Re

lativ

e F

req

ue

ncy

t-distribution

Standard Normal curve

t-distribution curve has “fatter” tails. For d.f. around 30, the t-distribution curve is very similar to the standard normal curve.

d.f.=5d.f.=15

d.f.=30

t t

t

A different confidence levelA different confidence level

Suppose we want to compute a 90% confidence interval for the average calcium intake.

We will use the same formula, with a different critical value t

The sample average is 898.44 - The standard deviation is s=422The sample size is n=36 The standard error is S.E.=422/sqrt(36)=70.33The confidence level C=0.90, alpha=1-C=0.10

The 90% confidence interval is(898.44 – t 0.05*70.33, 898.44+ t 0.05 *70.33)

The critical value t 0.05 =1.688

The C.I. Is (898.44 – 1.688*70.33, 898.44+ 1.688 *70.33)(779.72mg, 1017.168mg)

With 90% confidence level, we state that the average calcium intake is between 779.72mg and 1017.168 mg.

Approximate Confidence IntervalsApproximate Confidence Intervals

The normal approximation can be used to compute approximate confidence intervals if the sample size is large (n>30).

1.64 S.E

1.96 S.E

2.57 S.E

90% Confidence Interval

99 % Confidence Interval

95 % Confidence Interval

xMargin of error

-1.96SE +1.96SE

x margin error

x

x

Area under the normal curve = 95%

Expressions for C.I.’sExpressions for C.I.’s

The 90% C.I. for the population mean:

The 95% C.I. for the population mean:

The 99% C.I. for the population mean:

n

sx *64.1

s is the standard deviation of the n observations.

x is the sample average of n observations in a simple random sample of size n, where n is large (>30)

n

sx *96.1

n

sx *57.2

General remarks on C.I.’sGeneral remarks on C.I.’s

The purpose of a C.I. is to estimate an unknown parameter with an indication of how accurate the estimate is and of how confident we are that the result is correct.

The methods used here rely on the assumption that the sample is randomly selected.

Any confidence interval has two parts: estimate ± margin of error

The confidence level states the probability that the method will give a correct answer, i.e. the confidence interval contains the “true” value of the parameter.

The margin of error of a confidence interval decreases as1. The confidence level decreases2. The sample size n increases

Remarks:

1. Notice the trade off between the margin of error and the confidence level. The greater the confidence you want to place in your prediction, the larger the margin of error is (and hence less informative you have to make your interval).

2. A C.I. gives the range of values for the unknown population average that are plausible, in the light of the observed sample average. The confidence level says how plausible.

3. A C.I. is defined for the population parameter, NOT the sample statistic.

4. To make a margin of error smaller, you can take a larger sample!

5. Use the t-distribution in small samples (n<30). For large samples, the t-distribution is equivalent to the standard normal distribution.

Testing hypothesesTesting hypotheses

The recommended daily allowance (RDA) of calcium for women between 18-24 years of age is 1300 milligrams. An health organization claims that, on average, women in this age range take less calcium than the RDA level.

Using the collected data, what can we conclude regarding the claim of the health organization?

Testing hypothesesTesting hypotheses

Confidence intervals can be used to test conjectures or hypotheses about a certain characteristic of interest.

A trucking firm suspects the claim that the average lifetime of certain tires is at least 28,000 miles. To check the claim, the firm puts 80 of these tires on its trucks and gets an average lifetime of 27,563 miles with a standard deviation of 1,348 miles. What can you conclude from the data ?

We can construct a confidence level and check if the interval contains the value of 28,000 miles. In such a case, we could conclude that 28,000 is plausible in the light of the data!

A 95% C.I. for the average lifetime is (Are we using the t-distribution or the normal curve?)

27,563 ± 1.96* 1,348/sqrt(80)= 27,563 ± 295.39 miles = (27267, 27858).

Based on the data, the confidence interval contains values that are lower than 28,000 miles . It is more likely that the tires will last a shorter time.

n

sx *96.1

Testing hypothesesTesting hypotheses