statistical techniques i exst7005 distribution of sample means

25
Statistical Techniques I EXST7005 Distribution of Sample Means

Upload: cornelius-carpenter

Post on 29-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Statistical Techniques IEXST7005

Distribution of Sample Means

OBJECTIVES

Usually we will be testing hypotheses about means. We will need some additional information about the nature of means of samples in order to do hypothesis tests.

Distribution of Sample Means

Means are the basis for testing hypotheses about , the most common types of hypothesis tests.

Imagine a POPULATION from which we are drawing samples. Population size = NMean = Variance = 2Parent population values are:

– Yi = Y1, Y2, Y3 , ... , YN

Distribution of Sample Means (continued)

The samples of size n form a DERIVED POPULATION

There are Nn possible samples of size n that can be drawn from a population of size N (sampling WITH replacement).

for each sample we calculate a mean Yk = Yi/nwhere k = 1, 2, 3, ... , Nn

Distribution of Sample Means (continued)

The Derived Population of Means of samples of size n

Population size = Nn Mean = Y Variance = Y Derived population values

Yk = Y1, Y2, Y3, ... , YNn

Distribution of Sample Means (continued)

Mean of the DERIVED POPULATION Y = Yk/Nnwhere k = 1, 2, 3, ... , Nn

Variance of the DERIVED POPULATION 2Y = Yk-)2/Nnwhere k = 1, 2, 3, ... , Nnn = the sample sizeN = the population size Population size = Nn

Example of a Derived Population

Parent Population: Yi = 0, 1, 2, 3 = Yi/N = 6/4=1.5 2 = Yi-)2/N = [(0-1.5)2+(1-1.5)2+(2-1.5)2+(3-

1.5)2]/4 = 5/4 = 1.25= 1.12

Original Population

0.00

0.25

0 1 2 3

r.f.

The Derived Populationwhere n = 2 and Nn = 42 = 16

Draw all possible samples of size 2 from the Parent Population (sampling with replacement, so that values will occur more than once), and

calculate Y for each sample (Nn).

The Derived Population

Sample Mean0, 0 0.00, 1 0.50, 2 1.00, 3 1.51, 0 0.51, 1 1.01, 2 1.51, 3 2.02, 0 1.02, 1 1.52, 2 2.02, 3 2.53, 0 1.53, 1 2.03, 2 2.5

The Derived Population

Frequency table of the Derived Population

Means Frequency Relative Freq0.0 1 0.06250.5 2 0.12501.0 3 0.18751.5 4 0.25002.0 3 0.18752.5 2 0.12503.0 1 0.0625

Sum = 16 1

Derived Population

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Histogram of the Derived Population

Note that the derived population is shaped more like the normal distribution than the original population.

Probability statement from the two distributions

Find P(1Y2) For the original population, P(1Y2)=0.5000

For the derived population, P(1Y2)=0.6250

0.00

0.25

0 1 2 3

r.f.

0.000.050.100.150.200.250.30

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Given a population with mean and variance 2, if we draw all possible samples of size n (with replacement) from the population and calculateY, then the derived population of all possible sample means will haveMean: Y = Variance: Y = 2/nStandard deviation: Y = /n = 2/n

THEOREM on the distribution of sample means

THEOREM on the distribution of sample means (continued)

Notice that the variance and standard deviation of the mean have "n" in the denominator. As a result, the variance of the derived population becomes smaller as the sample size increases regardless of the value of the population variance.

CENTRAL LIMIT THEOREM

AS THE SAMPLE SIZE (n) INCREASES, THE DISTRIBUTION OF SAMPLE MEANS OF ALL POSSIBLE SAMPLES, OF A GIVEN SIZE FROM A GIVEN POPULATION, APPROACHES A NORMAL DISTRIBUTION IF THE VARIANCE IS FINITE. If the base distribution is normal, then the means are normal regardless of n.

CENTRAL LIMIT THEOREM (continued)

Why is this important? (and it is very important!) If we are more interested in the MEANS (and therefore the distribution of the means) than the original distribution, then normality is a more reasonable assumption.

Often, perhaps even USUALLY, we will be MORE INTERESTED in the MEANS of the DISTRIBUTION THAN IN THE DISTRIBUTIONS of the INDIVIDUALS.

NOTES on the distribution of sample means

as n increases, Y and Y decrease. Y for any n Y for any n > 1 as n increases and Y becomes smaller, the distribution of Y's becomes closer to Y. (i.e. we get a better estimate).

Some new terms

Reliability (as a statistical concept) - the closer the estimate of to the actual value of , the more "reliable" the estimate.

Accuracy (as a statistical concept) - this term refers to the lack of bias in the estimate, and not how small the variance is. An estimate may be very accurate, but have a great deal of scatter about the mean.

Estimation of Y and Y In practice we cannot draw all possible samples. Recall that E(S2) = so, S2Y = S2/n is an estimate of Y

where;– E(S2Y) = Y – and;– S2Y is an estimate of the variance of sample

means of size n– S2 is the estimate of the variance of

observations

Estimation of Y and Y (continued)

S2Y = S2/n is called the STANDARD ERROR to distinguish it from the Standard deviation it is also called the Standard Deviation OF THE

MEANS NOTE: that this division is by "n" for both

populations and samples, not by "n-1" as with the calculation of variance for samples.

Estimation of Y and Y (continued)

S2Y is a measure of RELIABILITY of the sample means as an estimate of the true population mean. i.e. the smaller S2Y , the more reliableY as an

estimate of Ways of increasing RELIABILITY

Basically, anything that decreases our estimate of Y makes our estimate more reliable.

Estimation of Y and Y (continued)

How do we decrease our estimate of Y?Increase the sample size; if n increases then Y decreases.

Decrease the variance; if our estimate of decreases then Y decreases.

– This can sometimes be done by; refining our measurement techniques finding a more homogeneous population to measure

The Z transformation for a DERIVED POPULATION

We will use the Z transformation for two purposes, individuals and means. for individuals use

– Zi = (Yi - )/for means we will use

– Zi = (Yi - Y)/Y

Summary

Most testing of hypotheses will concern tests of a derived population of means.

The mean of the derived population of sample means is Y

The Variance of the derived population of sample means is Y

Summary (continued)

The CENTRAL LIMIT THEOREM is an important aspect of hypothesis testing because it states that sample means tend to be more nearly normally distributed than the parent population.

Reliability and accuracy are statistical concepts relating to variability and lack of bias, respectively.