statistical techniques i exst7005 distribution of sample means
TRANSCRIPT
OBJECTIVES
Usually we will be testing hypotheses about means. We will need some additional information about the nature of means of samples in order to do hypothesis tests.
Distribution of Sample Means
Means are the basis for testing hypotheses about , the most common types of hypothesis tests.
Imagine a POPULATION from which we are drawing samples. Population size = NMean = Variance = 2Parent population values are:
– Yi = Y1, Y2, Y3 , ... , YN
Distribution of Sample Means (continued)
The samples of size n form a DERIVED POPULATION
There are Nn possible samples of size n that can be drawn from a population of size N (sampling WITH replacement).
for each sample we calculate a mean Yk = Yi/nwhere k = 1, 2, 3, ... , Nn
Distribution of Sample Means (continued)
The Derived Population of Means of samples of size n
Population size = Nn Mean = Y Variance = Y Derived population values
Yk = Y1, Y2, Y3, ... , YNn
Distribution of Sample Means (continued)
Mean of the DERIVED POPULATION Y = Yk/Nnwhere k = 1, 2, 3, ... , Nn
Variance of the DERIVED POPULATION 2Y = Yk-)2/Nnwhere k = 1, 2, 3, ... , Nnn = the sample sizeN = the population size Population size = Nn
Example of a Derived Population
Parent Population: Yi = 0, 1, 2, 3 = Yi/N = 6/4=1.5 2 = Yi-)2/N = [(0-1.5)2+(1-1.5)2+(2-1.5)2+(3-
1.5)2]/4 = 5/4 = 1.25= 1.12
Original Population
0.00
0.25
0 1 2 3
r.f.
The Derived Populationwhere n = 2 and Nn = 42 = 16
Draw all possible samples of size 2 from the Parent Population (sampling with replacement, so that values will occur more than once), and
calculate Y for each sample (Nn).
The Derived Population
Sample Mean0, 0 0.00, 1 0.50, 2 1.00, 3 1.51, 0 0.51, 1 1.01, 2 1.51, 3 2.02, 0 1.02, 1 1.52, 2 2.02, 3 2.53, 0 1.53, 1 2.03, 2 2.5
The Derived Population
Frequency table of the Derived Population
Means Frequency Relative Freq0.0 1 0.06250.5 2 0.12501.0 3 0.18751.5 4 0.25002.0 3 0.18752.5 2 0.12503.0 1 0.0625
Sum = 16 1
Derived Population
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Histogram of the Derived Population
Note that the derived population is shaped more like the normal distribution than the original population.
Probability statement from the two distributions
Find P(1Y2) For the original population, P(1Y2)=0.5000
For the derived population, P(1Y2)=0.6250
0.00
0.25
0 1 2 3
r.f.
0.000.050.100.150.200.250.30
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Given a population with mean and variance 2, if we draw all possible samples of size n (with replacement) from the population and calculateY, then the derived population of all possible sample means will haveMean: Y = Variance: Y = 2/nStandard deviation: Y = /n = 2/n
THEOREM on the distribution of sample means
THEOREM on the distribution of sample means (continued)
Notice that the variance and standard deviation of the mean have "n" in the denominator. As a result, the variance of the derived population becomes smaller as the sample size increases regardless of the value of the population variance.
CENTRAL LIMIT THEOREM
AS THE SAMPLE SIZE (n) INCREASES, THE DISTRIBUTION OF SAMPLE MEANS OF ALL POSSIBLE SAMPLES, OF A GIVEN SIZE FROM A GIVEN POPULATION, APPROACHES A NORMAL DISTRIBUTION IF THE VARIANCE IS FINITE. If the base distribution is normal, then the means are normal regardless of n.
CENTRAL LIMIT THEOREM (continued)
Why is this important? (and it is very important!) If we are more interested in the MEANS (and therefore the distribution of the means) than the original distribution, then normality is a more reasonable assumption.
Often, perhaps even USUALLY, we will be MORE INTERESTED in the MEANS of the DISTRIBUTION THAN IN THE DISTRIBUTIONS of the INDIVIDUALS.
NOTES on the distribution of sample means
as n increases, Y and Y decrease. Y for any n Y for any n > 1 as n increases and Y becomes smaller, the distribution of Y's becomes closer to Y. (i.e. we get a better estimate).
Some new terms
Reliability (as a statistical concept) - the closer the estimate of to the actual value of , the more "reliable" the estimate.
Accuracy (as a statistical concept) - this term refers to the lack of bias in the estimate, and not how small the variance is. An estimate may be very accurate, but have a great deal of scatter about the mean.
Estimation of Y and Y In practice we cannot draw all possible samples. Recall that E(S2) = so, S2Y = S2/n is an estimate of Y
where;– E(S2Y) = Y – and;– S2Y is an estimate of the variance of sample
means of size n– S2 is the estimate of the variance of
observations
Estimation of Y and Y (continued)
S2Y = S2/n is called the STANDARD ERROR to distinguish it from the Standard deviation it is also called the Standard Deviation OF THE
MEANS NOTE: that this division is by "n" for both
populations and samples, not by "n-1" as with the calculation of variance for samples.
Estimation of Y and Y (continued)
S2Y is a measure of RELIABILITY of the sample means as an estimate of the true population mean. i.e. the smaller S2Y , the more reliableY as an
estimate of Ways of increasing RELIABILITY
Basically, anything that decreases our estimate of Y makes our estimate more reliable.
Estimation of Y and Y (continued)
How do we decrease our estimate of Y?Increase the sample size; if n increases then Y decreases.
Decrease the variance; if our estimate of decreases then Y decreases.
– This can sometimes be done by; refining our measurement techniques finding a more homogeneous population to measure
The Z transformation for a DERIVED POPULATION
We will use the Z transformation for two purposes, individuals and means. for individuals use
– Zi = (Yi - )/for means we will use
– Zi = (Yi - Y)/Y
Summary
Most testing of hypotheses will concern tests of a derived population of means.
The mean of the derived population of sample means is Y
The Variance of the derived population of sample means is Y
Summary (continued)
The CENTRAL LIMIT THEOREM is an important aspect of hypothesis testing because it states that sample means tend to be more nearly normally distributed than the parent population.
Reliability and accuracy are statistical concepts relating to variability and lack of bias, respectively.