estimating the standard deviation © christine crisp “teach a level maths” statistics 2
TRANSCRIPT
Estimating the Standard Deviation
Estimating the Standard Estimating the Standard Deviation Deviation
© Christine Crisp
““Teach A Level Maths”Teach A Level Maths”
Statistics 2Statistics 2
The formula for the standard error ( the standard deviation of the sample means ) is
where is the population standard deviation and n is the sample size.
n
standard error (s.e. ) =
However, we may not know the population standard deviation so we must estimate this from our sample.The obvious quantity to use is the sample standard deviation but it can be shown that this is too small so we need to make an adjustment.
When we are estimating in Statistics, we talk about biased and unbiased estimators.
An unbiased estimator is one that on average gives the value we are estimating.So, for example, if the value we wanted to estimate was equal to 2, and all possible samples gave us these values:
1.8, 1.9, 2, 2.1, 2.2the statistic giving the values would be unbiased: it’s mean is 2.( We mustn’t worry that 4 of the 5 values are wrong. That isn’t the point. We must be right on average. )
We want unbiased estimators.
If we had 1.8, 2, 2, 2, 2.3our estimator is biased since the average is not correct, even though more individual values are correct.
Population and 1000 sample
means
Population, 1st sample and mean of 1st sample
Let’s look at the hens eggs again.We’ve met 3 different standard deviations (s.ds.) so we need to be clear which s.d. we are talking about.
The 1st s.d. is the Population standard deviation ( the one we want to estimate )The 2nd s.d. is the standard error or standard deviation of all sample means. It is also unknown as it depends on the unknown .
n
Population and 1000 sample
means
Population, 1st sample and mean of 1st sample
Let’s look at the hens eggs again.We’ve met 3 different standard deviations (s.ds.) so we need to be clear which s.d. we are talking about.
We are left with the 3rd s.d., the standard deviation, s, of our one sample.It can shown ( although we don’t need to do it ) that this is a biased estimator. However, we can tweak it to change it into an unbiased estimator.
s could be the standard deviation of this sample
n
Population and 1000 sample
means
Population, 1st sample and mean of 1st sample
Let’s look at the hens eggs again.We’ve met 3 different standard deviations (s.ds.) so we need to be clear which s.d. we are talking about.
The unbiased estimator of is 1
n
ns
where s is the standard deviation of a sample.
n
The unbiased estimator of 2In your formula book you will find the unbiased estimator of 2, the population variance, written as
1
)( 22
n
XXS i
To use this, replace the capital Xi by xi ( the sample
data ) and by ( the sample mean ). X x
However, you’ll probably be using calculator functions not a formula and your calculator gives the unbiased estimator of the population standard deviation as well as the sample standard deviation. Try the
following:
122
n
nsSIt gives the same result as
( For standard deviation, just square root. )
Enter the following data in your calculator:
1, 3, 5Select the list of statistics and you should find the values
...154701
...942800 The 1st ( larger ) of these is the unbiased estimator of . The other is the standard deviation of the sample.If you are not sure which to use, think about whether you are making estimates from a sample. If so, use the 1st ( larger ) value.
1
n
nsS
s
The biased and unbiased estimators are nearly the same.
One further point: if n is large, is very close to 1. 1n
nIgnore the symbols used by the calculator. They
are not the ones we use.
SUMMARY
To estimate the variance of a population we use the unbiased estimator, S2, where
122
n
nsS
and s2 is the variance of a sample of size n.
Calculators give the values of both s, the sample standard deviation and S the unbiased estimator of population standard deviation but we must ignore the calculator notation.
An unbiased estimator is one where the average of all possible values equals the quantity being estimated.
S2 can also be found from where x
represents each data item and is the sample
mean.
1
)( 22
n
xxS
x
The calculator gives the sample s.d. as 10·0457 . . . so we need to square to find the variance. I’ve written down 3 s.f. but will use the more exact calculator value.
This is the unbiased estimate of the population mean,
e.g. 1. Six people in a factory were selected and asked how long they took to get to work. The results, in minutes, were as follows:
7, 12, 13, 20, 30, 35 Calculate the mean and variance of the times in the sample and hence find unbiased estimates of the mean and variance of the times for all the workers.
2010
1
22
n
nsS
)..3(101 fs
2011 )..3(121 fs
The unbiased estimate of the variance, 2, is )..3(121 fs
519n
xx
Sample mean,
22
2 xn
xs
Sample variance,
Solution:
Although I’m showing the formulae in the solution, I’m using the calculator functions to
find each answer.
e.g. 2. The following sets of data are from samples, each from a different Normal population. Find unbiased estimates of the mean, , and standard deviation, , of each of the populations.
(a) 17, 24, 25, 31, 42(b
) ,422 x ,180022 x 10n
,330 x ,828)( 2 xx 5n(c)
(a)Sample:
,827 x 388 s
,827 369
Unbiased estimates of population parameters are:
Solutions:
(b) ,422 x ,180022 x 10n
Sample mean,
242 n
xx
22
2 xn
xs 3619242
10
18002 2
242 Unbiased estimate of mean, is
122
n
nsS
521Unbiased estimate of population standard deviation, is
)..3(644521 fsS
9
1036192S
Sample variance,
Estimate of population variance:
Sample mean,
66n
xx
66 Unbiased estimate of mean,
is
1
)( 22
n
xxS 207
Unbiased estimate of is S where
)..3(414207 fsS
,330 x ,828)( 2 xx 5n(c)
Exercise
The following sets of data are from samples, each from a different Normal population. Find unbiased estimates of the mean, , and standard deviation, , of each of the populations.
(a) 5·2, 7·9, 8·1, 9·3(b
) ,678 x ,523022 x 10n
,282 x ,1046)( 2 xx 5n(c)
(a)Sample:
,6257 x 501s
,)..3(637 fs 731Unbiased population estimates:
Solutions:
Sample mean,
867 n
xx
22
2 xn
xs 36633867
10
52302 2
867 Unbiased estimate of mean, is
1
22
n
nsS 704
Unbiased estimate of is
)..3(526704 fsS
Sample variance,
(b) ,678 x ,523022 x 10n
Sample mean,
456 n
xx
456 Unbiased estimate of mean,
is
1
)( 22
n
xxS 5261
Unbiased estimate of is S where
)..3(2165261 fsS
,282 x ,1046)( 2 xx 5n(c)
Estimating the Standard Deviation
The following slides contain repeats of information on earlier slides, shown without colour, so that they can be printed and photocopied.For most purposes the slides can be printed as “Handouts” with up to 6 slides per sheet.
Estimating the Standard DeviationThe unbiased estimator of
2
The unbiased estimator of 2 ( the population variance ) is given by
122
n
nsS
In your formula book you will find this written as
1
)( 22
n
XXS i
You can use either, replacing the capital Xi by xi
and by ( data and mean ) from your sample.X xHowever, you’ll probably be using calculator functions not a formula and your calculator the unbiased estimator of population standard deviation as well as the sample standard deviation.
Estimating the Standard DeviationEnter the following data in your
calculator:1, 3, 5
Select the list of statistics and you should find the values
...154701
...942800 The 1st ( larger ) of these is the unbiased estimator of . The other is the standard deviation of the sample.If you aren’t sure which to use, think about whether you are making estimates from a sample. If so, use the 1st ( larger ) value.
1
n
nsS
s
The biased and unbiased estimators are nearly the same.
One further point: if n is large, is very close to 1. 1n
n
Estimating the Standard Deviation
SUMMARY
To estimate the variance of a population we use the unbiased estimator, S2, where
122
n
nsS
and s2 is the variance of a sample of size n.
An unbiased estimator is one where the mean of all possible values equals the quantity being estimated.
S2 can also be found from where x
represents each data item and is the sample
mean.
1
)( 22
n
xxS
x Calculators give the values of both s, the
sample standard deviation and S the unbiased estimator of population standard deviation but we must ignore the calculator notation.
Estimating the Standard Deviation
e.g. The following sets of data are from samples, each from a different Normal population. Find unbiased estimates of the mean, , and standard deviation, , of each of the populations.
(a) 17, 24, 25, 31, 42(b
) ,422 x ,180022 x 10n
,330 x ,828)( 2 xx 5n(c)
(a) Using calculator functions,
,827 x 388 s
Solutions:
,827 369
Unbiased estimates of population parameters are:
For the sample,
Estimating the Standard Deviation(b
) ,422 x ,180022 x 10n
Sample mean,
242 n
xx
22
2 xn
xs 3619242
10
18002 2
242 Unbiased estimate of mean, is
122
n
nsS
521Unbiased estimate of population standard deviation, is
)..3(644521 fsS
9
103619 22S
Sample variance,
Estimate of population variance:
Estimating the Standard Deviation
Sample mean,
66n
xx
66 Unbiased estimate of mean,
is
1
)( 22
n
xxS 207
Unbiased estimate of population standard deviation, is S where
)..3(414207 fsS
,330 x ,828)( 2 xx 5n(c)