sampling theory determining the distribution of sample statistics
TRANSCRIPT
![Page 1: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/1.jpg)
Sampling Theory
Determining the distribution of Sample statistics
![Page 2: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/2.jpg)
Sampling Theorysampling distributions
• It is important that we model this and use it to assess accuracy of decisions made from samples.
• A sample is a subset of the population.
• In many instances it is too costly to collect data from the entire population.
Note:It is important to recognize the dissimilarity (variability) we should expect to see in various samples from the same population.
![Page 3: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/3.jpg)
Statistics and Parameters
A statistic is a numerical value computed from a sample. Its value may differ for different samples. e.g. sample mean , sample standard deviation s, and sample proportion .
A parameter is a numerical value associated with a population. Considered fixed and unchanging. e.g. population mean , population standard deviation , and population proportion p.
xp̂
![Page 4: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/4.jpg)
Observations on a measurement X
x1, x2, x3, … , xn
taken on individuals (cases) selected at random from a population are random variables prior to their observation.The observations are numerical quantities whose values are determined by the outcome of a random experiment (the choosing of a random sample from the population).
![Page 5: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/5.jpg)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 10 20 30 40 50 60
The probability distribution of the observations x1, x2, x3, … , xn
is sometimes called the population.
This distribution is the smooth histogram of the the variable X for the entire population
![Page 6: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/6.jpg)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 10 20 30 40 50 60
the population is unobserved (unless all observations in the population have been observed)
![Page 7: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/7.jpg)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 10 20 30 40 50 60
A histogram computed from the observations x1, x2, x3, … , xn
Gives an estimate of the population.
![Page 8: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/8.jpg)
A statistic computed from the observations
x1, x2, x3, … , xn
is also a random variable prior to observation of the sample.
A statistic is also a numerical quantity whose value is determined by the outcome of a random experiment (the choosing of a random sample from the population).
![Page 9: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/9.jpg)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0 10 20 30 40 50 60
The probability distribution of statistic computed from the observations
x1, x2, x3, … , xn
is sometimes called its sampling distribution.This distribution describes the random behaviour of the statistic
![Page 10: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/10.jpg)
It is important to determine the sampling distribution of a statistic.
It will describe its sampling behaviour.
The sampling distribution will be used the assess the accuracy of the statistic when used for the purpose of estimation.
Sampling theory is the area of Mathematical Statistics that is interested in determining the sampling distribution of various statistics
![Page 11: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/11.jpg)
Many statistics have a normal distribution.
This quite often is true if the population is Normal
It is also sometimes true if the sample size is reasonably large. (reason – the Central limit theorem, to be mentioned later)
![Page 12: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/12.jpg)
Combining Random Variables
![Page 13: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/13.jpg)
Combining Random Variables
Quite often we have two or more random variables
X, Y, Z etc
We combine these random variables using a mathematical expression.
Important question
What is the distribution of the new random variable?
![Page 14: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/14.jpg)
Example 1: Suppose that one performs two independent tasks (A and B):
X = time to perform task A (normal with mean 25 minutes and standard deviation of 3 minutes.)
Y = time to perform task B (normal with mean 15 minutes and std dev 2 minutes.)
Let T = X + Y = total time to perform the two tasks
What is the distribution of T?
What is the probability that the two tasks take more than 45 minutes to perform?
![Page 15: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/15.jpg)
Example 2:
Suppose that a student will take three tests in the next three days
1. Mathematics (X is the score he will receive on this test.)
2. English Literature (Y is the score he will receive on this test.)
3. Social Studies (Z is the score he will receive on this test.)
![Page 16: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/16.jpg)
Assume that
1. X (Mathematics) has a Normal distribution with mean = 90 and standard deviation = 3.
2. Y (English Literature) has a Normal distribution with mean = 60 and standard deviation = 10.
3. Z (Social Studies) has a Normal distribution with mean = 70 and standard deviation = 7.
![Page 17: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/17.jpg)
Graphs
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 20 40 60 80 100
X (Mathematics) = 90, = 3.
Y (English Literature) = 60, = 10.
Z (Social Studies) = 70 , = 7.
![Page 18: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/18.jpg)
Suppose that after the tests have been written an overall score, S, will be computed as follows:
S (Overall score) = 0.50 X (Mathematics) + 0.30 Y (English Literature) + 0.20 Z (Social Studies) + 10 (Bonus marks)
What is the distribution of the overall score, S?
![Page 19: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/19.jpg)
Sums, Differences, Linear Combinations of R.V.’s
A linear combination of random variables, X, Y, . . . is a combination of the form:
L = aX + bY + … + c (a constant)
where a, b, etc. are numbers – positive or negative.
Most common:Sum = X + Y Difference = X – Y
Others
Averages = 1/3 X + 1/3 Y + 1/3 Z
Weighted averages = 0.40 X + 0.25 Y + 0.35 Z
![Page 20: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/20.jpg)
Sums, Differences, Linear Combinations of R.V.’s
A linear combination of random variables, X, Y, . . . is a combination of the form:
L = aX + bY + … + c (a constant)
where a, b, etc. are numbers – positive or negative.
Most common:Sum = X + Y Difference = X – Y
Others
Averages = 1/3 X + 1/3 Y + 1/3 Z
Weighted averages = 0.40 X + 0.25 Y + 0.35 Z
![Page 21: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/21.jpg)
Means of Linear Combinations
The mean of L is:
Mean(L) = a Mean(X) + b Mean(Y) + … + c
L = a X + b Y + … + c
Most common:
Mean( X + Y) = Mean(X) + Mean(Y)
Mean(X – Y) = Mean(X) – Mean(Y)
If L = aX + bY + … + c
![Page 22: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/22.jpg)
Variances of Linear Combinations
If X, Y, . . . are independent random variables and
L = aX + bY + … + c then
Variance(L) = a2 Variance(X) + b2 Variance(Y) + …
Most common:
Variance( X + Y) = Variance(X) + Variance(Y)
Variance(X – Y) = Variance(X) + Variance(Y)
2 2 2 2 2L X Ya b
The constant c has no effect on the variance
![Page 23: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/23.jpg)
Example: Suppose that one performs two independent tasks (A and B):
X = time to perform task A (normal with mean 25 minutes and standard deviation of 3 minutes.)
Y = time to perform task B (normal with mean 15 minutes and std dev 2 minutes.)
X and Y independent so T = X + Y = total time is normal with
6.323 deviation standard
401525 mean
22
0823.39.16.3
404545
ZPZPTP
What is the probability that the two tasks take more than 45 minutes to perform?
![Page 24: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/24.jpg)
Example 2: A student will take three tests in the next three days
1. X (Mathematics) has a Normal distribution with mean = 90 and standard deviation = 3.
2. Y (English Literature) has a Normal distribution with mean = 60 and standard deviation = 10.
3. Z (Social Studies) has a Normal distribution with mean = 70 and standard deviation = 7.
Overall score, S = 0.50 X (Mathematics) + 0.30 Y (English Literature) + 0.20 Z (Social Studies) + 10 (Bonus marks)
![Page 25: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/25.jpg)
Graphs
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 20 40 60 80 100
X (Mathematics) = 90, = 3.
Y (English Literature) = 60, = 10.
Z (Social Studies) = 70 , = 7.
![Page 26: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/26.jpg)
Determine the distribution of
S = 0.50 X + 0.30 Y + 0.20 Z + 10
S has a normal distribution with Mean S = 0.50 X + 0.30 Y + 0.20 Z + 10
= 0.50(90) + 0.30(60) + 0.20(70) + 10
= 45 + 18 + 14 +10 = 87
2 2 22 2 20.5 0.3 0.2s X Y Z
2 2 22 2 20.5 3 0.3 10 0.2 7
2.25 9 1.96 13.21 3.635
![Page 27: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/27.jpg)
Graph
0
0.02
0.04
0.06
0.08
0.1
0.12
0 20 40 60 80 100
distribution of
S = 0.50 X + 0.30 Y + 0.20 Z + 10
![Page 28: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/28.jpg)
Sampling Theory
Determining the distribution of Sample statistics
![Page 29: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/29.jpg)
Combining Random Variables
![Page 30: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/30.jpg)
Sums, Differences, Linear Combinations of R.V.’s
A linear combination of random variables, X, Y, . . . is a combination of the form:
L = aX + bY + … + c (a constant)
where a, b, etc. are numbers – positive or negative.
Most common:Sum = X + Y Difference = X – Y
Others
Averages = 1/3 X + 1/3 Y + 1/3 Z
Weighted averages = 0.40 X + 0.25 Y + 0.35 Z
![Page 31: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/31.jpg)
Means of Linear Combinations
The mean of L is:
Mean(L) = a Mean(X) + b Mean(Y) + … + c
L = a X + b Y + … + c
Most common:
Mean( X + Y) = Mean(X) + Mean(Y)
Mean(X – Y) = Mean(X) – Mean(Y)
If L = aX + bY + … + c
![Page 32: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/32.jpg)
Variances of Linear Combinations
If X, Y, . . . are independent random variables and
L = aX + bY + … + c then
Variance(L) = a2 Variance(X) + b2 Variance(Y) + …
Most common:
Variance( X + Y) = Variance(X) + Variance(Y)
Variance(X – Y) = Variance(X) + Variance(Y)
2 2 2 2 2L X Ya b
The constant c has no effect on the variance
![Page 33: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/33.jpg)
Normality of Linear Combinations
If X, Y, . . . are independent Normal random variables and
L = aX + bY + … + c
then L is Normal with
mean
and standard deviation
cba YXL
2222XXL ba
2
![Page 34: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/34.jpg)
In particular:
X + Y is normal with
X – Y is normal with
22 deviation standard
mean
YX
YX
22 deviation standard
mean
YX
YX
![Page 35: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/35.jpg)
The distribution of the sample mean
![Page 36: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/36.jpg)
The distribution of averages (the mean)
• Let x1, x2, … , xn denote n independent random variables each coming from the same Normal distribution with mean and standard deviation .
• Let
11 2
1 1 1
n
ii
n
xx x x x
n n n n
What is the distribution of ?x
![Page 37: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/37.jpg)
The distribution of averages (the mean)
Because the mean is a “linear combination”
1 2
1 1 1nx x x xn n n
and
1 1 1 1n
n n n n
1 2
2 2 22 2 2 21 1 1
nx x x xn n n
2 2 2 2 22 2 2
2
1 1 1n
n n n n n
![Page 38: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/38.jpg)
Thus if x1, x2, … , xn denote n independent random variables each coming from the same Normal distribution with mean and standard deviation .
Then
11 2
1 1 1
n
ii
n
xx x x x
n n n n
has Normal distribution with
mean andx 2
2variance x n
standard deviation xn
![Page 39: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/39.jpg)
Graphs
0
0.02
0.04
0.06
0.08
150 170 190 210 230 250 270 290 310
The probability distribution of individual observations
The probability distribution of the mean
n
![Page 40: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/40.jpg)
Summary
• The distribution of the sample mean is Normal.• The distribution of the sample mean has exactly the
same mean as the population (). • The distribution of the sample mean has a smaller
standard deviation then the population.
• Averaging tends to decrease variability• An Excel file illustrating the distribution of the
sample mean
compared to n
x
x
![Page 41: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/41.jpg)
Example
• Suppose we are measuring the cholesterol level of men age 60-65
• This measurement has a Normal distribution with mean = 220 and standard deviation = 17.
• A sample of n = 10 males age 60-65 are selected and the cholesterol level is measured for those 10 males.
• x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, are those 10 measurements
Find the probability distribution of
Compute the probability that is between 215 and 225
?xx
![Page 42: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/42.jpg)
Solution
Find the probability distribution of x
Normal with 220x 17
and 5.37610
xn
215 225P x
215 220 220 225 220
5.376 5.376 5.376
xP
0.930 0.930 0.648P z
![Page 43: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/43.jpg)
The Central Limit Theorem
The Central Limit Theorem (C.L.T.) states that if n is sufficiently large, the sample means of random samples from any population with mean and finite standard deviation are approximately normally distributed with mean and standard deviation .
Technical Note: The mean and standard deviation given in the CLT hold for any sample size; it is only the “approximately normal” shape that requires n to be sufficiently large.
n
![Page 44: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/44.jpg)
Graphical Illustration of the Central Limit Theorem
Original Population
x10 3020
10 x
Distribution of x: n = 10
x
Distribution of x:n = 30
10 20
x
Distribution of x: n = 2
10 3020
![Page 45: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/45.jpg)
Implications of the Central Limit Theorem
• The Conclusion that the sampling distribution of the sample mean is Normal, will to true if the sample size is large (>30). (even though the population may be non- normal).
• When the population can be assumed to be normal, the sampling distribution of the sample mean is Normal, will to true for any sample size.
• Knowing the sampling distribution of the sample mean allows to answer probability questions related to the sample mean.
![Page 46: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/46.jpg)
Example
Example: Consider a normal population with = 50 and = 15.
Suppose a sample of size 9 is selected at random. Find:
P x( )45 60
P x( . )47 5
1)
2)
Solutions: Since the original population is normal, the distribution of the sample mean is also (exactly) normal
1) x 50
x n 15 9 15 3 52)
![Page 47: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/47.jpg)
5045 60 x0 1.00 2.00 z
Example
P x P
P z
( )
(
. . .
45 6045 50
560 50
5
1.00 2.00)
0 8413 0 0228 08185
zz = ;x - n
![Page 48: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/48.jpg)
5047.5 x0-0.50 z
0 3085.
Example
P x Px
P z
( . ).
( . )
. . .
47 550
547 5 50
5
5
05000 01915 0 3085
z = ;x - n
![Page 49: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/49.jpg)
ExampleExample: A recent report stated that the average cost of a hotel room in Toronto is $109/day. Suppose this figure is correct and that the standard deviation is known to be $20.
1) Find the probability that a sample of 50 hotel rooms selected at random would show a mean cost of $105 or less per day.
2) Suppose the actual sample mean cost for the sample of 50 hotel rooms is $120/day. Is there any evidence to refute the claim of $109 presented in the report?
Solution:
• The shape of the original distribution is unknown, but the sample size, n, is large. The CLT applies.
• The distribution of is approximately normal
x
x n 109 20 50 2 83 .x
![Page 50: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/50.jpg)
Example
xP P
P z
( ).
( . )
105105 109
2 83
141
.0 0793
z = ;x - n
z
109105 x0 141. z
0 0793.
1)
![Page 51: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/51.jpg)
• To investigate the claim, we need to examine how likely an observation is the sample mean of $120
• There is evidence (the sample) to suggest the claim of = $109 is likely wrong
• Since the probability is so small, this suggests the observation of $120 is very rare (if the mean cost is really $109)
• Consider how far out in the tail of the distribution of the sample mean is $120
P x P
P z
( ).
( . )
120 120 109
2 83389
1.0000 - 0.9999 = 0.0001
z = ;x - n
z
2)
![Page 52: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/52.jpg)
Summary
• The distribution of is (exactly) normal when the original population is normal
• The CLT says: the distribution of is approximately normal regardless of the shape of the original distribution, when the sample size is large enough!
• The mean of the sampling distribution of is equal to the mean of the original population:
xx
x
x
• The standard deviation of the sampling distribution of (also called the standard error of the mean) is equal to the standard deviation of the original population divided by the square root of the sample size:
x
xn
![Page 53: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/53.jpg)
Sampling Distribution of a Sample Proportion
![Page 54: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/54.jpg)
Sampling Distribution for Sample Proportions
Let p = population proportion of interest or binomial probability of success.
Let
trialsbimomial of no.
succeses of no.ˆ
n
Xp
p̂ ofon distributi sampling Then the
pp ˆmean n
ppp
)1(ˆ
is approximately a normal distribution with
= sample proportion or proportion of successes.
![Page 55: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/55.jpg)
LogicRecall X = the number of successes in n trials has a Binomial distribution with parameters n and p (the probability of success). Also X has approximately a Normal distribution with
mean = np and standard deviation
1ˆThen the sampling distribution of
Xp X
n n
ˆ
1 1mean p np p
n n
ˆ
(1 )(1 )
1 1and p
p pnp p
nn n
is a normal distribution with
(1 )npq np p
![Page 56: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/56.jpg)
0
5
10
15
20
25
30
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
c
p̂ ofon distributi Sampling
p̂ p
ˆ
1p
p p
n
![Page 57: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/57.jpg)
Example Sample Proportion Favoring a Candidate
Suppose 20% all voters favor Candidate A. Pollsters take a sample of n = 600 voters. Then the sample proportion who favor A will have approximately a normal distribution with
20.0mean ˆ pp
01633.0600
)80.0(20.0)1(ˆ
n
ppp
![Page 58: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/58.jpg)
0
5
10
15
20
25
30
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
c
p̂ ofon distributi Sampling
![Page 59: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/59.jpg)
Determine the probability that the sample proportion will be between 0.18 and 0.22
ˆi.e. the probability, 0.18 0.22 P p
Using the Sampling distribution:Suppose 20% all voters favor Candidate A. Pollsters take a sample of n = 600 voters.
![Page 60: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/60.jpg)
01633.0600
)80.0(20.0)1(ˆ
n
ppp
Solution:
20.0 Recall ˆ pp
ˆ0.18 0.20 0.20 0.22 0.20ˆ0.18 0.22
0.1633 0.1633 0.1633
pP p P
7794.01103.08897.0 225.1225.1 zP
01633.0 01633.0 01633.0
![Page 61: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/61.jpg)
Sampling Theory - SummaryThe distribution of sample statistics
![Page 62: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/62.jpg)
Distribution for Sample Mean
the sampling distribution of x
mean and standard deviation x xn
is a normal distribution with
If data is collected from a Normal distribution with mean and standard deviation then:
![Page 63: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/63.jpg)
The Central Limit Thereom
the sampling distribution of x
mean and standard deviation x xn
is a approximately normal (for n > 30) with
If data is collected from a distribution (possibly non Normal) with mean and standard deviation then:
![Page 64: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/64.jpg)
Distribution for Sample Proportions
Let p = population proportion of interest or binomial probability of success.
Let
trialsbimomial of no.
succeses of no.ˆ
n
Xp
p̂ ofon distributi sampling Then the
pp ˆmean n
ppp
)1(ˆ
is approximately a normal distribution with
= sample proportion or proportion of successes.
![Page 65: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/65.jpg)
Sampling distribution of a differences
![Page 66: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/66.jpg)
Sampling distribution of a difference in two Sample means
![Page 67: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/67.jpg)
If X, Yare independent normal random variables, then :
X – Y is normal with
Recall
22 deviation standard
mean
YX
YX
![Page 68: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/68.jpg)
Comparing Means
Situation• We have two normal populations (1 and 2)• Let 1 and 1 denote the mean and standard deviation of
population 1.• Let 2 and 2 denote the mean and standard deviation of
population 2.• Let x1, x2, x3 , … , xn denote a sample from a normal
population 1.• Let y1, y2, y3 , … , ym denote a sample from a normal
population 2.• Objective is to compare the two population means
![Page 69: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/69.jpg)
We know that:
is Normal with meanD x y
11
is Normal with mean and x xx
n
22
and
is Normal with mean and y yy
m
Thus
1 2 -x y x y
2 22 2 1 2= x y x y n m
![Page 70: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/70.jpg)
Example
Consider measuring Heart rate two minutes after a twenty minute exercise program.
There are two groups of individuals
1. Those who performed exercise program A (considered to be heavy).
2. Those who performed exercise program B (considered to be light).
The average Heart rate for those who performed exercise program A was 1 = 110 with standard deviation, 1 = 7.3, while the average Heart rate for those who performed exercise program B was 2 = 95 with standard deviation, 2 = 4.5.
![Page 71: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/71.jpg)
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
80 90 100 110 120 130
Heart rate for program B
Heart rate for program A
![Page 72: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/72.jpg)
Situation• Suppose we observe the heart rate of n = 15 subjects on
program A.• Let x1, x2, x3 , … , x15 denote these observations.• We also observe the heart rate of m = 20 subjects on
program B.• Let y1, y2, y3 , … , y20 denote these observations.• What is the probability that the sample mean heart rate for
Program A is at least 8 units higher than the sample mean heart rate for Program B?
![Page 73: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/73.jpg)
We know that:
is Normal with meanD x y
7.3 is Normal with mean 110 and
15x xx
and
4.5 is Normal with mean 95 and
20y yy
and
110 -95 15x y x y
2 2 2 22 2 1 2 7.3 4.5
= 2.136615 20x y x y n m
![Page 74: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/74.jpg)
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
80 90 100 110 120 130
distn of sample mean for program B distn of
sample mean program A
![Page 75: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/75.jpg)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 5 10 15 20 25 30
distn of difference in sample means, D
![Page 76: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/76.jpg)
• What is the probability that the sample mean heart rate for Program A is at least 8 units higher than the sample mean heart rate for Program B?
Solution
want 8 8 8P x y P x y P D
15 8 15 3.28
2.1366 2.1366
DP P z
1 0.0005 0.9995
![Page 77: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/77.jpg)
Sampling distribution of a difference in two Sample proportions
![Page 78: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/78.jpg)
Comparing ProportionsSituation• Suppose we have two Success-Failure experiments• Let p1 = the probability of success for experiment 1.• Let p2 = the probability of success for experiment 2.• Suppose that experiment 1 is repeated n1 times and
experiment 2 is repeated n2 • Let x1 = the no. of successes in the n1 repititions of
experiment 1, x2 = the no. of successes in the n2 repititions of experiment 2.
1 21 2
1 2
ˆ ˆ= and =x x
p pn n
1 21 2
1 2
ˆ ˆWhat is the distribution of = ?x x
D p pn n
![Page 79: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/79.jpg)
We know that:
1 2ˆ ˆ is Normal with meanD p p
1
1ˆ1 1
1
ˆ = is Normal with mean p
xp p
n
Thus
1 2 1 2ˆ ˆ ˆ ˆ 1 2 -p p p p p p
1 2 1 2
1 1 2 22 2ˆ ˆ ˆ ˆ
1 2
1 1= p p p p
p p p p
n n
1
1 1ˆ
1
1- and p
p p
n
2
2ˆ2 2
2
ˆAlso = is Normal with mean p
xp p
n
2
2 2ˆ
2
1- and p
p p
n
![Page 80: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/80.jpg)
Example
The Globe and Mail carried out a survey to investigate the “State of the Baby Boomers”. (June 2006)
Two populations in the study
1. Baby Boomers (age 40 – 59) (n1 = 664)
2. Generation X (age 30 – 39) (n2 = 342)
![Page 81: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/81.jpg)
One of questions“Are you close to your parents? – Yes or No”
Suppose that the proportions in the two populations were:• Baby Boomers – 40% yes (p1 = 0.40)• Generation X – 20% yes (p2 = 0.20)
What is the probability that this would be observed in the samples to a certain degree?
What is P[p1 – p2 ≥ 0.15]?^ ^
![Page 82: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/82.jpg)
Solution:
1
1ˆ1 1
1
ˆ = is Normal with mean 0.40p
xp p
n
1
1 1ˆ
1
1- and p
p p
n
2
2ˆ2 2
2
ˆAlso = is Normal with mean 0.20p
xp p
n
2
2 2ˆ
2
1- and p
p p
n
0.40 1- 0.40 0.019012
664
0.20 1- 0.20 0.02163
342
![Page 83: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/83.jpg)
0
5
10
15
20
25
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
distn of sample proportion for Gen X
distn of sample proportion for Baby Boomers
![Page 84: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/84.jpg)
1 2ˆ ˆ is Normal with meanD p p Now
1 2 1 2ˆ ˆ ˆ ˆ 1 2 - 0.4 0.2 0.2D p p p p p p
1 2 1 2
1 1 2 22 2ˆ ˆ ˆ ˆ
1 2
1 1= D p p p p
p p p p
n n
0.4 1 0.4 0.2 1 0.2
664 3420.028797
![Page 85: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/85.jpg)
0
2
4
6
8
10
12
14
16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 2ˆ ˆD p p Distribution of
![Page 86: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/86.jpg)
1 2ˆ ˆ 0.15 0.15P p p P D
Now
0.15 0.2
0.028797
1.74 1 0.0409 .9591
D
D
DP
P z
![Page 87: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/87.jpg)
Sampling distributions
Summary
![Page 88: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/88.jpg)
Distribution for Sample Mean
the sampling distribution of x
mean and standard deviation x xn
is a normal distribution with
If data is collected from a Normal distribution with mean and standard deviation then:
![Page 89: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/89.jpg)
The Central Limit Thereom
the sampling distribution of x
mean and standard deviation x xn
is a approximately normal (for n > 30) with
If data is collected from a distribution (possibly non Normal) with mean and standard deviation then:
![Page 90: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/90.jpg)
Distribution for Sample Proportions
Let p = population proportion of interest or binomial probability of success.
Let
trialsbimomial of no.
succeses of no.ˆ
n
Xp
p̂ ofon distributi sampling Then the
pp ˆmean n
ppp
)1(ˆ
is approximately a normal distribution with
= sample proportion or proportion of successes.
![Page 91: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/91.jpg)
Distribution of a difference in two sample Means
is Normal with meanD x y
1 2 -x y x y
2 22 2 1 2= x y x y n m
![Page 92: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/92.jpg)
Distribution of a difference in two sample proportions
1 2ˆ ˆ is Normal with meanD p p
1 2 1 2ˆ ˆ ˆ ˆ 1 2 -p p p p p p
1 2 1 2
1 1 2 22 2ˆ ˆ ˆ ˆ
1 2
1 1= p p p p
p p p p
n n
![Page 93: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/93.jpg)
The Chi-square (2) distribution
![Page 94: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/94.jpg)
The Chi-squared distribution with
degrees of freedom
Comment: If z1, z2, ..., z are independent
random variables each having a standard normal distribution then
U =
has a chi-squared distribution with degrees of freedom.
222
21 zzz
![Page 95: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/95.jpg)
0
0.06
0.12
0.18
0 10 20
The Chi-squared distribution with
degrees of freedom
- degrees of freedom
![Page 96: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/96.jpg)
2 4 6 8 10 12 14
0.1
0.2
0.3
0.4
0.52 d.f.
3 d.f.
4 d.f.
![Page 97: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/97.jpg)
Statistics that have the Chi-squared distribution:
2
2 2
1 1 1 1
1. c r c r
ij ij
ijj i j iij
x Er
E
The statistic used to detect independence between two categorical variables
d.f. = (r – 1)(c – 1)
![Page 98: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/98.jpg)
Let x1, x2, … , xn denote a sample from the normal distribution with mean and standard deviation , then
2
12
2.
r
ii
x xU
has a chi-square distribution with d.f. = n – 1.
2
2
( 1)n s
![Page 99: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/99.jpg)
Suppose that x1, x2, … , x10 is a sample of size n = 10 from the normal distribution with mean =100 and standard deviation =15.
2
1
1
r
ii
x xs
n
Suppose that
Example
is the sample standard deviation.
Find 10 20 .P s
![Page 100: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/100.jpg)
2
12
r
ii
x xU
has a chi-square distribution with
d.f. = n – 1 = 9
2
2
( 1)n s
Note
210 20 100 400P s P s
2
2
(9)
(15)
s
2
2 2 2
9 100 9 4009
15 15 15
sP
4 16P U
![Page 101: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/101.jpg)
chi-square distribution with d.f. = n – 1 = 9
4 16P U
We do not have tables to compute this area
![Page 102: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/102.jpg)
4 16P U
The excel function CHIDIST(x,df) computes P x U
x
P x U
![Page 103: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/103.jpg)
4 16 CHIDIST(4,9)-CHIDIST(16,9)P U
= 0.91141 - 0.06688 = 0.84453
![Page 104: Sampling Theory Determining the distribution of Sample statistics](https://reader035.vdocuments.site/reader035/viewer/2022062308/56649d8c5503460f94a74999/html5/thumbnails/104.jpg)
Statistical Inference