8_two samples test- updated- 1 july 14.pdf
TRANSCRIPT
-
Two sample Test
A.RameshDepartment of Management Studies
Indian institute of Technology Roorkee
-
Two-Sample Tests Overview
The means of two independent populations
The means of two related populations
The proportions of two independent populations
The variances of two independent populations
-
Two-Sample Tests Overview
Two Sample Tests
Independent
Population
Means
Means,
Related
Populations
Independent
Population
Variances
Group 1 vs. Group 2
Same group before vs. after treatment
Variance 1 vs.Variance 2
Examples
Independent
Population
Proportions
Proportion 1vs. Proportion 2
-
Two-Sample Tests
Independent
Population Means
1 and 2 known
1 and 2 unknown
Goal: Test hypothesis or form
a confidence interval for the
difference between two
population means, 1 2
The point estimate for the
difference between sample
means:
X1 X2
-
Sampling Distribution of the Difference Between
Two Sample Means
1
1
X n
x=
Population 1
Population 2
2
2
X n
x=
1 2X X
1X
2X
1 2X X
-
Sampling Distribution of the Difference
between Two Sample Means
1 2X X1 2X X
1 2
1
2
1
2
2
2X X n n
= +
1 2
1 2X X=
-
Z Formula for the Difference in Two Sample
Means
( ) ( )Z
X X
n n
=
+
1 2 1 2
1
2
1
2
2
2
n1 30, n2 30, and Independent Samples
-
Confidence Interval to Estimate 1 - 2 When
n1 and n2 are large and 1, 2 are unknown
( ) ( )1 2
1
2
1
2
2
21 2 1 2
1
2
1
2
2
2
X XSn
Sn X X
Sn
Sn
Z Z + +
( ) ( )Pr [ ]ob Z ZX X SnSn X X
Sn
Sn1 2
1
2
1
2
2
21 2 1 2
1
2
1
2
2
2
1 + + =
-
Two-Sample Tests
Independent Populations
Lower-tail test:
H0: 1 2H1: 1 < 2
i.e.,
H0: 1 2 0
H1: 1 2 < 0
Upper-tail test:
H0: 1 2H1: 1 > 2
i.e.,
H0: 1 2 0
H1: 1 2 > 0
Two-tail test:
H0: 1 = 2H1: 1 2
i.e.,
H0: 1 2 = 0
H1: 1 2 0
Two Independent Populations, Comparing Means
-
Two-Sample Tests
Independent Populations
Two Independent Populations, Comparing Means
Lower-tail test:
H0: 1 2 0
H1: 1 2 < 0
Upper-tail test:
H0: 1 2 0
H1: 1 2 > 0
Two-tail test:
H0: 1 2 = 0
H1: 1 2 0
/2 /2
-z -z/2z z/2
Reject H0 if Z < -Za Reject H0 if Z > Za Reject H0 if Z < -Za/2or Z > Za/2
-
Problem 1: Two Sample Z test
A random sample of 32 advertising managers from across the United States is taken. The advertising managers are contacted by telephone and asked what their annual salary is.
A similar random sample is taken of 34 auditing managers. The resulting salary data are listed in Table , along with the sample means, the population standard deviations, and the population variances.
-
Hypothesis Testing for Differences Between
Means: The Wage Example
Advertising Managers
74.256 57.791 71.115
96.234 65.145 67.574
89.807 96.767 59.621
93.261 77.242 62.483
103.030 67.056 69.319
74.195 64.276 35.394
75.932 74.194 86.741
80.742 65.360 57.351
39.672 73.904
45.652 54.270
93.083 59.045
63.384 68.508
164.264
253.16
700.70
32
2
1
1
1
1
=
=
=
=
S
S
X
n
411.166
900.12
187.62
34
2
2
2
2
2
=
=
=
=
S
S
X
n
Auditing Managers
69.962 77.136 43.649
55.052 66.035 63.369
57.828 54.335 59.676
63.362 42.494 54.449
37.194 83.849 46.394
99.198 67.160 71.804
61.254 37.386 72.401
73.065 59.505 56.470
48.036 72.790 67.814
60.053 71.351 71.492
66.359 58.653
61.261 63.508
-
Hypothesis Testing for Differences Between Means:
The Wage Example
1 2X X
RejectionRegion
Non Rejection Region
Critical Values
RejectionRegion
1 2X X
025.2
=
025.2
=
H
H
o
a
:
:
1 2
1 2
0
0
=
-
Hypothesis Testing for Differences Between Means:
The Wage Example
.Hreject not do 1.96, Z 1.96- If
.Hreject 1.96, > or Z 1.96- < ZIf
o
o
RejectionRegion
Non Rejection Region
Critical Values
RejectionRegion
96.1=Zc0 96.1=Zc
025.2
=025.
2=
-
Hypothesis Testing for Differences between
Means: The Wage Example
( ) ( )
( ) ( )35.2
34
411.166
32
253.256
0187.62700.70
2
2
2
1
2
1
2121
=
+
=
+
=
nS
nS
XXZ
.Hreject not do 1.96, Z 1.96- If
.Hreject 1.96, > or Z 1.96- < ZIf
o
o
.Hreject 1.96, > 2.35 = ZSince o
RejectionRegion
Non Rejection Region
Critical Values
RejectionRegion
cZ = 233.
025.2
=
0 cZ = 233.
025.2
=
-
Problem 2: Two Sample Z test Greystone Department Stores, Inc., operates two stores in
Buffalo, New York: One is in the inner city and the other is in
a suburban shopping center.
The regional manager noticed that products that sell well in
one store do not always sell well in the other.
The manager believes this situation may be attributable to
differences in customer demographics at the two locations.
Customers may differ in age, education, income, and so on.
Suppose the manager asks us to investigate the difference
between the mean ages of the customers who shop at the two
stores.
-
Data
1 = 10 and 2 = 10
= .05
n1 = 30
n2 = 40
X1 bar = 82 and x2 bar= 78.
-
Solution
The margin of error is 4.06 years and the 95%
confidence interval estimate of the difference
between the two population means is 5 - 4.06= .94
years to 5 - 4.06 = 9.06 years.
Do not reject Ho.
-
Two-Sample Tests
Independent Populations: 1 and 2 unknown
Independent
Population Means
1 and 2 known
1 and 2 unknown
Assumptions:
Samples are randomly andindependently drawn
Populations are normally
distributed
Population variances are
unknown but assumed equal
-
Two-Sample Tests
Independent Populations
Independent
Population Means
1 and 2 known
1 and 2 unknown
Forming interval estimates:
The population variances
are assumed equal, so use
the two sample standard
deviations and pool them to
estimate
the test statistic is a t value
with (n1 + n2 2) degrees
of freedom
-
The t Test for Differences in Population Means
Each of the two populations is normally distributed.
The two samples are independent.
At least one of the samples is small, n < 30.
The values of the population variances are unknown.
The variances of the two populations are equal. 1
2 = 22
-
t Formula to Test the Difference in Means
Assuming 12 = 2
2
( ) ( )( ) ( )
tX X
S n S nn n n n
=
+
+ +
1 2 1 2
1
2
1 2
2
2
1 2 1 2
1 1
2
1 1
-
Problem 1: Independent Populations and 1and 2 unknown and equal
At the Hernandez Manufacturing Company, an application of this test arises.
New employees are expected to attend a three-day seminar to learn about the company. At the end of the seminar, they are tested to measure their knowledge about the company.
The traditional training method has been lecture and a question-and-answer session. Management decided to experiment with a different training procedure, which processes new employees in two days by using DVDs and having no question-and-answer session.
If this procedure works, it could save the company thousands of dollars over a period of several years. However, there is some concern about the effectiveness of the two-day method, and company managers would like to know whether there is any difference in the effectiveness of the two training methods.
-
Hernandez Manufacturing Company: Test
Scores for New Employees After Training
Training Method A
56 51 45
47 52 43
42 53 52
50 42 48
47 44 44
Training Method B
59
52
53
54
57
56
55
64
53
65
53
57
1
1
1
2
15
47 73
19 495
n
X
S
=
=
=
.
.
2
2
2
2
12
56 5
18 273
n
X
S
=
=
=
.
.
-
Hernandez Manufacturing Company
H
H
o
a
:
:
1 2
1 2
0
0
=
If t < - 2.060 or t > 2.060, reject H .
If - 2.060 t 2.060, do not reject H .
o
o
2
05
2025
2 15 12 2 25
2 060
1 2
0 25 25
= =
= + = =
=
..
.. ,
df n n
t
RejectionRegion
Non Rejection Region
Critical Values
RejectionRegion
2025=.
0 . , .025 25 2060t =
2025=.
. ,.
025 252060t =
-
Hernandez Manufacturing Company
Since t = -5.20 < -2.060, reject H .o
( ) ( )( ) ( )
( )
( )( ) ( )( )
tX X
S n S nn n n n
=
+
+ +
=
+
+ +
=
1 2 1 2
1
2
1 2
2
2
1 2 1 2
1 1
2
1 1
47 73 56 50 0
19 495 14 18 273 11
15 12 2
1
15
1
12
5 20
. .
. .
.
If t < -2.060 or t > 2.060, reject H .
If -2.060 t 2.060, do not reject H .
o
o
-
Confidence Interval to Estimate 1 -
2 with Small Samples and 12 = 2
2
( )( ) ( )
1 2
1
2
1 2
2
2
1 2 1 2
1 2
1 1
2
1 1
2
X XS n S n
n n n n
n n
t
where df
+
+ +
= +
-
Problem 2: Independent Populations and 1and 2 unknown and equal
You are a financial analyst for a brokerage firm. Is there a difference in dividend yield between stocks listed on the NYSE & NASDAQ? You collect the following data:
NYSE NASDAQ (National Association of Securities Dealers Automated Quotations.)
Number 21 25
Sample mean 3.27 2.53
Sample std dev 1.30 1.16
Assuming both populations are approximately normal with equal
variances, is there a difference in average yield ( = 0.05)?
-
Solution
H0: 1 - 2 = 0 i.e. (1 = 2)
H1: 1 - 2 0 i.e. (1 2)
-
Two-Sample Tests
Independent Populations
( ) ( ) ( ) ( )1.5021
1)25(1)-(21
1.161251.30121
1)n()1(n
S1nS1nS
22
21
2
22
2
112p =
+
+=
+
+=
( ) ( ) ( )2.040
25
1
21
15021.1
02.533.27
n
1
n
1S
XXt
21
2p
2121 =
+
=
+
=
The test statistic is:
-
Two-Sample Tests
Independent Populations
H0: 1 - 2 = 0 i.e. (1 = 2)
H1: 1 - 2 0 i.e. (1 2)
= 0.05
df = 21 + 25 - 2 = 44
Critical Values: t = 2.0154
Test Statistic: 2.040
t0 2.0154-2.0154
.025
Reject H0 Reject H0
.025
Decision: Reject H0 at = 0.05
2.040
Conclusion: There is evidence
of a difference in the means.
-
Two-Sample Tests: Dependent Samples
Before and After Measurements on the same individual
Studies of twins
Studies of spouses
Individual
1
2
3
4
5
6
7
Before
32
11
21
17
30
38
14
After
39
15
35
13
41
39
22
-
Two-Sample Tests
Related Populations
D = X1 - X2
Tests Means of 2 Related Populations
Paired or matched samples
Repeated measures (before/after)
Use difference between paired values:
Assumptions:
Both Populations Are Normally Distributed
-
Two-Sample Tests
Related Populations
The ith paired difference is Di , where
n
D
D
n
1i
i==
Di = X1i - X2iThe point estimate for the population mean
paired difference is D :
-
Two-Sample Tests
Related Populations
Suppose the population standard deviation of
the difference scores, D, is known.
The test statistic for the mean difference is a Z value:
n
DZ
D
D=
Where
D = hypothesized mean difference
D = population standard deviation of differences
n = the sample size (number of pairs)
-
Two-Sample Tests
Related Populations
If D is unknown, you can estimate the unknown population standard deviation with a sample standard deviation:
1n
)D(D
S
n
1i
2
i
D
=
=
-
Two-Sample Tests
Related Populations
1n
)D(D
S
n
1i
2i
D
=
=
n
S
Dt
D
D=
The test statistic for D is now a t statistic:
Where t has n - 1 d.f.
and SD is:
-
Two-Sample Tests
Related Populations
Lower-tail test:
H0: D 0
H1: D < 0
Upper-tail test:
H0: D 0
H1: D > 0
Two-tail test:
H0: D = 0
H1: D 0
/2 /2
-t -t/2t t/2
Reject H0 if t < -ta Reject H0 if t > ta Reject H0 if t < -ta/2or t > ta/2
-
Problem 1: Two-Sample Tests
Related Populations
Assume you send your salespeople to a customer service training workshop. Has the training made a difference in the number of complaints? You collect the following data:
Salesperson Number of Complaints Difference, Di
(2-1)Before (1) After (2)
C.B. 6 4 -2
T.F. 20 6 -14
M.H. 3 2 -1
R.K. 0 0 0
M.O 4 0 -4
-
Two-Sample Tests
Related Populations Example
2.4n
D
D
n
1i
i
==
=
5.67
1n
)D(DS
2
i
D
=
=
Salesperson Number of Complaints Difference, Di
(2-1)Before (1) After (2)
C.B. 6 4 -2
T.F. 20 6 -14
M.H. 3 2 -1
R.K. 0 0 0
M.O 4 0 -4
-
Two-Sample Tests
Related Populations Example
Has the training made a difference in the number of complaints (at the = 0.01 level)?
H0: D = 0
H1: D 0Critical Value = 4.604
d.f. = n - 1 = 4
Test Statistic:
1.6655.67/
04.2
n/S
t
D
D =
=
=D
-
Two-Sample Tests
Related Populations Example
Reject
- 4.604 4.604
Reject
/2
- 1.66
Decision: Do not reject H0(t statistic is not in the reject region)
Conclusion: There is no
evidence of a significant change
in the number of complaints
/2
-
Two-Sample Tests
Related Populations
The confidence interval for D (known) is:
n
DZD
Where
n = the sample size (number of pairs in the paired sample)
-
Two-Sample Tests
Related Populations
The confidence interval for D ( unknown) is:
1n
)D(D
S
n
1i
2
i
D
=
=
n
StD D1n
where
-
Sampling Distribution of Differences
in Sample ProportionsFor large sam ples
1.
2.
3. and
4. where q = 1 - p
the difference in sam ple proportions is norm ally distributed with
p and
p
1
1
2
2
n
n
n
n
1
1
>
>
>
>
=
=
+
1
1
2
2
1 2
1 1
1
2 2
2
5
5
5
5
2
2
,
,
,
p
q
p
q
P P
P Q
nP Q
n
p
p
-
Z Formula for the Difference
in Two Population Proportions
( ) ( )Z
p p P P
P Q
nP Q
n
p
p
=
+
=
=
=
=
=
=
=
=
1 2 1 2
1 1
1
2 2
2
1
2
proportion from sam ple 1
proportion from sam ple 2
size of sam ple 1
size of sam ple 2
proportion from population 1
proportion from population 2
1 -
1 -
1
2
1
2
1 1
2 2
n
n
P
P
Q P
Q P
-
Z Formula to Test the Difference
in Population Proportions
( ) ( )
( )
Z
P Q
P
Q P
p p P P
n n
X Xn n
n p n p
n n
=
+
=+
+
=+
+
=
1 2 1 2
1 2
1 2
1 2
1 1 2 2
1 2
1 1
1
-
Two Population Proportions
Hypothesis for Population Proportions
Lower-tail test:
H0: 1 2H1: 1 < 2
i.e.,
H0: 1 2 0
H1: 1 2 < 0
Upper-tail test:
H0: 1 2H1: 1 > 2
i.e.,
H0: 1 2 0
H1: 1 2 > 0
Two-tail test:
H0: 1 = 2H1: 1 2
i.e.,
H0: 1 2 = 0
H1: 1 2 0
-
Two Population Proportions
Hypothesis for Population Proportions
Lower-tail test:
H0: 1 2 0
H1: 1 2 < 0
Upper-tail test:
H0: 1 2 0
H1: 1 2 > 0
Two-tail test:
H0: 1 2 = 0
H1: 1 2 0
/2 /2
-z -z/2z z/2
Reject H0 if Z < -Z Reject H0 if Z > Z Reject H0 if Z < -Z/2or Z > Z/2
-
Two Independent Population
Proportions: Example
Is there a significant difference between the proportion of men and the proportion of women who will vote Yes on Proposition A?
In a random sample of 72 men, 36 indicated they would vote Yes and, in a sample of 50 women, 31 indicated they would vote Yes
Test at the .05 level of significance
-
Two Independent Population
Proportions: Example
H0: 1 2 = 0 (the two proportions are equal)
H1: 1 2 0 (there is a significant difference between proportions)
The sample proportions are:
Men: p1 = 36/72 = .50
Women: p2 = 31/50 = .62
The pooled estimate for the overall proportion is:
.549122
67
5072
3136
nn
XXp
21
21 ==+
+=
+
+=
-
Two Independent Population
Proportions: Example
The test statistic for 1 2 is:
( ) ( )
( ) ( )1.31
50
1
72
1.549)(1.549
0.62.50
n
1
n
1)p(1p
z
21
2121
=
+
=
+
=
pp
Critical Values = 1.96For = .05
.025
-1.96 1.96
.025
-1.31
Decision: Do not reject H0
Conclusion: There is no evidence of a
significant difference in proportions who
will vote yes between men and women.
Reject H0 Reject H0
-
Two Independent Population
Proportions
( )2
22
1
1121
n
)(1
n
)(1 ppppZpp
+
The confidence interval for 1 2 is:
-
F Test for Two Population Variances
1
1
22min
11
2
2
2
1
==
==
=
n
n
SS
atordeno
numerator
df
df
F
-
F Distribution with 1 = 10 and 2 = 8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.00 1.00 2.00 3.00 4.00 5.00 6.00
-
A Portion of the F Distribution Table
for = 0.025
Numerator Degrees of Freedom
DenominatorDegrees of Freedom
. , ,025 9 11F
1 2 3 4 5 6 7 8 9
1 647.79 799.48 864.15 899.60 921.83 937.11 948.20 956.64 963.28
2 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.39
3 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.47
4 12.22 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.90
5 10.01 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68
6 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.52
7 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.82
8 7.57 6.06 5.42 5.05 4.82 4.65 4.53 4.43 4.36
9 7.21 5.71 5.08 4.72 4.48 4.32 4.20 4.10 4.03
10 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.78
11 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.59
12 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.44
-
Testing Population Variances
Purpose: To determine if two independent
populations have the same variability.
H0: 12 = 2
2
H1: 12 2
2
H0: 12 2
2
H1: 12 < 2
2
H0: 12 2
2
H1: 12 > 2
2
Two-tail test Lower-tail test Upper-tail test
-
Suppose a machine produces metal sheets that are specified to be 22 milli meters thick.
Because of the machine, the operator, the raw material, the manufacturing environment, and other factors, there is variability in the thickness.
Two machines produce these sheets. Operators are concerned about the consistency of the two machines. To test consistency, they randomly sample 10 sheets produced by machine 1 and 12 sheets produced by machine 2.
The thickness measurements of sheets from each machine are given in the table on the following page. Assume sheet thickness is normally distributed in the population.
How can we test to determine whether the variance from each sample comes from the same population variance (population variances are equal) or from different population variances (population variances are not equal)?
-
Sheet Metal Example: Hypothesis Test for
Equality of Two Population Variances
H
H
o
a
:
:
1
2
2
2
1
2
2
2
=
.025,9,11F =359.
If
If
F 3.59 reject H.
0.28 F do reject H.
o
o
,
. , 359
= .975,11,9
.025,9,11
FF
1
1
359
028
=
=
.
.
F
df
df
SS
n
n
numerator
deno ator
=
= =
= =
1
2
2
2
1 1
2 2
1
1
min
=
=
=
005
10
12
1
2
.
n
n
-
Sheet metal Manufacturer
Rejection Regions
Critical Values
. , ,.
025 9 113 59F =
Non RejectionRegion
. , ,.
975 11 90 28F =
If
If
F 3.59 reject H.
0.28 F do reject H.
o
o
,
. , 359
-
Sheet Metal Example
Machine 1
22.3 21.8 22.2
21.8 21.9 21.6
22.3 22.4
21.6 22.5
Machine 2
22.0
22.1
21.8
21.9
22.2
22.0
21.7
21.9
22.0
22.1
21.9
22.1
1
1
2
10
0 1138
n
S
=
= .
2
2
2
12
0 0202
n
S
=
= .F
SS
= = =1
2
2
2
01138
0 0202563
.
..
.Hreject 3.59, = F > 5.63 =F Since oc