8_two samples test- updated- 1 july 14.pdf

Two sample Test

A.RameshDepartment of Management Studies

Indian institute of Technology Roorkee

Two-Sample Tests Overview

The means of two independent populations

The means of two related populations

The proportions of two independent populations

The variances of two independent populations

Two-Sample Tests Overview

Two Sample Tests

Independent

Population

Means

Means,

Related

Populations

Independent

Population

Variances

Group 1 vs. Group 2

Same group before vs. after treatment

Variance 1 vs.Variance 2

Examples

Independent

Population

Proportions

Proportion 1vs. Proportion 2

Two-Sample Tests

Independent

Population Means

1 and 2 known

1 and 2 unknown

Goal: Test hypothesis or form

a confidence interval for the

difference between two

population means, 1 2

The point estimate for the

difference between sample

means:

X1 X2

Sampling Distribution of the Difference Between

Two Sample Means

1

1

X n

x=

Population 1

Population 2

2

2

X n

x=

1 2X X

1X

2X

1 2X X

Sampling Distribution of the Difference

between Two Sample Means

1 2X X1 2X X

1 2

1

2

1

2

2

2X X n n

= +

1 2

1 2X X=

Z Formula for the Difference in Two Sample

Means

( ) ( )Z

X X

n n

=

+

1 2 1 2

1

2

1

2

2

2

n1 30, n2 30, and Independent Samples

Confidence Interval to Estimate 1 - 2 When

n1 and n2 are large and 1, 2 are unknown

( ) ( )1 2

1

2

1

2

2

21 2 1 2

1

2

1

2

2

2

X XSn

Sn X X

Sn

Sn

Z Z + +

( ) ( )Pr [ ]ob Z ZX X SnSn X X

Sn

Sn1 2

1

2

1

2

2

21 2 1 2

1

2

1

2

2

2

1 + + =

Two-Sample Tests

Independent Populations

Lower-tail test:

H0: 1 2H1: 1 < 2

i.e.,

H0: 1 2 0

H1: 1 2 < 0

Upper-tail test:

H0: 1 2H1: 1 > 2

i.e.,

H0: 1 2 0

H1: 1 2 > 0

Two-tail test:

H0: 1 = 2H1: 1 2

i.e.,

H0: 1 2 = 0

H1: 1 2 0

Two Independent Populations, Comparing Means

Two-Sample Tests


Two Independent Populations, Comparing Means

Lower-tail test:

H0: 1 2 0

H1: 1 2 < 0

Upper-tail test:

H0: 1 2 0

H1: 1 2 > 0

Two-tail test:

H0: 1 2 = 0

H1: 1 2 0

/2 /2

-z -z/2z z/2

Reject H0 if Z < -Za Reject H0 if Z > Za Reject H0 if Z < -Za/2or Z > Za/2

Problem 1: Two Sample Z test

A random sample of 32 advertising managers from across the United States is taken. The advertising managers are contacted by telephone and asked what their annual salary is.

A similar random sample is taken of 34 auditing managers. The resulting salary data are listed in Table , along with the sample means, the population standard deviations, and the population variances.

Hypothesis Testing for Differences Between

Means: The Wage Example

Advertising Managers

74.256 57.791 71.115

96.234 65.145 67.574

89.807 96.767 59.621

93.261 77.242 62.483

103.030 67.056 69.319

74.195 64.276 35.394

75.932 74.194 86.741

80.742 65.360 57.351

39.672 73.904

45.652 54.270

93.083 59.045

63.384 68.508

164.264

253.16

700.70

32

2

1

1

1

1

=

=

=

=

S

S

X

n

411.166

900.12

187.62

34

2

2

2

2

2

=

=

=

=

S

S

X

n

Auditing Managers

69.962 77.136 43.649

55.052 66.035 63.369

57.828 54.335 59.676

63.362 42.494 54.449

37.194 83.849 46.394

99.198 67.160 71.804

61.254 37.386 72.401

73.065 59.505 56.470

48.036 72.790 67.814

60.053 71.351 71.492

66.359 58.653

61.261 63.508

Hypothesis Testing for Differences Between Means:

The Wage Example

1 2X X

RejectionRegion

Non Rejection Region

Critical Values

RejectionRegion

1 2X X

025.2

=

025.2

=

H

H

o

a

:

:

1 2

1 2

0

0

=

Hypothesis Testing for Differences Between Means:

The Wage Example

.Hreject not do 1.96, Z 1.96- If

.Hreject 1.96, > or Z 1.96- < ZIf

o

o

RejectionRegion


Critical Values

RejectionRegion

96.1=Zc0 96.1=Zc

025.2

=025.

2=

Hypothesis Testing for Differences between

Means: The Wage Example

( ) ( )

( ) ( )35.2

34

411.166

32

253.256

0187.62700.70

2

2

2

1

2

1

2121

=

+

=

+

=

nS

nS

XXZ

.Hreject not do 1.96, Z 1.96- If

.Hreject 1.96, > or Z 1.96- < ZIf

o

o

.Hreject 1.96, > 2.35 = ZSince o

RejectionRegion


Critical Values

RejectionRegion

cZ = 233.

025.2

=

0 cZ = 233.

025.2

=

Problem 2: Two Sample Z test Greystone Department Stores, Inc., operates two stores in

Buffalo, New York: One is in the inner city and the other is in

a suburban shopping center.

The regional manager noticed that products that sell well in

one store do not always sell well in the other.

The manager believes this situation may be attributable to

differences in customer demographics at the two locations.

Customers may differ in age, education, income, and so on.

Suppose the manager asks us to investigate the difference

between the mean ages of the customers who shop at the two

stores.

Data

1 = 10 and 2 = 10

= .05

n1 = 30

n2 = 40

X1 bar = 82 and x2 bar= 78.

Solution

The margin of error is 4.06 years and the 95%

confidence interval estimate of the difference

between the two population means is 5 - 4.06= .94

years to 5 - 4.06 = 9.06 years.

Do not reject Ho.

Two-Sample Tests

Independent Populations: 1 and 2 unknown

Independent

Population Means

1 and 2 known

1 and 2 unknown

Assumptions:

Samples are randomly andindependently drawn

Populations are normally

distributed

Population variances are

unknown but assumed equal

Two-Sample Tests


Independent

Population Means

1 and 2 known

1 and 2 unknown

Forming interval estimates:

The population variances

are assumed equal, so use

the two sample standard

deviations and pool them to

estimate

the test statistic is a t value

with (n1 + n2 2) degrees

of freedom

The t Test for Differences in Population Means

Each of the two populations is normally distributed.

The two samples are independent.

At least one of the samples is small, n < 30.

The values of the population variances are unknown.

The variances of the two populations are equal. 1

2 = 22

t Formula to Test the Difference in Means

Assuming 12 = 2

2

( ) ( )( ) ( )

tX X

S n S nn n n n

=

+

+ +

1 2 1 2

1

2

1 2

2

2

1 2 1 2

1 1

2

1 1

Problem 1: Independent Populations and 1and 2 unknown and equal

At the Hernandez Manufacturing Company, an application of this test arises.

New employees are expected to attend a three-day seminar to learn about the company. At the end of the seminar, they are tested to measure their knowledge about the company.

The traditional training method has been lecture and a question-and-answer session. Management decided to experiment with a different training procedure, which processes new employees in two days by using DVDs and having no question-and-answer session.

If this procedure works, it could save the company thousands of dollars over a period of several years. However, there is some concern about the effectiveness of the two-day method, and company managers would like to know whether there is any difference in the effectiveness of the two training methods.

Hernandez Manufacturing Company: Test

Scores for New Employees After Training

Training Method A

56 51 45

47 52 43

42 53 52

50 42 48

47 44 44

Training Method B

59

52

53

54

57

56

55

64

53

65

53

57

1

1

1

2

15

47 73

19 495

n

X

S

=

=

=

.

.

2

2

2

2

12

56 5

18 273

n

X

S

=

=

=

.

.

Hernandez Manufacturing Company

H

H

o

a

:

:

1 2

1 2

0

0

=

If t < - 2.060 or t > 2.060, reject H .

If - 2.060 t 2.060, do not reject H .

o

o

2

05

2025

2 15 12 2 25

2 060

1 2

0 25 25

= =

= + = =

=

..

.. ,

df n n

t

RejectionRegion


Critical Values

RejectionRegion

2025=.

0 . , .025 25 2060t =

2025=.

. ,.

025 252060t =

Hernandez Manufacturing Company

Since t = -5.20 < -2.060, reject H .o

( ) ( )( ) ( )

( )

( )( ) ( )( )

tX X

S n S nn n n n

=

+

+ +

=

+

+ +

=

1 2 1 2

1

2

1 2

2

2

1 2 1 2

1 1

2

1 1

47 73 56 50 0

19 495 14 18 273 11

15 12 2

1

15

1

12

5 20

. .

. .

.

If t < -2.060 or t > 2.060, reject H .

If -2.060 t 2.060, do not reject H .

o

o

Confidence Interval to Estimate 1 -

2 with Small Samples and 12 = 2

2

( )( ) ( )

1 2

1

2

1 2

2

2

1 2 1 2

1 2

1 1

2

1 1

2

X XS n S n

n n n n

n n

t

where df

+

+ +

= +

Problem 2: Independent Populations and 1and 2 unknown and equal

You are a financial analyst for a brokerage firm. Is there a difference in dividend yield between stocks listed on the NYSE & NASDAQ? You collect the following data:

NYSE NASDAQ (National Association of Securities Dealers Automated Quotations.)

Number 21 25

Sample mean 3.27 2.53

Sample std dev 1.30 1.16

Assuming both populations are approximately normal with equal

variances, is there a difference in average yield ( = 0.05)?

Solution

H0: 1 - 2 = 0 i.e. (1 = 2)

H1: 1 - 2 0 i.e. (1 2)

Two-Sample Tests


( ) ( ) ( ) ( )1.5021

1)25(1)-(21

1.161251.30121

1)n()1(n

S1nS1nS

22

21

2

22

2

112p =

+

+=

+

+=

( ) ( ) ( )2.040

25

1

21

15021.1

02.533.27

n

1

n

1S

XXt

21

2p

2121 =

+

=

+

=

The test statistic is:

Two-Sample Tests


H0: 1 - 2 = 0 i.e. (1 = 2)

H1: 1 - 2 0 i.e. (1 2)

= 0.05

df = 21 + 25 - 2 = 44

Critical Values: t = 2.0154

Test Statistic: 2.040

t0 2.0154-2.0154

.025

Reject H0 Reject H0

.025

Decision: Reject H0 at = 0.05

2.040

Conclusion: There is evidence

of a difference in the means.

Two-Sample Tests: Dependent Samples

Before and After Measurements on the same individual

Studies of twins

Studies of spouses

Individual

1

2

3

4

5

6

7

Before

32

11

21

17

30

38

14

After

39

15

35

13

41

39

22

Two-Sample Tests

Related Populations

D = X1 - X2

Tests Means of 2 Related Populations

Paired or matched samples

Repeated measures (before/after)

Use difference between paired values:

Assumptions:

Both Populations Are Normally Distributed

Two-Sample Tests

Related Populations

The ith paired difference is Di , where

n

D

D

n

1i

i==

Di = X1i - X2iThe point estimate for the population mean

paired difference is D :

Two-Sample Tests

Related Populations

Suppose the population standard deviation of

the difference scores, D, is known.

The test statistic for the mean difference is a Z value:

n

DZ

D

D=

Where

D = hypothesized mean difference

D = population standard deviation of differences

n = the sample size (number of pairs)

Two-Sample Tests

Related Populations

If D is unknown, you can estimate the unknown population standard deviation with a sample standard deviation:

1n

)D(D

S

n

1i

2

i

D

=

=

Two-Sample Tests

Related Populations

1n

)D(D

S

n

1i

2i

D

=

=

n

S

Dt

D

D=

The test statistic for D is now a t statistic:

Where t has n - 1 d.f.

and SD is:

Two-Sample Tests

Related Populations

Lower-tail test:

H0: D 0

H1: D < 0

Upper-tail test:

H0: D 0

H1: D > 0

Two-tail test:

H0: D = 0

H1: D 0

/2 /2

-t -t/2t t/2

Reject H0 if t < -ta Reject H0 if t > ta Reject H0 if t < -ta/2or t > ta/2

Problem 1: Two-Sample Tests

Related Populations

Assume you send your salespeople to a customer service training workshop. Has the training made a difference in the number of complaints? You collect the following data:

Salesperson Number of Complaints Difference, Di

(2-1)Before (1) After (2)

C.B. 6 4 -2

T.F. 20 6 -14

M.H. 3 2 -1

R.K. 0 0 0

M.O 4 0 -4

Two-Sample Tests

Related Populations Example

2.4n

D

D

n

1i

i

==

=

5.67

1n

)D(DS

2

i

D

=

=

Salesperson Number of Complaints Difference, Di

(2-1)Before (1) After (2)

C.B. 6 4 -2

T.F. 20 6 -14

M.H. 3 2 -1

R.K. 0 0 0

M.O 4 0 -4

Two-Sample Tests


Has the training made a difference in the number of complaints (at the = 0.01 level)?

H0: D = 0

H1: D 0Critical Value = 4.604

d.f. = n - 1 = 4

Test Statistic:

1.6655.67/

04.2

n/S

t

D

D =

=

=D

Two-Sample Tests


Reject

- 4.604 4.604

Reject

/2

- 1.66

Decision: Do not reject H0(t statistic is not in the reject region)

Conclusion: There is no

evidence of a significant change

in the number of complaints

/2

Two-Sample Tests

Related Populations

The confidence interval for D (known) is:

n

DZD

Where

n = the sample size (number of pairs in the paired sample)

Two-Sample Tests

Related Populations

The confidence interval for D ( unknown) is:

1n

)D(D

S

n

1i

2

i

D

=

=

n

StD D1n

where

Sampling Distribution of Differences

in Sample ProportionsFor large sam ples

1.

2.

3. and

4. where q = 1 - p

the difference in sam ple proportions is norm ally distributed with

p and

p

1

1

2

2

n

n

n

n

1

1

>

>

>

>

=

=

+

1

1

2

2

1 2

1 1

1

2 2

2

5

5

5

5

2

2

,

,

,

p

q

p

q

P P

P Q

nP Q

n

p

p

Z Formula for the Difference

in Two Population Proportions

( ) ( )Z

p p P P

P Q

nP Q

n

p

p

=

+

=

=

=

=

=

=

=

=

1 2 1 2

1 1

1

2 2

2

1

2

proportion from sam ple 1

proportion from sam ple 2

size of sam ple 1

size of sam ple 2

proportion from population 1

proportion from population 2

1 -

1 -

1

2

1

2

1 1

2 2

n

n

P

P

Q P

Q P

Z Formula to Test the Difference

in Population Proportions

( ) ( )

( )

Z

P Q

P

Q P

p p P P

n n

X Xn n

n p n p

n n

=

+

=+

+

=+

+

=

1 2 1 2

1 2

1 2

1 2

1 1 2 2

1 2

1 1

1

Two Population Proportions

Hypothesis for Population Proportions

Lower-tail test:

H0: 1 2H1: 1 < 2

i.e.,

H0: 1 2 0

H1: 1 2 < 0

Upper-tail test:

H0: 1 2H1: 1 > 2

i.e.,

H0: 1 2 0

H1: 1 2 > 0

Two-tail test:

H0: 1 = 2H1: 1 2

i.e.,

H0: 1 2 = 0

H1: 1 2 0

Two Population Proportions

Hypothesis for Population Proportions

Lower-tail test:

H0: 1 2 0

H1: 1 2 < 0

Upper-tail test:

H0: 1 2 0

H1: 1 2 > 0

Two-tail test:

H0: 1 2 = 0

H1: 1 2 0

/2 /2

-z -z/2z z/2

Reject H0 if Z < -Z Reject H0 if Z > Z Reject H0 if Z < -Z/2or Z > Z/2

Two Independent Population

Proportions: Example

Is there a significant difference between the proportion of men and the proportion of women who will vote Yes on Proposition A?

In a random sample of 72 men, 36 indicated they would vote Yes and, in a sample of 50 women, 31 indicated they would vote Yes

Test at the .05 level of significance



H0: 1 2 = 0 (the two proportions are equal)

H1: 1 2 0 (there is a significant difference between proportions)

The sample proportions are:

Men: p1 = 36/72 = .50

Women: p2 = 31/50 = .62

The pooled estimate for the overall proportion is:

.549122

67

5072

3136

nn

XXp

21

21 ==+

+=

+

+=



The test statistic for 1 2 is:

( ) ( )

( ) ( )1.31

50

1

72

1.549)(1.549

0.62.50

n

1

n

1)p(1p

z

21

2121

=

+

=

+

=

pp

Critical Values = 1.96For = .05

.025

-1.96 1.96

.025

-1.31

Decision: Do not reject H0

Conclusion: There is no evidence of a

significant difference in proportions who

will vote yes between men and women.

Reject H0 Reject H0


Proportions

( )2

22

1

1121

n

)(1

n

)(1 ppppZpp

+

The confidence interval for 1 2 is:

F Test for Two Population Variances

1

1

22min

11

2

2

2

1

==

==

=

n

n

SS

atordeno

numerator

df

df

F

F Distribution with 1 = 10 and 2 = 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.00 1.00 2.00 3.00 4.00 5.00 6.00

A Portion of the F Distribution Table

for = 0.025

Numerator Degrees of Freedom

DenominatorDegrees of Freedom

. , ,025 9 11F

1 2 3 4 5 6 7 8 9

1 647.79 799.48 864.15 899.60 921.83 937.11 948.20 956.64 963.28

2 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.39

3 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.47

4 12.22 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.90

5 10.01 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68

6 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.52

7 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.82

8 7.57 6.06 5.42 5.05 4.82 4.65 4.53 4.43 4.36

9 7.21 5.71 5.08 4.72 4.48 4.32 4.20 4.10 4.03

10 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.78

11 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.59

12 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.44

Testing Population Variances

Purpose: To determine if two independent

populations have the same variability.

H0: 12 = 2

2

H1: 12 2

2

H0: 12 2

2

H1: 12 < 2

2

H0: 12 2

2

H1: 12 > 2

2

Two-tail test Lower-tail test Upper-tail test

Suppose a machine produces metal sheets that are specified to be 22 milli meters thick.

Because of the machine, the operator, the raw material, the manufacturing environment, and other factors, there is variability in the thickness.

Two machines produce these sheets. Operators are concerned about the consistency of the two machines. To test consistency, they randomly sample 10 sheets produced by machine 1 and 12 sheets produced by machine 2.

The thickness measurements of sheets from each machine are given in the table on the following page. Assume sheet thickness is normally distributed in the population.

How can we test to determine whether the variance from each sample comes from the same population variance (population variances are equal) or from different population variances (population variances are not equal)?

Sheet Metal Example: Hypothesis Test for

Equality of Two Population Variances

H

H

o

a

:

:

1

2

2

2

1

2

2

2

=

.025,9,11F =359.

If

If

F 3.59 reject H.

0.28 F do reject H.

o

o

,

. , 359

= .975,11,9

.025,9,11

FF

1

1

359

028

=

=

.

.

F

df

df

SS

n

n

numerator

deno ator

=

= =

= =

1

2

2

2

1 1

2 2

1

1

min

=

=

=

005

10

12

1

2

.

n

n

Sheet metal Manufacturer

Rejection Regions

Critical Values

. , ,.

025 9 113 59F =

Non RejectionRegion

. , ,.

975 11 90 28F =

If

If

F 3.59 reject H.

0.28 F do reject H.

o

o

,

. , 359

Sheet Metal Example

Machine 1

22.3 21.8 22.2

21.8 21.9 21.6

22.3 22.4

21.6 22.5

Machine 2

22.0

22.1

21.8

21.9

22.2

22.0

21.7

21.9

22.0

22.1

21.9

22.1

1

1

2

10

0 1138

n

S

=

= .

2

2

2

12

0 0202

n

S

=

= .F

SS

= = =1

2

2

2

01138

0 0202563

.

..

.Hreject 3.59, = F > 5.63 =F Since oc

8_two samples test- updated- 1 july 14.pdf

Documents