two-sample problems – means

49
Two-Sample Problems – Means 1.Comparing two (unpaired) populations 2.Assume: 2 SRSs, independent samples, Normal populations Make an inference for their difference: 2 1 Sample from population 1: 1 1 1 , , s x n Sample from population 2: 2 2 2 , , s x n 1

Upload: kosey

Post on 22-Feb-2016

72 views

Category:

Documents


0 download

DESCRIPTION

Two-Sample Problems – Means. Comparing two (unpaired) populations Assume: 2 SRSs, independent samples, Normal populations. Make an inference for their difference:. Sample from population 1:. Sample from population 2:. S.E. – standard error in the two-sample process. Confidence Interval:. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Two-Sample Problems – Means

1

Two-Sample Problems – Means1. Comparing two (unpaired) populations2. Assume: 2 SRSs, independent samples,

Normal populations

Make an inference for their difference: 21

Sample from population 1: 111 ,, sxn

Sample from population 2: 222 ,, sxn

Page 2: Two-Sample Problems – Means

2

S.E. – standard error in the two-sample process

..)( *21 EStxx

Confidence Interval: Estimate ± margin of error

testedbeing means population, 21 )(or 0: 21210 H

Significance Test:

2

22

1

21

2121

..0)(:StatTest

ns

ns

xxESxxt

2

22

1

21..

ns

nsES

)1,1 min( 21 nndf

Page 3: Two-Sample Problems – Means

3

Using the CalculatorConfidence Interval:

On calculator: STAT, TESTS, 0:2-SampTInt…

Given data, need to enter: Lists locations, C-Level

Given stats, need to enter, for each sample: x, s, n and then C-Level

Select input (Data or Stats), enter appropriate info, then Calculate

Page 4: Two-Sample Problems – Means

4

Using the CalculatorSignificance Test:

On calculator: STAT, TESTS, 4:2-SampT –Test…

Given data, need to enter: Lists locations, Ha

Given stats, need to enter, for each sample : x, s, nand then Ha

Select input (Data or Stats), enter appropriate info, then Calculate or Draw

Output: Test stat, p-value

Page 5: Two-Sample Problems – Means

5

Ex 1. Is one model of camp stove any different at boiling water than another at the 5% significance level?

::0

aHH

t:statTest

Model 1: 5.2,4.11,10 111 sxnModel 2: 0.3,9.9,12 222 sxn

:value p

Page 6: Two-Sample Problems – Means

6

Ex 2. Is there evidence that children get more REM sleep than adults at the 1% significance level?

::0

aHH

t:statTest

Children: 5.0,8.2,11 111 sxnAdults: 7.0,1.2,13 222 sxn

:value p

Page 7: Two-Sample Problems – Means

7

Ex 3. Create a 98% C.I for estimating the mean differencein petal lengths (in cm) for two species of iris.

Iris virginica: 55.0,48.5,35 111 sxnIris setosa: 21.0,49.1,38 222 sxn

:error ofmargin

:Interval-t

Page 8: Two-Sample Problems – Means

8

Ex 4. Is one species of iris any different at petal length than another at the 2% significance level?

::0

aHH

t:statTest

Iris virginica: 55.0,48.5,35 111 sxn

Iris setosa: 21.0,49.1,38 222 sxn

-2 0 1 2 3-3-4 4-1

:value p

Page 9: Two-Sample Problems – Means

9

Two-Sample Problems – Proportions

Make an inference for their difference: 21 pp

Sample from population 1: 111 ˆ,, pxn

Sample from population 2: 222 ˆ,, pxn

Page 10: Two-Sample Problems – Means

10

Using the CalculatorConfidence Interval:

On calculator: STAT, TESTS, B:2-PropZInt…

Need to enter: C-LevelEnter appropriate info, then Calculate.

..)ˆˆ( *21 ESzpp

Estimate ± margin of error

,, 11 xn ,, 22 xn

Page 11: Two-Sample Problems – Means

11

Using the Calculator

On calculator: STAT, TESTS, 6:2-PropZTest…

Need to enter: and then Ha

Enter appropriate info, then Calculate or Draw

Output: Test stat, p-value

testedbeing sproportion population, 21 pp

)(or 0: 21210 ppppH

Significance Test:

,, 11 xn ,, 22 xn

Page 12: Two-Sample Problems – Means

12

Ex 5. Create a 95% C.I for the difference in proportions of eggs hatched.Nesting boxes apart/hidden: )(270,478 11 hatchedxn

Nesting boxes close/visible: )(270,805 22 hatchedxn

:error ofmargin

:Interval-z

Page 13: Two-Sample Problems – Means

13

Ex 6. Split 1100 potential voters into two groups, those who get a reminder to register and those who do not.

Of the 600 who got reminders, 332 registered.Of the 500 who got no reminders, 248 registered.

Is there evidence at the 1% significance level that the proportion of potential voters who registered was greater than in the group that received reminders?

Group 1: 332,600 11 xn

Group 2: 248,500 22 xn

Page 14: Two-Sample Problems – Means

14

Ex 6. (continued)

::0

aHH

z:statTest

:value p

Page 15: Two-Sample Problems – Means

15

Ex 7. “Can people be trusted?”Among 250 18-25 year olds, 45 said “yes”.Among 280 35-45 year olds, 72 said “yes”.

Does this indicate that the proportion of trusting people is higher in the older population? Use a significance level of α = .05.

Group 1: 45,250 11 xn

Group 2: 72,280 22 xn

Page 16: Two-Sample Problems – Means

16

Ex 7. (continued)

::0

aHH

z:statTest

:value p

Page 17: Two-Sample Problems – Means

17

Scatterplots & Correlation

Each individual in the population/sample will have two characteristics looked at, instead of one.

Goal: able to make accurate predictions for one variable in terms of another variable based on a data set of paired values.

Page 18: Two-Sample Problems – Means

18

Variables

Explanatory (independent) variable, x, is used to predict a response.Response (dependent) variable, y, will be the outcome from a study or experiment.

height vs. weight, age vs. memory, temperature vs. sales

Page 19: Two-Sample Problems – Means

19

ScatterplotsPlot of paired values helps to determine if a

relationship exists.

Ex: variables – height(in), weight (lb)Height Weight

72 171

65 150

68 180

70 180

72 185

66 165 65 66 70 72

190

150

170

68

Page 20: Two-Sample Problems – Means

20

Scatterplots - FeaturesDirection: negative, positiveForm: line, parabola, wave(sine)Strength: how close to following a pattern

Direction:

65 66 70 72

190

150

170Form:

Strength:

Page 21: Two-Sample Problems – Means

21

Scatterplots – Temp vs Oil used

Direction:

20 30 70 90

45

25

35Form:

Strength:

Page 22: Two-Sample Problems – Means

22

CorrelationCorrelation, r, measures the strength of the linear

relationship between two variables.

r > 0: positive directionr < 0: negative direction

Close to +1:Close to -1:Close to 0:

Page 23: Two-Sample Problems – Means

23.85, -.02, .13, -.79

Page 24: Two-Sample Problems – Means

24

Lines - Review

y = a + bx

1 2 3 4

3

12

a:

b:

xy232

Page 25: Two-Sample Problems – Means

25

Regression

Looking at a scatterplot, if form seems linear, then use a linear model or regression line to describe how a response variable y changes as an explanatory variable changes.

Regression models are often used to predict the value of a response variable for a given explanatoryvariable.

Page 26: Two-Sample Problems – Means

26

Least-Squares Regression Line

The line that best fits the data:

where:

bxay ˆ

x

y

ss

rb

xbya

Page 27: Two-Sample Problems – Means

27

ExampleFat and calories for 11 fast food chicken sandwichesFat: Calories:

8.9,6.20 xsx2.144,7.472 ysy 947.r

Page 28: Two-Sample Problems – Means

ExampleFat and calories for 11 fast food chicken sandwichesFat:

Fat

Calories

Calories: 8.9,6.20 xsx

2.144,7.472 ysy947.r

28

Page 29: Two-Sample Problems – Means

29

Example-continuedxy 93.1365.185ˆ

What is the slope and what does it mean?

What is the intercept and what does it mean?

How many calories would you predict a sandwich with 40 grams of fat has?

Page 30: Two-Sample Problems – Means

30

Why “Least-squares”?The least-squares lines is the line that minimizesthe sum of the squared residuals.Residual: difference between actual and predicted

1 3

27

9

18

x y

1 10 14 -4

3 25 24 1

… … … …

yy ˆy

Page 31: Two-Sample Problems – Means

31

Scatterplots – Residuals

To double-check the appropriateness of using a linear regression model, plot residuals against the explanatory variable.

No unusual patterns means good linear relationship.

Page 32: Two-Sample Problems – Means

32

Other things to look forSquared correlation, r2, give the percent of

variation explained by the regression line.

947.rChicken data:

Page 33: Two-Sample Problems – Means

33

Other things to look forInfluential observations:

Prediction vs. Causation:x and y are linked (associated) somehow butwe don’t say “x causes y to occur”. Other forces may be

causing the relationship (lurking variables).

Page 34: Two-Sample Problems – Means

34

Extrapolation: using the regression for a prediction outside of the range of values for the explanatory variables.

age weight

20 180

25 190

32 190

36 200

36 225

40 215

47 220

15 20 25 30 35 40 45 50160

170

180

190

200

210

220

230

f(x) = 1.61262304574406 x + 148.488708743486R² = 0.715721941417884

yLinear (y)

age

weight

Page 35: Two-Sample Problems – Means

35

On calculatorSet up: 2nd 0(catalog), x-1(D), scroll down to “Diagnostic On”, Enter, Enter

Scatterplots: 2nd Y=(Stat Plot), 1, On, Select TypeAnd list locations for x values and y valuesThen, ZOOM, 9(Zoom Stat)

Regression: STAT, CALC, 8: LinReg (a + bx), enter, List location for x, list location for y, enterGraph: Y=, enter line into Y1

Page 36: Two-Sample Problems – Means

36

Examples:Cat Chick Dog Duck Goat Lion Bird Pig Bun

nySquirrel

x 63 22 63 28 151 108 18 115 31 44 Incubation, days

y 11 7.5 11 10 12 10 8 10 7 9 Lifespan, years

x 2 5 2 5 4 5 1 1 4 2 6 1 age, years

y 16 11 17 10 12 11 20 19 10 16 11 20 resale, thousands $

Page 37: Two-Sample Problems – Means

37

Contingency Tables

• Contingency tables summarize all outcomes – Row variable: one row for each possible value– Column variable: one column for each possible value– Each cell (i,j) describes number of individuals with those values

for the respective variables.

Making comparisons between two categorical variables

Age\Income <15 15-30 >30 Total

<21 5 3 1 9

21-25 4 9 6 19

>25 2 2 8 12

Total 11 14 15 40

Page 38: Two-Sample Problems – Means

38

• Info from the table– # who are over 25 and make under $15,000:

– % who are over 25 and make under $15,000:

– % who are over 25:

– % of the over 25 who make under $15,000:

Age\Income <15 15-30 >30 Total

<21 5 3 1 9

21-25 4 9 6 19

>25 2 2 8 12

Total 11 14 15 40

Page 39: Two-Sample Problems – Means

39

Marginal Distributions– Look to margins of tables for individual variable’s distribution

– Marginal distribution for age:

– Marginal distribution for income:

Age\Income <15 15-30 >30 Total<21 5 3 1 9

21-25 4 9 6 19>25 2 2 8 12

Total 11 14 15 40

Age Freq. Rel. Freq<21 9

21-25 19>25 12

Total 40

Income <15 15-30 >30 TotalFreq. 11 14 15 40

Rel. Freq.

Page 40: Two-Sample Problems – Means

40

Conditional Distributions– Look at one variable’s distribution given another– How does income vary over the different age groups?– Consider each age group as a separate population and compute

relative frequencies:Age\Income <15 15-30 >30 Total

<21 5 3 1 9

21-25 4 9 6 19

>25 2 2 8 12

Age\Income <15 15-30 >30 Total

<21

21-25

>25

Page 41: Two-Sample Problems – Means

41

Independence RevisitedTwo variables are independent if knowledge of onedoes not affect the chances of the other.

In terms of contingency tables, this means that the conditional distribution of one variable is (almost) the same for all values of the other variable.

In the age/income example, the conditionals are not even close. These variables are not independent. There is some association between age and income.

Page 42: Two-Sample Problems – Means

42

Test for IndependenceIs there an association between two variable?– H0: The variables are( The two variables are )– Ha: The variables(The two variables are )

Assuming independence:– Expected number in each cell (i, j):(% of value i for variable 1)x(% of j value for variable 2)x

(sample size) =

Page 43: Two-Sample Problems – Means

43

Example of Computing Expected ValuesRh\Blood A B AB O Total

+ 176 28 22 198 424- 30 12 4 30 76

Total 206 40 26 228 500

Expected number in cell (A, +):

Rh\Blood A B AB O Total+ 22.048 193.344 424- 3.952 34.656 76

Total 206 40 26 228 500

Page 44: Two-Sample Problems – Means

44

Chi-square statisticTo measure the difference between the observed

table and the expected table, we use the chi-square test statistic:

where the summation occurs for each cell in the table.

count expectedcount expected count observed 2

2

1. Skewed right2. df = (r – 1)(c – 1)3. Right-tailed test

Page 45: Two-Sample Problems – Means

45

Test for Independence – Steps State variables being tested

State hypotheses: H0, the null hypothesis, vars independentHa, the alternative, vars not independent

Compute test statistic: if the null hypothesis is true, where does the sample fall? Test stat = X2-score

Compute p-value: what is the probability of seeing a test stat as extreme (or more extreme) as that?

Conclusion: small p-values lead to strong evidence against H0.

Page 46: Two-Sample Problems – Means

46

ST – on the calculator

On calculator: STAT, TESTS, C:X2 –Test Observed: [A]Expected: [B]

Enter observed info into matrix A, then perform test with Calculate or Draw.

Output: Test stat, p-value, df

To enter observed info into matrix A: 2nd, x-1 (Matrix), EDIT, 1: A, change dimensions, enter info in each cell.

Page 47: Two-Sample Problems – Means

47

Ex . Test whether type and rh factor are independent at a 5% significance level.

::0

aHH

2:statTest :value p

:conclusion

Page 48: Two-Sample Problems – Means

48

Ex . Test whether age and stance on marijuana legalization are associated.

::0

aHH

2:statTest :value p:conclusion

stance\age 18-29 30-49 50- Totalfor 172 313 258 743

against 52 103 119 274Total 224 416 377 1017

Page 49: Two-Sample Problems – Means

49

Additional Examplespersonality\college Health Science Lib Arts Educator

extrovert 68 56 62 47introvert 94 81 45 66

Job grade\marital status Single Married Divorced1 58 874 152 222 3450 603 74 1204 93

City size\practice status Government Judicial Private Salaried<250,000 30 44 258 36

250-500,000 79 102 651 90>500,000 22 34 127 23