reading: course-pack chapters 17 – 18, 23 - 26 –sampling distribution for proportions and means...

72
Reading: Course-Pack Chapters 17 – 18, 23 - 26 SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS HYPOTHESES TESTINGS FOR PROPORTIONS AND MEANS 1

Upload: marian-gibson

Post on 12-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

Reading: Course-Pack Chapters 17 – 18, 23 - 26

– SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS

– CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS

– HYPOTHESES TESTINGS FOR PROPORTIONS AND MEANS

1

Page 2: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

2

SAMPLING DISTRIBUTION MODELS

• SAMPLING DISTRIBUTION MODEL FOR A PROPORTION

PROBLEM FORMULATION: SUPPOSE THAT p IS AN UNKNOWN PROPORTION OF ELEMENTS OF A CERTAIN TYPE S IN A POPULATION.

EXAMPLES• PROPORTION OF LEFT - HANDED PEOPLE;• PROPORTION OF HIGH SCHOOL STUDENTS WHO

ARE FAILING A READING TEST;• PROPORTION OF VOTERS WHO WILL VOTE FOR

MR. X.

Page 3: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

3

ESTIMATION OF p

• TO ESTIMATE p, WE SELECT A SIMPLE RANDOM SAMPLE (SRS), OF SIZE SAY, n = 1000, AND COMPUTE THE SAMPLE PROPORTION.

• SUPPOSE THE NUMBER OF THE TYPE WE ARE INTERESTED IN, IN THIS SAMPLE OF n = 1000 IS x = 437. THEN THE SAMPLE PROPORTION

IS COMPUTED USING THE FORMULA

n

xp ˆ

Page 4: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

4

IN THE EXAMPLE ABOVE

%7.431000

437ˆ p

Page 5: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

5

WHAT IS THE ERROR OF ESTIMATION?

• THAT IS, WHAT IS

• WHAT MODEL CAN HELP US FIND THE BEST ESTIMATE OF THE TRUE PROPORTION OF p?

• LET’S START THE ANALYSIS BY FIRST ANSWERING THE SECOND QUESTION.

?ˆ pp

Page 6: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

6

APPROACH

• SUPPOSE THAT WE TAKE A SECOND SAMPLE OF SIZE 1000 AND COMPUTE P(HAT); CLEARLY, THE NEW ESTIMATE WILL BE DIFFERENT FROM 0.437. NOW, TAKE A THIRD SAMPLE, A FOURTH SAMPLE, UNTIL THE TWO THOUSANDTH (2000 –TH) SAMPLE, EACH OF SIZE 1000. IT IS OBVIOUS THAT WE WILL LIKELY OBTAIN TWO THOUSAND DIFFERENT P(HATS) AS ILLUSTRATED IN THE TABLE BELOW.

Page 7: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

7

TABLE OF 2000 SAMPLES OF SIZE EACH n=1000, AND THEIR CORRESPONDING P(HATS)

SAMPLES OF SIZE n P(HATS)

… …

1n 1p̂

2n 2p̂

2000n 2000p̂

Page 8: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

8

WHAT DO WE DO WITH THE DATA FOR P(HATS)?

• WE CONSTRUCT A HISTOGRAM OF THESE 2000 P(HATS).# OF SAMPLES

P(HATS)p

Page 9: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

9

• THE HISTOGRAM ABOVE IS AN EXAMPLE OF WHAT WE WOULD GET IF WE COULD SEE ALL THE PROPORTIONS FROM ALL POSSIBLE SAMPLES. THAT DISTRIBUTION HAS A SPECIAL NAME. IT IS CALLED THE SAMPLING DISTRIBUTION OF THE PROPORTIONS.

• OBSERVE THAT THE HISTOGRAM IS UNIMODAL, ROUGHLY SYMMETRIC, AND IT’S CENTERED AT P.

Page 10: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

10

WHAT DOES THE SHAPE OF THE HISTOGRAM REMIND US ABOUT A MODEL THAT MAY JUST BE THE RIGHT ONE FOR SAMPLE PROPORTIONS?

• ANSWER: IT IS AMAZING AND FORTUNATE THAT A NORMAL MODEL IS JUST THE RIGHT ONE FOR THE HISTOGRAMS OF SAMPLE PROPORTIONS.

• HOW GOOD IS THE NORMAL MODEL?– IT IS GOOD IF THE FOLLOWING

ASSUMPTIONS AND CONDITIONS HOLD.

Page 11: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

11

ASSUMPTIONS AND CONDITIONS

• ASSUMPTIONS• INDEPENDENCE ASSUMPTION: THE

SAMPLED VALUES MUST BE INDEPENDENT OF EACH OTHER.

• SAMPLE SIZE ASSUMPTION: THE SAMPLE SIZE, n, MUST BE LARGE ENOUGH

• REMARK: ASSUMPTIONS ARE HARD – OFTEN IMPOSSIBLE TO CHECK. THAT’S WHY WE ASSUME THEM. GLADLY, SOME CONDITIONS MAY PROVIDE INFORMATION ABOUT THE ASSUMPTIONS.

Page 12: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

12

CONDITIONS

• RANDOMIZATION CONDITION: THE DATA VALUES MUST BE SAMPLED RANDOMLY. IF POSSIBLE, USE SIMPLE RANDOM SAMPLING DESIGN TO SAMPLE THE POPULATION OF INTEREST.

• 10% CONDITION: THE SAMPLE SIZE, n, MUST BE NO LARGER THAN 10% OF THE POPULATION OF INTEREST.

• SUCCESS/FAILURE CONDITION: THE SAMPLE SIZE HAS TO BE BIG ENOUGH SO THAT WE EXPECT AT LEAST 10 SUCCESSES AND AT LEAST 10 FAILLURES. THAT IS,

)(10

)(10

FAILLUREnq

SUCCESSnp

Page 13: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

13

THE CENTRAL LIMIT THEOREM FOR THE SAMPLING DISTRIBUTION OF A PROPORTION

• FOR A LARGE SAMPLE SIZE n, THE SAMPLING DISTRIBUTION OF P(HAT) IS APPROXIMATELY

THAT IS, P(HAT) IS NORMAL WITH

n

pqpDEVIATIONSTANDARD

ppEMEAN

)ˆ(

)ˆ(

n

pqpN ,

Page 14: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

14

EXAMPLE 1

• ASSUME THAT 30% OF STUDENTS AT A UNIVERSITY WEAR CONTACT LENSES

• (A) WE RANDOMLY PICK 100 STUDENTS. LET P(HAT) REPRESENT THE PROPORTION OF STUDENTS IN THIS SAMPLE WHO WEAR CONTACTS. WHAT’S THE APPROPRIATE MODEL FOR THE DISTRIBUTION OF P(HAT)? SPECIFY THE NAME OF THE DISTRIBUTION, THE MEAN, AND THE STANDARD DEVIATION. BE SURE TO VERIFY THAT THE CONDITIONS ARE MET.

• (B) WHAT’S THE APPROXIMATE PROBABILITY THAT MORE THAN ONE THIRD OF THIS SAMPLE WEAR CONTACTS?

Page 15: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

15

SOLUTION TO EXAMPLE 1

Page 16: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

16

EXAMPLE 2

• INFORMATION ON A PACKET OF SEEDS CLAIMS THAT THE GERMINATION RATE IS 92%. WHAT’S THE PROBABILITY THAT MORE THAN 95% OF THE 160 SEEDS IN THE PACKET WILL GERMINATE? BE SURE TO DISCUSS YOUR ASSUMPTIONS AND CHECK THE CONDITIONS THAT SUPPORT YOUR MODEL.

• SOLUTION

Page 17: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

17

SAMPLING DISTRIBUTION OF THE SAMPLE MEAN

APPROACH FOR ESTIMATING

SAME AS FOR SAMPLING DISTRIBUTION FOR PROPORTIONS ILLUSTRATED ABOVE

X

n

xxxxTHATRECALL n

...21

X

Page 18: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

18

ASSUMPTIONS AND CONDITIONS

• ASSUMPTIONS• INDEPENDENCE ASSUMPTION: THE SAMPLED

VALUES MUST BE INDEPENDENT OF EACH OTHER

• SAMPLE SIZE ASSUMPTION: THE SAMPLE SIZE MUST BE SUFFICIENTLY LARGE.

• REMARK: WE CANNOT CHECK THESE DIRECTLY, BUT WE CAN THINK ABOUT WHETHER THE INDEPENDENCE ASSUMPTION IS PLAUSIBLE.

Page 19: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

19

CONDITIONS

• RANDOMIZATION CONDITION: THE DATA VALUES MUST BE SAMPLED RANDOMLY, OR THE CONCEPT OF A SAMPLING DISTRIBUTION MAKES NO SENSE. IF POSSIBLE, USE SIMPLE RANDOM SAMPLING DESIGN TO ABTAIN THE SAMPLE.

• 10% CONDITION: WHEN THE SAMPLE IS DRAWN WITHOUT REPLACEMENT (AS IS USUALLY THE CASE), THE SAMPLE SIZE, n, SHOULD BE NO MORE THAN 10% OF THE POPULATION.

• LARGE ENOUGH SAMPLE CONDITION: IF THE POPULATION IS UNIMODAL AND SYMMETRIC, EVEN A FAIRLY SMALL SAMPLE IS OKAY. IF THE POPULATION IS STRONGLY SKEWED, IT CAN TAKE A PRETTY LARGE SAMPLE TO ALLOW USE OF A NORMAL MODEL TO DESCRIBE THE DISTRIBUTION OF SAMPLE MEANS

Page 20: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

20

CENTRAL LIMIT THEOREM FOR THE SAMPLING DISTRIBUTION FOR MEANS

• FOR A LARGE ENOUGH SAMPLE SIZE, n, THE SAMPLING DISTRIBUTION OF THE SAMPLE MEAN IS APPROXIMATELY

• THAT IS, NORMAL WITH

X

n

N,

deviationdardspopulation

nxDEVIATIONSTANDARD

meanpopulationxEMEAN

tan

)(

)(

Page 21: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

21

EXAMPLE 3

• SUPPOSE THE MEAN ADULT WEIGHT, , IS 175 POUNDS WITH STANDARD DEVIATION, , OF 25 POUNDS. AN ELEVATOR HAS A WEIGHT LIMIT OF 10 PERSONS OR 2000 POUNDS. WHAT IS THE PROBABILITY THAT 10 PEOPLE WHO GET ON THE ELEVATOR OVERLOAD ITS WEIGHT LIMIT?

• SOLUTION

Page 22: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

22

EXAMPLE 4

• STATISTICS FROM CORNELL’S NORTHEAST REGIONAL CLIMATE CENTER INDICATE THAT ITHACA, NY, GETS AN AVERAGE OF 35.4 INCHES OF RAIN EACH YEAR, WITH A STANDARD DEVIATION OF 4.2 INCHES. ASSUME THAT A NORMAL MODEL APPLIES.

• (A) DURING WHAT PERCENTAGE OF YEARS DOES ITHACA GET MORE THAN 40 INCHES OF RAIN?

• (B) LESS THAN HOW MUCH RAIN FALLS IN THE DRIEST 20% OF ALL YEARS?

• (C) A CORNELL UNIVERSITY STUDENT IS IN ITHACA FOR 4 YEARS. LET y (bar) REPRESENT THE MEAN AMOUNT OF RAIN FOR THOSE 4 YEARS. DESCRIBE THE SAMPLING DISTRIBUTION MODEL OF THIS SAMPLE MEAN, y (bar).

• (D) WHAT’S THE PROBABILITY THAT THOSE 4 YEARS AVERAGE LESS THAN 30 INCHES OF RAIN?

Page 23: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

23

SOLUTION TO EXAMPLE 4

Page 24: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

24

CONFIDENCE INTERVALS FOR PROPORTIONS

ESTIMATIONPOINT ESTIMATION PRODUCES A

NUMBER (AN ESTIMATE) WHICH IS BELIEVED TO BE CLOSE TO THE VALUE OF UNKNOWN PARAMETER.

FOR EXAMPLE: A CONCLUSION MAYBE THAT “PROPORTION P OF LEFT-HANDED STUDENTS IN MSU IS

APPROXIMATELY O.46”

Page 25: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

25

SOME POINT ESTIMATORS

PARAMETER ESTIMATOR

PROPORTION P

MEAN

STANDARD DEVIATION

S

X

Page 26: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

26

INTERVAL ESTIMATION

• PRODUCES AN INTERVAL THAT CONTAINS THE ESTIMATED PARAMETER WITH A PRESCRIBED CONFIDENCE.

• A CONFIDENCE INTERVAL OFTEN HAS THE FORM:

)(MEERROROFMARGINESTIMATEPOINT

Page 27: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

27

DEFINITION

• GIVEN A CONFIDENCE LEVEL C%, THE CRITICAL VALUE IS THE NUMBER SO THAT THE AREA UNDER THE PROPER CURVE AND BETWEEN IS C (IN DECIMALS).

*C

** CANDC

Page 28: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

28

SOME CRITICAL VALUES FOR STANDARD NORMAL DISTRIBUTION

C % CONFIDENCE LEVEL

CRITICAL VALUE

80% 1.282

90% 1.645

95% 1.960

98% 2.326

99% 2.576

*Z

Page 29: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

29

WHAT DOES C% CONFIDENCE REALLY MEAN?

• FORMALLY, WHAT WE MEAN IS THAT C% OF SAMPLES OF THIS SIZE WILL PRODUCE CONFIDENCE INTERVALS THAT CAPTURE THE TRUE PROPORTION.

• C% CONFIDENCE MEANS THAT ON AVERAGE, IN C OUT OF 100 ESTIMATIONS, THE INTERVAL WILL CONTAIN THE TRUE ESTIMATED PARAMETER.

• E.G. A 95% CONFIDENCE MEANS THAT ON THE AVERAGE, IN 95 OUT OF 100 ESTIMATIONS, THE INTERVAL WILL CONTAIN THE TRUE ESTIMATED PARAMETER.

Page 30: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

30

CONFIDENCE INTERVAL FOR PROPORTION P [ONE-PROPORTION Z-INTERVAL]

ASSUMPTIONS AND CONDITIONS• RANDOMIZATION CONDITION

• 10% CONDITION

• SAMPLE SIZE ASSUMPTION OR SUCCESS/FAILURE CONDITION

• INDEPENDENCE ASSUMPTION• NOTE: PROPER RANDOMIZATION CAN HELP

ENSURE INDEPENDENCE.

Page 31: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

31

CONSTRUCTING CONFIDENCE INTERVALS

ESTIMATOR SAMPLE PROPORTION

STANDARD ERROR

C% MARGIN OF ERROR

C% CONFIDENCE INTERVAL

n

qpPSE

ˆˆ)ˆ(

)ˆ()ˆ( * pSEzpME

)ˆ(ˆ pMEp

Page 32: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

32

SAMPLE SIZE NEEDED TO PRODUCE A CONFIDENCE INTERVAL WITH A GIVEN MARGIN OF ERROR, ME

SOLVING FOR n GIVES

WHERE IS A REASONABLE GUESS. IF WE CANNOT MAKE A GUESS, WE TAKE

n

qpzpME

ˆˆ)ˆ( *

2

2*

)(

ˆˆ)(

ME

qpzn

qANDp ˆˆ5.0ˆˆ qp

Page 33: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

33

EXAMPLE 1A MAY 2002 GALLUP POLL FOUND THAT ONLY 8% OF A

RANDOM SAMPLE OF 1012 ADULTS APPROVED OF ATTEMPTS TO CLONE A HUMAN.

(A) FIND THE MARGIN OF ERROR FOR THIS POLL IF WE WANT 95% CONFIDENCE IN OUR ESTIMATE OF THE PERCENT OF AMERICAN ADULTS WHO APPROVE OF CLONING HUMANS.

(B) EXPLAIN WHAT THAT MARGIN OF ERROR MEANS.

(C) IF WE ONLY NEED TO BE 90% CONFIDENT, WILL THE MARGIN OF ERROR BE LARGER OR SMALLER? EXPLAIN.

(D) FIND THAT MARGIN OF ERROR.

(E) IN GENERAL, IF ALL OTHER ASPECTS OF THE SITUATION REMAIN THE SAME, WOULD SMALLER SAMPLES PRODUCE SMALLER OR LARGER MARGINS OF ERROR?

Page 34: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

34

SOLUTION

Page 35: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

35

EXAMPLE 2

DIRECT MAIL ADVERTISERS SEND SOLICITATIONS (a.k.a. “junk mail”) TO THOUSANDS OF POTENTIAL CUSTOMERS IN THE HOPE THAT SOME WILL BUY THE COMPANY’S PRODUCT. THE RESPONSE RATE IS USUALLY QUITE LOW. SUPPOSE A COMPANY WANTS TO TEST THE RESPONSE TO A NEW FLYER, AND SENDS IT TO 1000 PEOPLE RANDOMLY SELECTED FROM THEIR MAILING LIST OF OVER 200,000 PEOPLE. THEY GET ORDERS FROM 123 OF THE RECIPIENTS.

(A) CREATE A 90% CONFIDENCE INTERVAL FOR THE PERCENTAGE OF PEOPLE THE COMPANY CONTACTS WHO MAY BUY SOMETHING.

(B) EXPLAIN WHAT THIS INTERVAL MEANS.(C) EXPLAIN WHAT “90% CONFIDENCE” MEANS.(D) THE COMPANY MUST DECIDE WHETHER TO NOW DO A

MASS MAILING. THE MAILING WON’T BE COST-EFFECTIVE UNLESS IT PRODUCES AT LEAST A 5% RETURN. WHAT DOES YOUR CONFIDENCE INTERVAL SUGGEST? EXPLAIN.

Page 36: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

36

SOLUTION

Page 37: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

37

EXAMPLE 3

IN 1998 A SAN DIEGO REPRODUCTIVE CLINIC REPORTED 49 BIRTHS TO 207 WOMEN UNDER THE AGE OF 40 WHO HAD PREVIOUSLY BEEN UNABLE TO CONCEIVE.

(A) FIND A 90% CONFIDENCE INTERVAL FOR THE SUCCESS RATE AT THIS CLINIC.

(B) INTERPRET YOUR INTERVAL IN THIS CONTEXT.

(C) EXPLAIN WHAT “90 CONFIDENCE” MEANS.

(D) WOULD IT BE MISLEADING FOR THE CLINIC TO ADVERTISE A 25% SUCCESS RATE? EXPLAIN.

(E) THE CLINIC WANTS TO CUT THE STATED MARGIN OF ERROR IN HALF. HOW MANY PATIENTS’ RESULTS MUST BE USED?

(F) DO YOU HAVE ANY CONCERNS ABOUT THIS SAMPLE? EXPLAIN.

Page 38: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

38

SOLUTION

Page 39: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

INFERENCES ABOUT MEANS

39

Page 40: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

ASSUMPTIONS AND CONDITIONS

• INDEPENDENCE ASSUMPTION: THE DATA VALUES SHOULD BE INDEPENDENT. THERE’S REALLY NO WAY TO CHECK INDEPENDENCE OF THE DATA BY LOOKING AT THE SAMPLE, BUT WE SHOULD THINK ABOUT WHETHER THE ASSUMPTION IS REASONABLE.

• RANDOMIZATION CONDITION: THE DATA SHOULD ARISE FROM A RANDOM SAMPLE OR SUITABLY A RANDOMIZED EXPERIMENT.

40

Page 41: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

ASSUMPTIONS AND CONDITIONS

• 10% CONDITION: THE SAMPLE IS NO MORE THAN 10% OF THE POPULATION.

• NORMAL POPULATION ASSUMPTION OR NEARLY NORMAL CONDITION: THE DATA COME FROM A DISTRIBUTION THAT IS UNIMODAL AND SYMMETRIC. REMARK: CHECK THIS CONDITION BY MAKING A HISTOGRAM OR NORMAL PROBABILITY PLOT.

41

Page 42: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

CONSTRUCTING CONFIDENCE INTERVALS FOR MEANS

42

• POINT ESTIMATOR:

• STANDARD ERROR:

• C% MARGIN OF ERROR:

Page 43: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

WHERE tn-1* IS A CRITICAL VALUE FOR STUDENT’S t – MODEL WITH n – 1 DEGREES OF FREEDOM THAT CORRESPONDS TO C% CONFIDENCE LEVEL.

43

2

22*1)(

ME

stn n

Page 44: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

REMARK

44

Page 45: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

ILLUSTRATIVE PICTURE

45

Page 46: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

FINDING CRITICAL t - VALUES

• Using t tables (Table T) and/or calculator, find or estimate the

• 1. critical value t7* for 90% confidence level if number of degrees of freedom is 7

• 2. one tail probability if t = 2.56 and number of degrees of freedom is 7

• 3. two tail probability if t = 2.56 and number of degrees of freedom is 7

• NOTE: If t has a Student's t-distribution with degrees of freedom, df, then TI-83 function tcdf(a,b,df) , computes the area under the t-curve and between a and b.

46

Page 47: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

EXAMPLES FROM PRACTICE EXERCISES SHEET 7

47

Page 48: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

48

TESTING HYPOTHESES ABOUT PROPORTIONS

• PROBLEM• SUPPOSE WE TOSSED A COIN 100 TIMES

AND WE OBTAINED 38 HEADS AND 62 TAILS. IS THE COIN BIASED?

• THERE IS NO WAY TO SAY YES OR NO WITH 100% CERTAINTY. BUT WE MAY EVALUATE THE STRENGTH OF SUPPORT TO THE HYPOTHESIS THAT “THE COIN IS BIASED.”

Page 49: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

49

TESTING

• HYPOTHESESNULL HYPOTHESIS – ESTABLISHED FACT;– A STATEMENT THAT WE EXPECT DATA TO

CONTRADICT;– NO CHANGE OF PARAMETERS.ALTERNATIVE HYPOTHESIS – NEW CONJECTURE;– YOUR CLAIM;– A STATEMENT THAT NEEDS A STRONG

SUPPORT FROM DATA TO CLAIM IT;– CHANGE OF PARAMETERS

0H

AH

Page 50: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

50

IN OUR PROBLEM

.""

5.0;:

5.0;:0

HEADSTURNSCOINTHE

THATYPROBABILITTHEISpWHERE

pBIASEDISCOINH

pFAIRISCOINH

A

Page 51: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

51

EXAMPLE

• WRITE THE NULL AND ALTERNATIVE HYPOTHESES YOU WOULD USE TO TEST EACH OF THE FOLLOWING SITUATIONS.

• (A) IN THE 1950s ONLY ABOUT 40% OF HIGH SCHOOL GRADUATES WENT ON TO COLLEGE. HAS THE PERCENTAGE CHANGED?

• (B) 20% OF CARS OF A CERTAIN MODEL HAVE NEEDED COSTLY TRANSMISSION WORK AFTER BEING DRIVEN BETWEEN 50,000 AND 100,000 MILES. THE MANUFACTURER HOPES THAT REDESIGN OF A TRANSMISSION COMPONENT HAS SOLVED THIS PROBLEM.

• (C) WE FIELD TEST A NEW FLAVOR SOFT DRINK, PLANNING TO MARKET IT ONLY IF WE ARE SURE THAT OVER 60% OF THE PEOPLE LIKE THE FLAVOR.

Page 52: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

52

ATTITUDE

• ASSUME THAT THE NULL HYPOTHESIS

IS TRUE AND UPHOLD IT,

UNLESS DATA STRONGLY SPEAKS AGAINST IT.

0H

Page 53: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

53

TEST MECHANIC

• FROM DATA, COMPUTE THE VALUE OF A PROPER TEST STATISTICS, THAT IS, THE Z-STATISTICS.

• IF IT IS FAR FROM WHAT IS EXPECTED UNDER THE NULL HYPOTHESIS ASSUMPTION, THEN WE REJECT THE NULL HYPOTHESIS.

Page 54: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

54

COMPUTATION OF THE Z – STATISTICS OR PROPER TEST STATISTICS

n

qppSD

wherepSD

ppz

oo

o

.)ˆ(

,)ˆ(

ˆ

Page 55: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

55

CONSIDERING THE EXAMPLE AT THE BEGINNING:

4.205.0

50.038.0

05.0100

)5.0(5.0)ˆ(,5.0,38.0ˆ

o

O

zAND

PSDPP

Page 56: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

56

THE P – VALUE AND ITS COMPUTATION

• THE PROBABILITY THAT IF THE NULL HYPOTHESIS IS CORRECT, THE TEST STATISTIC TAKES THE OBSERVED OR MORE EXTREME VALUE.

• P – VALUE MEASURES THE STRENGTH OF EVIDENCE AGAINST THE NULL HYPOTHESIS. THE SMALLER THE P – VALUE, THE STRONGER THE EVIDENCE AGAINST THE NULL HYPOTHESIS.

Page 57: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

57

THE WAY THE ALTERNATIVE HYPOTHESIS IS WRITTEN IS HELPFUL IN COMPUTING THE P - VALUE

NORMAL CURVEAH

oA ppH :

oA ppH :

)( ozzP

)( ozzP

)(2 ozzP oA ppH :

valuep

Page 58: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

58

IN OUR EXAMPLE,

• P – VALUE = P( z < - 2.4) = 0.0082

• INTERPRETATION: IF THE COIN IS FAIR, THEN THE PROBABILITY OF OBSERVING 38 OR FEWER HEADS IN 100 TOSSES IS 0.0082

Page 59: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

59

CONCLUSION: GIVEN SIGNIFICANCE LEVEL = 0.05

• WE REJECT THE NULL HYPOTHESIS IF THE P – VALUE IS LESS THAN THE SIGNIFICANCE LEVEL OR ALPHA LEVEL.

• WE FAIL TO REJECT THE NULL HYPOTHESIS (I.E. WE RETAIN THE NULL HYPOTHESIS) IF THE P – VALUE IS GREATER THAN THE SIGNIFICANCE LEVEL OR ALPHA LEVEL.

Page 60: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

60

ASSUMPTIONS AND CONDITIONS

• RANDOMIZATION

• INDEPENDENT OBSERVATIONS

• 10% CONDITION

• SUCCESS/FAILURE CONDITION

Page 61: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

61

EXAMPLE 1

• THE NATIONAL CENTER FOR EDUCATION STATISTICS MONITORS MANY ASPECTS OF ELEMENTARY AND SECONDARY EDUCATION NATIONWIDE. THEIR 1996 NUMBERS ARE OFTEN USED AS A BASELINE TO ASSESS CHANGES. IN 1996, 31% OF STUDENTS REPORTED THAT THEIR MOTHERS HAD GRADUATED FROM COLLEGE. IN 2000, RESPONSES FROM 8368 STUDENTS FOUND THAT THIS FIGURE HAD GROWN TO 32%. IS THIS EVIDENCE OF A CHANGE IN EDUCATION LEVEL AMONG MOTHERS?

Page 62: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

62

EXAMPLE 1 CONT’D

• (A) WRITE APPROPRIATE HYPOTHESES.

• (B) CHECK THE ASSUMPTIONS AND CONDITIONS.

• (C) PERFORM THE TEST AND FIND THE P – VALUE.

• (D) STATE YOUR CONCLUSION.

• (E) DO YOU THINK THIS DIFFERENCE IS MEANINGFUL? EXPLAIN.

Page 63: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

63

SOLUTION

Page 64: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

64

EXAMPLE 2

• IN THE 1980s IT WAS GENERALLY BELIEVED THAT CONGENITAL ABNORMALITIES AFFECTED ABOUT 5% OF THE NATION’S CHILDREN. SOME PEOPLE BELIEVE THAT THE INCREASE IN THE NUMBER OF CHEMICALS IN THE ENVIRONMENT HAS LED TO AN INCREASE IN THE INCIDENCE OF ABNORMALITIES. A RECENT STUDY EXAMINED 384 CHILDREN AND FOUND THAT 46 OF THEM SHOWED SIGNS OF AN ABNORMALITY. IS THIS STRONG EVIDENCE THAT THE RISK HAS INCREASED? ( WE CONSIDER A P – VALUE OF AROUND 5% TO REPRESENT STRONG EVIDENCE.)

Page 65: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

65

EXAMPLE 2 CONT’D

• (A) WRITE APPROPRIATE HYPOTHESES.• (B) CHECK THE NECESSARY ASSUMPTIONS.

• (C) PERFORM THE MECHANICS OF THE TEST. WHAT IS THE P – VALUE?

• (D) EXPLAIN CAREFULLY WHAT THE P – VALUE MEANS IN THIS CONTEXT.

• (E) WHAT’S YOUR CONCLUSION?• (F) DO ENVIRONMENTAL CHEMICALS CAUSE

CONGENITAL ABNORMALITIES?

Page 66: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

66

SOLUTION

Page 67: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

INFERENCES ABOUT MEANS

• TESTING HYPOTHESES ABOUT MEANS

• ONE – SAMPLE t – TEST FOR MEANS

• PROBLEM• Test HO: = 0

67

Page 68: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

ASSUMPTIONS AND CONDITIONS

• INDEPENDENCE ASSUMPTION

• RANDOMIZATION CONDITION

• 10% CONDITION

• NEARLY NORMAL CONDITION OR LARGE SAMPLE

68

Page 69: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

STEPS IN TESTING

• NULL HYPOTHESIS

• HO: = 0

• ALTERNATIVE HYPOTHESIS

• HA: > 0

• or HA: < 0

• or HA: ≠ 0

 69

Page 70: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

ATTITUDE: Assume that the null hypothesis HO is true and uphold it, unless data strongly speaks against it.

• STANDARD ERROR

• TEST STATISTICS

• t HAS STUDENT’S t – DISTRIBUTION WITH

n – 1 DEGREES OF FREEDOM.

70

Page 71: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

 P-value: Let to be the observed value of the

test statistic.

HA P-value SHAPE OF DISTRIBUTION 

HA: > 0 P(t > to)  

HA: < 0 P(t <to)  

HA: ≠ 0 P(t > |to|) + P(t < -|to|)

 

71

Page 72: Reading: Course-Pack Chapters 17 – 18, 23 - 26 –SAMPLING DISTRIBUTION FOR PROPORTIONS AND MEANS –CONFIDENCE INTERVALS FOR PROPORTIONS AND MEANS –HYPOTHESES

CONCLUSION

72