stat 13, intro. to statistical methods for the life and ...frederic/13/sum17/day06.pdf · stat 13,...

112
Stat 13, Intro. to Statistical Methods for the Life and Health Sciences. 1. Collect hw2. 2. More problems with studies, coverage, adherer bias and clofibrate example. 3. More about confounding factors. 4. Confounding and lefties example. 5. Comparing two proportions using numerical and visual summaries, good or bad year example. 6. Comparing 2 proportions with CIs + testing using simulation, dolphin example. 7. Comparing 2 props. with theory-based testing, smoking and gender example. 8. Five number summary, IQR, and geysers. 9. Comparing two means with simulations and bicycling to work example. Read ch5 and 6. The midterm will be on ch 1-6. http://www.stat.ucla.edu/~frederic/13/sum17 . Bring a PENCIL and CALCULATOR and any books or notes you want to the midterm and final. HW3 4.CE.10, 5.3.28, 6.1.17, and 6.3.14. 4.CE.10 starts out "Studies have shown that children in the U.S. who have been spanked have a significantly lower IQ score on average...." 5.3.28 starts out "Recall the data from the Physicians' Health Study: Of the 11,034 physicians who took the placebo ...." 6.1.17 starts out "The graph below displays the distribution of word lengths ...." 6.3.14 starts out "In an article titled 'Unilateral Nostril Breathing Influences Lateralized Cognitive Performance' that appeared ...." 1

Upload: phungkhanh

Post on 19-Mar-2018

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

1. Collect hw2. 2. More problems with studies, coverage, adherer bias and clofibrate example. 3. More about confounding factors. 4. Confounding and lefties example. 5. Comparing two proportions using numerical and visual summaries,

good or bad year example. 6. Comparing 2 proportions with CIs + testing using simulation, dolphin example.7. Comparing 2 props. with theory-based testing, smoking and gender example. 8. Five number summary, IQR, and geysers. 9. Comparing two means with simulations and bicycling to work example.

Read ch5 and 6. The midterm will be on ch 1-6. http://www.stat.ucla.edu/~frederic/13/sum17 .Bring a PENCIL and CALCULATOR and any books or notes you want to the midterm and final.

HW3 4.CE.10, 5.3.28, 6.1.17, and 6.3.14. 4.CE.10 starts out "Studies have shown that children in the U.S.

who have been spanked have a significantly lower IQ score on average...."5.3.28 starts out "Recall the data from the Physicians' Health Study: Of the

11,034 physicians who took the placebo ...."6.1.17 starts out "The graph below displays the distribution of word lengths ...."

6.3.14 starts out "In an article titled 'Unilateral Nostril Breathing Influences Lateralized Cognitive Performance' that appeared ...."

1

Page 2: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

1.HandinHW2!2.Moreproblemswithstudies,coverage,adhererbiasandClofibrate.Surveysareobservational.• Coverageisacommonissue.Coverageistheextenttowhich

thepeopleyousampledfromrepresenttheoverallpopulation.Asurveyatafancyresearchhospitalinawealthyneighborhoodmayyieldpatiencewithhigherincomes,highereducation,etc.

• Non-responsebiasisanothercommonproblem.Poorcoveragemeansthepeoplegettingthesurveydonotrepresentthegeneralpopulation.Non-responsebiasmeansthatoutofthepeopleyougavethesurveyto,thepeopleactuallyfillingitoutandsubmittingitaredifferentfromthepeoplewhodidnot.

• Sameexactissuesinwebsurveys.

Page 3: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Moreproblemswithstudies,andClofibrate example.Non-responsebiasissimilartoadhererbias,inexperiments.Adrugcalledclofibrate wastestedon3,892middle-agedmenwithhearttrouble.Itwassupposedtopreventheartattacks.1,103assignedatrandomtotakeclofibrate,2,789toplacebo(lactose)group.Subjectswerefollowedfor5years.Isthisanexperimentoranobservationalstudy?

Clofibrate patientswhodiedduringfollowupadherers 15%non-adherers 25%total 20%

Page 4: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Moreproblemswithstudies,andClofibrate example.Non-responsebiasissimilartoadhererbias,inexperiments.Adrugcalledclofibrate wastestedon3,892middle-agedmenwithhearttrouble.Itwassupposedtopreventheartattacks.1,103assignedatrandomtotakeclofibrate,2,789toplacebo(lactose)group.Subjectswerefollowedfor5years.Isthisanexperimentoranobservationalstudy?

Itisanexperiment.DoesClofibrate work?Clofibrate patientswhodiedduringfollowup

adherers 15%non-adherers 25%total 20%

Page 5: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Clofibrate patientswhodiedduringfollowupadherers 15%non-adherers 25%total 20%

--------------------------------------------------------------------------Placebo

adherers 15%nonadherers 28%total 21%

Thosewhotookclofibrate didmuchbetterthanthosewhodidn'tkeeptakingclofibrate.Doesthismeanclofibrate works?

Page 6: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Clofibrate patientswhodiedduringfollowupadherers 15%non-adherers 25%total 20%

--------------------------------------------------------------------------Placebo

adherers 15%nonadherers 28%total 21%

Thosewhoadheredtoplaceboalsodidmuchbetterthanthosewhostoppedadhering.

Page 7: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Clofibrate patientswhodiedduringfollowupadherers 15%non-adherers 25%total 20%

--------------------------------------------------------------------------Placebo

adherers 15%nonadherers 28%total 21%

Allinalltherewaslittledifferencebetweenthetwogroups.

Page 8: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Clofibrate patientswhodiedduringfollowupadherers 15%non-adherers 25%total 20%

--------------------------------------------------------------------------Placebo

adherers 15%nonadherers 28%total 21%

Adherersdidbetterthannon-adherers,notbecauseofclofibrate,butbecausetheywerehealthieringeneral.Why?

Page 9: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Clofibrate patientswhodiedduringfollowupadherers 15%non-adherers 25%total 20%

--------------------------------------------------------------------------Placebo

adherers 15%nonadherers 28%total 21%

Adherersdidbetterthannon-adherers,notbecauseofclofibrate,butbecausetheywerehealthieringeneral.Why?• adherersarethetypetoengageinhealthierbehavior.• sickpatientsarelesslikelytoadhere.

Page 10: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

3.Moreaboutconfoundingfactors.• Byaconfoundingfactor,wemeananalternativeexplanation

thatcouldexplaintheapparentrelationshipbetweenthetwovariables,eveniftheyarenotcausallyrelated.Typicallythisisdonebyfindinganotherdifferencebetweenthetreatmentandcontrolgroup.Forinstance,differentstudieshaveexaminedsmokersandnon-smokersandhavefoundthatsmokershavehigherratesoflivercancer.Oneexplanationwouldbethatsmokingcauseslivercancer.Butisthereanyother,alternativeexplanation?

• Onealternativewouldbethatthesmokerstendtodrinkmorealcohol,anditisthealcohol,notthesmoking,thatcauseslivercancer.

Page 11: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

3.Moreaboutconfoundingfactors.• Anotherplausibleexplanationisthatthesmokersareprobably

olderonaveragethanthenon-smokers,andolderpeoplearemoreatriskforallsortsofcancerthanyoungerpeople.

• Anothermightbethatsmokersengageinotherunhealthyactivitiesmorethannon-smokers.

• Notethatifonesaidthat“smokingmakesyouwanttodrinkalcoholwhichcauseslivercancer,”thatwouldnotbeavalidconfoundingfactor,sinceinthatexplanation,smokingeffectiveiscausallyrelatedtolivercancerrisk.

Page 12: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

3.Moreaboutconfoundingfactors.• Aconfoundingfactormustbeplausiblylinkedtoboththe

explanatoryandresponsevariables.Soforinstancesaying“perhapsahigherproportionofthesmokersaremen”wouldnotbeaveryconvincingconfoundingfactor,unlessyouhavesomereasontothinkgenderisstronglylinkedtolivercancer.

• Anotherexample:left-handednessandageatdeath.PsychologistsDianeHalpernandStanleyCoren lookedat1,000deathrecordsofthosewhodiedinSouthernCaliforniainthelate1980s andearly1990sandcontactedrelativestoseeifthedeceasedwererighthanded orlefthanded.Theyfoundthattheaverageagesatdeathofthelefthanded was66,andfortherighthanded itwas75.Theirresultswerepublishedinprestigiousscientificjournals,NatureandtheNewEnglandJournalofMedicine.

Page 13: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Moreaboutconfoundingfactors.Allsortsofcausalconclusionsweremadeabouthowthisshowsthatthestressofbeinglefthanded inourrighthanded worldleadstoprematuredeath.

Page 14: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Moreaboutconfoundingfactors.• Isthisanobservationalstudyoranexperiment?

Page 15: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Moreaboutconfoundingfactors.• Isthisanobservationalstudyoranexperiment?Itisanobservationalstudy.• Arethereplausibleconfoundingfactorsyoucanthinkof?

Page 16: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Moreaboutconfoundingfactors.• Aconfoundingfactoristheageofthetwopopulationsin

general.Leftiesinthe1980swereonaverageyoungerthanrighties.Manyoldleftieswereconvertedtorightiesatinfancy,intheearly20thcentury,butthispracticehassubsided.Thusinthe1980sand1990s,therewererelativelyfewoldleftiesbutmanyyoungleftiesintheoverallpopulation.Thisaloneexplainsthediscrepancy.

Page 17: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Unit2.ComparingTwoGroups

• InUnit1,welearnedthebasicprocessofstatisticalinferenceusingtestsandconfidenceintervals.Wedidallthisbyfocusingonasingleproportion.

• InUnit2,wewilltaketheseideasandextendthemtocomparingtwogroups.Wewillcomparetwoproportions,twoindependentmeans,andpaireddata.

Page 18: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

5.Comparingtwoproportionsusingnumericalandvisualsummaries,andthegoodorbadyearexample.

Section5.1

Page 19: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Example5.1:PositiveandNegativePerceptions

• Considerthesetwoquestions:– Areyouhavingagoodyear?– Areyouhavingabadyear?

• Dopeopleanswereachquestioninsuchawaythatwouldindicatethesameanswer?(e.g.YesforthefirstoneandNoforthesecond.)

Page 20: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

PositiveandNegativePerceptions

• Researchersquestioned30students(randomlygivingthemoneofthetwoquestions).

• Theythenrecordedifapositiveornegativeresponsewasgiven.

• Theywantedtoseeifthewordingofthequestioninfluencedtheanswers.

Page 21: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Positiveandnegativeperceptions

• Observationalunits– The30students

• Variables– Questionwording(goodyearorbadyear)– Perceptionoftheiryear(positiveornegative)

• Whichistheexplanatoryvariableandwhichistheresponse variable?

• Isthisanobservationalstudyorexperiment?

Page 22: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Individual TypeofQuestion

Response Individual TypeofQuestion

Response

1 GoodYear Positive 16 GoodYear Positive2 GoodYear Negative 17 BadYear Positive3 BadYear Positive 18 GoodYear Positive4 GoodYear Positive 19 GoodYear Positive5 GoodYear Negative 20 GoodYear Positive6 BadYear Positive 21 BadYear Negative7 GoodYear Positive 22 GoodYear Positive8 GoodYear Positive 23 BadYear Negative9 GoodYear Positive 24 GoodYear Positive10 BadYear Negative 25 BadYear Negative11 GoodYear Negative 26 GoodYear Positive12 BadYear Negative 27 BadYear Negative13 GoodYear Positive 28 GoodYear Positive14 BadYear Negative 29 BadYear Positive15 GoodYear Positive 30 BadYear Negative

RawDatainaSpreadsheet

Page 23: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Two-WayTables

• Atwo-waytableorganizesdata– Summarizestwo categoricalvariables– Alsocalledcontingencytable

• Arestudentsmorelikelytogiveapositiveresponseiftheyweregiventhegoodyearquestion?

GoodYear BadYear TotalPositiveresponse 15 4 19Negativeresponse 3 8 11Total 18 12 30

Page 24: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Two-WayTables

• Conditionalproportionswillhelpusbetterdetermineifthereisanassociationbetweenthequestionaskedandthetypeofresponse.

• Wecanseethatthesubjectswiththepositivequestionweremorelikely torespondpositively.

GoodYear BadYear TotalPositiveresponse 15/18 ≈0.83 4/12≈0.33 19Negativeresponse 3 8 11Total 18 12 30

Page 25: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SegmentedBarGraphs

• Wecanalsousesegmentedbargraphstoseethisassociation betweenthe"goodyear"questionandapositiveresponse.

Page 26: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Statistic

GoodYear BadYear TotalPositiveresponse 15(83%) 4(33%) 19Negativeresponse 3 8 11Total 18 12 30

� Thestatisticwewillmainlyusetosummarizethistableisthedifferenceinproportionsofpositiveresponsesis0.83− 0.33=0.50.

Page 27: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

AnotherStatistic

GoodYear BadYear TotalPositiveresponse 15(83%) 4(33%) 19Negativeresponse 3 8 11Total 18 12 30

� Anotherstatisticthatisoftenused,calledrelativerisk,istheratiooftheproportions:0.83/0.33=2.5.

� Wecansaythatthosewhoweregiventhegoodyearquestionwere2.5timesaslikelytogiveapositiveresponse.

Page 28: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

NoAssociation

• Fordatatoshownoassociation,theproportionsofpositiveresponsesshouldbethesameforthosegettingeachquestiontype.

• Sincetheoverallpositiveresponsewas19/30(63%),ifthereisnoassociationweshouldhave63%ofthe18thatgotthegoodyearquestionwithapositiveresponse(11.4)and63%ofthe12thatgotthebadyearquestionshouldgiveapositiveresponse(7.6).Thefollowingtableistheclosestpossible.

Good Year Bad Year TotalPositiveresponse 11(61%) 8(67%) 19Negativeresponse 7 4 11Total 18 12 30

Page 29: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

6. Comparing two proportions with CIs and testing using simulation, dolphin example.

Section5.2

Page 30: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins

Example5.2

Page 31: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphinsIsswimmingwithdolphinstherapeuticforpatientssufferingfromclinicaldepression?

• ResearchersAntonioli andReveley (2005),inBritishMedicalJournal,recruited30subjectsaged18-65withaclinicaldiagnosisofmildtomoderatedepression

• Discontinuedantidepressantsandpsychotherapy4weekspriortoandthroughouttheexperiment

• 30subjectswenttoanislandnearHonduraswheretheywererandomlyassignedtotwotreatmentgroups

Page 32: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins• Bothgroupsengagedinonehourofswimmingandsnorkeling

eachday• Onegroupswaminthepresenceofdolphinsandtheother

groupdidnot• Participantsinbothgroupshadidenticalconditionsexceptfor

thedolphins• Aftertwoweeks,eachsubjects’levelofdepressionwas

evaluated,asithadbeenatthebeginningofthestudy• Theresponsevariableiswhetherornotthesubjectachieved

substantialreductionindepression

Page 33: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins

Nullhypothesis:Dolphinsdonothelp.– Swimmingwithdolphinsisnotassociatedwithsubstantialimprovementindepression

Alternativehypothesis:Dolphinshelp.– Swimmingwithdolphinsincreases theprobabilityofsubstantialimprovementindepressionsymptoms

Page 34: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins• Theparameteristhe(long-run)differencebetweenthe

probabilityofimprovingwhenreceivingdolphintherapyandtheprob.ofimprovingwiththecontrol(𝜋dolphins - 𝜋control)

• Sowecanwriteourhypothesesas:H0:𝜋dolphins - 𝜋control=0.Ha:𝜋dolphins- 𝜋control>0.or

H0: 𝜋dolphins= 𝜋control

Ha:𝜋dolphins> 𝜋control

(Note:wearenotsayingourparametersequalanycertainnumber.)

Page 35: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins

Results:

Dolphingroup

Controlgroup

Total

Improved 10(66.7%) 3(20%) 13

DidNot Improve 5 12 17Total 15 15 30

Thedifferenceinproportionsofimproversis:𝒑$𝒅 − 𝒑$𝒄 =0.667– 0.20=0.467.

Page 36: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins

• Therearetwopossibleexplanationsforanobserveddifferenceof0.467.– Atendencytobemorelikelytoimprovewithdolphins(alternativehypothesis)

– The13subjectsweregoingtoshowimprovementwithorwithoutdolphinsandrandomchanceassignedmoreimproverstothedolphins(nullhypothesis)

Page 37: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins

• Ifthenullhypothesisistrue(noassociationbetweendolphintherapyandimprovement)wewouldhave13improversand17non-improversregardlessofthegrouptowhichtheywereassigned.

• Hencetheassignmentdoesn’tmatterandwecanjustrandomlyassignthesubjects’resultstothetwogroupstoseewhatwouldhappenunderatruenullhypothesis.

Page 38: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins• Wecansimulatethiswithcards– 13cardstorepresenttheimprovers– 17cardsrepresentthenon-improvers

• Shufflethecards– put15inonepile(dolphintherapy)– put15inanother(controlgroup)

Page 39: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins• ComputetheproportionofimproversintheDolphinTherapygroup

• ComputetheproportionofimproversintheControlgroup

• Thedifferenceinthesetwoproportionsiswhatcouldjustaswellhavehappenedundertheassumptionthereisnoassociationbetweenswimmingwithdolphinsandsubstantialimprovementindepression.

Page 40: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

20.0%Improvers66.7%Improvers

DolphinTherapyControlNon-

improver

Improver

Improver

Improver

Improver

Improver

Improver

Improver

ImproverImprover

Improver

Improver

Improver

ImproverNon-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

40.0%Improvers 46.7%Improvers0.400– 0.467=-0.067

DifferenceinSimulatedProportions

Page 41: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

33.3%Improvers53.3%Improvers 46.7%Improvers40.0%Improvers

Non-improver

Improver

Non-improver

Improver

Improver

Non-improver

Improver

Improver

ImproverNon-improver

Non-improver

Non-improver

Non-improver

ImproverNon-improver

Non-improver

Improver

Improver

Non-improver

Non-improver

Non-improver

Improver

ImproverImprover

Improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

0.533– 0.333=0.200

DifferenceinSimulatedProportions

DolphinTherapy Control

Page 42: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

53.3%Improvers 33.3%Improvers40.0%Improvers46.7%Improvers

Non-improver

Improver

Non-improver

Improver

Improver

Non-improver

Improver

Improver

ImproverNon-improver

Non-improver

Non-improver

Non-improver

ImproverNon-improver

Non-improver

Improver

Improver

Non-improver

Non-improver

Non-improver

Improver

ImproverImprover

Improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

0.467– 0.400=0.067

DifferenceinSimulatedProportions

DolphinTherapyControl

Page 43: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

MoreSimulations-0.067-0.333 -0.200

0.067 0.2000.333

0.467-0.200

-0.200

-0.200-0.067 -0.067

-0.067

-0.067 -0.067-0.067

0.067

0.067

0.067

0.067

0.067

0.0670.200

0.200

0.200

0.3330.333Onlyonesimulatedstatisticsoutof30wasas

largeorlargerthanourobserveddifferenceinproportionsof0.467,henceourp-valueforthisnulldistributionis1/30≈0.03.

DifferenceinSimulatedProportions

Page 44: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins• Wedid1000repetitionstodevelopanulldistribution.

Page 45: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins

• 13outof1000resultshadadifferenceof0.467orhigher(p-value=0.013).

• 0.467is(.*+,-((../0

≈ 2.52 SDabovezero.� Usingeitherthep-valueorstandardizedstatistic,wehavestrongevidenceagainstthenullandcanconcludethattheimprovementduetoswimmingwithdolphinswasstatisticallysignificant.

Page 46: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins

� A95%confidenceintervalforthedifferenceintheprobabilityusingthestandarddeviationfromthenulldistributionis0.467+ 2(0.185)=0.467+ 0.370or(0.097to0.837)

• Weare95%confidentthatwhenallowedtoswimwithdolphins,theprobabilityofimprovingisbetween0.097and0.837higherthanwhennodolphinsarepresent.

• Howdoesthisintervalbackupourconclusionfromthetestofsignificance?

Page 47: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SwimmingwithDolphins

• Canwesaythatthepresenceofdolphinscaused thisimprovement?– Sincethiswasarandomizedexperiment,andassumingeverythingwasidenticalbetweenthegroups,wehavestrongevidencethatdolphinswerethecause

• Canwegeneralizetoalargerpopulation?– Maybemildtomoderatelydepressed18-65yearoldpatientswillingtovolunteerforthisstudy

– Wehavenoevidencethatrandomselectionwasusedtofindthe30subjects."Outpatients,recruitedthroughannouncementsontheinternet,radio,newspapers,andhospitals."

Page 48: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

7.Comparingtwoproportions:Theory-BasedApproach,andsmokingandgenderexample.

Section5.3

Page 49: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Introduction

• Justaswithasingleproportion,wecanoftenpredictresultsofasimulationusingatheory-basedapproach.

• Thetheory-basedapproachalsogivesasimplerwaytogenerateaconfidenceintervals.

• ThemainnewmathematicalfacttouseistheSEforthedifferencebetweentwoproportionsis

�̂�(1 − �̂�) .9:+ .

9<

� .

Page 50: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Parents’SmokingStatusandtheirBabies’Gender

Example5.3

Page 51: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

• Howdoesparents’behavioraffectthegenderoftheirchildren?

• Fukudaetal.(2002)foundthefollowinginJapan.– Outof565birthswherebothparentssmokedmorethan

apackaday,255wereboys.Thisis45.1%boys.– Outof3602birthswherebothparentsdidnotsmoke,

1975wereboys.This54.8%boys.– Intotal,outof4170births,2230wereboys,whichis

53.5%.• Otherstudieshaveshownareducedmaletofemale

birthratiowherehighconcentrationsofotherenvironmentalchemicalsarepresent(e.g.industrialpollution,pesticides)

Page 52: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender• A segmentedbargraphand2-waytable• Let’scomparetheproportionstoseeifthedifferenceis

statisticallysignificantly.

BothSmoked Neither Smoked

Boy 255(45.1%) 1,975(54.8%)

Girl 310 1,627

Total 565 3,602

Page 53: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

NullHypothesis:• Thereisnoassociationbetween

smokingstatusofparentsandsexofchild.

• Theprobabilityofhavingaboyisthesameforparentswhosmokeanddon’tsmoke.

• 𝜋smoking - 𝜋nonsmoking =0

Page 54: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

AlternativeHypothesis:• Thereisanassociationbetweensmokingstatusofparentsandsexofchild.

• Theprobabilityofhavingaboyisnotthesameforparentswhosmokeanddon’tsmoke

• 𝜋smoking - 𝜋nonsmoking ≠0

Page 55: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

• Whataretheobservationalunitsinthestudy?• Whatarethevariablesinthisstudy?• Whichvariableshouldbeconsideredtheexplanatoryvariableandwhichtheresponsevariable?

• Whatistheparameterofinterest?• Canyoudrawcause-and-effectconclusionsforthisstudy?

Page 56: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

Usingthe3SStrategytoassesthestrength1.Statistic:• Theproportionofboysborntononsmokersminustheproportionofboysborntosmokersis0.548– 0.451=0.097.

Page 57: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

2.Simulate:• Manyrepetitionsofshufflingthe2230boysand1937girlstothe565smokingand3602nonsmokingparents

• Calculatethedifferenceinproportionsofboysbetweenthegroupsforeachrepetition.

• Shufflingsimulatesthenullhypothesisofnoassociation

Page 58: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender3.Strengthofevidence:• Nothingasextremeas

ourobservedstatistic(≥0.097or≤−0.097)occurredin5000repetitions,

• HowmanySDsis0.097abovethemean?Z=0.097/0.023=4.22usingsimulations.Whataboutusingthetheory-basedapproach?

Page 59: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

• Noticethenulldistributioniscenteredatzeroandisbell-shaped.

• Thiscanbeapproximatedbythenormaldistribution.

Page 60: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Formulas

• Thetheory-basedapproachyieldsz=4.30.

𝑧 =�̂�. − �̂�@

�̂�(1 − �̂�) 1𝑛.+ 1𝑛@

• Here𝑧 = .0*/-.*0.

.0B0(.-.0B0) :CDE<F

:GDG

�=4.30.

• p-valueis2*(1-pnorm(4.30))=0.00171%.

Page 61: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

• Fukudaetal.(2002)foundthefollowinginJapan.– Outof3602birthswherebothparentsdidnotsmoke,

1975wereboys.This54.8%boys.– Outof565birthswherebothparentssmokedmorethan

apackaday,255wereboys.Thisis45.1%boys.– Intotal,outof4170births,2230wereboys,whichis

53.5%boys.

Page 62: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Formulas• Howdowefindthemarginoferrorforthedifferencein

proportions?

𝑀𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟 ⨯�̂�.(1 − �̂�.)

𝑛.+�̂�@(1 − �̂�@)

𝑛@

• Themultiplierisdependentupontheconfidencelevel.– 1.645for90%confidence– 1.96for95%confidence– 2.576for99%confidence

• Wecanwritetheconfidenceintervalintheform:– statistic± marginoferror.

Page 63: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender• Ourstatisticistheobservedsampledifferenceinproportions,

0.097.

• Pluggingin1.96 ⨯ RS:(.-RS:)9:

+ RS<(.-RS<)9<

� =0.044,

weget0.097± 0.044asour95%CI.• Wecouldalsowritethisintervalas(0.053,0.141).• Weare95%confidentthattheprobabilityofaboybaby

whereneitherfamilysmokesminustheprobabilityofaboybabywherebothparentssmokeisbetween0.053and0.141.

Page 64: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Aclarificationontheformulas• Themarginoferrorforthedifferenceinproportionsis

𝑀𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟 ⨯ SE,whereSE = RS:(.-RS:)9:

+ RS<(.-RS<)9<

Intesting,thenullhypothesisisnodifferencebetweenthetwogroups,soweusedtheSE

�̂�(1 − �̂�)𝑛.

+�̂�(1 − �̂�)

𝑛@

where�̂� istheproportioninbothgroupscombined.Butin

CIs,weusetheformula RS:(.-RS:)9:

+ RS<(.-RS<)9<

� becausewe

arenotassuming�̂�. =�̂�@withCIs.

Page 65: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

• Howwouldtheintervalchangeiftheconfidencelevelwas99%?

• TheSE= RS:(.-RS:)9:

+ RS<(.-RS<)9<

� =.0224.

• Previously,fora95%CI,itwas0.097± 1.96x.0224=0.097± 0.044.

• Fora99%CI,itis0.097± 2.576x.0224=0.097± 0.058.

Page 66: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender• Writtenasthestatistic± marginoferror,the99%CIforthedifferencebetweenthetwoproportionsis

0.097± 0.058.• Marginoferror– 0.058forthe99%confidenceinterval– 0.044forthe95%confidenceinterval

Page 67: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

• Howwouldthe95%confidenceintervalchangeifwewereestimating

𝜋smoker – 𝜋nonsmoker

insteadof𝜋nonsmoker – 𝜋smoker?

Page 68: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

• (−0.141,−0.053)or−0.097± 0.044insteadof

• (0.053,0.141)or 0.097± 0.044.

• Thenegativesignsindicatetheprobabilityofaboyborntosmokingparentsislowerthanthatfornonsmokingparents.

Page 69: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender

ValidityConditionsofTheory-Based• Sameaswithasingleproportion.• Shouldhaveatleast10observationsineachofthecellsofthe2x2table.

SmokingParents Non-smokingParents

Total

Male 255 1975 2230Female 310 1627 1937Total 565 3602 4167

Page 70: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

SmokingandGender• Thestrongsignificantresultinthisstudyyieldedquiteabitofpresswhenitcameout.

• Soonotherstudiescameoutwhichfoundnorelationshipbetweensmokingandgender(Parazinni etal.2004,Obel etal.2003).

• James(2004)arguedthatconfoundingvariableslike socialfactors,diet,environmentalexposureorstresswerethereasonfortheassociationbetweensmokingandgenderofthebaby.Theseareallconfoundedsinceitwasanobservationalstudy.Differentstudiescouldeasilyhavehaddifferentlevelsoftheseconfoundingfactors.

Page 71: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

8.Fivenumbersummary,IQR,andgeysers.

6.1:ComparingTwoGroups:QuantitativeResponse6.2:ComparingTwoMeans:Simulation-BasedApproach6.3:ComparingTwoMeans:Theory-BasedApproach

Page 72: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Section 6.1ExploringQuantitativeData

Page 73: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Quantitativevs.CategoricalVariables

• Categorical– Valuesforwhicharithmeticdoesnotmakesense.– Gender,ethnicity,eyecolor…

• Quantitative– Youcanaddorsubtractthevalues,etc.– Age,height,weight,distance,time…

Page 74: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

GraphsforaSingleVariable

Categorical

Quantitative

BarGraph DotPlot

Page 75: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

ComparingTwoGroupsGraphically

Categorical

Quantitative

Page 76: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

NotationCheck

Statistics� �̅� Samplemean� �̂� Sampleproportion.

Parameters� 𝜇 Populationmean� 𝜋 Population

proportionorprobability.

Statisticssummarizeasampleandparameterssummarizeapopulation

Page 77: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Quartiles

• Suppose25%oftheobservationsliebelowacertainvaluex.Thenxiscalledthelowerquartile(or25th percentile).

• Similarly,if25%oftheobservationsaregreaterthanx,thenxiscalledtheupperquartile (or75thpercentile).

• Thelowerquartilecanbecalculatedbyfindingthemedian,andthendeterminingthemedianofthevaluesbelowtheoverallmedian.Similarlytheupperquartileismedian{xi :xi> overallmedian}.

Page 78: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

IQRandFive-NumberSummary• Thedifferencebetweenthequartilesiscalledtheinter-

quartilerange (IQR),anothermeasureofvariabilityalongwithstandarddeviation.

• Thefive-numbersummary forthedistributionofaquantitativevariableconsistsoftheminimum,lowerquartile,median,upperquartile,andmaximum.

• TechnicallytheIQRisnottheinterval(25thpercentile,75thpercentile),butthedifference75th percentile– 25th .

• Differentsoftwareusedifferentconventions,butwewillusetheconventionthat,ifthereisarangeofpossiblequantiles,youtakethemiddleofthatrange.

• Forexample,supposedataare1,3,7,7,8,9,12,14.• M=7.5,25th percentile=5,75th percentile=10.5.IQR=5.5.

Page 79: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

IQRandFive-NumberSummary• Formediansandquartiles,wewillusetheconvention,if

thereisarangeofpossibilities,takethemiddleoftherange.• InR,thisistype=2.type=1meanstaketheminimum.• x=c(1,3,7,7,8,9,12,14)• quantile(x,.25,type=2)##5.5• IQR(x,type=2)##5.5• IQR(x,type=1)##6.Canyouseewhy?

• Forexample,supposedataare1,3,7,7,8,9,12,14.• M=7.5,25th percentile=5,75th percentile=10.5.IQR=5.5.

Page 80: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

GeyserEruptions

Example6.1

Page 81: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

OldFaithfulInter-EruptionTimes

• Howdothefive-numbersummaryandIQRdifferforinter-eruptiontimesbetween1978and2003?

Page 82: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

OldFaithfulInter-EruptionTimes

• 1978IQR=81– 58=23• 2003IQR=98– 87=11

Page 83: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Boxplots

MinQlower MedQupper Max

Page 84: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Boxplots(Outliers)• Adatavaluethatismorethan1.5× IQRabovetheupperquartileorbelowthelowerquartileisconsideredanoutlier.

• Whentheseoccur,thewhiskersonaboxplotextendouttothefarthestvaluenotconsideredanoutlierandoutliersarerepresentedbyadotoranasterisk.

Page 85: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

CancerPamphletReadingLevels

• Shortetal.(1995)comparedreadinglevelsofcancerpatientsandreadabilitylevelsofcancerpamphlets.Whatisthe:– Medianreadinglevel?– Meanreadinglevel?

• Arethedataskewedonewayortheother?

Page 86: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

• Skewedabittotheright• Meantotherightofmedian

Page 87: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

ComparingTwoMeans:Simulation-BasedApproachand

bicyclingtoworkexample.Section 6.2

Page 88: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

Comparisonwithproportions.

• Wewillbecomparingmeans,muchthesamewaywecomparedtwoproportionsusingrandomizationtechniques.

• Thedifferencehereisthattheresponsevariableisquantitative(theexplanatoryvariableisstillbinarythough).Soifcardsareusedtodevelopanulldistribution,numbersgoonthecardsinsteadofwords.

Page 89: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWorkExample6.2

Page 90: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork• Doesbicycleweightaffectcommutetime?• BritishMedicalJournal(2010)presentedtheresultsofa

randomizedexperimentdonebyJeremyGroves,who wantedtoknowifbicycleweightaffectedhiscommutetowork.

• For56days(JanuarytoJuly)Grovestossedacointodecideifhewouldbikethe27milestoworkonhiscarbonframebike(20.9lbs)orsteelframebicycle(29.75lbs).

• Herecordedthecommutetimeforeachtrip.

Page 91: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

• Whataretheobservationalunits?– Eachtriptoworkonthe56differentdays.

• Whataretheexplanatoryandresponsevariables?– ExplanatoryiswhichbikeGrovesrode(categorical–binary)

– Responsevariableishiscommutetime(quantitative)

Page 92: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

• Nullhypothesis: Commutetimeisnotaffectedbywhichbikeisused.

• Alternativehypothesis: Commutetimeisaffectedbywhichbikeisused.

Page 93: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork• Inchapter5weusedthedifferenceinproportions of

“successes”betweenthetwogroups.• Nowwewillcomparethedifferenceinaverages between

thetwogroups.• Theparametersofinterestare:– µcarbon =Longtermaveragecommutetimewithcarbonframedbike

– µsteel =Longtermaveragecommutetimewithsteelframedbike.

Page 94: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

• µisthepopulationmean.Itisaparameter.• Usingthesymbolsµcarbon andµsteel,wecanrestatethehypotheses.

• H0: µcarbon =µsteel• Ha: µcarbon ≠µsteel .

Page 95: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

Remember:• Thehypothesesareaboutthelongterm associationbetweencommutetimeandbikeused,notjusthis56trips.

• Hypothesesarealwaysaboutpopulationsorprocesses,notthesampledata.

Page 96: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

Samplesize Samplemean SampleSD

Carbonframe 26 108.34min 6.25min

Steelframe 30 107.81min 4.89 min

Page 97: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

• Thesampleaverageandvariabilityforcommutetimewashigherforthecarbonframebike

• Doesthisindicateatendency?• Orcouldahigheraveragejustcomefromtherandomassignment?PerhapsthecarbonframebikewasrandomlyassignedtodayswheretrafficwasheavierorweathersloweddownDr.Grovesonhiswaytowork?

Page 98: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

• Isitpossible togetadifferenceof0.53minutesifcommutetimeisn’taffectedbythebikeused?

• ThesametypeofquestionwasaskedinChapter5forcategoricalresponsevariables.

• Thesameanswer.Yesit’spossible,howlikelythough?

Page 99: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

• The3SStrategyStatistic:

• Chooseastatistic:• Theobserveddifferenceinaveragecommutetimes�̅�carbon – �̅�steel =108.34- 107.81

=0.53minutes

Page 100: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

Simulation:• Wecanimaginesimulatingthisstudywithindexcards.–Writeall56timeson56cards.

• Shuffleall56cardsandrandomlyredistributeintotwostacks:– Onewith26cards(representingthetimesforthecarbon-framebike)

– Another30cards(representingthetimesforthesteel-framebike)

Page 101: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

Simulation(continued):• Shufflingassumesthenullhypothesisofnoassociationbetweencommutetimeandbike

• Aftershufflingwecalculatethedifferenceintheaveragetimesbetweenthetwostacksofcards.

• Repeatthismanytimestodevelopanulldistribution• Let’sseewhatthislookslike

Page 102: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

mean=107.87mean=108.34

CarbonFrameSteelFrame114116 123

113

113

118 109106111

119

mean=108.27

108.27– 107.87=0.40

103103 112

102

110

102 107100101

104

103105 106 102111

106108 106105 107

mean=107.81

116116 118

113

113

113 105104110

109

111111 105

102

106

109 10898103

110

112102 106 102101

105

ShuffledDifferencesinMeans

Page 103: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

mean=108.37mean=108.27

CarbonFrameSteelFrame114116 123

113

113

118 109106111

119

mean=107.69

107.69– 108.37=-0.68

103103 112

102

110

102 107100101

104

103105 106 102111

106108 106105 107

mean=107.87

116116 118

113

113

113 105104110

109

111111 105

102

106

109 10898103

110

112102 106 102101

105

ShuffledDifferencesinMeans

Page 104: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

mean=108.13mean=107.69

CarbonFrameSteelFrame114116 123

113

113

118 109106111

119

mean=107.97

107.97– 108.13=-0.16

103103 112

102

110

102 107100101

104

103105 106 102111

106108 106105 107

mean=108.37

116116 118

113

113

113 105104110

109

111111 105

102

106

109 10898103

110

112102 106 102101

105

ShuffledDifferencesinMeans

Page 105: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

MoreSimulations-2.11-1.20 -1.21

-1.93 -1.53-1.11

0.71-0.52

1.79

0.022.53 1.90

-0.98

0.81 0.551.89

-0.31

-2.50

0.38

-1.51

0.22

1.500.13

0.44

1.46

-0.64-1.10Nineteenofour30simulatedstatisticswereas

ormoreextremethanourobserveddifferenceinmeansof0.53,henceourestimatedp-valueforthisnulldistributionis19/30=0.63.

ShuffledDifferencesinMeans

Page 106: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork• Using1000simulations,weobtainap-valueof72%.• Whatdoesthisp-valuemean?• Ifmeancommutetimesforthebikesarethesamein

thelongrun,andwerepeatedrandomassignmentofthelighterbiketo26daysandtheheavierto30days,adifferenceasextremeas0.53minutesormorewouldoccurinabout72%oftherepetitions.

• Therefore,wedonothavestrongevidencethatthecommutetimesforthetwobikeswilldifferinthelongrun.ThedifferenceobservedbyDr.Grovesisnotstatisticallysignificant.

Page 107: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

• HaveweproventhatthebikeGroveschoosesisnotassociatedwithcommutetime?(Canweconcludethenull?)– No,alargep-valueisnot“strongevidencethatthenullhypothesisistrue.”

– Itsuggeststhatthenullhypothesisisplausible– Therecouldbeasmalllong-termdifference.Buttherealsocouldbenodifference.

Page 108: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

• Imaginewewanttogeneratea95%confidenceintervalforthelong-rundifferenceinaveragecommutingtime.– Sampledifferenceinmeans± 1.96⨯SEforthedifferencebetweenthetwomeans

• Fromsimulations,theSE=standarddeviationofthedifferences=1.47.

• 0.53± 1.96(1.47)=0.53± 2.88• -2.35to3.41.• Whatdoesthismean?

Page 109: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

• Weare95%confidentthatthetruelongtermdifference(carbon– steel)inaveragecommutingtimesisbetween-2.41and3.47minutes.Thecarbonframedbikeisbetween2.41minutesfasterand3.47minutesslowerthanthesteelframedbike.

• Doesitmakesensethattheintervalcontains0,basedonourp-value?

Page 110: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

Scopeofconclusions• Canwegeneralizeourconclusiontoalargerpopulation?

• TwoKeyquestions:–Wasthesamplerandomlyobtainedandrepresentativeoftheoverallpopulationofinterest?

–Wasthisanexperiment?Weretheobservationalunitsrandomlyassignedtotreatments?

Page 111: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

• Wasthesamplerepresentativeofanoverallpopulation?

• WhataboutthepopulationofalldaysDr.Grovesmightbiketowork?– No,Grovescommutedonconsecutivedaysinthisstudyanddidnotincludeallseasons.

• Wasthisanexperiment?Weretheobservationalunitsrandomlyassignedtotreatments?– Yes,heflippedacoinforthebike.–Wecanprobablydrawcause-and-effectconclusionshere.

Page 112: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/sum17/day06.pdf · Stat 13, Intro. to Statistical Methods for the Life and Health ... , since in that explanation,

BicyclingtoWork

• WecannotgeneralizebeyondGrovesandhistwobikes.

• Alimitationisthatthisstudyisnotdouble-blind– Theresearcherandthesubject(whichhappenedtobethesamepersonhere)werenotblindtowhichtreatmentwasbeingused.

– Dr.Grovesknewwhichbikehewasriding,andthismighthaveaffectedhisstateofmindorhischoiceswhileriding.