sample size in health sciences - basics and selected examples
TRANSCRIPT
Sample size estimation: Basics & selected examples Dr. S. A. Rizwan, M.D.
Public Health SpecialistSBCM, Joint Program – Riyadh
Ministry of Health, Kingdom of Saudi Arabia
Learningobjectives
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Importanceofsamplesizeestimation• Basicconceptsinsamplesizecalculation• Howdoessamplesizerelatetostudyresults• Samplesizecalculationinspecificsituations
2
Booksandsoftware
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Books• Samplesizedeterminationinhealthstudies- apracticalmanual(Lwanga &Lemeshow)
• SampleSizeCalculationsinClinicalResearch(Shein-ChungChow,HanshengWang,JunShao)
• Software• Epitools,onlinecalculators,Statcal inEpi Info,Gpower• PASS,nmaster,Statsdirect,Stata• Manyothers
3
Obligatoryopeningjoke!
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 4
RethinkingAesop’sfables…
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 5
Let’splayagame!
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 6
Samplesize:Basicconcepts
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 7
Prerequisitesforthisclass
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Understandingofthefollowingbasicconcepts• Typesofstudydesigns• Measuresofassociation• Mean/SD• Proportion• Standarderror• Hypothesistestingandtypes• Confidenceintervals
8
Somerelatedterms
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Significancelevel• Power• Effectsize• Variability• Precision
Con.level Zα95% 1.96(2sided)95% 1.64(1sided)99% 2.57(2sided)99% 2.32(1sided)Power Zβ90% 1.28285% 1.03780% 0.84275% 0.67570% 0.524
9
Samplesize&statisticalinference
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Twomethodsofstatisticalinference• Hypothesistesting• Confidenceintervalestimation
10
Twoaspectsofagoodsample
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Thesamplesize• Ifadequate,thengoodinternalvalidity
• Thesamplingmethod• Ifrepresentative,thengoodexternalvalidity
11
Whycalculatesamplesize?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Statingtheassumptionsandparametersbeforestartofthestudyincreasesthevalidityofstatisticalconclusionsmadeafterthestudy
• Post-hocanalysisandresultsareconsideredmerelyexploratory
12
Thoughtexercise
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• IamapplyingforajobandintheresumeIhavestatedthatmytypingspeedisveryfast.
• Myfriendisapplyingforthesamejobandinhisresumehestatedthathistypingspeedwas60words/min.
Whichcandidateareyoumorelikelytoassessinavalidmanner?
13
Whycalculatesamplesize?(contd.)
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Fundsandtimeconstraints
• Reallynotnecessarytostudytheentirepopulation(ethicalproblem!)
• Smallsamplesunabletodetectclinicallyrelevantdifferences
• Ifastudywithsmallsamplefindsnon-significantresults– whatdoesitmean?
14
Thoughtexercise
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Study 1: A study was conducted for an anti-hypertensive drug on 10,000 people whichshowed a statistically significant fall in BP of1mm Hg over 3 months
• Study 2: It was found that there was 30%reduction in mortality due to propranolol amongMI patients. But that was not significant. 66cases and 64 controls were studied
Stateyourcommentoneachoftheabovescenario.
15
Thoughtexercise
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 16
Then,howlargeshouldSSbe?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Neither too small nor too large
17
Samplesizeestimatedforprimaryobjective
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Samplesizeiscalculatedfor‘primaryoutcomevariable’
• Ifthereare>1primaryoutcomessamplesizecalculatedforeachoutcomeandlargestchosen
18
Commonscenariosforsamplesize
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• 100sofscenariosforcalculationsamplesize
• Descriptive:• Proportion,mean/SD
• Analytical:• Twoproportions,2means/SD• Also,riskdiff,OR&RR,incidencedensity
19
Uncommonscenariosforsamplesize
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Survival• Regression• Correlation• Qualityassurance• Diagnosticteststudies• Andmanymore
20
Furtherconsiderationsforsamplesize
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Studydesign• Clusterdesign• Crossover• Matched/paired• Typeofhypothesis(inequality,equivalence,non-inferiority&superiority)
• Fixedfollowupduration• Ratioofcontrolstocases
• HypothesistestingorCIestimation?
21
HowtoapproachaSSproblem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
1. Converttheresearchquestionintoastatisticalproblemstatement2. Determineformulaorsoftwarecommand&determineinputsneeded3. Selectthesourcesfortheinputs4. Substitutethevaluesintheformulaorenterinthesoftware5. Factorinnon-response/drop-outrate
22
HowtoapproachaSSproblem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• First:Converttheresearchquestionintoastatisticalproblemstatement
• Foreg.,• ToestimatethemeanbirthweightofneonatesborntomotherswithanaemiaintheeasternsectorofRiyadh
• Estimationofasinglemeanwithstatedprecision
23
HowtoapproachaSSproblem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Second:Findouttheformulaorthesoftwarecommandappropriateforthisproblem
• Foreg.,• Estimationofasinglemeanwithstatedprecision
N=(Zα2 *S2)/L2
24
HowtoapproachaSSproblem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Second:anddeterminetheingredientsyourequiretoinputintheformula• Exp.proportion,incidence• Exp.SD• Exp.RRorOR• Power,precision• Confidencelevel• Others(DE,ICC,COV,clustersize)
• Foreg.,• EstimateofSD,alfa &precision
25
HowtoapproachaSSproblem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Third:Selectingthesourcesfortheinputs
• Matchthelocationascloseaspossible• Matchthestudypopulationascloseaspossible• Matchthestudysettingascloseaspossible• Matchthestatisticascloseaspossible• Orconductapilotstudy
• Foreg.,• OthersectorinRiyadh->someothercityinKSA->Middleeast->anydevelopingcountry->anywhere
26
HowtoapproachaSSproblem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Third(contd.):Whatsourcestouse?
• Fromwhere• PublishedLiterature• Pilotstudy• Expertsinthefield• Educatedguess(gutfeeling)
Itbegsthequestionthatifwealreadyknowtheseinputsthenwhyconductthestudyinthefirstplace!
27
HowtoapproachaSSproblem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Third(Contd.):eg., anappropriatesource
28
HowtoapproachaSSproblem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Fourth:Substitutethevaluesintheformulaorenterinthesoftware
N=(Zα2 *S2)/L2
N=(1.96*1.96*600*600)/100*100N=138.2N=Roundedto140
29
HowtoapproachaSSproblem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Fifth:Factorinnon-response/drop-outrate
• Finalsamplesize= !"#$%&()*&+,$&-.&/0&($12(&0".&
• Foreg.,• Foranon-responserateof20%• Finalsamplesize=140/0.80=175
30
Samplesize:Someselectedscenarios
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 31
Samplesizeinspecificsituations
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Authors Originalresearchquestion Simplifiedproblemstatement
1.Dr. Nariman Whatistheproportion ofpatientswhoquitsmoking inatobaccocessationprogram?
Estimationofasingleproportionforaspecialgroup
2.Dr. GhadeerWhatistheincidenceofDMinobesehypertensivesandwhatistheincidenceofDMinnon-obesehypertensiveduringafiveyearfollow-upperiod?
Comparisonofincidenceratesintwogroupsinacohortstudy
3.Dr. Rahma
Whatistheproportion ofLBWneonatesborn tosicklecellmothersandwhatistheproportion ofLBWneonatesborn tonormalmothers inacohortofmothers?
Comparisonoftwoproportionsinacohortstudy
4.Dr. Abrar
Whatistheproportion of ILIabsentstudentsinthehandwashingschoolsandwhatistheproportion ofILIabsentstudents inthecontrolschools?Hereschoolsaretheunitsofrandomisation
Comparisonoftwoproportionsina2groupclusterRCT
32
Scenario1:Estimatingasingleproportion
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 33
Scenario1– Step1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Whatistheproportionofpatientswhoquitsmokinginatobaccocessationprogram?
• Specifically,whatistheproportionofpatientswithDMandHTNwhoquitsmokinginatobaccocessationprogram?
• Itisacross-sectionalstudybasedonsecondarydataanalysis
• Estimatingasingleproportion
34
Scenario1– Step2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• SSformulaforestimationadifferencebetweentwoproportionsincohortstudy• Inputsrequiredareexpectedproportionofquitting,precision&confidencelevel
35
Scenario1– Step3
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Athoroughliteraturereviewandpreliminarydataanalysisshowedawidevariationintheexpectedproportion– from10%to50%
36
Scenario1– Step4
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Substitutingthevaluesforanumberofscenariosinthesoftware
37
Scenario1– Step5
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Theconceptofdropoutsorlosstofollow-upinnotapplicable inthiscasebecauseitissecondarydataanalysis
• Sothesamplesizeshouldbe>400andbutneednotbe>3500
• Finaldecisionwilldependonfeasibility
38
Scenario2:Comparisonofincidenceratesinatwogroupcohortstudy
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 39
Scenario2– Step1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Hypothesis:theriskofdeveloping(incidence)DMwillbehigherinobesehypertensivepatientsascomparedtonon-obesehypertensivepatientsduringa5yearfollow-upperiod
• Itisacohortstudywithtwogroups• Exposedisobesehypertensive• Non-exposedisnon-obesehypertensive• OutcomeisincidenceofDM
• Estimatingadifferencebetweentwoincidenceratesinacohortstudy
40
Scenario2– Step1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Thisproblemcanbevisualisedinanumberofways:
1. Comparingtwoincidenceratesinacohortstudy(RelativeRisk– hypothesistest)
2. Comparingtwoincidence ratesinacohortstudy(RelativeRisk– statedprecision)
3. Comparingtwoincidence ratesinacohortstudywithsmallproportionandfixedstudyduration(Riskdifference– hypothesistest)
4. Comparingtwoproportions(Riskdifference–hypothesistest)
41
Scenario2– Step2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method1:SSformulaforestimatingRRwithstatedprecision• Inputsrequiredareexpected proportionofdiseaseamongexposed&unexposed,RR,Precision,
confidence level
42
Scenario2– Step2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method2:SSformulaforhypothesistestingofRR• Inputsrequiredareexpected proportionofdiseaseamongexposed&unexposed,RR,power,
confidence level
43
Scenario2 – Step2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• SSformulafordifferenceintwoproportions(akariskdifference)canalsobeusedforthisscenario
Riskdifferencebetween2proportions Riskdifferencebetween2incidence rateswith fixedstudyduration
44
Scenario2 – Step2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 45
Scenario2– Step3
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• AcasualliteraturereviewshowedthattheriskofDMwas5timesamongobeseHTNascomparedtonon-obeseHTN,theincidenceamongnon-obesewas5.4andamongobesewas24.2per1000personyears
46
Scenario2– Step4
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method1&2:Substitutingthevaluesforanumberofscenariosinthesoftware
47
Scenario2– Step5
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Consideringalosstofollow-upof10%
• Finalsamplesize=716/0.90=795pergroup
48
Scenario3:Comparisonofproportionsinatwogroupcohortstudy
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 49
Scenario3– Step1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Hypothesis:theproportionofLBWneonateswillbehigherinthesicklecellmothersascomparedtothenon-sicklecellmother
• Itisacohortwithtwogroups• Exposedismotherswithsicklecelldisease• Non-exposedisnormalmothers• OutcomeisproportionofLBW
• Estimatingadifferencebetweentwoproportionsinacohortstudy
50
Scenario3– Step1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Thisproblemcanbevisualisedinanumberofways:
1. Comparingtwoincidence ratesinacohortstudy(RelativeRisk– hypothesistest)
2. Comparingtwoincidence ratesinacohortstudy(RelativeRisk– statedprecision)
3. Comparingtwoincidence ratesinacohortstudywithsmallproportionandfixedstudyduration(Riskdifference– hypothesistest)
4. Comparingtwoproportions(Riskdifference–hypothesistest)
51
Scenario3– Step2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method1:SSformulaforestimatingriskdifference(hypothesistest)• Inputsrequiredareexpectedproportionofdiseaseamongexposed&unexposed,power,confidence level
Differencebetween2proportions
52
Scenario3– Step3
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• AliteraturereviewshowedthatproportionofLBWamongSCDmotherswas16.5%andinthenormalmothersitwas8.3%,withanRRof~2
53
Scenario3– Step4
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Substitutingthevaluesforanumberofscenariosinthesoftware
54
Scenario3– Step5
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Consideringalosstofollow-upof10%
• Finalsamplesize=331/0.90=368pergroup
55
Scenario4:ComparisonofproportionsintwogroupclusterRCT
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh 56
Scenario4– Step1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Hypothesis:theproportionofstudentsbeingabsentduetoILIwillbehigherinthecontrolschools ascomparedtotheschoolsimplementingthehandwashing programduringafollowupperiodof6weeks
• ItisaclusterRCTwithtwogroups• Exposedishandwashingprogram• Non-exposedisnohandwashingprogram• OutcomeisproportionofILIabsenteeism• Schoolistheunitofrandomisation
• EstimatingadifferencebetweentwoproportionsinaclusterRCT
57
Scenario4– Step1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Thisproblemcanbevisualisedinanumberofways:
1. Comparingtwoproportions(Riskdifference–hypothesistestusingICC)
2. Comparingtwoproportions(Riskdifference–hypothesistestusingDesignEffect)
3. Comparingtwoproportions(Riskdifference–hypothesistestusingCoefficientofvariation)
58
Scenario4– Step2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method1:SSformulaforcomparisonofproportionsusingdesigneffect• Inputsrequiredareproportionofoutcomeintheexp.group&controlgroup,sizeofcluster,DE,power,
confidence level
59
Scenario4– Step2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method2:SSformulaforcomparisonofproportionsusingintraclustercorrelationcoefficient• Inputsrequiredareproportionofoutcomeintheexp.group&controlgroup,sizeofcluster,ICC,
power,confidencelevel
60
Scenario4– Step3
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• AliteraturereviewshowedthatincidenceofILIabsenteeismwas0.043intheexp.groupand0.070inthecontrolgroup
61
Scenario4– Step4
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Substitutingthevaluesforanumberofscenariosinthesoftware
62
Scenario4
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method3:SSformulaforcomparisonofincidencerates(persontime)• Inputsrequiredareincidence rates(PT)intheexp.group&controlgroup,coeff.ofvariation,power,
confidence level
63
Scenario4 – Step5
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Consideringalosstofollow-upof10%
• Finalsamplesize=1625/0.90=1805pergroup
• No.ofclustersrequired=1805/40=45pergroup
64
Review
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Whyissamplesizecalculationimportant?
• WhatarethefivestepstocalculatetheSS?
• Whatarethesomeofthecommoninputs requiredforsamplesizeformulae?
• HowwillyouselectanappropriatesourcefortheinputsofSSformula?
• HowwillyourelatetheSSofyourstudyaftertheresults?
65
Takehomemessages
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Apriorisamplesizecalculationisverycrucialformakingvalidconclusions
• Followthestepwiseapproach
• Samplesizeestimationdoesnotneedtobeveryaccurate,onlyadequate
• Incaseofnon-significantfindingsinastudy,calculatepowerfordeeperunderstanding
66