anova test assumptions

3/13/2015 PROPHETStatGuide:DoyourdataviolateonewayANOVAassumptions?

http://www.basic.northwestern.edu/statguidefiles/oneway_anova_ass_viol.html 1/5

PROPHETStatGuide:DoyourdataviolateonewayANOVAassumptions?

Ifthepopulationsfromwhichdatatobeanalyzedbyaonewayanalysisofvariance(ANOVA)weresampledviolateoneormoreoftheonewayANOVAtestassumptions,theresultsoftheanalysismaybeincorrectormisleading.Forexample,iftheassumptionofindependenceisviolated,thentheonewayANOVAissimplynotappropriate,althoughanothertest(perhapsablockedonewayANOVA)maybeappropriate.Iftheassumptionofnormalityisviolated,oroutliersarepresent,thentheonewayANOVAmaynotbethemostpowerfultestavailable,andthiscouldmeanthedifferencebetweendetectingatruedifferenceamongthepopulationmeansornot.Anonparametrictestoremployingatransformationmayresultinamorepowerfultest.Apotentiallymoredamagingassumptionviolationoccurswhenthepopulationvariancesareunequal,especiallyifthesamplesizesarenotapproximatelyequal(unbalanced).Often,theeffectofanassumptionviolationontheonewayANOVAresultdependsontheextentoftheviolation(suchashowunequalthepopulationvariancesare,orhowheavytailedoneoranotherpopulationdistributionis).Somesmallviolationsmayhavelittlepracticaleffectontheanalysis,whileotherviolationsmayrendertheonewayANOVAresultuselesslyincorrectoruninterpretable.Inparticular,smallorunbalancedsamplesizescanincreasevulnerabilitytoassumptionviolations.

Potentialassumptionviolationsinclude:

Implicitfactors:lackofindependencewithinasampleLackofindependence:lackofindependencebetweensamplesOutliers:apparentnonnormalitybyafewdatapointsNonnormality:nonnormalityofentiresamplesUnequalpopulationvariancesPatternsinplotsofdata:detectingviolationassumptionsgraphicallySpecialproblemswithsmallsamplesizesSpecialproblemswithunbalancedsamplesizesMultiplecomparisons:effectsofassumptionviolationsonmultiplecomparisontests

Implicitfactors:Alackofindependencewithinasampleisoftencausedbytheexistenceofanimplicitfactorinthedata.Forexample,valuescollectedovertimemaybeseriallycorrelated(heretimeistheimplicitfactor).Ifthedataareinaparticularorder,considerthepossibilityofdependence.(Iftheroworderofthedatareflecttheorderinwhichthedatawerecollected,anindexplotofthedata[datavalueplottedagainstrownumber]canrevealpatternsintheplotthatcouldsuggestpossibletimeeffects.)

Lackofindependence:Whetherthesamplesareindependentofeachotherisgenerallydeterminedbythestructureoftheexperimentfromwhichtheyarise.Obviouslycorrelatedsamples,suchasasetofobservationsovertimeonthesamesubjects,arenotindependent,andsuchdatawouldbemoreappropriatelytestedbyaonewayblockedANOVAorarepeatedmeasuresANOVA.Ifyouareunsurewhetheryoursamplesareindependent,youmaywishtoconsultastatisticianorsomeonewhoisknowledgeableaboutthedatacollectionschemeyouareusing.



Outliers:Valuesmaynotbeidenticallydistributedbecauseofthepresenceofoutliers.Outliersareanomalousvaluesinthedata.Outlierstendtoincreasetheestimateofsamplevariance,thusdecreasingthecalculatedFstatisticfortheANOVAandloweringthechanceofrejectingthenullhypothesis.Theymaybeduetorecordingerrors,whichmaybecorrectable,ortheymaybeduetothesamplenotbeingentirelyfromthesamepopulation.Apparentoutliersmayalsobeduetothevaluesbeingfromthesame,butnonnormal,population.Theboxplotandnormalprobabilityplot(normalQQplot)maysuggestthepresenceofoutliersinthedata.

TheFstatisticisbasedonthesamplemeansandthesamplevariances,eachofwhichissensitivetooutliers.(Inotherwords,neitherthesamplemeannorthesamplevarianceisresistanttooutliers,andthus,neitheristheFstatistic.)Inparticular,alargeoutliercaninflatetheoverallvariance,decreasingtheFstatisticandthusperhapseliminatingasignificantdifference.Anonparametrictestmaybeamorepowerfultestinsuchasituation.Ifyoufindoutliersinyourdatathatarenotduetocorrectableerrors,youmaywishtoconsultastatisticianastohowtoproceed.

Nonnormality:Thevaluesinasamplemayindeedbefromthesamepopulation,butnotfromanormalone.Signsofnonnormalityareskewness(lackofsymmetry)orlighttailednessorheavytailedness.Theboxplot,histogram,andnormalprobabilityplot(normalQQplot),alongwiththenormalitytest,canprovideinformationonthenormalityofthepopulationdistribution.However,ifthereareonlyasmallnumberofdatapoints,nonnormalitycanbehardtodetect.Ifthereareagreatmanydatapoints,thenormalitytestmaydetectstatisticallysignificantbuttrivialdeparturesfromnormalitythatwillhavenorealeffectontheFstatistic.

Fordatasampledfromanormaldistribution,normalprobabilityplotsshouldapproximatestraightlines,andboxplotsshouldbesymmetric(medianandmeantogether,inthemiddleofthebox)withnooutliers.

TheonewayANOVA'sFtestwillnotbemuchaffectedevenifthepopulationdistributionsareskewed,buttheFtestcanbesensitivetopopulationskewnessifthesamplesizesareseriouslyunbalanced.Ifthesamplesizesarenotunbalanced,theFtestwillnotbeseriouslyaffectedbylighttailednessorheavytailedness,unlessthesamplesizesaresmall(lessthan5),orthedeparturefromnormalityisextreme(kurtosislessthan1orgreaterthan2).

Robuststatisticaltestsoperatewellacrossawidevarietyofdistributions.Atestcanberobustforvalidity,meaningthatitprovidesPvaluesclosetothetrueonesinthepresenceof(slight)departuresfromitsassumptions.Itmayalsoberobustforefficiency,meaningthatitmaintainsitsstatisticalpower(theprobabilitythatatrueviolationofthenullhypothesiswillbedetectedbythetest)inthepresenceofthosedepartures.TheonewayANOVA'sFtestisrobustforvalidityagainstnonnormality,butitmaynotbethemostpowerfultestavailableforagivennonnormaldistribution,althoughitisthemostpowerfultestavailablewhenitstestassumptionsaremet.Inthecaseofnonnormality,anonparametrictestoremployingatransformationmayresultinamorepowerfultest.

Unequalpopulationvariances:Theinequalityofthepopulationvariancescanbeassessedbyexaminationoftherelativesizeofthesamplevariances,eitherinformally(includinggraphically),orbyarobustvariancetestsuchasLevene'stest.(Bartlett'stestisevenmoresensitivetononnormalitythantheonewayANOVA'sFtest,andthusshouldnotbeusedforsuchtesting.)Theeffectofinequalityofvariancesismitigatedwhenthesamplesizesareequal:TheFtestisfairlyrobustagainstinequalityofvariancesifthe



samplesizesareequal,althoughthechanceincreasesofincorrectlyreportingasignificantdifferenceinthemeanswhennoneexists.Thischanceofincorrectlyrejectingthenullhypothesisisgreaterwhenthepopulationvariancesareverydifferentfromeachother,particularlyifthereisonesamplevarianceverymuchlargerthantheothers.

Theeffectofinequalityofthevariancesismostseverewhenthesamplesizesareunequal.Ifthelargersamplesareassociatedwiththepopulationswiththelargervariances,thentheFstatisticwilltendtobesmallerthanitshouldbe,reducingthechancethatthetestwillcorrectlyidentifyasignificantdifferencebetweenthemeans(i.e.,makingthetestconservative).Ontheotherhand,ifthesmallersamplesareassociatedwiththepopulationswiththelargervariances,thentheFstatisticwilltendtobegreaterthanitshouldbe,increasingtheriskofincorrectlyreportingasignificantdifferenceinthemeanswhennoneexists.Thischanceofincorrectlyrejectingthenullhypothesisinthecaseofunbalancedsamplesizescanbesubstantialevenwhenthepopulationvariancesarenotverydifferentfromeachother.

Althoughtheeffectofunbalancedsamplesizesandunequalpopulationvariancesincreasesforsmallersamplesizes,itdoesnotdecreasesubstantiallyifthesamplesizesareincreasedwithoutchangingthelackofbalanceinthesamplesizes.Forthisreason,andbecauseequalsamplesizesmitigatetheeffectofunequalpopulationvariances,thebestcourseistokeepthesamplesizesasequalaspossible.

Ifbothnonnormalityandunequalvariancesarepresent,employingatransformationmaybepreferable.AnonparametrictestliketheKruskalWallisteststillassumesthatthepopulationvariancesarecomparable.

Patternsinplotofdata:Theplotofeachsample'svaluesagainstitsmean(oritssampleID)willconsistofvertical"stacks"ofdatapoints,onestackforeachuniquesamplemeanvalue.Iftheassumptionsforthesamples'populationdistributionsarecorrect,thestacksshouldbeaboutthesamelength.Outliersmayappearasanomalouspointsinthegraph.

Afanpatternliketheprofileofamegaphone,withanoticeableflareeithertotherightortotheleftasshowninthepicture(oneormoreofthe"stacks"ofdatapointsismuchlongerthantheothers),suggeststhatthevarianceinthevaluesincreasesinthedirectionthefanpatternwidens(usuallyasthesamplemeanincreases),andthisinturnsuggeststhatatransformationmaybeneeded.

Sidebysideboxplotsofthesamplescanalsoreveallackofhomogeneityofvariancesifsomeboxplotsaremuchlongerthanothers,andrevealsuspectedoutliers.



Specialproblemswithsmallsamplesizes:Ifoneormorethesamplesizesissmall,itmaybedifficulttodetectassumptionviolations.Withsmallsamples,violationassumptionssuchasnonnormalityorinequalityofvariancesaredifficulttodetectevenwhentheyarepresent.Also,withsmallsamplesize(s)theonewayANOVA'sFtestofferslessprotectionagainstviolationofassumptions.

Evenifnoneofthetestassumptionsareviolated,aonewayANOVAwithsmallsamplesizesmaynothavesufficientpowertodetectanysignificantdifferenceamongthesamples,evenifthemeansareinfactdifferent.Thepowerdependsontheerrorvariance,theselectedsignificance(alpha)levelofthetest,andthesamplesize.Powerdecreasesasthevarianceincreases,decreasesasthesignificancelevelisdecreased(i.e.,asthetestismademorestringent),andincreasesasthesamplesizeincreases.Withverysmallsamples,evensamplesfrompopulationswithverydifferentmeansmaynotproduceasignificantonewayANOVAFteststatisticunlessthesamplevarianceissmall.IfastatisticalsignificancetestwithsmallsamplesizesproducesasurprisinglynonsignificantPvalue,thenalackofpowermaybethereason.Thebesttimetoavoidsuchproblemsisinthedesignstageofanexperiment,whenappropriateminimumsamplesizescanbedetermined,perhapsinconsultationwithastatistician,beforedatacollectionbegins.

Specialproblemswithunbalancedsamplesizes:TheonewayANOVAtestisnottoosensitivetoinequalityofvariancesifthesamplesizesareequal.Ifthesamplesizesarenotapproximatelyequal,andespeciallyifthelargersamplevariancesareassociatedwiththesmallersamplesizes,thenthecalculatedFstatisticmaybedominatedbythesamplevariancesforthelargersamples,sothatthetestislesslikelytocorrectlyidentifysignificantdifferencesinthemeansifthelargersamplesareassociatedwiththelargerpopulationvariances,andmorelikelytoreportnonexistentdifferencesinthemeansifthesmallersamplesareassociatedwiththelargerpopulationvariances.Unbalancedsamplesizesalsoincreaseanyeffectduetononnormality,andrequireadjustmentstobemadeincalculatingmultiplecomparisonstests.

Multiplecomparisons:Ingeneral,themultiplecomparisonstestswillberobustinthosesituationswhentheonewayANOVA'sFtestisrobust,andwillbesubjecttothesamepotentialproblemswithunequalvariances,particularlywhenthesamplesizesareunequal.AswiththeonewayANOVAitself,thebestprotectionagainsttheeffectsofpossibleassumptionviolationsistoemployequalsamplesizes.Unequalvariancesmaymakeindividualcomparisonsofmeansinaccurate,becausethemultiplecomparisontechniquesrelyonapooledestimateforthevariance,basedontheassumptionthatthesamplevariancesareequal.

Ideally,thesamplesizeswillbeequalforallpairwisemultiplecomparisontests.Whentheyarenot,anadjustmentmustbemadetothecalculations.TheTukeyKrameradjustment(basedonthe



harmonicmeanofeachpair'ssamplesizes),whichProphetuses,maybeconservative(thatis,itmaybelesslikelytoflagmeansasdifferentthanthenominalsignificancelevelwouldsuggest),butingeneralperformswell.Analternativeprocedureistousetheharmonicmeanofallthesamplesizesforallthepairwisecomparisons.ThishasthedisadvantagethattheactualsignificancelevelofthetestismoreoftendifferentfromthenominalsignificancelevelthanisthecasewiththeTukeyKrameradjustmentworse,theactualsignificancelevelofthetestmaybegreaterthanthenominalsignificancelevel,meaningthatthetestismorelikelytoincorrectlyflagameandifferenceassignificant.

Examinetheglossary.

DoakeywordsearchofPROPHETStatGuide.

BacktoStatGuideonewayANOVApage.

BacktoStatGuidehomepage.

Lastmodified:March17,1997

1996BBNCorporationAllrightsreserved.

anova test assumptions

Documents