stat 13, intro. to statistical methods for the life and health …frederic/13/sum17/day09.pdf ·...
TRANSCRIPT
Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.
0.Handinhw4.1.Calculatingcorrelation.2.Slopeoftheregressionline.3.Goodnessoffit.4.Commonproblemswithregression.
a.Inferringcausation.b.Extrapolation.c.Curvature.
5.Testingsignificanceofcorrelationorslope.6.ANOVAandF-test,ch9.7.Reviewandexamples.
ThefinalisThuinclassandwillbeonch 1-7,10,andatmost1questiononch9.BringaPENCILandCALCULATORandanybooksornotesyouwant.Nocomputers.http://www.stat.ucla.edu/~frederic/13/sum17.
1
1.Calculatingcorrelation,r.
ρ =rho=correlationofthepopulation.SupposethereareNpeopleinthepopulation,X=temperature,Y=heartrate,themeanandsd oftempinthepop.areµ" and𝜎" ,andthepop.meanandsd ofheartrateareµ$and𝜎$.
ρ =*+∑ "-./0
10+23*
$-./414
.
Givenasampleofsizen,weestimateρ using
r= *5.*
∑"-.060
523*
$-.464
.
ThisisinAppendixA.
2
2.Slopeofregressionline.• Suppose𝑦8 =a+bx istheregressionline.
• Theslopeboftheregressionlineisb=r6460.
Thisisusuallythethingofprimaryinteresttointerpret,asthepredictedincreaseinyforeveryunitincreaseinx.• Bewareofassumingcausationthough,esp.withobservationalstudies.Bewaryofextrapolationtoo.
• Theintercepta=$ - b " .
• TheSDoftheresidualsis 1 − 𝑟<� 𝑠$.Thisisagoodestimateofhowmuchtheregression
predictionswilltypicallybeoffby.
3.Howwelldoesthelinefit?• 𝑟< isameasureoffit.Itindicatestheamountofscatteraroundthebestfittingline.
• 1 − 𝑟<� 𝑠$isusefulasameasureofhowfaroffpredictionswouldhavebeenonaverage.• Residualplotscanindicatecurvature,outliers,orheteroskedasticity.
• Notethatregressionresidualshavemeanzero,whetherthelinefitswellorpoorly.
-2 -1 0 1 2 3 4 5
-4-2
02
46
8
x
y
-2 -1 0 1 2 3 4 5-10
-8-6
-4-2
02
x
residual
• Heteroskedasticity:whenthevariabilityinyisnotconstantasxvaries.
4.Commonproblemswithregression.• a.Correlationisnotcausation.Especiallywithobservationaldata.
Commonproblemswithregression.
Commonproblemswithregression.Holmes and Willett (2004) reviewed all prospective studies on fat consumption and breast cancer with at least 200 cases of breast cancer. "Not one study reported a significant positive association with total fat intake.... Overall, no association was observed between intake of total, saturated, monounsaturated, or polyunsaturated fat and risk for breast cancer."
They also state "The dietary fat hypothesis is largely based on the observation that national per capita fat consumption is highly correlated with breast cancer mortality rates. However, per capita fat consumption is highly correlated with economic development. Also, low parity and late age at first birth, greater body fat, and lower levels of physical activity are more prevalent in Western countries, and would be expected to confound the association with dietary fat."
Commonproblemswithregression.• b.Extrapolation.
Commonproblemswithregression.• b.Extrapolation.• Oftenresearchersextrapolatefromhighdosestolow.
Commonproblemswithregression.• b.Extrapolation.Therelationshipcanbenonlinearthough.Researchersalsooftenextrapolatefromanimalstohumans.
Zaichkina etal.(2004)onhamsters
Commonproblemswithregression.• c.Curvature.Thebestfittinglinemightfitpoorly.Portetal.(2005).
Commonproblemswithregression.• c.Curvature.Thebestfittinglinemightfitpoorly.Wongetal.(2011).
Commonproblemswithregression.• d.Statisticalsignificance.Couldtheobservedcorrelationjustbeduetochancealone?
5.InferencefortheRegressionSlope:Theory-BasedApproachSection10.5
Dostudentswhospendmoretimeinnon-academicactivitiestendtohavelowerGPAs?Example10.4
Dostudentswhospendmoretimeinnon-academicactivitiestendtohavelowerGPAs?
• Thesubjectswere34undergraduatestudentsfromtheUniversityofMinnesota.
• Theywereaskedquestionsabouthowmuchtimetheyspentinactivitieslikework,watchingTV,exercising,non-academiccomputeruse,etc.aswellaswhattheircurrentGPAwas.
• WearegoingtotesttoseeifthereisanegativeassociationbetweenthenumberofhoursperweekspentonnonacademicactivitiesandGPA.
Hypotheses• NullHypothesis:ThereisnoassociationbetweenthenumberofhoursstudentsspendonnonacademicactivitiesandstudentGPAinthepopulation.
• AlternativeHypothesis:ThereisanegativeassociationbetweenthenumberofhoursstudentsspendonnonacademicactivitiesandstudentGPAinthepopulation.
DescriptiveStatistics• GPAU = 3.60 − 0.0059(nonacademichours).• Istheslopsignificantlydifferentfrom0?
ShuffletoDevelopNullDistribution
• Wearegoingtoshufflejustaswedidwithcorrelationtodevelopanulldistribution.
• Theonlydifferenceisthatwewillbecalculatingtheslopeeachtimeandusingthatasourstatistic.
• atestofassociationbasedonslopeisequivalenttoatestofassociationbasedonacorrelationcoefficient.
Betavs Rho• Testingtheslopeoftheregressionlineisequivalenttotestingthecorrelation (samep-value,butobviouslydifferentconfidenceintervalssincethestatisticsaredifferent)
• Hencethesehypothesesareequivalent.• Ho:β =0Ha:β <0(Slope)• Ho:ρ =0Ha:ρ < 0(Correlation)
• Sampleslope(b)Population(β:beta)• Samplecorrelation(r)Population(ρ:rho)
• Whenwedothetheorybasedtest,wewillbeusingthet-statisticwhichcanbecalculatedfromeithertheslopeorcorrelation.
Introduction• Ournulldistributionsareagainbell-shapedandcenteredat0(foreithercorrelationorslopeasourstatistic).
The book on p549 finds a p value of 3.3% by simulation.
ValidityConditions• Undercertainconditions,theory-basedinferenceforcorrelationorslopeoftheregressionlineuset-distributions.
• Wecouldusesimulationsorthetheory-basedmethodsfortheslopeoftheregressionline.
• Wewouldgetthesamep-valueifweusedcorrelationasourstatistic.
PredictingHeartRatefromBodyTemperatureExample10.5A
HeartRateandBodyTemp• Earlierwelookedattherelationshipbetweenheartrateandbodytemperaturewith130healthyadults
• PredictedHeartRate = −166.3 + 2.44 Temp• r=0.257
HeartRateandBodyTemp
• Wetestedtoseeifwehadconvincingevidencethatthereisapositiveassociationbetweenheartrateandbodytemperatureinthepopulationusingasimulation-basedapproach.(Wewillmakeit2-sidedthistime.)
• NullHypothesis:Thereisnoassociationbetweenheartrateandbodytemperatureinthepopulation. β=0
• AlternativeHypothesis:Thereisanassociationbetweenheartrateandbodytemperatureinthepopulation. β≠0
HeartRateandBodyTemp
Wegetaverysmallp-value(0.0036).Anythingasextremeasourobservedslopeof2.44happeningbychanceisveryrare
HeartRateandBodyTemp
• Wecanalsoapproximatea95%confidenceintervalobservedstatistic+ multiplierx SE2.44± 1.96x 0.842=0.790to4.09
• Whatdoesthismean?We’re95%confidentthat,inthepopulationofhealthyadults,each1° increaseinbodytempisassociatedwithanincreaseinheartrateofbetween0.790to4.09beatsperminute.
HeartRateandBodyTemp
• Thetheory-basedapproachshouldworkwellsincethedistributionoftheslopeshasanicebellshape
• Alsocheckthescatterplot
HeartRateandBodyTemp
• Wewillusethet-statistictogetourtheory-basedp-value.
• Wewillfindatheory-basedconfidenceintervalfortheslope.
• Onp554,thebooknotestheformulat= cdefgheg
�.
• Herethetstatisticis2.97.• Thep-valueis0.36%.Sothecorrelationisstatisticallysignificantlygreaterthanzero.
SmokingandDrinkingExample10.5B
ValidityConditionsRememberourvalidityconditionsfortheory-basedinferenceforslopeoftheregressionequation.
1. Thescatterplotshouldfollowalineartrend.2. Thereshouldbeapproximatelythesamenumber
ofpointsaboveandbelowtheregressionline(symmetry).
3. Thevariabilityofverticalslicesofthepointsshouldbesimilar.Thisiscalledhomoskedasticity.
ValidityConditions• Let’slookatsomescatterplotsthatdonotmeettherequirements.
SmokingandDrinkingTherelationshipbetweennumberofdrinksandcigarettesperweekforarandomsampleofstudentsatHopeCollege.
Thedotat(0,0)represents524students
Aretheconditionsmet?Hardtosay.Thebooksaysno.
SmokingandDrinking• Whentheconditionsarenotmet,applyingsimulation-basedinferenceispreferabletotheory-basedt-testsandCIs.
ValidityConditions
• Whatdoyoudowhenvalidityconditionsaren’tmetfortheory-basedinference?• Usethesimulated-basedapproach.
• Anotherstrategyisto“transform”thedataonadifferentscalesoconditionsaremet.• Thelogarithmicscaleiscommon.
• Onecanalsofitadifferentcurve,notnecessarilyaline.
6.ANOVAandF-test.
Section9.2
ANOVA• ANOVAstandsforANalysis OfVAriance.• Usefulwhencomparingmorethan2means.• IfIhave2 meanstocompare,I justlookattheirdifferencetomeasurehowfaraparttheyare.
• SupposeI wantedtocomparethreemeans.IhavethemeanforgroupA,themeanforgroupB,andthemeanforgroupC.
Fteststatistic
• TheanalysisofvarianceF teststatisticis:
groupsy within variabilitgroupsbetween y variabilit
=F
• Thisissimilartothet-statisticwhenwewerecomparingjusttwomeans. 𝑡 = "̅d."̅g
kdg
hdlkd
g
hg
�
RecallingAmbiguousProseExample9.2
ComprehensionExample
(Don’t follow along in your book or look ahead on the PowerPoint until after I read you the passage.) • Studentswerereadanambiguousprosepassageunderoneofthefollowingconditions:• Studentsweregivenapicturethatcouldhelptheminterpretthepassagebefore theyheardit.
• Studentsweregiventhepictureafter theyheardthepassage.
• Studentswerenot shownanypicturebeforeorafterhearingthepassage.
• Theywerethenaskedtoevaluatetheircomprehensionofthepassageona1to7scale.
ComprehensionExample• ThisexperimentisapartialreplicationdoneatHopeCollegeofastudy
donebyBransfordandJohnson(1972).• Studentswererandomlyassignedtooneofthe3groups.• Listentothepassageandseeifitmakessense.Wouldapicturehelp?Iftheballoonspopped,thesoundwouldn’tbeabletocarrysinceeverythingwouldbetoofarawayfromthecorrectfloor.Aclosedwindowwouldalsopreventthesoundfromcarrying,sincemostbuildingstendtobewellinsulated.Sincethewholeoperationdependsonasteadyflowofelectricity,abreakinthemiddleofthewirewouldalsocauseproblems.Ofcourse,thefellowcouldshout,butthehumanvoiceisnotloudenoughtocarrythatfar.Anadditionalproblemisthatastringcouldbreakontheinstrument.Thentherecouldbenoaccompanimenttothemessage.Itisclearthatthebestsituationwouldinvolvelessdistance.Thentherewouldbefewerpotentialproblems.Withfacetofacecontact,theleastnumberofthingscouldgowrong.
Hypotheses
• Null:Inthepopulationthereisnoassociationbetweenwhetherorwhenapicturewasshownandcomprehensionofthepassage
• Alternative:Inthepopulationthereisanassociationbetweenwhetherandwhenapicturewasshownandcomprehensionofthepassage
Hypotheses
• Null: All three of the long term mean comprehension scores are the same.
µnopicture =µpicturebefore =µpictureafter
• Alternative: At least one of the mean comprehension scores is different.
RecallScore• Studentsratedtheircomprehension,andtheresearchersalsohadthestudentsrecallasmanyideasfromthepassageastheycould.Theywerethengradedonwhattheycouldrecallandtheresultsareshown.
ValidityConditions• Justaswiththesimulation-basedmethod,weareassumingwehaveindependentgroups.
• TwoextraconditionsmustbemettousetraditionalANOVA:• Normality:Ifsamplesizesaresmallwithineachgroup,datashouldn’tbeveryskewed.Ifitis,usesimulationapproach.(Samplesizesofatleast20isagoodguidelineformeans.)
• Equalvariation:Largeststandarddeviationshouldbenomorethantwicethevalueofthesmallest.
ANOVAOutput• ThisisthekindofoutputyouwouldseeinmoststatisticspackageswhendoingANOVA.
• Thevariabilitybetweenthegroupsismeasuredbythemeansquaretreatment(40.02).
• Thevariabilitywithinthegroupsismeasuredbythemeansquareerror(3.16).
• TheFstatisticis40.02/3.16=12.67.
Conclusion• Sincewehaveasmallp-valuewehavestrongevidenceagainstthenullandcanconcludeatleastoneofthelong-runmeanrecallscoresisdifferent.
Reviewlist.1.MeaningofSD. 19.Randomsamplingandrandom2.Parametersandstatistics. assignment.3.Zstatisticforproportions. 20.TwoproportionCIsandtesting.4.Simulationandmeaningofpvalues. 21.IQRand5number summaries.5.SEforproportions. 22.CIs for 2means andtesting.6.Whatinfluencespvalues. 23.Paired data.7.CLTandvalidityconditionsfortests. 24.Placeboeffect,adhererbias,8.1-sidedand2-sidedtests. andnonresponsebias.9.Rejectthenullvs.acceptthealternative. 25.Predictionandcausation.10.Samplingandbias. 26.Multipletestingandpublicationbias11.Significancelevel. 27.Regression.12.TypeI,typeIIerrors,andpower. 28.Correlation.13.CIsforaproportion. 29.Calculate&interpreta&b.14.CIsforamean. 30.Goodnessoffitforregression.15.Marginoferror. 31.Commonregressionproblems16.Practicalsignificance. (causation,extrapolation,curvature,heterskedasticity).17.Confounding. 32.ANOVAandF-test.18.Observationalstudiesandexperiments.
exampleproblems.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.
1.Whatdoesthecorrelationof0.82imply?a.82%ofthevariationinweightisexplainedbyheight.b.Thetypicalvariationinpeople'sheightsis82%aslargeasthetypicalvariationintheirweights.c.Thereisstrongassociationbetweenheightandweightinthissample.d.Foreveryinchofincreaseinone'sheight,wewouldpredicta0.82lb.increaseinweight.e.Ifapersonweighs100pounds,thenwetypicallywouldexpectthepersontobeabout82inchestall.
exampleproblems.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.
1.Whatdoesthecorrelationof0.82imply?a.82%ofthevariationinweightisexplainedbyheight.b.Thetypicalvariationinpeople'sheightsis82%aslargeasthetypicalvariationintheirweights.c.Thereisstrongassociationbetweenheightandweightinthissample.d.Foreveryinchofincreaseinone'sheight,wewouldpredicta0.82lb.increaseinweight.e.Ifapersonweighs100pounds,thenwetypicallywouldexpectthepersontobeabout82inchestall.
exampleproblems.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.
2.Whatistheestimatedslope,inlbs/inch,oftheregressionlineforpredictingweightfromheight?a.6.56.b.7.12.c.8.04.d.9.92.e.10.2.f.11.4.
exampleproblems.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.
2.Whatistheestimatedslope,inlbs/inch,oftheregressionlineforpredictingweightfromheight?a.6.56.b.7.12.c.8.04.d.9.92.e.10.2.f.11.4.
rsy/sx =.82x 40/5=6.56.
exampleproblems.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.
3.Howmuchwouldapredictionusingthisregressionlinetypicallybeoffby?a.12.7lbs. b.13.5lbs. c.14.4lbs. d.20.2lbs.e.22.9lbs.
exampleproblems.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.
3.Howmuchwouldapredictionusingthisregressionlinetypicallybeoffby?a.12.7lbs. b.13.5lbs. c.14.4lbs. d.20.2lbs.e.22.9lbs.
√(1-r2)sy =√(1-.822)x40=22.9.
exampleproblems.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.
4.Ifweweretorandomlytakeoneadultfromthissample,howmuchwouldhis/herheighttypicallydifferfrom65by?a.0.05in.b.0.1in.c.0.5in.d.1.0in.e.2.5in.f.5.0in.
exampleproblems.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.
4.Ifweweretorandomlytakeoneadultfromthissample,howmuchwouldhis/herheighttypicallydifferfrom65by?a.0.05in.b.0.1in.c.0.5in.d.1.0in.e.2.5in. f.5.0in.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,themedianheightis64.5inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.
5.Whyshouldn'tonetrustthisregressionlinetopredicttheweightofsomeonewhois25inchestall?a.Thesamplesizeisinsufficientlylarge.b.ThesampleSDofweightistoosmall.c.Thevalueof25inchesistoofaroutsidetherangeofmostobservations.d.ThecorrelationoftheANOVAisat-testconfidenceintervalwithstatisticalsignificance.e.Thedatacomefromanobservationalstudy,sotheremaybeconfoundingfactors.f.Theheightvaluesareheavilyrightskewed,sothepredictionerrorsarelarge.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,themedianheightis64.5inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.
5.Whyshouldn'tonetrustthisregressionlinetopredicttheweightofsomeonewhois25inchestall?a.Thesamplesizeisinsufficientlylarge.b.ThesampleSDofweightistoosmall.c.Thevalueof25inchesistoofaroutsidetherangeofmostobservations.d.ThecorrelationoftheANOVAisat-testconfidenceintervalwithstatisticalsignificance.e.Thedatacomefromanobservationalstudy,sotheremaybeconfoundingfactors.f.Theheightvaluesareheavilyrightskewed,sothepredictionerrorsarelarge.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,themedianheightis64.5inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.Theestimatedslopeinpredictingweightfromheightis6.56lbs/in.
6.Howshouldoneinterprettheestimatedslopeof6.56?a.Eachextrainchyougrowcausesyoutoincreaseyourweightby6.56lbs onaverage.b.Eachextralb.youweighcausesyoutogrow6.56inches.c.TheamountofweightAmericansaverageis6.56standarderrorsabovethemean.d.TheZ-scorecorrespondingtothecorrelationbetweenheightandweightis6.56.e.Foreachextrainchtalleryouare,yourpredictedweightincreasesby6.56lbs.f.Theproportionofvarianceinweightexplainedbytheregressionequationis6.56%.
Supposethatamongasampleof100adultsinagiventown,thecorrelationbetweenheight(inches)andweight(lbs.)is0.82,andthemeanheightis65inches,themedianheightis64.5inches,thesd ofheightis5inches,themeanweightis160lbs.,andthesd ofweightis40lbs.Theestimatedslopeinpredictingweightfromheightis6.56lbs/in.
6.Howshouldoneinterprettheestimatedslopeof6.56?a.Eachextrainchyougrowcausesyoutoincreaseyourweightby6.56lbs onaverage.b.Eachextralb.youweighcausesyoutogrow6.56inches.c.TheamountofweightAmericansaverageis6.56standarderrorsabovethemean.d.TheZ-scorecorrespondingtothecorrelationbetweenheightandweightis6.56.e.Foreachextrainchtalleryouare,yourpredictedweightincreasesby6.56lbs.f.Theproportionofvarianceinweightexplainedbytheregressionequationis6.56%.
Supposearesearcherstudyingacne,oroilyskin,takesasimplerandomsampleof400Americansage21-30andcallsthemgroupA,andasimplerandomsampleof200Americansage31-40andcallsthemgroupB.IngroupA,88peoplehaveacne,andingroupB,30peoplehaveacne.
7.Whatisthepooledsamplepercentagewithacne,inbothgroupscombined?a.16.1%.b.17.2%.c.18.4%.d.19.7%.e.21.2%.f.23.0%.
Supposearesearcherstudyingacne,oroilyskin,takesasimplerandomsampleof400Americansage21-30andcallsthemgroupA,andasimplerandomsampleof200Americansage31-40andcallsthemgroupB.IngroupA,88peoplehaveacne,andingroupB,30peoplehaveacne.
7.Whatisthepooledsamplepercentagewithacne,inbothgroupscombined?a.16.1%.b.17.2%.c.18.4%.d.19.7%. e.21.2%.f.23.0%.
118/600=19.7%.
Supposearesearcherstudyingacne,oroilyskin,takesasimplerandomsampleof400Americansage21-30andcallsthemgroupA,andasimplerandomsampleof200Americansage31-40andcallsthemgroupB.IngroupA,88peoplehaveacne,andingroupB,30peoplehaveacne.
8.Usingthispooledsamplepercentage,underthenullhypothesisthatthetwogroupshavethesameacnerate,whatisthestandarderrorforthedifferencebetweenthetwopercentages?a.1.42%. b.1.88%. c.2.02%. d.2.99%.e.3.08%. f.3.44%.
Supposearesearcherstudyingacne,oroilyskin,takesasimplerandomsampleof400Americansage21-30andcallsthemgroupA,andasimplerandomsampleof200Americansage31-40andcallsthemgroupB.IngroupA,88peoplehaveacne,andingroupB,30peoplehaveacne.
8.Usingthispooledsamplepercentage,underthenullhypothesisthatthetwogroupshavethesameacnerate,whatisthestandarderrorforthedifferencebetweenthetwopercentages?a.1.42%. b.1.88%. c.2.02%. d.2.99%.e.3.08%. f.3.44%.
√(.197*(1-.197)/400+.197*(1-.197)/200)=.0344.
Supposearesearcherstudyingacne,oroilyskin,takesasimplerandomsampleof400Americansage21-30andcallsthemgroupA,andasimplerandomsampleof200Americansage31-40andcallsthemgroupB.IngroupA,88peoplehaveacne,andingroupB,30peoplehaveacne.
9.WhatistheZstatisticforthedifferencebetweenthetwogrouppercentages?a.1.02. b.1.23. c.1.55. d.1.88. e.2.03. f.2.43.
Supposearesearcherstudyingacne,oroilyskin,takesasimplerandomsampleof400Americansage21-30andcallsthemgroupA,andasimplerandomsampleof200Americansage31-40andcallsthemgroupB.IngroupA,88peoplehaveacne,andingroupB,30peoplehaveacne.
9.WhatistheZstatisticforthedifferencebetweenthetwogrouppercentages?a.1.02. b.1.23. c.1.55. d.1.88. e.2.03. f.2.43.
(88/400– 30/200)÷ 0.0344=2.03.
Supposearesearcherstudyingacne,oroilyskin,takesasimplerandomsampleof400Americansage21-30andcallsthemgroupA,andasimplerandomsampleof200Americansage31-40andcallsthemgroupB.IngroupA,88peoplehaveacne,andingroupB,30peoplehaveacne.
10.Usingtheunpooled standarderrorforthedifferencebetweenthetwopercentages,finda95%confidenceintervalforthepercentagewithacneamongthoseage21-30minusthepercentageofacneamongthoseage31-40.a.7%+/- 5.02%.b.7%+/- 5.54%.c.7%+/- 5.92%.d.7%+/- 6.03%.e.7%+/- 6.41%.
Supposearesearcherstudyingacne,oroilyskin,takesasimplerandomsampleof400Americansage21-30andcallsthemgroupA,andasimplerandomsampleof200Americansage31-40andcallsthemgroupB.IngroupA,88peoplehaveacne,andingroupB,30peoplehaveacne.
10.Usingtheunpooled standarderrorforthedifferencebetweenthetwopercentages,finda95%confidenceintervalforthepercentagewithacneamongthoseage21-30minusthepercentageofacneamongthoseage31-40.a.7%+/- 5.02%.b.7%+/- 5.54%.c.7%+/- 5.92%.d.7%+/- 6.03%.e.7%+/- 6.41%.
88/400– 30/200+/- 1.96*√(88/400*(1-88/400)/400+30/200*(1-30/200)/200)=7%+/- 6.41%.
Supposearesearcherstudyingacne,oroilyskin,takesasimplerandomsampleof400Americansage21-30andcallsthemgroupA,andasimplerandomsampleof200Americansage31-40andcallsthemgroupB.IngroupA,88peoplehaveacne,andingroupB,30peoplehaveacne.
11.Whatcanweconcludefromthe95%CIintheproblemabove?a.Thereisnostatisticallysignificantdifferenceinthepercentagewithacneamongthoseage21-30andthoseage31-40.b.Thereisstatisticallysignificantcorrelationbetweenthesamplesizeandtheeffectsizefortheconfoundingfactorsofarandomizedcontrolledexperiment.c.Ofthosewithacne,thereisnostatisticallysignificantdifferencebetweenthepercentagewhoareage21-30andthepercentagewhoare31-40.d.Astatisticallysignificantlyhigherpercentageofpeopleage21-30haveacnethanthoseage31-40.e.Thesamplesizesaretoosmallforaconclusiontobevalidhere.
Supposearesearcherstudyingacne,oroilyskin,takesasimplerandomsampleof400Americansage21-30andcallsthemgroupA,andasimplerandomsampleof200Americansage31-40andcallsthemgroupB.IngroupA,88peoplehaveacne,andingroupB,30peoplehaveacne.
11.Whatcanweconcludefromthe95%CIintheproblemabove?a.Thereisnostatisticallysignificantdifferenceinthepercentagewithacneamongthoseage21-30andthoseage31-40.b.Thereisstatisticallysignificantcorrelationbetweenthesamplesizeandtheeffectsizefortheconfoundingfactorsofarandomizedcontrolledexperiment.c.Ofthosewithacne,thereisnostatisticallysignificantdifferencebetweenthepercentagewhoareage21-30andthepercentagewhoare31-40.d.Astatisticallysignificantlyhigherpercentageofpeopleage21-30haveacnethanthoseage31-40.e.Thesamplesizesaretoosmallforaconclusiontobevalidhere.
Forthefollowingtwoproblems,supposearesearcherstudiesatreatmentforacertaincondition.Theresearcherdivides230subjectsrandomlyintothreegroups.GroupAreceivesaplacebo.GroupBreceivesthetreatmentatalowdose,andgroupCreceivesthetreatmentatahighdose.ThetablebelowshowstheoutputfromanANOVAF-testonthemeanlevelsoftheconditioninthe3groups.
AnalysisofVarianceTable
Response:outcomeDf SumSq MeanSq Fvalue Pr(>F)
group 20.467 0.233 0.221 0.803Residuals 22728.5 1.06
12.Inthetable,whatnumberisameasureofthevariabilitybetweengroups?a.2.b.0.233c.0.221d.0.803e.28.5 f.1.06.
Forthefollowingtwoproblems,supposearesearcherstudiesatreatmentforacertaincondition.Theresearcherdivides230subjectsrandomlyintothreegroups.GroupAreceivesaplacebo.GroupBreceivesthetreatmentatalowdose,andgroupCreceivesthetreatmentatahighdose.ThetablebelowshowstheoutputfromanANOVAF-testonthemeanlevelsoftheconditioninthe3groups.
AnalysisofVarianceTable
Response:outcomeDf SumSq MeanSq Fvalue Pr(>F)
group 20.467 0.233 0.221 0.803Residuals 22728.5 1.06
12.Inthetable,whatnumberisameasureofthevariabilitybetweengroups?a.2.b.0.233 c.0.221d.0.803e.28.5 f.1.06.
Forthefollowingtwoproblems,supposearesearcherstudiesatreatmentforacertaincondition.Theresearcherdivides230subjectsrandomlyintothreegroups.GroupAreceivesaplacebo.GroupBreceivesthetreatmentatalowdose,andgroupCreceivesthetreatmentatahighdose.ThetablebelowshowstheoutputfromanANOVAF-testonthemeanlevelsoftheconditioninthe3groups.
AnalysisofVarianceTableResponse:outcome
Df SumSq MeanSq Fvalue Pr(>F)group 20.467 0.233 0.221 0.803Residuals 22728.5 1.06
13.WhatcanweconcludefromtheresultoftheFtest?a.Thetreatmenthasastatisticallysignificanteffect,thoughwecannotbesurewhichdoseisattributabletothissignificanteffectorinwhichdirectiontheeffectgoes.b.Thetreatmentseemstohavesignificantlygreatereffectatlargedosesthanatsmalldoses.c.Thetreatmentdoesnotseemtohaveastatisticallysignificanteffect.d.Wefailtorejectthenullhypothesisthatthetreatmenthasasignificanteffectonthecondition.e.Thecorrelationbetweendoseandoutcomeappearstobestatisticallysignificant.
Forthefollowingtwoproblems,supposearesearcherstudiesatreatmentforacertaincondition.Theresearcherdivides230subjectsrandomlyintothreegroups.GroupAreceivesaplacebo.GroupBreceivesthetreatmentatalowdose,andgroupCreceivesthetreatmentatahighdose.ThetablebelowshowstheoutputfromanANOVAF-testonthemeanlevelsoftheconditioninthe3groups.
AnalysisofVarianceTableResponse:outcome
Df SumSq MeanSq Fvalue Pr(>F)group 20.467 0.233 0.221 0.803Residuals 22728.5 1.06
13.WhatcanweconcludefromtheresultoftheFtest?a.Thetreatmenthasastatisticallysignificanteffect,thoughwecannotbesurewhichdoseisattributabletothissignificanteffectorinwhichdirectiontheeffectgoes.b.Thetreatmentseemstohavesignificantlygreatereffectatlargedosesthanatsmalldoses.c.Thetreatmentdoesnotseemtohaveastatisticallysignificanteffect.d.Wefailtorejectthenullhypothesisthatthetreatmenthasasignificanteffectonthecondition.e.Thecorrelationbetweendoseandoutcomeappearstobestatisticallysignificant.