math stats text

577
To HARRIET To my GRANDCHILDREN To JOAN, LISA, and KARIN A.M.M. F.A.G. D .. C.B. Library of Congress Cataloging in Publication Data Mood, Alexander McFar1ane, 1913- Introduction to the theory of statistics. (McGraw-Hi1l series in probability and statistics) Bibliography: p. 1. Mathematical statistics. I. Graybill. Frank1in A., joint author. II . .Boes, Duane C., joint author. III. Title. QA276.M67 1974 519.5 73-292 ISBN 0-{)7-042864-6 INTRODUCTION TO THE THEORY OF STATISTICS Copyright © 1963. t 974 by McGraw-Hill, Inc. All rights reserved. Copyright 1950 by McGraw-Hili, Inc. All rights reserved. Printed in the United States of America. No part of this pubJication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic. mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. 6789 10 KPKP 7832109 This book was set in Times Roman. The editors were Brete C. Harrison and Madelaine Eichberg; the cover was designed by Nicholas Krenitsky; and the production supervisor was Ted Agrillo. The drawings were done by Oxford Il1ustrators Limited. The printer and binder was Kinsport Press, Inc.

Upload: jerry-jake

Post on 21-Nov-2014

122 views

Category:

Documents


3 download

TRANSCRIPT

ToHARRIET To myGRANDCHILDREN ToJOAN,LISA,andKARIN A.M.M. F.A.G. D .. C.B. Library of Congress Cataloging in Publication Data Mood, Alexander McFar1ane,1913-Introduction to the theory of statistics. (McGraw-Hi1l series in probability and statistics) Bibliography: p. 1.Mathematical statistics. I.Graybill. Frank1in A., joint author.II. .Boes, Duane C., joint author. III. Title. QA276.M671974519.573-292 ISBN0-{)7-042864-6 INTRODUCTION TOTHETHEORY OFSTATISTICS Copyright 1963.t 974by McGraw-Hill, Inc.Allrights reserved. Copyright1950by McGraw-Hili, Inc.Allrights reserved. Printed in the United States of America. No part of this pubJication may be reproduced, stored in aretrieval system, or transmitted, in any form or by any means, electronic. mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. 6789 10KPKP7832109 This book was set in Times Roman. The editors were Brete C.Harrison and Madelaine Eichberg; the cover wasdesigned by NicholasKrenitsky; and the production supervisor wasTed Agrillo. The drawings were done by Oxford Il1ustrators Limited. The printer and binder wasKinsport Press, Inc. CONTENTS Prefaceto the Third Edition ... Xlll Excerptsfromthe First and Second Edition Prefacesxv IProbability1 1Introduction and Summary1 2Kindsof Probability2 2.1Introduction2 2.2Classicalor aPrioriProbability3 2.3APosteriorior FrequencyProbability5 3Probability-Axiomatic8 3.1ProbabilityModels8 3.2AnAside-Set Theory9 3.3Definitionsof SampleSpaceand Event14 3.4 Definition of Probability 19 3.5Finite Sample Spaces25 3.6ConditionalProbability and Independence32 viCONTENTS IIRandom Variables,Distribution Functions,and Expectation51 1Introduction and Summary51 2Random Variable andCumulativeDistributionFunction52 2.1Introduction52 2.2Definitions53 3Density Functions57 3.1DiscreteRandom Variables57 3.2ContinuousRandom Variables60 3.3Other Random Variables62 4ExpectationsandMoments64 4.1Mean64 4.2Variance67 .4.3Expected Value of aFunction of aRandomVariable69 4.4Inequali ty71 4.5JensenInequality72 4.6Momentsand MomentGeneratingFunctions72 illSpecialParametric Familiesof Univariate Distributions85 1IntroductionandSummary85 2Discrete Distributions86 2.1DiscreteUniformDistribution86 2.2Bernoulliand BinomialDistributions87 2.3Hypergeometric Distribution91 2.4PoissonDistribution93 2.5Geometric and NegativeBinomialDistributions99 2.6Other Discrete Distributions103 3ContinuousDistributions105 ).(Uniform or RectangularDistribution105 #. (3.2) NormalDistribution '--107 3.3 Exponentialand Gamma Distributions 111 3.4 Beta Distribution 115 3.5 Other ContinuousDistributions 116 4 Comments119 4.1 Approximations 119 4.2 PoissonandExponentialRelationship121 4.3 ContagiousDistributionsandTruncatedDistributions122 CONTENTSvii IVJoint and ConditionalDistributions,Stochastic Independence,More Expectation129 1Introduction and Summary129 2JointDistribution Functions130 2.1CumulativeDistribution Function130 2.2JointDensityFunctions forDiscrete Random Variables133 2.3JointDensityFunctions forContinuousRandom Variables138 3ConditionalDistributionsandStochastic Independence143 3.1ConditionalDistributionFunctions forDiscreteRandom Variables143 3.2ConditionalDistribution Functions forContinuousRandom Variables146 3.3More on ConditionalDistributionFunctions148 3.4Independence150 4Expectation153 4.1Definition153 4.2Covariance and Correlation Coefficient155 4.3ConditionalExpectations157 4.4JointMomentGeneratingFunctionandMoments159 4.5Independenceand Expectation160 4.6Cauchy-SchwarzInequality162 5BivariateNormalDistribution162 5.1DensityFunction162 5.2MomentGeneratingFunction and Moments164 5.3- Marginaland ConditionalDensities167 VDistributions of Functionsof Random Variables175 1Introduction and Summary175 2Expectationsof Functions of RandomVariables176 2.1Expectation TwoWays176 2.2Sumsof Random Variables178 2.3Product andQuotient180 3 Cumulative-distribution-function Technique181 3.1Description of Technique181 3.2Distribution of Minimum andMaximum182 3.3Distribution of SumandDifferenceof TwoRandom Variables185 3.4Distribution of Productand Quotient187 viiiCONTENTS 4Moment-generating-function Technique 189 4.1Description of Technique 189 4.2Distribution of Sums of IndependentRandomVariables192 5The TransformationY =g(X} 198 5.1Distribution of Yg(X)198 5.2ProbabilityIntegralTransform202 6Transformations203 6.1Discrete RandomVariables 203 6.2Continuous Random Variables204 VISampling and SamplingDistributions219 1Introduction and Summary219 2Sampling220 2.1Inductive Inference220 2.2PopulationsandSamples222 2.3Distribution of Sample224 2.4Statistic and SampleMoments226 3SampleMean230 ~ M e a nand Variance 231 3.2Lawof LargeNumbers231 .3Central-limit Theorem233 3.4Bernoulliand PoissonDistributions236 3.5ExponentialDistribution237 3.6Uniform Distribution238 3.7Cauchy Distribution238 4Sampling fromtheNormalDistributions239 4.1Roleof theNormalDistribution inStatistics239 4.2SampleMean 240 4.3The Chi-square Distribution 241 4.4 The FDistribution 246 4.5 Student'stDistribution 249 5Order Statistics ~5.1 DefinitionandDistributions 5.2Distribution of Functions of Order Statistics 254 5.3Asymptotic Distributions256 5.4SampleCumulativeDistribution Function264 VIIParametricPoint Estimation 1Introduction andSummary 2Methodsof Finding Estimators 2.1Methodsof Moments 2.2Maximum Likelihood 2.3OtherMethods 3Properties of PointEstimators 3.1Closeness 3.2Mean-squaredError 3.3Consistency andBAN 3.4LossandRiskFunctions 4Sufficiency 4.1Sufficient Statistics 4.2Factorization Criterion 4.3MinimalSufficientStatistics 4.4ExponentialFamily 5UnbiasedEstimation 5.1LowerBoundforVariance 5.2SufficiencyandCompleteness 6Location or ScaleInvariance 6.1Location Invariance 6.2Scale Invariance 7BayesEstimators 7.1PosteriorDistribution 7.2Loss-functionApproach 7.3MinimaxEstimator 8Vector of Parameters 9OptimumPropertiesof Maximum-likelihoodEstimation vm Parametric IntervalEstimation 1Introduction and Summary 2ConfidenceIntervals 2.1An Introduction toConfidence Intervals 2.2Definition of Confidence Interval 2.3PivotalQuantity CONTENTSix 271 271 273 274 276 286 288 288 291 294 297 299 300 307 311 312 315 315 321 331 332 336 339 340 343 350 351 358 372 372 373 373 377 379 XCONTENTS 3Sampling fromtheNormalDistribution381 3.1Confidence IntervalfortheMean381 3.2ConfidenceInterval forthe Variance382 3.3Simultaneous ConfidenceRegionfortheMean and Variance384 3.4Confidence IntervalforDifferenceinMeans386 4Methodsof Finding ConfidenceIntervals387 4.1Pivotal-quantity Method 387. 4.2StatisticalMethod389 5Large-sample Confidence Intervals393 6BayesianIntervalEstimates396 IXTestsof Hypotheses401 1IntroductionandSummary401 2SimpleHypothesisversusSimple Alternative409 2.1Introduction409 2.2Most PowerfulTest410 2.3LossFunction414 3Composite Hypotheses418 3.1GeneralizedLikelihood-ratioTest419 3.2UniformlyMost PowerfulTests421 3.3UnbiasedTests425 3.4Methods of Finding Tests425 4 Tests of Hypotheses-Sampling from the Normal Distribution428 4.1TestsontheMean428 4.2TestsontheVariance431 4.3Testson SeveralMeans432 4.4Testson SeveralVariances438 5 Chi-squareTests.440 5.1AsymptoticDistributionof GeneralizedLikelihood-ratio440 5.2Chi-square Goodnessof-fitTest442 5.3Testof theEqualityof TwoMultinomialDistributions andGeneralizations448 5.4Testsof Independence inContingency Tables452 6 Testsof Hypothesesand ConfidenceIntervals461 7Sequenti'alTestsof Hypotheses464 7.1Introduction464 CONTENTSxi 7.2Definition of SequentialProbabilityRatioTest466 7.3Approximate SequentialProbabilityRatioTest468 7.4ApproximateExpected Sample Size of SequentialProbability Ratio Test470 XLinearModels 482 1Introduction and Summary482 2Examples of theLinearModel483 3Definitionof LinearModel484 4PointA487 5ConfidenceIntervals-Case A491 6Testsof Hypotheses-Case A494 7Point Estimation-Case B498 XIN onparametricMethods504 1Introduction andSummary504 2InferencesConcerning aCumulativeDistributionFunction506 2.1Sample or Empirical Cumulative DistributionFunction506 2.2Kolmogorov-Smirnov Goodness-of-fit Test508 2.3ConfidenceBands forCumulative Distribution Function511 3Inferences Concerning Quantiles512 3.1Pointand IntervalEstimates of aQuantile512 3.2Testsof HypothesesConcerning Quantiles514 4 ToleranceLimits515 5 Equalityof TwoDistributions518 5.1Introduction518 5.2 Two-sample Sign Test519 5.3RunTest519 5.4Median Test521 5.5Rank-sum Test522 AppendixA.MathematicalAddendum 1Introduction 527 527 xiiCONTENTS 2Noncalculus527 2.1Summation andProductNotation527 2.2Factorialand CombinatorialSymbolsandConventions528 2.3Stirling'sFormula 530 2.4The BinomialandMultinomialTheorems530 3Calculus531 3.1Preliminaries531 3.2TaylorSeries533 3.3The Gamma andBetaFunctions534 Appendix B.Tabular Summary of Parametric Families of Distributions537 1Introduction537 Table1.DiscreteDistributions538 Table2.ContinuousDistributions540 Appendix C.Referencesand Related Reading544 MathematicsBooks544 ProbabilityBooks544 Probabilityand StatisticsBooks545 Advanced(moreadvancedthanMGB)545 Intermediate (about thesamelevelasMGB)545 Elementary(lessadvancedthanMGB,but calculus prerequisite)546 SpecialBooks546 Papers546 Books of Tables547 Appendix D.Tables548 1Descriptionof Tables548 Table1.Ordinates of theNormalDensityFunction548 Table2.CumulativeNormalDistribution548 Table3.Cumulative Chi-squareDistribution549 Table4.Cumulative FDistribution549 Table5.CumulativeStudent'stDistribution550 Index557 PREFACETOTHE THIRDEDITION Thepurposeof thethird editionof thisbook istogiveasoundandself-con-tained (in the sense that the necessary probability theory is included) introduction toclassicalormainstreamstatisticaltheory.Itisnotastatistical-methods-cookbook,noracompendiumof statisticaltheories,norisitamathematics book.Thebook isintended tobeatextbook,aimed forusein thetraditional full-yearupper-divisionundergraduatecourseinprobabilityandstatistics, or foruseasatextinacoursedesigned for first-year graduatestudents.The lattercourseisoftena"servicecourse,"offeredtoavarietyofdisciplines. Nopreviouscourseinprobabilityor statisticsisneededinordertostudy thebook.The mathematical preparation required isthe conventional full-year calculuscourse which includes seriesexpansion,mUltipleintegration,andpar-tialdifferentiation.Linearalgebraisnotrequired.Anattempthasbeen made totalk tothe reader.Also, we have retainedtheapproachof presenting the theory with some connection to practical problems.The book is not mathe-maticallyrigorous.Proofs,and evenexactstatements of results,areoftennot given.Instead,wehavetriedtoimpart a"feel" forthetheory. The book isdesigned to be used in either thequarter systemor the semester system.Inaquartersystem,Chaps.IthroughV couldbecovered inthe first xivPREFACETOTHETHIRDEDITION quarter,Chaps.VIthrough part of VIIIthe secondquarter,andthe restof the bookthethirdquarter.Inasemestersystem,Chaps.IthroughVIcouldbe coveredthefirstsemesterandtheremainingchaptersthesecondsemester. Chapter VI is a "bridging" chapter; it can be considered to be a part of" proba-bility" or apart of" statistics."Severalsections or subsections can be omitted withoutdisruptingthecontinuityofpresentation.Forexample,anyofthe following could beomitted:Subsec.4.5of Chap.II; Subsecs., 2.6,3.5,4.2,and 4.3of Chap.III;Subsec.5.3of Chap.VI;Subsecs.2.3,3.4,4.3andSecs.6 through 9 of Chap. VII; Secs.5 and 6 of Chap. VIII; Secs.6 and 7 of Chap. IX; and all or part of Chaps. X and XI.Subsection 5.3 of Chap VI on extreme-value theoryissomewhat more difficultthan therestof that chapter.In Chap.VII, Subsec.7.1on Bayesestimation canbetaught withoutSubsec.3.4on lossand risk functions but Subsec.7.2 cannot.Parts of Sec. 8 of Chap. VII utilize matrix notation.Themanyproblemsareintendedtobeessentialforlearningthe material inthebook.Someof the more difficult problemshavebeenstarred. ALEXANDERM.MOOD FRANKLINA.GRAYBILL DUANEC.BOES EXCERPTSFROMTHEFIRST ANDSECONDEDITIONPREFACES This book developed froma set of notes which I prepared in 1945.At that time there wasnomoderntextavailable specificallydesignedforbeginningstudents of matpematicalstatistics.Since thenthe situationhasbeen relievedconsider-ably,andhad Iknowninadvancewhatbookswereinthemakingitislikely that Ishouldnothaveembarkedonthisvolume.However,itseemedsuffi-ciently differentfromotherpresentationstogiveprospectiveteachersandstu-dentsausefulalternativechoice. The aforementioned notes were used as text material for three years at Iowa StateCollegeinacourseofferedtoseniorandfirst-yeargraduatestudents. The only prerequisite for the coursewasoneyearof calculus,andthisrequire-ment indicates the levelof the book.(The calculus classat Iowa State met four hours per weekand included goodcoverage of Taylor series,partialdifferentia-tion,and multiple integration.) Nopreviousknowledgeof statistics isassumed. This isastatisticsbook,notamathematicsbook,asanymathematician willreadilysee.Littlemathematicalrigoristobefoundinthederivations simply because it would be boring and largely awaste of time atthislevel.Of course rigorous thinking isquite essentialtogoocfstatistics,and Ihavebeenat some painstomake ashowof rigorandtoinstillan appreciationforrigorby pointingout variouspitfallsof loosearguments. XVIEXCERPTSFROMTHEFIRSTANDSECONDEDITIONPREFACES Whilethistextisprimarilyconcernedwiththetheoryofstatistics,full cognizancehasbeentakenof thosestudentswhofearthatamomentmaybe wastedinmathematicalfrivolity.Allnewsubjectsaresuppliedwithalittle sceneryfrompracticalaffairs,and,moreimportant,aseriousefforthasbeen made in theproblemstoillustratethe variety of waysin whichthetheorymay beapplied. The arean essentialpart of thebook.Theyrange fromsimple numericalexamplestotheoremsneededinsubsequent chapters.They include important subjects which could easily take precedence over material in thetext; the relegation of subjects to problems wasbasedrather on the feasibilityof such aprocedurethanonthepriorityof thesubject.For example,thematterof correlation isdealtwithalmostentirely intheproblems.It seemedtomein-efficienttocover multivariate situations twiceindetail,i.e.,withtheregression modelandwiththecorrelationmodel.The emphasisinthetextproper ison themore generalregression model. Theauthorofatextbookisindebtedtopracticallyeveryonewhohas touched the field,and Iherebowto allstatisticians.However,in giving credit to contributors one must draw the line somewhere,and Ihave simplified matters by drawing it veryhigh;onlythemosteminentcontributorsarementionedin thebook. IamindebtedtoCatherineThompsonandMaxineMerrington,andto E.S.Pearson, editor of Biometrika,forpermissiontoinclude Tables IIIandV, whichareabridgedversionsof tablespublishedin Biometrika.Iamalsoin-debtedtoProfessorsR.A.Fisher and Frank Yates,and toMessrs.Oliver and Boyd,Ltd.,Edinburgh,forpermissiontoreprintTableIVfromtheirbook " Statistical TablesforUseinBiological,AgriculturalandMedicalResearch." Sincethe firstedition of thisbook waspublished in1950manynewstatis-ticaltechniques have been made available and many techniques that were only in thedomain of the mathematicalstatisticianarenowusefulanddemandedby the appliedstatistician.To include some of thismaterial wehave hadtoelim-inate other material, else the book would have come toresembleacompendium. Thegeneralapproach of presenting thetheorywithsomeconnectiontoprac-ticalproblemsapparentlycontributedsignificantlytothesuccessof thefirst edition and we havetried tomaintain that feature in thepresentedition. I PROBABILITY 1INTRODUCTIONANDSUMMARY The purpose of thischapter isto define probability and discuss some of its prop-erties.Section2 isabrief essayonsomeof thedifferentmeaningsthathave beenattachedtoprobabilityandmaybeomittedbythosewhoareinterested onlyinmathematical(axiomatic)probability,whichisdefinedinSec.3and usedthroughouttheremainderof thetext.Section3issubdividedintosix subsections.The first,Subsec.3.1,discusses the concept of probability models. Itprovidesareal-worldsettingfortheeventualmathematicaldefinitionof probability.Areviewof some of the settheoreticalconceptsthat arerelevant toprobabilityisgiveninSubsec.3.2.Samplespaceandeventspaceare defined in Subsec.3.3.Subsection 3.4 commences with a recallof the definition of a function.Such a definition isusefulsincemany of the words to be defined inthisandcomingchapters(e.g.,probability,randomvariable,distribution, etc.)aredefinedasparticularfunctions.Theindicatorfunction,tobeused extensivelyinlater chapters,isdefinedhere.The probabilityaxiomsarepre-sented, and the probability function is defined.Several properties of this prob-ability functionare stated.The culmination of thissubsectionisthe definition of aprobabilityspace.Subsection3.5isdevoted to examplesof probabilities 2PROBABIUTYI definedonfinitesamplespaces.Therelatedconceptsofindependenceofu events and conditional probability are discussed in the sixth and finalsubsection. Bayes'theorem,themUltiplicationrule,andthetheoremof totalprobabilities are provedor derived,and examplesof eachare given. Of thethreemainsectionsincluded inthischapter,onlySec.3,whichis byfarthelongest,isvital.Thedefinitionsof probability, Pfobabiijly space, conditionalprobability,andindependence,alongwithfamiliaritywiththe properties of probability,conditionalandunconditionalandrelatedformulas, are the essence of this chapter.This chapter isabackground chapter; -itintro-ducesthelanguageof probabilitytobeusedindeveloping distributiontheory, which isthebackbone of the theoryof statistics. 2KINDSOFPROBABILITY 2.1Introduction Oneof thefundamentaltoolsof statisticsisprobability,whichhaditsformal beginningswithgamesof chance in the seventeenthcentury. Gamesof chance,asthename implies,include suchactionsasspinninga roulette wheel,throwing dice,tossingacoin, drawingacard,etc.,inwhicht h ~ outcome of a trial isuncertain.However, it isrecognized that even thoughthe outcomeof anyparticulartrialmaybeuncertain,thereisapredictable' ibng-termoutcome.Itisknown,forexample,thatinmanythrowsof anideal (balanced,symmetrical)coinaboutone-halfof thetrialswillresultinheads. It is this long-term, predictable regularity that enablesgaming houses to engage inthebusiness. Asimilartypeof uncertaintyandlong-termregularityoftenoccursin experimentalscience.Forexample,inthescienceof geneticsitisuncertain whetheranoffspringwillbemaleorfemale,butinthelongrunit isknown approximatelywhat percentof offspringwillbemaleand whatpercen'twillbe female.Alifeinsurance company cannot predict which persons in theUnited States willdieat age50,but it canpredictquite satisfactorily howmany people in the UnitedStates willdieat that age.~ First weshalldiscuss the classical,or apriori, theoryof probability; then we shall discuss the frequencytheory.Development of the axiomatic approach willbedeferreduntilSec.3. 2KINDSOFPROBAMUl'Y3 2.2Classicalor APriori Probability Aswestatedintheprevioussubsection,thetheoryof probabilityinitsearly stages was closely associated with games of chance.This association prompted the classicaldefinition.For example,suppose that wewant the probability of the even.t1hat an ideal coin will turn up heads.We argue in this manner:Since _>-a( there areonly two waysthat the coin can fall,heads or tails,andsince the coin iswellbalanced,one wouldexpect that the coin is just aslikelytofallheadsas tails; theprobabilityof theeventof aheadwillbegiventhevalue t Thiskindofreasoningpromptedthefollowingclassicaldefinitionof prob-ability. Definition1Classical probabilityIf a random experiment can resul t innmutuallyexclusiveandequallylikelyoutcomesandif n A.of these outcomeshaveanattributeA,thentheprobabilityof Aisthefraction nA./n.1/1/ We shall apply this definition to a fewexamples in order to illustrate its meaning. If an ordinary die (one ofa pair of dice) is tossed-there are six possible out-comes-anyone of thesixnumberedfacesmayturn up.Thesesixoutcomes aremutuallyexclusivesincetwoormorefacescannot turn upsimultaneously. And if the die is fair, or true, the six outcomes are equally likely; i.e., it is expected that eachfacewillappear withaboutequalrelativefrequencyinthelongrun. blowsupposethat wewanttheprobability that the resultof atossbean even number.Threeof thesixpossibleoutcomeshavethisattribute.Theprob-abilityan even number willappear when adie istossed is therefore i, or t. -l;imilarly,theprobabilitythata5willappearwhenadieistossedis1.The probability that the resultof atosswillbe greater than 2 isi-To consider another example, suppose that a card is drawn at random from anordinarydeckof playingcards.Theprobabilityofdrawingaspadeis readilyseentobe or iThe probabilityof drawinganumberbetween5 andinclusive, is;ort'3 The application of the definition isstraightforward enough in these simple cases,but itisnotalwayssoobvious.Carefulattentionmustbepaid tothe "mutually exclusive," Hequally likely," and Hrandom."Suppose thatonewishestocomputetheprobabilityof gettingtwoheadsif acoinis tossed twice.Hemightreasonthattherearethreepossibleoutcomesforthe twotosses:two "heads,twotails,or one headandone tai 1.One of thesethree . 4PROBABIUTY' I outcomeshasthedesired attribute, i.e., twoheads;therefore the probability is -1.Thisreasoningisfaultybecausethethreegivenoutcomesarenotequally likely.Thethirdoutcome,oneheadandonetail,canOCCUrintwoways sincetheheadmayappearonthefirsttossand thetailon thesecondor the headmayappearon thesecondtossandthetailon the first.Thus thereare fourequallylikelyoutcomes:HH,HT,TH,andTT.Thefirstofthesehas the desired attribute, while theothers do not.The correct probability is there-forei.Theresultwouldbethesameif twoidealcoinsweretossedsimul-taneously. Again,supposethatonewishedtocomputetheprobabilitythatacard drawn froman ordinary well-shuffieddeck willbe anaceor aspade.Inenu-merating the favorableoutcomes,onemightcount4acesand13spadesand reasonthatthereare17outcomeswiththedesiredattribute.Thisisclearly incorrect because these17outcomes are not mutually exclusive sincethe aceof spades isboth an ace and a spade.There are16 outcomes that are favorable to an ace or aspade, and sothe correct probability is~~ ,or143' Wenotethatbytheclassicaldefinitiontheprobabilityof eventAisa number between 0 and 1 inclusive.The ratio nAln must beless than or equalto 1sincethetotalnumberofpossibleoutcomescannotbesmallerthanthe number of outcomes with a specified attribute.If an event is certain to happen, itsprobability is1;if it iscertain not to happen, its probability is O.Thus, the probabilityof obtainingan8intossingadieisO.Theprobabilitythat the number showing whenadie istossed islessthan 10is equal to1. The probabilities determined by the classicaldefinitionarecalleda priori probabilities.Whenonestatesthattheprobabilityof obtainingaheadin tossingacoinis!-,hehasarrivedat thisresultpurelybydeductivereasoning. The result doesnot require that any coin betossed or evenbe at hand.We say that if the coin is true,the probability of ahead is!, but this islittle more than saying thesamethingintwodifferentways.Nothingissaidabouthowone can determine whether or not aparticular coin istrue. Thefactthatweshalldealwithidealobjectsindevelopingatheoryof probability willnot troub1e us because that is acommon requirement of mathe-maticalsystems.Geometry,forexample,dealswithconceptuallyperfect circles,lineswithzerowidth,andsoforth,butitisausefulbranchof knowl-edge,which can be applied to diverse practical problems. There aresome rather troublesome limitations in the classical, or a priori, approach.It isobvious,forexample,thatthe definitionof probabilitymust bemodifiedsomehowwhenthe totalnumberof possibleoutcomesisinfinite. One mightseek, forexample,the probability that an integerdrawnatrandom fromthepositiveintegersbeeven.Theintuitiveanswer tothisquestionist. 2KINDSOFPROBABIUTY5 If one were pressedto justify thisresulton thebasisof the definition,hemight reasonasfol1ows:Supposethatwelimitourselvestothefirst20integers;10 of theseare even sothat theratio of favorableoutcomes tothe totalnumber is t%,or 1Again,if the first200 integers are considered,100of theseare even, and the r a ~ i ois also 1_In genera!, the first 2N integers contain Neven integers; if we form the ratio N/2N and let Nbecome infinite so as to encompass the whole setof positive integers, the ratio remains1_Theabove argumentisplausible, andtheanswerisplausible,butitisnosimplematter tomaketheargument standup.It depends,forexample,onthenaturalorderingof thepositive integers,andadifferentordering couldproduceadifferentresult.Thus,one could just aswellorder the integers in this way:1,3,2; 5,7,4;9,11,6;___, takingthefirstpairof oddintegersthenthefirsteveninteger,thesecondpair of odd integers then the second eveninteger,and soforth.Withthisordering, onecouldarguethattheprobabilityof drawinganevenintegeris1-.The integers can alsobeordered sothat theratiowilloscillateandneverapproach anydefinite value asNincreases. Thereisanotherdifficultywiththeclassicalapproachtothetheoryof probabilitywhichisdeepereventhanthatarisinginthecaseof aninfinite num berofoutcomes.Supposethatwetossacoinknowntobebiasedin favorof heads(itisbentsothataheadismorelikelytoappearthanatail). The two possibleoutcomesof tossingthe coinarenot equallylikely.What is the probability of a head?The classical definition leaves us completely helpless here. Stillanother difficulty withthe classicalapproach isencountered whenwe trytoanswerquestionssuchasthefollowing:Whatisthe probabilitythat a childborninChicagowillbeaboy?Or whatisthe probabilitythatamale willdiebeforeage50?Orwhatistheprobabilitythatacookieboughtata certainbakery willhavelessthan threepeanutsinit?Allthesearelegitimate questionswhichwewantto bring into the realm of probability theory.However, notions of Hsymmetry,""equally likely," etc., cannotbeutilizedasthey could bein games of chance.Thus weshallhave toalter or extendour definitionto bringproblemssimilartotheaboveintotheframeworkof thetheory.This more widely applicable probability is called a posteriori probability, or!requency, andwillbediscussedinthenextsubsection. 2.3APosteriori or Frequency Probability A coin which seemed to be wellbalanced and symmetricalwastossed100 times, and the outcomes recorded in Table 1.The important thing to notice isthat the relative frequencyof heads is closeto1.Thisisnot unexpectedsincethe coin 6PROBABILITYI wassymmetrical,and it wasanticipated that in the long run headswouldoccur aboutone-half of thetime.For another example,asingle die wasthrown300 times,andtheoutcomesrecordedinTable2.Noticehowclosetherelative frequencyof afacewithaIshowingisto!; similarlyfora2,3,4,5,and6. Theseresultsarenotunexpectedsincethediewhichwasusedwasquitesym-metricalandbalanced;itwasexpectedthateachfacewouldoccurwithabout equalfrequencyinthe longrun.Thissuggests that wemightbe willingtouse thisrelativefrequencyinTable1 asanapproximationfortheprobabilitythat theparticularcoinusedwillcomeupheadsor wemightbewillingtousethe relativefrequenciesinTable2asapproximationsfortheprobabilitiesthat variousnumbersonthisdiewillappear.Note thatalthough therelativefre-quenciesof thedifferentoutcomesarepredictable,theactualoutcomeof an individual throw isunpredictable. Infact,it seemsreasonable toassumeforthe coinexperimentthatthere existsanumber,labelit p,whichisthe probability of ahead.Now if thecoin appearswellbalanced,symmetrical,andtrue,wemightuseDefinition1and statethat pisapproximatelyequalto 1.It isonlyanapproximationtoset p equal to 1- since for this particular coin wecannot be certain that the twocases, .-headsandtails,areexactlyequallylikely.Butbyexaminingthebalanceand symmetryof thecoinit mayseemquitereasonabletoassumethattheyare. Alternatively,thecoincouldbetossedalargenumberof times,theresults recorded as in Table 1, and the relative frequency ofa head used as anapproxima-tionforp.Intheexperimentwithadie,theprobability P2of a2showing could be approximatedbyusingDefinitionIor byusingthe relative frequency inTable 2.The importantthing isthatwepostulatethatthereisanumber P whichisdefinedastheprobabilityofaheadwiththecoinoranumberP2 which is the probability of a 2 showing in the throw of the die.Whether weuse Definition1 or therelative frequencyfor the probabilityseemsunimportant in theexamples ci ted. Table1RESULTSOFTOSSINGACOIN100TIMES Observed Observed relative Long-rune x p e c ~ e d relativefrequency Outcome Frequency frequencyof abalanced coin H56 .56 .50 T44 .44 .50 Total1001.001.00 2 KINDSOFPROBABIUTY7 Suppose,asdescribedabove,that the coinisunbalancedsothat weare quitecertainfromanexaminationthatthetwocases,headsand tails,arenot equallylikely ,tohappen.Inthesecasesanumberpcanstillbepostulated asthe probabilit y that a head shows,but the classicaldefinition willnot help us tofindthevalueof p.Wemustusethe frequencyapproachor possiblysome physicalarialysisof the unbalanced coin. In many scientific investigations, observations are taken whichhave anele-ment of uncertainty or unpredictability in them.Asa very simple example, sup-pose that wewant to predict whether the next baby born in a certain locality will be a male or a female.This is individually an uncertain event, but the resultsof groupsof births canbedealtwithsatisfactorily.Wefindthatacertainlong-runregularity exists which issimilar to the long-runregularityof the frequency ratio of a head when a coin is thrown.If, for example, we find upon examination of records that about 51percent of the births are male, it might be reasonable to postulate that the probability of a male birth in this locality is equal to a number pand take.51asits approximation. Tomake this ideamore concrete, we sha11assume that a series of observa-tions(orexperiments)canbemadeunderquite uniforlJ1conditions.That is, an observation of a random experiment ismade; then the experiment is repeated undersimilarconditions,andanotherobservationtaken.Thisisrepeated many times,andwhile the conditions are similar each time,thereisanuncon-trollablevariationwhichishaphazardor randomsothat theobservationsare individuallyunpredictable.Inmanyof thesecasestheobservationsfallinto certainclasseswhereintherelativefrequenciesarequitestable.Thisthat we postulate a number p, called the probability of the event, and approximate pbythe frequencywithwhichtherepeatedobservationssatisfythe Table2RESULTSOFTOSSINGADIE300TIMES ObservedObserved Long-runexpected relativefrequency OutcomeFrequencyrelativefrequencyof abalanced die 151 .170.1667 2 54 .180.1667 348 .160.1667 4 51 .170.1667 5 49 .163.1667 6 47 .157.1667 Tota13001.0001.000 8PROBABILITYI event.Forinstance,supposethattheexperimentconsistsofsamplingthe populationof alargecitytoseehowmanyvotersfavoracertainproposal. The outcomes are" favor"or "do not favor,"and eachvoter's response isun-predictable, but it isreasonabletopostulate anumber pasthe probability that agiven response willbe "favor."The relative frequency of" favor" responses canbe usedasan approximate value for p. Asanotherexample,supposethattheexperimentconsistsof sampling transistorsfromalargecollectionof transistors.Weshallpostulatethatthe probability of a giventransistor being defective is p.Wecan approximate pby selectingseveraltransistorsatrandomfromthecollectionandcomputingthe relative frequencyof the number defective. The important thing is that wecanconceiveof aseriesof observationsor experimentsunder rather uniformconditions.Then anumber pcanbepostu-lated as the probability of the event Ahappening, and p can be approximated by the relative frequencyof the eventAin aseries of experiments. 3PROBABILITY-AXIOMATIC 3.1Probability Models One of the aims of science is to predict and describe events in the world in which welive.Onewayinwhichthisisdoneisto construct mathematical models whichadequately describe therealworld.For example,the equation s =tgt2 expressesacertain relationshipbetweenthesymbols s, g, andt.It isamathe-maticalmodel.Tousethe equations = !gt2 to predicts,the distanceabody falls,asafunctionof time t, the gravitational constant gmust beknown.The latter isaphysicalconstantwhichmustbemeasuredbyexperimentationif the equations =tgt2 istobeusefuLThe reasonformentioningthisequationis thatwedoasimilarthinginprobabilitytheory;weconstructaprobability modelwhichcanbeusedto describe events in therealworld.For example, it might be desirable to findan equation whichcouldbeused to predict the Sexof eachbirthin acertain locality.Such an equation wouldbevery complex, and none has been found.However, a probability model can be constructed which, whilenotveryhelpfulindealingwithanindividualbirth,isquiteusefulin dealing withgroupsof births.Therefore,wecanpostulateanumber pwhich represents theprobability that abirthwillbeamale.Fromthisfundamental probabilitywecananswerquestionssuchas:Whatistheprobabilitythatin ten births at least three will be males?Or what isthe probability that there will bethree consecutive male births in the next five?To answerquestions suchas theseand many similar ones,weshalldevelopan idealized probability model. 3PROBABILlTY-AXIOMATIC9 Thetwogeneraltypesof probability(aprioriandaposteriori)defined abovehaveoneimportantthingincommon:Theybothaconceptual experimentinwhichthevariousoutcomes canoccurundersomewhatuniform conditions.For example,repeatedtossingof acoinfortheaprioricase,and repeatedbirthfortheaposterioricase.However,wemightliketobringinto the realmof probability theory situations whichcannotconceivablyfitintothe frameworkofrepeatedoutcomesundersomewhatsimilarconditions.For example, wemight liketo answer questions suchas:What isthe probability my wifelovesme?Or what isthe probability that World War III willstart before January1,1985?Thesetypesof problemsarecertainlyalegitimatepartof generalprobabilitytheoryandareincludedinwhatisreferredtoassubjective probability.Weshallnotdiscusssubjectiveprobabilitytoanygreatextentin thisbook,weremark that the axiomsof probability from which wedevelop probabilitytheoryarerichenoughtoincludeaprioriprobability,aposteriori probability,andsubjective probability. To start, werequire that everypossibleoutcome of the experiment under study can be enumerated.For example, in the coin-tossing experiment there are twopossibleoutcomes:headsandtails.Weshallassociateprobabilitiesonly withtheseoutcomesor withcollectionsof theseoutcomes.We add,however, that even if a particular outcome is impossible, it can be included (its probability is0).Themainthingtorememberisthateveryoutcomewhichcanoccur must be included. Eachconceivableoutcomeof theconceptual experiment understudy will be definedasasamplepoint,and thetotalityof conceivableoutcomes(orsample points)will be defined as the sample space. Our .object,of course,istoassessthe probabilityof certainoutcomesor collectionsofoutcomesof theexperiment.Discussionof suchprobabilities isconvenientlycouchedinthelanguageofsettheory,anoutlineofwhich appearsinthenextsubsection.WeshaHreturntoformaldefinitionsand examplesof sample space,event,and probability. 3.2An Aside-Set Theory Webeginwithacollectionof objects.Eachobjectinourcollectionwillbe calledapointorelement.Weassumethatour collectionof objectsislarge enoughtoincludeallthepointsunderconsiderationinagivendiscussion. Thetotalityof allthesepointsiscalledthespace,universe,oruniversalset. Wewillcall it the space (anticipating that it willbecome the sample space when we speak of probability) and denote itbyn.Let (1)denote an element or point i"nn.Althoughasetcanbedefinedasanycollectionof objects,weshall 10PROBABIliTYI assume, unlessotherwise stated, that allthe sets mentioned in a given discussion consistof points in the spacen. EXAMPLE1Q=R2,where R2 is the collection of points coin the plane and co= (x,y)isany pair of realnumbers xand y.IIII EXAMPLE2n = {allUnitedStates citizens}. IIII WeshallusuallyusecapitalLatinlettersfromthebeginningofthe alphabet, with or without subscripts, to denote sets.If cois a pointor element belongingtothesetA,weshallwritecoEA;if coisnotanelementofA,we shallwriteco;A. Definition 2SubsetIf everyelementof a setAisalsoan elementof a setB,thenAisdefinedtobeasubset of B, and weshallwrite AcBor B=>A; read" Aiscontained in B" or " BcontainsA."II1/ Definition 3Equivalent setsTwo sets Aand B are defined to be equiva-lent,or equal,ifA cBandBcA..Thiswillbeindicatedbywriting A=B./111 Definition 4EmptysetIf asetAcontainsnopoints, it\'villbe called the null set,or empty set,and denoted by./1// Definition SComplementThecomplementof asetAwithrespectto the space n, denoted byA, AC,or n - A, isthe set of all points that are in Qbut notinA.1//1 Definition 6UnionLetAandBbeanytwosubsetsof n;thenthe setthatconsistsof allpointsthatare inAor Bor bothisdefinedtobe the unionof Aand Band writtenAvB.1/// Definition 7IntersectionLetAandBbeanytwosubsets of n; then thesetthat consistsof allpoints that are inbothAand Bisdefinedtobe the intersectionof AandBand iswrittenA(\ Bor AB./ / // Definition 8Set differenceLet A and B be any two subsetsof n.The setof allpointsinAthatarenotinBwHlbedenotedbyA- Bandis defined as set difference.II // 3PROBABILITY-AXIOMATIC11 EXAMPLE3Letn = {(x,y):0 Ed. 1111 Theorem13If Al and A2Ed, then Ai() A2Ed. PROOFAlandA2Ed; henceAlUA2,and(AIUA2) Ed,but (-=A:--1 -U-A='2)= Al () 12 = Al () A2byDe Morgan's law.1111 nn Theorem14If AI' A2, ... , AnEd, thenU Ai andn Ai E d. 1=1I1 PROOFFollows by induction. 1111 Wewillalwaysassumethatour collectionof eventsdisanalgebra-whichpartiallyjustifiesour useof dasournotation forit.In practice,one mighttakethatcollectionof eventsof interestinagivenconsiderationand enlarge the collection, if necessary,to include (i) the sure event, (ii)allcomple-mentsof eventsalreadyincluded,and(iii)allfiniteunionsand intersectionsof eventsalreadyincluded,andthISwillbean algebrad.Thus far, we have not explained why dcannot alwaysbe taken to be the collection of allsubsets of Q. Such explanation will be given when we define probability in the next subsection. 3 P:aOBABILITY-AXIOMATIC19 3.4Definitionof Probability In thissectionwegivethe axiomaticdefinitionof probability.Althoughthis formaldefinitionof probabilitywillnotinitself allow us to achieve our goal :>f assigning actual probabilities to events consisting of certain outcomes of random experiments,itisanother inaseriesof definitionsthatwillultimatelyleadto thatgoal.Sinceprobability,aswellasforthcomingconcepts,isdefinedasa particular function,webeginthissubsectionwithareviewof thenotionof a function. The definition of afunctionThefollowingterminologyisfrequentlyused to describe a function:Afunction,say f('), is arule (law,formula,recipe)that associates each point in one set of points with one and only one point in another set of points.The firstcollection of points, sayA, iscaned the domain,and the second collection, sayB,the counterdomain. Definition13FunctionAfunction,sayf( .), with domain Aand coun-terdomain B, is a collection of ordered pairs, say (a,b),satisfying (i)a EA and b EB; tii) eacha EAoccurs as the firstelementof some ordered pair in the collection(each bE Bisnot necessarilythe secondelement of some orderedpair);and(iii)notwo(distinct)orderedpairsinthecollection have thesame firstelement.IIII If (a,b) Ef( .),wewriteb = f(a)(read" b equalsfof a")andcall f(a) thevalueof f()at a.For anya EA, f(a)isan element of B; whereasf( -)is asetof Qrderedpairs.Thesetof allvaluesof f( .) iscalled the rangeof f( ); i.e.,therange of f(')= {bEB: b = f(a) for some a EA} and is always a subset of the counterdomain Bbut isnot necessarily equal to it.f(a) is also called the imageof a under f( +),and a is called the pre image of /(a). EXAMPLE12Let.ft(-)andf2(')bethetwofunctions,havingtherealline fortheir domain and counterdomain,definedby fi ( .) = {(x, y) : y= x3 + X+ 1,- 00< x 0and 0< PCB] 0; then P[A1A2 An]= P[AdP[A21 AtlP[A31 A1A2] PlAn IAl... An-d PROOFTheproofcanbeattainedbyemployingmathematical induction andisleft asan exercise. If/I Aswiththetwoprevioustheorems,themultiplicationruleisprimarily usefulforexperimentsdefinedintermsof stages.Suppose the experiment has nstagesandAJ isan eventdefinedintermsof stage jof theexperiment;then AIA2 .. , Aj-d istheconditionalprobabilityofaneventdescribedin . termsofwhathappensonstagejconditionedonwhathappensonstages 1,2,... , j- 1.ThemultiplicationrulegivesPlAt A 1.. An]intermsof the natural conditional probabilities P[AjIA1A2'"Aj-tl forj= 2,... , n. EXAMPLE25Therearefiveurns,andtheyarenumberedIto5.Each urn contains10balls.Urn i hasi defectiveballsand10- i nondefective balls,i =1,2,... ,5.For instance,urn3 hasthreedefectiveballsand sevennondefectiveballs.Considerthefollowingrandomexperiment: Firstan urn isselectedatrandom,andthen itballisselectedatrandom fromthe selectedurn.(The experimenter doesnot knowwhichurnwas selected.)Letusasktwoquestions:(i)Whatistheprobabilitythata defectiveballwillbeselected?(ii)If wehavealreadyselectedtheball andnotedthatitisdefective,whatistheprobabilitythatitcamefrom urn 5? SOLUTIONLetAdenotetheeventthatadefectiveballisselectedand Bt theeventthaturniisselected,i =I, ... ,5.NotethatP[B,l=i1,... ,5, andP[AIBi ]=i/lO, i= 1,... , 5.Question (i)asks, What is P[A]?Usingthetheoremof totalprobabilities,wehave 55ill 5.1563 peA] == 10'"5 =50= 50"2 = 10' 38PROBABILITY I Note that there isatotalof 50 ballsof which15are defective!Question (ii)asks,WhatisP[Bsl A]?Sinceurn5hasmoredefectiveballsthan anyof theotherurnsandweselectedadefectiveball,wesuspectthat P[Bsl A] > P[Bi IA]fori =1,2,3,or4.Infact,wesuspectP[Bsl A] > P[B41 A] >... > P[BII A].EmployingBayes' formula,wefind Similarly, P[BsIA] =;[AIBs]P[Bs] IP[A IBtJP[BtJTo3 i= I P[BIA] = (kilO). t= !5... k15' 10 k= 1,... ,5, substantiating our suspicion.Note thatunconditionally allthe B/swere equallylikelywhereas,conditionally (conditionedonoccurrenceof event A),theywerenot.Also,notethat sskI s156 IP[BkIA] =I-=- Ik=--=l.IIII k=lk=11515k=l152 EXAMPLE26Assume thatastudent istaking amultiple-choice test.On a givenquestion,thestudenteitherknowstheanswer,inwhichcasehe answersitcorrectly,orhedoesnotknowtheanswer,inwhichcasehe guesseshopingtoguesstherightanswer.Assumethattherearefive multiple-choicealternatives,asisoftenthecase.Theinstructoriscon-frontedwiththisproblem:Havingobservedthatthestudentgotthe correct answer,he ,wishesto know what isthe probability that the student knewthe answer.Let pbe the probability that the stydent willknow the answer and 1 - P the probability that the student guesses.Let usassume thattheprobabilitythatthestudentgetstherightanswergiventhathe guesses ist.(This may not be a realistic assumption since even though the student doesnot know the rightanswer,he often would know that certain alternativesarewrong, in whichcase hisprobability of guessing correctly should be better thanLet Adenote the event that the student got the right answer and B denote the event that the student knew the right answer. We areseeking P[BI A].UsingBayes'formula,wehave P[A IB]P[B]1 . p P[BI A] = P[A IB]P[B] + P[A I B]P[B]=1 . P + t(1- p)" Note that IIII 3PROBABlUTY-AXIOMATIC39 EXAMPLE27Anurncontainstenballsof whichthreeare black andseven arewhite.The followinggame isplayed:At eachtrialaballisselected at random, itscolor isnoted,and it isreplacedalongwithtwoadditional ballsofthesamecolor.Whatistheprobabilitythatablackballis selectedineachof thefirstthreetrials?LetBidenotetheeventthata black ball isselected on the ith trial.We are seeking P[B1B2 B3]'By the mul tiplication r"ule, P[B1B2 B3]= P[B1JP[B2I'BdP [B3IB1B2]= 130/2174=/6'IIII \' EXAMPLE28SupposeanurncontainsMballsof whichKareblackand M- Karewhite.Asampleof sizenisdrawn.Findtheprobability thatthe jthballdrawnisblackgiventhatthesamplecontainskblack balls.(Weintuitively expecttheanswertobekin.)Wehavetocon-sider sampling (i) with replacementand (ii)without replacement. SOLUTIONLetAkdenotetheeventthatthesamplecontainsexactly kblackballsandBjdenotetheeventthatthe jthballdrawnisblack. Weseek P[Bjl Ak]'Consider (i)first. (n)Kk(M - K)"-k(n- 1)Kk-1(M _K)"-k P[AkJ=andP[Ak I BJ.J=1 kM(Ik- 1M"-byEq.(3)of Subsec.3.5.Sincethe ballsarereplaced, P[Bj]= KIM for any j.Hence, For case (ii), ( ~ ) ( ~~~ ) P[A.J=( '::) and (K- 1)(M - K) P[AI B.J=k- 1n- k kJ(M _ 1) n- 1 j-1 byEq.(5)of Subsec.3.5.P[Bj ]=L P[Bjl CdP[Ci],whereCi denotes i ==0 the event of exactlyiblack b a ~ l sin the firstj - 1 draws.Note that 40PROBABILITY and andso Finally, K-i P[B., C.]= ---JtM-j+l' ,[(K - l)(M - K)j(M - 1)]K PCB _IA]'P[AkI Bj]P[BJ =k- 1n- kn- 1Ai =~ . JkP[Akl( ~ ) ( ' ~=f) j ( ~ ) n . I Thusweobtainthesameanswerundereither methodof sampling. "t-'/IIF," Independence of eventsIf P[A IB]doesnotdependonevem:B,thatis, P[A, B] =P[A], thenit wouldseemnaturaltosaythat eventAisindependent of event B.This isgiveninthe followingdefinition. Definition19IndependenteventsForagivenprobabilityspace (0, &/, P[]),letAandBbetwoeventsin.!il.EventsAandBare defined to be independent if and only if anyone of the following conditions issatisfied: (i)P[AB] =P[A]P[B]. (ii)P[A, B]= P[A] if P[B]> O. (iii)P[B IA]P[B] if PtA]> O. IIII RemarkSomeauthorsuse" statisticallyindependent,"or "stochasti-callyindependent,"instead of" independent."/111 To argue the equivalence of the above three conditions, it sufficesto show that (i)implies (il),Oi)implies(iii),and (iii) implies(i).If P[AB]P[A]P[B], then P[A, B]= P[AB]IP[B] =P[A]P[B]fP[B] = P[A] for P[B] > 0; so (i) implies (ii).If PtA I B]= P[A],thenP[BI A] = P[A, B]P[B]IP[A] =P[A]P[B]IP[A] = P[B]for P[A] >0andP[B]> 0;so(ii)implies(iii).Andif P[BI A] = P[B], thenP[AB] = P[B,A]P[A] = P[B]P[A]forP[A]>O.ClearlyP[AB] = P[A]P[B] if P[A] =0 or P[B] =o. 3PROBABILITY-AXIOMATIC41 EXAMPLE29Consider the experiment of tossing two dice.Let A denote the eventof anoddtotal,Btheeventof anaceonthefirstdie,andC the event of atotalof seven.We posethree problems: (i)Are AandBindependent? Oi)Are AandC independent? (iii)Are Band C independent? WeobtainP[A I B] =1 = P[A],P[A I C]=I:# P[A] = 1,andP[CI B] = 1;=P[C] =!; soAandBareindependent,AisnotindependentofC, andBand C are independent.IIII The propertyof independence of twoeventsAandB and the property that AandBaremutuallyexclusivearedistinct,thoughrelated,properties.For example,twomutuallyexclusive eventsAandBare independent if andonlyif P[A]P[B] =0,whichistrueif andonlyif either AorBhaszeroprobability. Or if P[A] :# 0and P[B]=F0,thenAand Bindependentimpliesthattheyare not mutually exclusive, and A and B mutually exclusive implies that they are not independent.Independenceof AandBimplies independenceof other events as wel1. Theorem 32If AandBare two independent events defined on agiven probability space (0, d, P[]),thenAand13areindependent,A andB areindependent, and.Ifand 13are independent. PROOF P[ABJ =P[A] - P[AB]= P[A] - P[A]P[B]= P[A](l - P[BD = P[A]P[B]. Similarly for the others.IIII Thenotionofindependenteventsmaybeextendedtomorethantwo events. Definition 20IndependenceofseveraleventsFor agivenprobability space(0, d, Pl']), letA.,A2 , ,Anbeneventsind.EventsAb A2, ,All are defined to be independent if and only if P[A,Aj]= P [ A i ] P [ A ~ ]fori :# j P[Ai1jAd =P[Af ]P[Aj ]P[Aklfor i:# j,j:# k, i=Fk . . P[.OI AI]= tIl PtA,]. 1111 42PROBABILITYI Onemightinquirewhetheralltheaboveconditionsarerequiredinthe definition.For instance,doesP[AIA2 A3] = P[AtlP[A2]P[A3]implyP[AIA2] = P[AtlP[A2]?Obviouslynot,since P[A1A2 A3]=P[AdP[A2]P[A3]if P[A3] =0,but P[AIA2]#:P[AdP[A2]if AtandA2arenotindependent.Ordoes pairwiseindependenceimplyindependence?Againtheanswerisnegative, as the followingexample shows. EXAMPLE30Pairwise independence doesnot imply independence.LetAl denote the event of an odd faceon the firstdie, A2the event of an odd face on the seconddie,andA3the eventof an oddtotal in therandom experi-mentthatconsistsoftossingtwodice.P[AdP[A2]= !. != P[A1A2], P[AtlP[A3]=1 '1 =P[A31 AtlP[Ad = P[AIA3],andP[A2 A3]= i = P[A2]P[A3];soAbA2,andA3arepairwiseindependent,However P[A1A2 A3]= 0#:! = P[AdP[A2]P[A3];soAhA2,andA3arenot independent. IIII In one sense, independence and conditional probability are each used to find the same thing, namely, P[AB], for P[AB] = P[A]P[B] under independence and P[AB] =P[A IB]P[B] under nonindependence.The nature of the eventsA and Bmaymakecalculationsof P[A], P[B],andpossiblyP[A IB]easy,but direct calculationof P[AB]difficult,inwhichcaseour formulasforindependenceor conditionalprobabilitywouldallowustoavoidthedifficultdirectcalculation of P[AB].We mightnote that P[AB] =P[A IB]P[B] isvalidwhether or not A isindependentof B provided that P[A I B] isdefined. The definition of independence is usednot only to check if two given events areindependentbutalsotomodelexperiments.Forinstance,foragiven experimentthenature of the eventsAandBmightbesuchthat we are willing to assume that A and B are independent; then the definition of independence gives theprobabilityof theeventAnBintermsof P[A]andP[B].Similarlyfor more thantwoevents. EXAMPLE31Considertheexperimentof samplingwithreplacementfrom anurn containi ngMballs of whichK are black and MK white.Since balls are being replaced after each draw, it seems reasonable to assume that theoutcomeof theseconddrawisindependentof theoutcomeof the first.ThenP[twoblacksinfirsttwodraws]= P[blackonfirstdraw]P[black on second draw]=(KIM)2,IIII PROBLEMS43 PROBLEMS Tosolvesomeof theseproblemsitmaybenecessarytomakecertainassumptions, suchassamplepointsareequallylikely,ortrialsareindependent,etc.,whensuch assumptionsarenotexplicitlystated.Someof the more difficultproblems,or those that require'special knowledge,are markedwith an *. lOne urn containsoneblackballand one goldball.Asecondurn containsone white and one goldball.One ballisselectedatrandomfromeachurn. (a)Exhibita sample space forthisexperiment. (b)Exhibitthe eventspace. (c)Whatisthe probability thatbothballswillbeof the same color? (d)What isthe probabilitythat one ballwillbegreen? 2Oneurncontainsthreeredballs,twowhiteballs,andoneblueball.Asecond urn contains one redball,twowhite balls,and threeblue balls. (a)One ballisselectedatrandomfromeach urn. (i)Describeasample space forthisexperiment. (ii)Find the probabilitythatboth ballswillbe of the same color. (iii)Istheprobabilitythatbothballswillberedgreaterthantheprob-ability thatbothwillbe white? (b)The balls in the two urns are mixed together in a single urn, and then a sample of threeisdrawn.Find the probability that all three colors are represented, when (i) sampling with replacement and (ii) withoutreplacement. 3If Aand Bare disjointevents, P[A] =.5, and P[A uB] =.6,what isP[B]? 4Anurncontains fiveballsnumbered1 to5 of whichthe firstthree areblack and thelasttwoaregold.Asampleof size2isdrawnwithreplacement:LetBl denote the event that the firstball drawn isblack and B2denote the eventthat the secondballdrawnisblack. (a)Describeasamplespacefortheexperiment,and exhibittheeventsB1,B2, andB1B2 (b)Find P[B1],P[B2],and P[B1B2]' (c)Repeatparts (a)and (b)forsampling withoutreplacement. 5Acarw i t ~sixsparkplugsisknowntohavetwomalfunctioningsparkplugs. If twoplugsarepulledatrandom,whatistheprobabilityof gettingbothof the malfunctioning plugs ? 6Inanassembly-lineoperation,1of theitemsbeingproducedaredefective.If three itemsare picked at random and tested,what isthe probability: (a)That exactlyone of themwillbedefective? (b)That at leastone of themwillbe defective? 7In acertain game aparticipant isallowed three attempts at scoring ahit.In the threeattemptshemustalternatewhichhandisused;thushehastwopossible strategies:righthand,lefthand,righthand;or lefthand,righthand,lefthand. Hischanceof scoring ahitwithhisrighthandis.8,whileitisonly.5withhis left hand.If he issuccessful at the game provided that he scores at least two hits inarow,whatstrategygivesthebetterchanceof success?Answerthesame 44PROBABILITYI questionif .8isreplacedby PIand.5by P2.Doesyouranswerdependon PI and P2?. 8(a)SupposethatAandBaretwoequallystrongteams.Isitmoreprobabfe thatAwillbeatB inthreegamesoutof fouror infivegamesout of seven? ,-(b)Supposenowthat the probabilitythatAbeats Binan individualgameis p. Answer part (a).Does youranswer dependon p? 9If P[A] =tand P[B] =!, canAand Bbe disjoint?Explain. 10Prove or disprove; If P[A] =P[B] =p, thenP[AB] 1 - IX-fl. 15Prove properties (i) to(iv)of indicator functions. 16Prove the more generalstatementinTheorem19. 17Exhibit (if such exists) a probability space, denoted by (0, d, P[ D, which satisfies. the following.For AlandA2members ofd, if P[Atl =P[A2],then Al =A2. 18Four drinkers(sayI,II,III,andIV)aretorankthreedifferentbrandsof b&r (sayA,B,andC)inablindfoldtest.Eachdrinkerranksthethreebeersas.-l. (forthepeer he likesbest),2,and3,andthentheassignedranksof eachbrand of beer,are summed.Assume that the drinkers really cannot discriminate between beerssothat eachisassigninghisrankingsat random. (a)What isthe probabilitythat beer Awillreceive atotal score of 4? (b)Whatistheprobabilitythat somebeer willreceiveatotal score of (c)What isthe probability that some beer willreceive atotal score of 5 or less? 19The followingare three of theclassicalproblemsin probability. (a)Comparetheprobabilityof atotalof 9withatotalof 10whentrrreefair dice are tossedonce (Galileoand Duke of Tuscany). (b)Compare theprobabilityof atleastone6in4tossesof afairdiewitlithe, ..,. probabilityof atleastonedouble-6in24tossesof twofairdice(Chevalier deMere). (c)Compare theprobabiJityof atleastone6 whensixdicearerolledwiti- the probability of at least two 6s when twelvediceare rolled(PepystoNewton). 20A seller has a dozen small electric motors, two of whichare faulty .. A is interestedinthedozenmotors.The seller can crate the motors with all twelve in,/L one box or withsixin each of twoboxes;he knowsthat the customer willinspect twOof the twelve motors if they are all crated in one box and one motor from each' of thetwosmallerboxesif theyare cratedsixeachtotwosmallerboxes.He hasthreestrategiesinhisattempttosellthefaultymotors:(i)cratealltwelve inone box; (ii)put one faultymotor in each of the two smaller or (iii)put both of the faultymotors in one of the smaller boxesandnofaulty inthe other.Whatisthe probabilitythatthe customer willnotinspectafaultymotor under eachof thethree strategies? ". PROBLEMS45 21Asample of fiveobjectsisdrawnfromalarger population of Nobjects (N5). LetNw orNwo denotethenumberofdifferentsamplesthatcouldbedrawn depending, respectively,on whether sampling is done with or without replacement. Give the values for Nw and Nwo Show that when N is very large, these two values are approximatelyequalinthesensethattheirratioiscloseto1 butnotinthe sensethat their difference isclose to O. 22 Out of a .!oup of 25persons, what isthe probability that all 25willhave different birthdays?(Assume a365-day year and that alldays are equally likely.) 23Abridgeplayerknowsthathistwoopponentshaveexactlyfiveheartsbetween thetwoof them.Eachopponenthasthirteencards.Whatistheprobability thatthereisathree-twosplitonthehearts(thatis,oneplayerhasthreehearts andthe other two)? 24(a)If rballsarerandomlyplacedintonurns(eachballhavingprobabilitylin of goingintothefirsturn),whatistheprobabilitythatthefirsturnwill contain exactlykballs? "(b)Letn -'1-Cl)andr -7 Xlwhiler/n=mremainsconstant.Showthatthe ",-W.probapility you calculated approaches e .. lllmk/k!. ,}S')(:{biased coin hasprobability pof landing heads.Ace,Bones,and Clod toss the coinsuccessively,Acetossingfirst,untilaheadoccurs.The personwhotosses the firsthead wins.Find the probability of winning for each. *26It is told that in certain rural areas of Russia marital fortunes were once told in the following way: A girl would hold six strings in her hand with the ends protruding abQveand below; afriendwouldtietogether the sixupper ends inpairs and then tietogetherthesixlowerendsin pairs.If itturned out thatthe friendhadtied sixstringsintoatleastonering,thiswassupposedtoindicatethatthegirl ''Wouldgetmarriedwithin ayear.Whatistheprobabilitythat asingleringwill ,beformedwhenthe stringsaretiedatrandom?Whatisthe probability that at 'leastonering willbeformed?Generalizetheproblemto2nstrings. 27Mr.Bandit, a well-knownrancher and not so well-knownpart-time cattle rustler, llstwentyheadof cattlereadyformarket.Sixteenof thesecattle are hisown tfiilndconsequently bear hisown brand.The other four bear foreignbrands.Mr. Banditknowsthatthebrandinspector at themarketplace checksthebrands of .20 percent of the cattle in any shipment.He has twotrucks, one which willhaul alltwenty for allx. IX) (ii)J f(x) dx =1. III/ -IX) With this definitionwecan speak of probability density functionswithout referenceto random variables.We might note that aprobability density func-tionof acontinuousrandomvariableasdefinedinDefinition8doesindeed possessthetwoproperties inthe abovedefinition. 3.3Other Random Variables Not allrandom variables are either continuous or discrete, or not all cumulative distribution functionsare either absolutely continuous or discrete. 3 DENSITYFUNCTIONS63 ______~______~ - ___________- - ~ - - - - __~x FIGURE5 EXAMPLE8Consider the experiment of recording the delay that amotorist encounters at aone .. waytrafficstop sign.LetXbethe random variable thatrepresentsthedelaythatthemotoristexperiencesaftermakingthe required stop.Thereisacertain probability that there willbenooppos-ing trafficsothat themotoristwillbeable to proceed with nodelay.'On the other hand, if the motorist has to wait, he may have to wait for any of acontinuumof possibletimes.Thisexperimentcouldbemodeledby assumingthatXhasacumulativedistributionfunctiongivenbyFx{x) =(1- peAX)/[O,OO)(x).ThisFx(x)hasajumpofI- patx= 0butis continuous for x> O.SeeFig.5.IIII Manypracticalexamplesofcumulativedistributionfunctionsthatare partlydiscreteandpartlyabsolutelycontinuouscanbegiven.Yetthereare stillothertypesof cumulativedistributionfunctions.Therearecontinuous cumulativedistributionfunctions,calledsingularcontinuous,whosederivative is 0 at alm 0 forallpossible estimates tandall8ine and(ii)t(t;0). =0fort=1"(0).t(t;0)equalsthe loss incurred if one estimates 1"(8) to be t when 0 is the true parameter value.IIII Inagivenestimationproblemonewouldhavetodefineanappropriate 10ssJutlcfionfortheparticularproblemunderstudy.Itisameasureof the error and presumably would be greater forlarge error than forsmall error.We wouldwantthelosstobesmall;or,statedanotherway,wewanttheerror in estimation to be small, or we want the estimate to be close to what it is estimating. EXAMPLE16Severalpossiblelossfunctionsare: (i) (ii) (iii) (iv) tl(t; 8)=[t- 1"(0)]2. t 2 (t;8)=1t - 1"(8) I . t(t. 8)=fAif1t- 1"(8) I >e 3,\ 0if1 t- 1"(0) I~e,whereA> o. tit; 0)=p(O) 1t- 1"(0)1'for P(O)> 0 and r> O. tl iscalledthe squared-errorlossfunction,and t 2iscalledtheabsolute-error loss function.Note that both tl and t 2increase as the error t- 1"(8) increasesinmagnitude.t 3saysthatyoulosenothingif theestimatet iswithin e units of 1"(0)and otherwise you lose amount A.t 4is a general lossfunctionthat includesbothtl and t 2asspecialcases.IIII 298PARAMETRICPOINT ESTIMATION vn Weassume now that an appropriate loss functionhasbeen defined for our estimationproblem,andwethinkof thelossfunctionasameasureof error orloss.OurobjectistoselectanestimatorT=t(Xl,.. ,Xn)thatmakes this error or losssmall.(Admittedly, weare not considering a veryimportant, substantiveproblembyassumingthatasuitablelossfunctionisgiven.In general,selectionofanappropriatelossfunctionisnottrivial.)Theloss functionin itsfirstargument dependsontheestimatet,andtisavalueof the estimatorT;that is,t=t(Xb".,xn)Thus,our lossdependson thesample Xl' ... ,XnWe cannot hope tomake the loss small for everypossible sample, butwecan trytomakethelosssmallontheaverage.Hence,if wealterour objectiveof pickingthatestimatorthatmakesthelosssmalltopickingthat estimator thatmakestheaveragelosssmall,wecan removethedependenceof thelosson thesampleXl'""XnThisnotionisembodiedinthe following definition. Definition 12Risk functionFor a givenlossfunctiont(;.), therisk function,denotedby[!llO),of anestimatorT= t(Xl,.. ,Xn)isdefined tobe [!liO)=8]. (iv)p(O)Co[!T- T(O)I''j.1/11 4 SUFFICIENCY299 Our object now isto select anestimator that makesthe averageloss (risk) smalland ideallyselectanestimatorthathasthesmallestrisk.Tohelpmeet thisobjective,weuse the concept of admissible estimators. Definition13AdmissibleestimatorFortwoestimators~= t1(X1,... ,Xn)andT2= t2(X1,... ,Xn),estimatort1isdefinedtobea better estimator thant 2if and onlyif forall(Jine and forat least one(Jine. An estimatorT= t(X1'... ,Xn)isdefinedtobe admissible if and onlyif there isnobetter estimator.IIII In general, given two estimators t1and t2neither isbetter than the other; thatis,theirrespectiveriskfunctionsasfunctionsof (J,cross.Weobserved thissamephenomenonwhenwestudiedthemean-squarederror.Here,as there,therewillnot, in general,exist anestimator withuniformlysmallest risk. The problem isthe dependenceof theriskfunctionon(J.Whatwemight do isaverageout(J,justasweaverageoutthedependenceonx.,... ,Xnwhen goingfromthelossfunctiontotheriskfunction.Thequestionthenis:Just howshould(Jbeaveragedout?WewillconsiderjustthisprobleminSec.7 on the Bayes estimators.Anotherwayof removing the dependenceof the risk function on (Jisto replace the risk functionbyits maximum value and compare estimatorsbylookingattheirrespectivemaximumrisks,naturallypreferring thatestimatorwithsmallestmaximumrisk.Suchanestimatorissaidtobe minimax. Definition 14MinimaxAnestimatort*isdefinedtobeaminimax estimator if and only if sup 9ft*(J)< sup 9ft1.Forn= 1,I{o}(X1)isanunbiased estimatorwhichisa functionof thecompletesufficientstatisticX,and hence I(o}(X1)itselfis the UMVUE of e-AThe reader maywant to ~ e r i v e themeanand varianceof 328PARAMETRICPOINT ESTIMATION VII andcomparethemwiththemeanandvarianceoftheestimator II (lin) II{o}(X,)given in Example30. IIII i=l EXAMPLE35LetXl'... ,XIIbearandomsamplefromI(x;0)= Oe-8xI(o.oo)(X).Our object is to find the UMVUE of each of the following functionsof theparameter0:0,1/0,ande-K8 = P[X >K]forgivenK. Since Oe-8xI(o.oolx) is amember of the exponential class (see Example 24), n the statistic S=IXiiscompleteand sufficient. 1 II X n=(lIn) IXi'whichisafunctionofthecompletesufficient iI II statistic S =IXi, is an unbiased estimator of 110;hencebyTheorem 10, i= 1 X IIistheUMVUE of 110. TofindtheUMVUEof 0,onemightsuspectthat theestimatoris n of the form elI Xi'wheree isaconstant which maydependon n.Now 1 forn >1.So~ 9 [ e I IXi]= 0whene = n - 1;hence (n- l)/I Xiis the UMVUEof0forn> 1.Thevarianceof(n- l)/I Xiisgivenby 02j(n - 2)forn > 2. Althoughonemightbeabletoguesswhichfunctionof S= IXi isanunbiasedestimatorfore-K8,letusderivethedesiredestimatorby starting with the followingsimpleunbiased estimator of e-K9: I(K.oo)(X1), Note that C8[I(K,oolX1)]=0. P[XI< K] + 1 . P[XI> K] =P[XI> K] = e-K9; soI(K,oo)(X1)is indeed an unbiased estimator of e-K8, and therefore byTheorems8and10C9[I(K,oo)(X1)IS]istheUMVUE of e-K9.Now, C 8 [I(K , oo)(XI)IS = s]=P[I(K. oo)(X1)= 11 S = s]= P[X1 >K IS = s).In order to obtain P[XI > KIS= s], we will first find the conditional distribu.. tion of Xlgiven S =s. sUNBIASEDESTIMATION329 IX1IS .(Xll s) ~ X I IXbSK!S=s]= fKoofxdS=.,cXlls)dX1 _ JSr(n)(s- Xt)"-2 - Kr(n-l)S,,-1dXt n-1 fO =,,-1y"-2( -dy) ss-K n- 1y,,-ts-K -S,,-ln- 10 for s > Kand n >1, where the substitution y= s - Xlwas made.Hence, 330pARAMETRICPOINTESTIMATION VII istheUMVUE fore-K(Jforn > 1.(Actuallytheestimator isapplicable forn =1 aswell.)It maybeof interestandwouldserveasacheckto verifydirectly that isunbiased. wherethesubstitution u =s- Kwasmade. //1/ Inclosingthissectiononunbiasedestimation,wemakeseveralremarks. RemarkF orsomefunctionsoftheparameterthereisnounbiased estimator.Forexample,inasampleof size1 fromabinomialdensity there isnounbiased estimator for1/8.Suppose there were;letT=t(X) denoteit.Then8(J[T]=t(x) (n)9"(1- 8)n-x = 1/8, which says that x=Ox an nth-degreepolynomial in 8 is identical to 1/8,which cannot be.///1 RemarkWementioned in Subsec.5.1that the Cramer-Rao lower bound isnotnecessarilythebestlowerbound.For example,theCramer-Rao lowerboundforthevarianceof unbiasedestimatorsof8insampling from the negative exponential distribution is given by 82/n (see Example 29), andthevarianceof theUMVUEof 8isgivenby82/(n- 2)(seeEx-ample 35).82/(n- 2) isnecessarilythebestlowerbound.//11 RemarkForsomeestimationproblemsthereisanunbiasedestimator but noUMVUE.Consider thefollowingexample.1/1/ EXAMPLE36LetXb... ,Xnbearandom sample from theuniform density overtheinterval(8,8 + 1].Wewanttoestimate8.X n-! and (Y1 + Yn)/2 -! areunbiasedestimatorsof8,yetthereisnoUMVUE 6 LOCATIONORSCALEJNVARlANCE331 of (J.For fixed0 O.IIII Anumberof theestimatorsthatwehaveconsideredarescale-invariant, includingXn,JS2,(Yl + Yn)/2,andYn - Yl.Ourdiscussionofscale-invariantestimators willbelimited to problemsconcerningestimationof scale parameters definedbelow. Definition 27ScaleparameterLet{f( .;8),8> O}beafamilyof densities indexed by a real parameter 8.The parameter 8 is defined to be a scale parameter if and only if the density I(x; 8) can be written as (1/8)h(xI8) forsomedensityh().Equivalently,8isascaleparameterforthe density Ix(x;8)of arandom variableXif andonlyif thedistributionof XI8 isindependentof 8./ / / / Note that if 8 is a scale parameter for the family of densities {/('; 8),8 > O}, then the densityh( . )of thedefinition isgivenbyhex) = I(x;1). 6 LOCATIONORSCALEINVARIANCE337 EXAMPLE40Wegiveseveralexamplesof scaleparameters.If f(x;1) = (l/l)e-x/Al0, t;1)ix),then1isascaleparametersincee ~ Y1(0, oo)(Y)isa density.~ o t ethatthisparameterizationofthenegativeexponential distribution isnot the parameterization thatwehaveusedpreviously. If f(x;6)= "'0, .,(x) = J;1tlT exp [ - H ~ )1 then(Jisa scale parameter since(I/J2ic) exp (-ty2) isadensity. If f(x;0)= (1/8)1(0. 9) (x)= (l/8)I(o.1)(xl8),then0isa scale param-eter sinceI(O.l)(Y)isa density. If f(x;8)= (lIO)I(9.29)(X)= (1/8)1(1, 2lx/8),then 8 isa scale param-eter since 1(1. 2)(Y)isa density.II/I Oursoleresultforscaleinvariance,aresultthatiscomparabletothe resultof TheoremlIon location invariance,requiresa slightlydifferentframe work.Instead of measuring error with squared-error loss function wemeasure it with the lossfunction t(t; 8)= (t - 6)2162 = (tiO- 1)2.If It - OJrepresents error, then 1001 t- 01/6 can bethought of as percent error, and then (t- 8)2/62 isproportional to percent error squared.Westate the following theorem, also fromPitman[41],withoutproof. Theorem12LetXb., Xnbearandomsamplefromthedensityf(';8), where0> 0isascaleparameter.Assumethatf(x;0)= 0forx< 0; thatis,therandomvariablesXiassumeonlypositivevalues.Within the classof scale-invariantestimators, theestimator hasuniformlysmallestrisk forthelossfunctiont(t; 0)= (t- 8)2/02./1// ... Definition 28PItmanestimatorforscaleTheestimatorgiveninEq. (18)isdefinedtobethe Pitmanestimator forscale./111 RemarkThePitmanestimatorforscaleisafunctionofsufficient statistics.I III 338PAR.AMEI'RlCPOINT ESTIMATIONVII EXAMPLE41LetXl'... ,Xnbe arandom sample from adensity I(x; lJ)= (1/0)/(0,6)(x).The Pitman estimator forthe scale parameter B is frO (1102) fI (1/0)1(0, 8)(X,) dOf 00 o-n- 2dO o1=1_ _ Y..:..:;.."___ 00n00 fo(1/03) lU1 (1/0)/(0. 6)(X,)dOfy" o-n- 3dO {1/[(n+ 2)_1]}yn-(n+2)+1n+ 2 = {1/[(n+ 3)- 1]}yn-lj]= IIp = mnp, p= P[Xi > Yj]= f P[Y < x I X= x]fx(x) dx = f F f(x)fx(x) dx. If .J'f 0istrue, 1 P =f Fx(x)fx(X) dx =fou du= t SimilarlY,thevarianceof Ucanbefound.Thederivationissomewhatmore complicatedsinceoneneedstheexpectedvalueofU2Fromthemeanand varianceofU,themeanandvarianceof Txcanbeobtained.If.J'f 0istrue, theyare givenby and [rr]mn(m+ n + 1) var.Lx=12 (23) Theexactdistributionof Txturnsout tobeaverytroublesomeproblem forlarge mand n.However,Mannand Whitney have calculated thedistribu-tion for small m and n,have shown that Txis approximately normally distributed forlargemandn,andhavedemonstratedthatthenormalapproximationis quite accurate when mand n are larger than 7.Thus for samples of reasonable sizeonecanusethenormalapproximationwithmeanandvariancegivenby Eqs.(22) and (23) to finda critical region for testing .J'f 0:F x(z)= F y(z) for allz versus.J'f 1:F x(Z)=1=F y(z).Thetest would bethe following: Reject .J'f 0ifI Tx- G[Tx] I islarge; 524NON PARAMETRICMETHODSXI that is, Reject :K0ifand only if1 Tx- 8[Tx] I~k, wherekisdeterminedbyfixingthesizeof thetestandusingtheasymptotic normal distributionof Tx. EXAMPLE5FindtheexactdistributionofTxunder:K 0form=3and n= 2.Eachofthefollowingarrangementsisequallylikelyif:K0is true: x x x yy, x x y x y, x x yy x,x y x x y,x y x y x, x yy x x,y x x x y,y x x y x,y x y x x,yy x x x. The corresponding Txvalues are, respectively, 6, 7,8, 8,9,10,9, 10,11,12; so P[ Tx= 6]=P[ Tx= 7]=110' P[Tx = 8]= P[Tx = 9]= P[Tx =10]= lo' and P[Tx = 11]=P[Tx =12]=/0' IIII PROBLEMS 1n 1Show thatT= - L IB(X,)isanunbiasedestimator of P[X EB].Find varrll, n,= 1 and show thatT isamean-squared-error consistent estimator of P[X EB]. 1n 2Define Fn(BJ)=- L IBiXt) for j=1,2.Find cov[Fn(B1), Fn(Bz)]. n,=l 3LetY1, ,Yn be the order statistics corresponding to a random sample of size n from acontinuous c.d.f. F(' ). (a)Find the density of F( Yj ). (b)Find the joint density of F( Yt)and F( Yj). (c)Find the density of [F(Yn) - F(Yz)]/[F(Yn)- F(Y1)]. 4LetXl, ... , Xnbeindependentandidenticallydistributedrandomvariables havingcommoncontinuousc.d.f.F( .).LetYI .90. 11Test as many ways as you know how at the 5 percent level thatthefollowingtwo samples came from the samepopulation: x1.31.41.41.51.71.91.9 y1.61.82.02.12.12.22.3 12LetXl,... ,Xsdenotearandomsampleof size5fromthedensity f(x;fJ)= 1(6 - t.6+ t)(x).Consider estimating O. (a)Determine the confidence coefficientof the confidence interval (Yl,Ys). (b)Find a confidence interval for 0 that has the same confidence coefficient as in part (a)using thepivotal quantity (Yl + Ys)/2 - O. (c)Compare the expected lengths of the confidence intervals of parts (a)and (b). 13Find var[U]whenFx(') - FA).SeeEq.(20). 14Equation (21) shows that Uand Txare linearly related.Find the exact distribution ofU orTxwhenJIt' 0istrueforsmallsamplesizes.For example,takem =1, n=2; m= 1,n=3; m=2,n=1 ; m= 3,n =1; and m=n =2. 15WesawthatG[U] =mnp.IsU/mnanunbiasedestimatorof p=P[Xt >Yj ] whether or not JIt' 0istrue?IsUaconsistent estimator of p? 16AcommonmeasureofassociationforrandomvariablesXandYistherank correlation, or Spearman's correlation.The Xvalues are ranked, and the observa-tions are replaced by their ranks; similarly theYobservations are replaced by their ranks.For example, for asample of size5 the observations x20.419.721.820.120.7 y9.28.911.49.410.3 526NONPAllAMETRICMETHODS are replaced by r(x) r(y) 31 21 XI 524 534 Letr(Xt)denotetherankofXIandr( Yt)the rankofYt Usingthesepaired ranks,the ordinary sample correlation iscomputed: [r(XI) - i(X)][r( Yt) - i( Y)] Spearman's correlation =S =-::::========;;;;:====== , [r(XI) - i(X)]Z[r( Yt) - i( Y)]2 where f(X) =2: r(Xt)!n and f(Y) =2: r(Yt)/n. (a)Showthat S =1 - 6 2:D1/(n3 - n), whereDt= r(Xt) - r(Yt ). (b)Compute the ordinary correlation and Spearman's correlation for the above data. 17ArgUethatthedistributionof SinProb.16isindependentof theformof the distributions of XandY provided that XandYare continuous and independently distributedrandom variables.Hence Scanbeusedasateststatistic inanon-parametric test of the null hypothesisof independence. 18Show that the meanand variance of S(inProb.17)under the hypothesis of inde-pendence are 0 and1/(n - 1), respectively. 1INTRODUCTION APPENDIXA MATHEMATICALADDENDUM Thepurpose of thisappendixisto providethe reader witha ready referencetosome mathematicalresultsthatareusedinthebook.Thisappendixisdividedintotwo mainsections:Thefirst,Sec.2below,givesresultsthatare,forthemost part, com-binatorialinnature,and thelastgivesresults fromcalculus.No attempt ismadeto prove these results,although sometimes amethod of proof isindicated. 2NONCALCULUS 2.1SummationandProductNotation Asumof termssuchasn3+ n4+ ns+ n6+ n7isoftendesignatedbythesymbol 7 ~ni.~is the capital Greek letter sigma, and in this connection it is often called the 1=3 summationsign.The letter I iscalledthe summationindex.The term fol1owingL is calledthe summand.The" i =3 "below~indicatesthat the firstterm of the sum is obtainedbyputtingi =3 inthe summand.The" 7 "abovethe L indicatesthat the 528MATHEMATICALADDENDUM APPENDIXA finalterm of the sum isobtained byputting i = 7 in the summand.The other terms of the sum are obtained by giving ithe integral values between the limits 3 and 7.Thus 5 L:(-I)J-2jx1J =2x4 - 3x6 + 4x8 - 5x10 J=2 Ananalogousnotationforaproductisobtainedbysubstitutingthecapital Greek letter n forL:.In thiscasethe terms resulting fromsubstituting the integers forthe index are multipliedinstead of added.Thus EXAMPLE1Some useful formulas involving summations are listedbelow. They can beproved using mathematical induction. ii = n(n + 1).(1) l.d2 ii2=n(n + 1)(2n + 1).(2) 1=16 ~;4 =n(n + 1)(2n + 1)(3n2 + 3n - 1). t ~ 130(4) Equation (1)can beusedto derive the following formula for an arithmetic series or progression: " L:[a + (j - l)d] = na J=1 d 2n(n-l). A companion series,the finite geometric series,or progression,isgiven by (5) II-I1 -r" L:ar} = a.(6) J-O1 - r II1I 2.2Factorial and CombinatorialSymbolsand Conventions Aproduct of apositive integer nbyall the positive integers smaller than it isusually denotedbyn!(read" n/actorial").Thus II-I n! = n(n - 1)(n - 2).... 1= n (n- j).(7) J=O O!isdefinedto be1. 2 NONCALCULUS529 Aproductof apositiveintegernbythenext k- 1smallerpositiveintegersis usuallydenoted by (n)" 'Thus (n)"=n(n1)..... (n- k+ 1) " =n (nj1),(8) J=1 Note that there are kterms in the product in Eq.(8). Remark(n)"= n!/(n - k)!,and(n)n=n!/O!= nJ.Thecombinatorialsymbol (;) is defined as follows: (n)=(nh kk! n! (9) (nk)!k!' (;) isread.. combinationofnthingstakingkat atime"ormorebrieflyas "npick k"; it isalso called abinomial coefficient.Define Remark ( ~ )=(:) = 1. (;) = (n~ k ) ' (n : I) =( ~ )+ (k ~I) ifkO LogisticF(x) = [1 e -(x- Cl)/II]-1 -oo0 8xo f(x)=x'+11(Jl:o'oo)(x) 8>08-1 for8> 1 Gumbelor F(x) =exp (_e-(JI:-IX)/II) -oo 1 , Fdistribution f(x)=I'[(m + n)/2](m)m/2 m, n =1.2,... n I'(m/2)I'(n/2)nn-2 x(m-Z)/2 forn>2 x[1+ (m/n)x](m+n)/Zl(o.oo)(x)

Chi-square f(x)=1 (x) k1,2, ...k , distribution I'(kj2)(0,00) ,1lI " 2CONTINUOUSDISTRIBUTIONS543 Moments p.: =8[X'] Varianceor p., =8[(X - p.)"]Moment generating a2= t8'[(X - p.)Z]and/or cumuJants K,function&[etx] a-Z1b[r(1+ 2b-1) p.: =a-r1br( 1 t8'[Xt] =a-rfbr( 1 + i) - rZ(1+ b-1)] flZ1J'2 ellt1J'fll csc(1J'flt) 3 8xS8x'O for 8> r doesnot (8 - 1)2(8 - 2) p.:=O exist -r for 8> 2 1J'Z{32 Kr=(-for r2,ellfrO- fl/) 6 whereisdigamma function for t