stastical methods for hazards and heatlh (bishop y., 1977)

Upload: diego-quartulli

Post on 14-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Stastical Methods for Hazards and Heatlh (Bishop Y., 1977)

    1/10

    Brogan & Partners

    Statistical Methods for Hazards and HealthAuthor(s): Yvonne M. M. BishopSource: Environmental Health Perspectives, Vol. 20 (Oct., 1977), pp. 149-157Published by: Brogan & PartnersStable URL: http://www.jstor.org/stable/3428653 .

    Accessed: 01/10/2013 15:59

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

    .JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of

    content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

    of scholarship. For more information about JSTOR, please contact [email protected].

    .

    The National Institute of Environmental Health Sciences (NIEHS) andBrogan & Partners are collaborating

    with JSTOR to digitize, preserve and extend access toEnvironmental Health Perspectives.

    http://www.jstor.org

    http://www.jstor.org/action/showPublisher?publisherCode=brogparthttp://www.jstor.org/stable/3428653?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/3428653?origin=JSTOR-pdfhttp://www.jstor.org/action/showPublisher?publisherCode=brogpart
  • 7/27/2019 Stastical Methods for Hazards and Heatlh (Bishop Y., 1977)

    2/10

    EnvironmentalHealth PerspectivesVol. 20, pp. 149-157) 1977

    Statistical Methods f o r Hazardsa n d H e a l t hby Yvonne M.M.Bishope

    Theobjective f thisarticle s todocumentheneed orfurtherdevelopmentfstatisticalmethodology,trainingof morestatisticians nd improvedcommunication etweenstatisticians nd the manyotherdisciplinesngagedn enuronmental esearch.Discussionf adequacy f the current tatisffcalmethodol-ogyrequiresheuseof examples,whichwillhopefullynotbe offensiveotheauthors.References madetorecentdevelopmentsndareasof unsolved roblems elineatedn threebroadareas:enumerationataandadjusted ates,timeseries;andmultiple egression.A briefoutlineoftheideasbehindcurrentmethodsof analyzingdiscretedatais followedbya demon-strationof theirutilityusinganexample f theeffectsofexposure, ex,andeducation nbronchitisates.Examples relistedof theubiquity f the timecomponentwhenrelatingpollution ffects oeachotherandtohealthef&cts.Anartificial xamples used oemphasizeheeffectsof time-dependentutocorrela-tionss rends,andcycles.References regiven o a varietyof newdevelopmentsntime-seriesnalysis.Discussion f thepitfalls n multiple egression nalysis,andpossiblealternative pproachess largelybasedontworecentreviews nd ncludes eferencesorecentdevelopmentsfrobust echniques.

    design requirements. As the need for bettermethodologycannot be appreciatedunless the de-ficiencies of the present state-of-the-artare con-sidered, examples will be given where the infor-mationobtainedfrom the availabledata is not op-timum Examplesfor this purposehave been takenfroma Chess monograph 4). In some instancesthestate of the art has improvedsince this work wasdone; in other areas many deficiencies still exist.The purposeof usingthese examplesis not to criti-cize but to demonstrate he importanceof improv-ingouranalytic echniques.The introductoryoverview to the Chess mono-graph cites two statistical methodologies, generallinearregressionfor quantitativevariablesandgen-eral linear models for categoricalresponses (44).The similarity of the two methods is stressed.Below we show how the emphasison this similarityhas led the authors to report their analyses ofcategorical models inappropriatelyand generallyinadequatelyexploit the strengthsof the analytictechnique.We discuss the problemsof time seriesand why linear regressiontechniquesare inappro-priate for their analysis. Some of the modernad-vances in fitting linear and nonlinear models toquantitativevariables are mentioned briefly. Weconcludethatthe 1970task force recommendationsshouldbe stressedonce again.

    October 1977 149

    IntroductionDramaticepisodes of fog or smog accompaniedby notably increasedmortalityand morbidityhave

    convinced us that pollutedair affects health (1-3).Now we mustdeterminemorepreciselyhow muchpollutionandwhat type of pollutioncauses disabil-ity. Both the exposure variable "air quality" andthe outcome variable"health effects" are hardtodefine and measure. Much discussion centers onthe reliabilityand validityof specific measures;in-creasingly,attentionis being paidto numerousan-cillary factors or covariates that iniEluence os-tulatedrelationships.All these issues are of crucialimportance n designinggood studies and point tothe needfor interdisciplinarynputwhenstudiesarebeing designed. If a study is poorly designed noamount of subsequent statistical legerdemainwillproduce meaningfulresults. ConverselySeven thebest designedstudiescan lead to misleading onclu-sions if the data are inadequately analyzed. Weneedbothgood designandgood analysis.This paper addresses only the issue of dataanalysisandignoresstudydesign, except insofarasimprovementsof analytictechniqueswill reflect on*HarvardSchool of Public Health Boston, Massachusetts02115.

    This content downloaded from 186.18.32.91 on Tue, 1 Oct 2013 15:59:06 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/27/2019 Stastical Methods for Hazards and Heatlh (Bishop Y., 1977)

    3/10

    on the choice of technique. Furtherdiscussion ofcomparisons between techniques has been givenelsewhere (7, 8).A well-fittingmodel is selected by a process oftrial and error, and it includes those main effectsand interactionswhich are large. The maineffectsand interactions hat do not improvethe goodness-of-fit are discarded. We often declare that the ef-fects that are includedare "significant"and thosethatarediscardedare "not significant.' Indeed,wemayfinish upwitha tableresemblingan analysisofvariance table. Such a table will list effects ofimportance,andgivenanindication f howtheover-allgoodness-of-fitwouldbe changed f eacheffect isexcluded from the model. The degrees of freedomassociated with these measure-of-fitstatistics aredeterminedfrom the numberof categories in therelevantvariables.The most commonlyused mea-sures are asymptoticallydistributedaccording tothe chi-squaredistributionandso the probability fobservinga valueas largeor larger hanvaluetabu-latedmaybe readilyobtained.How Does This Help Us?

    Fittingmodelsmay be helpful ntwo ways: (a)wecan determinewhicheffects areof importance,and(b) we can use the fitted estimates obtainedunderthe model in orderto obtain meaningful ummarystatistics. In our example above, meaningful um-mary statistics might be bronchitisrates for eachexposure area adjustedfor differences in the sexand agedistributionsnthe areas.The modelscan be extendedto includemanyvar-iables. As an exampleof thetype of situationwherethey are of value we includeTables 1-3which aretakenfromthe Rocky Mountain tudies(4). Inspec-tion of the first Tables I and 2 indicates that wehave the following five variables:bronchitisstwocategories, yes or no; sex, two categories; educa-tion, three categories; age, four categories; expo-sureareatwo categories.Multiplyingtogether the number of categoriestells us thateach person is distributedntoone of 96cells. It is difElculto interpretTable 3 because suf-ficient information n whichmodelwas fitted is notgiven. If we assume(a) thatsex, educationandagearerelated o bronchitisrates, (b)exposurearea hasno effect on bronchitisrates, (c) the numbers ofpersons in each sex-education-age categorydiffersby exposurearea,and (d)thatno multifactor ffectsare present, then the model fitted would have thetermsshown in Table 4, each with theirassociateddegreesof freedom,oneforeach parameter.

    150 Environmental Health Perspectives

    Enumeration Data-ancx

    Adjusted RatesWhat Is a Log-Linear Model?

    In recentyearstherehasbeen muchdevelopmentin the handlingof discrete data that have manycategoricalvariables. Most authorsagree that theinteractionsbetween the variablescan best be de-termined by fitting models that are linear in thelogarithmic cale.Suppose we are interested in the effect of thethree variablessex, age, and exposure area on theprevalenceof bronchitis.The most complex modelstates thateach of the threevariableshas a propor-tional effect on the bronchitisrate, and that eachpairof variablesmaymodifythe effect of the other,and indeed that all three variables may have ajoint effect. This is equivalentto sayingthatthe ef-fect of age on the bronchitisrateis not the sameforeach sex, and thatthe magnitudeof this interaction

    .varles zetween exposure areas. We say that thismodel includes the four-factor interactionbronchitis-age-sex-area. At the otherextreme, thesimplestmodelstates thatthe bronchitisrateis con-stantfor every sex-age-area combination.Betweenthe most complex and the simplest model we canchoose froma largevariety of intermediatemodelsseach postulatingdifferent combinationsof simpleproportionalmain effects and interactioneffects.Each mainor interactioneffect is representedby aterm in the log-linearmodel. Analysis consists ofdeterminingwhich intermediatemodelfits the datawell and is not appreciably improved by addingmoreterms.

    How Do We Choose a Model?Althoughmost authorsare agreeduponthe gen-eralutilityof the log-linearmodelapproach, hereissome disagreementover the methods of obtainingestimates under a specific model and determininghow well these estimates fit the observed data.Most of the proposed methods such as maximumlikelihood, least squares, or minimumchi-squareusuallyyield comparable f not identicalestimates,and the probability levels associated with thegoodness-of-fitstatistics are in general very close.us a t lough we can chose froma varietyof tech-niquesfor fittingmodels to a particular ataset, thefinal selection of a suitablemodel is not dependent

    This content downloaded from 186.18.32.91 on Tue, 1 Oct 2013 15:59:06 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/27/2019 Stastical Methods for Hazards and Heatlh (Bishop Y., 1977)

    4/10

    Table1. Smoking- ndsex-specific revalence ates(percent) or chronicbronchitisbyeducationandage"bNonsmokers Ex-smokers SmokersCategory Mothers Fathers Mothers Fathers Mothers Fathers

    Education:High school 1.36 1.95 2.51 2.13 10.85 19.46Ages29 1.09 0.00 2.63 0.00 13.07 14.553s39 1.31 0.68 3.86 2.72 11.17 15.594W9 2.63 4.10 2.38 3.24 14.95 21.0050 2.61 6.25 0.00 5.06 7.69 28.41

    aDatafromChess monograph4).bChronic ronchitisratesare equivalent o cruderates or symptom severities 6 and7.

    Table 2. Prevalence of chronic bronchitis in nonindustrially exposed parents: individual and pooled community rates (percent)by sex and smoking status'lSex- and age-adjusted ratesNonsmokers Ex-smokers Smokers Non- Ex-

    Community Mothers Fathers Mothers Fathers Mothers Fathers smokers smokers SmokersPooled low 1.08 1.25 3.12 1.45 11.78 17.05 1.08 2.46 14.00Low I 1.36 1.10 3.05 0.00 8.72 12.44Low II 0.48 0.00 4.55 5.31 14.15 20.86Low 111 1.06 4.88 0.00 0.00 13.68 20.00Pooled high 2.54 3.47 2.80 4.82 12.88 18.63 3.12 3.56 15.72High I 3.56 4.90 1.79 4.72 13.83 18.40High II 1.50 2.00 3.92 4.95 11.75 18.85

    aData from Chess monograph (4?.

    Table 3. Analysis of variance for health observations in smokers and nonsmokers, chronic bronchitis. bDegrees Smokers Nonsmokersa

    Factor freedom X2 Probability (p) X2 Probability (p)Sex 1 14.70

  • 7/27/2019 Stastical Methods for Hazards and Heatlh (Bishop Y., 1977)

    5/10

    dered categories (10-14)and methodsfor comput-ing variancesfor certaintypes of estimates. Thereis still need for further development of methodssuitable for a mixtureof discrete and continuousvariables.Time SeriesWhyDoWeNeedtoLookat Them?

    The following are examples of situationswheretherelationshipsbetweentwo ormoreseriesof datacollected over time are of currentinterest: (1) as-sessing the performance of a new pollution-measuringdevice comparedwith thatof a standarddevice in the field; (2) determiningwhether adja-cent stations monitoring he air in a city are givingcomparabledata or whether there are real differ-ences in air quality in neighboring regions;(3) determiningwhethercentralmonitoring tationsgive a true pictureof individualexposure by com-paring heirreadingswith personaldosimeterread-ings; (4) relatingfluctuations n indices of diseasesuch as deaths, hospital visits or exacerbation ofsymptomsto measuresof air quality;(S) assessingtheextent to whichdifferentpollutants ncreaseanddecreasesimultaneouslyorwith a consistentlagbe-tween peaks;(6) predictionof the future evels of agiven series so that the effects of interventionmaybe assessed.Thus the relationshipof various time series iscentral o relatingenvironmental ndhealtheffects.WhyIs a SimpleCorrelationNot Informative?

    In each of the situations cited above attemptshave been madeto use simplecorrelationsas mea-sures of the association between two time series.This approachcan be criticized on several levels.Range of Obser2nations. If each serialmeasurementcould be regardedas independentofall precedingmeasurements(which is usually un-true)andwas takenfroma normaldistribution hencorrelationwould be a reasonableapproach.How-ever when observing natural phenomenon thestrengthof the associationwill dependon the rangeof values that occurred during the observationperiod.As an illustration,considerFigures la and lb. lnFigure la, two lines, markedA and B, areconnect-ing a series of points. The points were obtainedfrom a table of randomnormaldeviates (15). Thusthe points are independent observations from anormaldistributionwith mean of zero andvariance

    to two. Withthis reductionwe would have 32 cellsand be fitting 20 parameters,giving 12 degrees offreedom.The additionof the effect of exposureonbronchitiswould bringus to 11degrees of freedomas given in Table 3. This example has been citedlaboriouslyto illustratethe importanceof specify-ingwhichmodelwas fitted.There were further problems in understandingTable 3. Apparentlytwo separatemodels were fit-ted, one to smokersandtheotherto nonsmokers. fwe look at the first line of the table we see x2values for sex and educationare larger or smokersthanfor nonsmokers.We mightsuspect thatsmok-ing had a synergistic influence and enhanced theeffects of age and education. Such a suspicionwould be unjustified f the sampleof smokerswaslargerthan the sample of nonsmokers.We cannotmake the assumption because x2 values increasewith largersamplesizes, even when the interactioneffect they reflectremainsconstant.We couldread-ily evaluate the possibility of smoking affectingother interactions by the simple procedureof ad-ding smokingas a sixth variable to the other fivevariablesalreadyin the model. Then we could de-termine the magnitudeof possible three-factoref-fects one relatingsmoking-sex-bronchitisandtheotherrelating moking-education-bronchitis.If we turn to the second purpose of modelfitting to enable us to adjustrates for several un-derlyingvariablessimultaneously we findthatthisstrengthof the procedurehas been ignored.All theratesgiven are eithercruderates, or adjusted oratmosttwo variablesusingcrudespecificrates.What Improvements Are Needed?

    ln conclusion, the full strengthsof the methodol-ogy were not used: (I) variableswere reducedtotwo categoriesthuslosinginformation, 2) smokingwas not includedas a variable, husits effect cannotbe assessed fromthe resultsgiven, (3) the particu-lar model fitted could only be inferred, thus itsgoodness-of-fitstatisticsareof no value, (4) the fit-ted valueswere not used to computeadjustedrates.Some of the difficultiesnoted above stem fromtheattemptto presentthe results in a tableformatthatresemblesanalysis of variancefor continuousdata..Althoughthere are similarities n that models arebeing fitted, it is important o distinguishbetweenthe strengthsof the differentmethodologiesappro-priate for differenttypes of data (9). Thus the in-adequacieswerelargelydueto a lack of understand-ing of the methodology.This indicates a need forbetter raining ndcommunication.Since 1970,furtheradvances in technologyhavebeen made, notably methods for dealing with or-152 Environmental Health Perspectives

    This content downloaded from 186.18.32.91 on Tue, 1 Oct 2013 15:59:06 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/27/2019 Stastical Methods for Hazards and Heatlh (Bishop Y., 1977)

    6/10

    0 of one unit. Theoretically the two series of inde-1 ^ pendent observations have a correlation of zero. By

    31 tI chance we have an observed value of r - 0.37. In! reo.37 ./ \ Figure lb we have introduced linear trends by2 / i adding to these random deviates a difference of 0.2. ,+ / between successive measurements on line A, and1 A /N1 w/ \ 1{ Al differences of 0.1 for line B. The correlation we

    D \, , i \ t {iX,/ $hz now compute is increased to r = 0.43. If we were to,' \ ' ' / < 9t ,' ' introduce steeper trends by adding larger constants,B \ / \ \ / \ we would get larger values of r

    X S '! \, 8 a, Clearly, in periods of relative stability of the un--2- > 2 derlyingphenomenathe values we-obtain represent\' noise about the constant true value, as in Figure lta,

    3 and the correlation between the two series will nota | t | t t 1 1 1 \ \ \ differ significantly from zero. lf we measure both

    1 5 10 15 phenomena during a period when both are subjectDaYS to a seasonal trend, as in Figure lb we will increase

    our apparent correlation If we measure during a,t, period when there is a period of stability and a

    3 / 8 period when both phenomena have a trend we will/ 1 obtain an intermediate value for r. Before comput-2- r0.43 8 l, ing a correlation coefficient, it is necessary to con-

    ^ , /\ < sider whether the series have common large shifts,1 , A' / ' whether we need to distinguish short-term associa-O 8 ," > / \1 V tion from general seasonal trends and in fact to con-' ! A' rg V ' sider carefully the hypothetical model we arel ^S \ , / < 'I t X evaluating.

    -2- \\/;1 the time series data of interest cannot be regardedVli as independent observations as we did in the pre-3' 1f) ceding section. We have only to consider a familiar

    measure such as minimum 24-hr temperature to ap-l 1 5 t 1 i 1 1 t t | 1 I preciate that the possible values for a particular dayfall within a range determined by knowing the time

    3 of year and can be defined even more closelybyknowing the values for immediately preceding days.

    24 Thus the series are autocorrelated; the values for.0.66 day t are related to those for day t - 1, and so on.

    A This autocorrelation invalidates the use of regres-0 \ sion or multiple regression techniques designed for,} ',,\ ,1\ independent observations. The effect of autocor-

    B

  • 7/27/2019 Stastical Methods for Hazards and Heatlh (Bishop Y., 1977)

    7/10

    Box and other authors (19-21) have beendevelopingsuch specific modelsfor carbonmonox-ide in Los Angeles to study the effect of changes inmethods of instrument alibrationandthe effect ofvarious control measures.The noise inherent in any system together withthe limitationsof the lengths of the series, usuallyrequires hatsome form of smoothing s carriedoutduring he analysis. Researchersat Princetonhavebeen making rapid advances in development ofthese techniques and are conductingMonte Carlosimulations o evaluate differentapproaches.Thusagainthe researchis in progressbut much needs tobe done before the relative advantagesof differentstrategiesare fully understood(22-24).

    Multiple RegressionWhen Are Least-Squares Fitsa Poor Choice?

    Pitfalls in the interpretation of linear least-squaresregressionrelating o two variablesare wellknown; they include nonnormalityof the distribu-tionof variables,nonlinearity f the relationshipbe-tween the variables, ack of independencebetweenobservationsand the presence of outliers.Whenthenumberof variablesincreases so do the problems:thelist mustbe enlarged o includemulticollinearityof the variables, and it is no longerpossible to de-tect these problems by simple plots of the data.Even whenthe problemsaredetected, the optimummethodof analyzingdatawith one or more types ofdeparturefrom the assumptionsunderlying east-squaresregressionis not readilyapparent.Recentdevelopments deal with both methods of detectingparticularypes of departureandwithdata-analysisin the presence of such departures. Increasinglythese methods are being appliedto analysis of en-vironmentaldatabut areapparentlynotwell knownto a11nvestigators.

    Directions of Current DevelopmentIn a recent review, Hocking, (25) suggests that"the role of the developers of regressionmethodol-ogy is to provide the less skilled user with tech-niques that are robust while easy to use andunderstand." Much effort has gone into the de-velopment of techniques that are "robust," or, inotherwords, arerelatively nsensitive to departuresfrom the usual assumptions underlying least-

    tocorrelation,and most will have cyclic patternsofvarying ength.Bloomfield (16, 17) has investigated the use ofspectrumanalysisas a tool fordeterminingwhetherthe aggravationof asthmasymptoms are related todaily minimum emperatureor to atmosphericSOxlevels. He explains: "The spectrum may be re-garded as a decomposition of the variance of thedata into componentsassociated with differentfre-quencies." Frequencies in this context meansnumber of cycles per day; thus an annual effectwould theoretically be at the frequency of 1/365cycle per day, but in fact the smoothingof the data(which was a necessary preliminarystep) spreadsthe effect over a widerband. Bloomfield also com-putes the coherence between series, which he ex-plainsas "the frequency-dependentmeasureof cor-relation between series." Thus he has a series ofcorrelations that show the extent to which the cy-clic patterns of the series correspond. He con-

    cludes, "the series are essentially unrelatedat fre-quencies above 0.25 cycles per day, which corre-spond to a period of four days. However, at lowerfrequencies, which correspond to longer periods,there is substantialcoherence. This is a warningthat the impact of these two series on the healthseries may be complex and hard to disentangle."He also investigates partial coherence, namelythefrequency-dependent artial orrelationbetweenasthma and sulfuroxide aftercorrection for the ef-fect of minimum temperature. Throughout hispaper he warns us about assumptionsunderlyingthe analysis, namely that the series are "station-ary" in the sense that the covariances betweentimeperiodsare constantthroughout he series, andthat the relationships between the variables arelinear, and finally that the tentative conclusionsreached may be reversed following subsequentanalysis. Thus we conclude thatthis is a very prom-isingapproachbutthat care mustbe taken to recog-nize the importanceof the underlyingassumptions.Stressingthe limitations of a particularmodel isnot intended to indicate that the approach is

    poor rather it is to stress that analysis of timeseries is not simply a matter of running the datathrough a computerprogram.The situation is de-scribedby Box et al. (18):"The obtainingof sampleestimates of the autocorrelation unction and thespectrumare non-structural pproaches,analogousto the representation of an empirical distributionfunction by a histogram . . . They provide a firststep .. . pointing the way to some parametricmodel on which subsequentanalyses will be based.

    1S4Environmental Health Perspectives

    This content downloaded from 186.18.32.91 on Tue, 1 Oct 2013 15:59:06 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/27/2019 Stastical Methods for Hazards and Heatlh (Bishop Y., 1977)

    8/10

    w . . . w s .

    w w . - b

    squaresregression. Gnanadesikanet al. (26) havebeen particularlyconcerned with the detection ofoutliers. Andrews (27, 28) has re-analyzed dataoriginally analyzed by Daniel and Woods (29),using newer techniquesthat he believes are resis-tant to a smallnumberof gross outliers. He warnsthat his iterativetechniqueis more expensive thanleast-squaresbutin addition o producing tablees-timates t will detect outliers.Andrewsreaches thesame conclusionsregarding his sampledata set asDaniel andWood, andthis has led Hocking(25)toobserve that these skilled analysts using repeatedinspection of residual plots were in fact using arobustprocedure.Diaconis (30) has appliedresis-tantanalysisof variancetechniquesto airpollutiondata. Brownet al. (31) observed reduction n mor-talityratesin two California ountiesandsuggestedthatthis mightbe a reflectionof reducedair pollu-tion consequentupon the 1974 uel crisis. Diaconiswas unableto findparallelreduction n CO or NO2.Thus the question remains open whether the ob-served reduction in mortality was due to othercauses, or to chancefluctuations,or to interactionsamongairpollutants hathave not yet been investi-gated.The problemof multicollinearity asbeentackledbya varietyof approaches.SchwingandMcDonald(32)have compared east-squaresandridgeregres-sion,and have appliedboth ridgeregressionand asign-restrictedeast-squaresmethodto the analysisofthe association between mortalityrates, naturalionizing adiation,and some air pollutants.Theyshow hatthe two laterapproachesyield compara-bleresults thatdifferfromthose obtainedby usingleast-squares32,33). The implicationsof orderre-strictionshave also been investigated(34). In theconclusionof his review Hocking (25) states that"themulticollinearity roblemseems to have beengivenoo little attention n the statistics iterature."Herecommends hateigenvaluesshouldalwaysbeinspectedo determinepossible redundancies,butthatwhen near-singularitiesexist the method ofhandlinghemis not clear.The problemof more complex relationshipsbe-tweenvariables has received much attention. Inarecent review, Gallant (35) concentrates onmethodsf Elttingnonlinearfunctionsratherthanonhe detectionof such functionalrelationships ntheata.Otherauthorssuchas Anscombe(36),andWilk37), and ClevelandandKleimer(38)have de-velopedophisticatedplotting echniques ordetec-tion f characteristicsof the data. Gnanadesikanand ettenring 26)review manyof these.All of these endeavorspoint to the complexitiesthatmay be encounteredin multivariatedata. Inview f these complexities, it is unlikely that a

    least-squaresEltof a simple "hockey stick' func-tion will prove to be an adequatemethodof deter-mining"threshold" evels of pollutantsas has beendone (Fig. 2). This method may be useful in anexperimentalsituation such as that described byMcNeil (39),because othersources of variationarecontrolled. Certainly it is misleading to presentpointestimatesobtainedby this methodwithout n-dicating heirvariability,andwithoutreportinganyattemptto investigatealternatemodels

    so g | l l l /

    zlul ;

    oi

    y

    5 10@ 19 M N I

    ws | I J r- -

    ss.- /

    " - an

    l -

    . ! l.' l ls lo lstY.LUTA"T"Ct"T"f>.0 21 XsFIG.2 Examplesof theuse of a hockey-stick unctionwherenoattempts made o indicatereliability rto assess the interac-tion effects of different pollutants (4). The plots showtemperature-specifichreshold stimates orsymptomaggra-vationby sulfurdioxide,totalsuspendedparticulatesTSP),andsuspendedsulfates(SS).

    October1977 lSS

    This content downloaded from 186.18.32.91 on Tue, 1 Oct 2013 15:59:06 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/27/2019 Stastical Methods for Hazards and Heatlh (Bishop Y., 1977)

    9/10

    In the examplereproduced n Figure 2 the effectof temperaturewas held constant, but three differ-ent polllltantswere each treatedseparatelywithnoattempt being made to consider how they wouldaffect symptomaggravationwhen present in differ-ent combinations.Similarobservationswere madeby thediscussantsof a paperby Nelson et al. (40).Conclusions

    The reportof the task force on researchplanningin Environmental Health Sciences (41) recom-mended in 1970that furtherdevelopmentof effi-cient statistical techniques be undertaken. In atleast threeof the five areas of concern(contingencytables, time seriesS and multivariate methods),theoretical advances have been made. In someareas these advances have been well documented,in others progress has only reached the stage ofverbal reporting and unpublished manuscripts.Much needs to be done, both in termsof develop-ment of theoryand makingreadilyaccessible com-puter programswith adequate documentation orcarryingout the techniquesproposed.In spite of this developmentalactivity, review ofrecent literature reveals relatively few instanceswhere the newer techniquesare being employed.Partly this is because the stage of developmentissuch that they are not readilyavailable,partlybe-catlse of lack of communication.Thus the need fortrainingrecommended n 1970still exists.A ssltellite ymposiumwas sponsored by IASPSon StatisticalAspects of pollutionproblems n 1971(42). In the publishedreport, Van Belle noted thedangers hat ;producersnof statisticalanalyseswillbase their product on argumentsof dubiousvalid-ity. He cites four areas: the first two were:(l) ;iThe use of a linear regression model to ap-proximatea causewffect link is questionable"and(2) ;"rheuse of elasticitycoefficients is misleadingwhen the variables are measured in arbitraryunits."He also cautions about the indiscriminantaccumulation f largebodies of dataandon the ten-dency to place too muchfaith in "indices.' Theseproblemsare still with us.

    The author was supported in part by grant ES 01108 from theU.S. Public Health Service. Many thanks go to Drs. B. Ferrisand F. Speizer for introduction to these problems.This material is drawn from a Background Document pre-pared by the author for the NIEHS Second Task Force forResearch Planning in Environmental Health Science. The Re-port of the Task Force is an independent and collective reportwhich has been published by the Government Printing Officeunder the title, "Human Health and Environment Some Re-

    search Needs." Copies of the original material for this Back-ground Document, as well as others prepared for the report canbe secured from the National Technical Information Service,U S. Department of Commerce, 5285Port Royal Road, Spring-field, Virginia 22161.

    REFERENCES1. Glaser, M., Greenberg, L., and Field, F. Mortality andmorbidity during a period of high levels of air pollution,Nov. 23-25, 1966. Arch. Environ. Health 15: 684 (1967).2. Schronk, H. H., et al. Air pollution in Donora, Pa.:Epidemiology of an unusual smog episode of October, 1948.P. H. Bulletin 306, Fed. Sci. Agency. Div. Ind. Hyg. PHS,U.S. Dept. HEW, Washington, D.C., 1949.3. Scott, J. A. The London fog of December 1966. Med. Of-f1ce. 109: 250 (1966).4. Health Consequences of Sulfur Oxides: A Report fromChess, 1970-1971. IJ. S. Environmental Protection Agency,Research Triangle Park N. C., 1974.5. Graybill, F. An Introduction to Linear Statistical Models,Vol. 1. McGraw-Hill, New York, 1961.6. Grizzle, J. E., Starmer, C. F., and Koch, G. G. Analysis ofcategorical data by linear models. Biometrics 25: 489 (1969).7. Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W.Discrete Multivariate Analysis: Theory and Practice. MITPress, Cambridge, 1975.8. Johnson, W. D., and Koch, G. G. A note on the weightedleast-squares analysis of the Ries-Smith contingency tabledata. Technometrics 13: 438 (1971).9. Wermuth, N. ;'Analogies between multiplicative models incontingency tables and covariance selection. Biometrics 32:95 (1976).10. Bock, R. D. Multivariate analysis of qualitative data. In:Multivariate Statistical Methods in Behavioral Research.McGraw-Hill, New York, 1975.

    11. Clayton, D. G. Some Odds Ratio Statistics for the Analysisof Ordered Categorical Data. Biometrika 61: 525 (1974).12. Haberman, S. J. Log-linear models for frequency tables withordered classifications. Biometrics 30: 589 (1974).13. Simon, G. Alternative Analyses for the singly-ordered con-tingency table. J. Amer. Statist. Assoc. 69: 971 (1974).14. Williams, O. C., and Grizzle, J. E. Analysis of contingencytables having ordered response categories. J. Amer. Statist.Assoc.67: 55 (1972).15. Rand Corporation. A Million Random Digits with 100,000Normal Deviates. Glencoe, Free Press, 1975.16. Bloomfield, P. Spectrum analysis of epidemiological data.Paper presented at Fourth Symposium on Statistics and theEnvironment, Washington, D.C., 1976.17. Bloomfield, P. Fourier Analysis of Time Series: An Intro-duction. Wiley, New York, 1976.18. Boxs G. E. P., and Jenkins, G. M. Time Series AnalysisForecasting and Control. Holden-Day, San Francisco,1970.19. Box, G. E. P., and Tiao, G. C. Intervention analysis andapplications to economic and environmental problems. J.Amer. Statist. Assoc. 70: 70 (1975).20. Tiao, G. C., Box G. E. P., and Hamming, W. J. Analysisof Los Angeles photochemical smog data: a statistical over-view. J. Air Pollut. Control Assoc. 25:260 (1975).21. Tiao, G. C., Box, G. E. P., and Hamming, W. J. A statisti-cal analysis of the Los Angeles ambient carbon monoxidedata 1955-1972. J. Air Pollut. Control Assoc., 25: 1129(1975).

    156 Environmental Health Perspectives

    This content downloaded from 186.18.32.91 on Tue, 1 Oct 2013 15:59:06 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/27/2019 Stastical Methods for Hazards and Heatlh (Bishop Y., 1977)

    10/10

    22. Rabiner,L. R., Sambur,M. R.7andSchmidt C. E. Appli-cations of a nonlinear moothingalgorithmo speech pro-cessing. IEEE Trans. Acoustics, Speech SignalProc. 23:No. 6, 552 (1975).23. Velleman,P. F. RobustNon-LinearData Smoothing Tech.Report No. 89, Series 2, Dept. of Statistics, PrincetonUniv., 1975.24. Beaton, E., andTukey, J. W. The fitting of powerseries,meaning polynomials, illustratedon band-spectroscopicdata.Technometrics 16:147 (1974).25. Hocking, R. R. The analysisand selectionof variables nlinearregression.Biometrics32: 1 (1976).26. Gnanadesikan,R., andKettenring, . R. Robustestimates,residualsand outlier detection with multiresponsedata.Biometrics28: 81 ( 1972).27. Andrews,D F. A robustmethod or multiple inearregres-sion. Technometrics16:523 ( 1974).28. Andrews, D. F., et al. Robustestimatesof location:Surveyand Advances. PrincetionUniv. Press7 Princeton,N. J.,1972.29. Daniel, C. and Wood, F. S. Fitting Equationsto Data.Wiley,New York, 1971.30. Diaconis, P. Measuringhe effect of the fuel crisis. Papergivenat AAAS meeting,Boston,Mass., 1976.

    31. Brown, S. M., et al. Effect on mortalityof the 1974fuelcrisis. Nature 257: 306 (1975).32. Schwing,R. C., and McDonald,G. C. Measuresof Associ-ation of some air pollutants,natural onizing radiationandcigarettesmokingwith mortalityrates. Paper presentedatInternationalSymposiumon Recent Advances in the As-

    sessmentof the HealthEffects of Environmental ollution,Paris, France, 1974.33. McDonald, G. C., and Schwing, R. C. Instabilitiesof re-gression estimatesrelatingair pollution o mortality.Tech-nometrics 11:763 (1973).34. Barlow, R. E., et al. StatisticalInferenceunderOrder Re-strictions.Wiley, New York, 1972.35. Gallant,A. R. Nonlinearregression.Amer. Statistician 9:73 (1973)36. Anscombe, F. J., Graphs in statistical analysis. Amer.Statistician27: 17 (1973).37. Wilk, M. B., and Gnanadesikan,R. Probabilityplottingmethods or the analysisof data Biometrika 5: 1 ( 1968).38. Cleveland,W. S., and Kleimer, B. Enhancingcatter plotswith curves of moving statistics. Technometrics 17: 447(1975).39. McNeil, D. R. Statisticalmodels and stochastic models.Proceedings f a SIMSConferenceon Epidemiology,Alta,Utah, 1974.40. Nelson, W. C., Hasselblad,V., and Lowrimond,G. R.Statisticalaspectsof a communityhealthand environmentalsurveillancesystem. VIth Berkeley Symposium, Vol. 6,1972,p. 125.41. NIEHS. Man'shealthand the environment----someesearch

    needs. FirstTask Force on ResearchPlanning n Environ-mental HealthScience. U. S. DHEW, Washington,D. C.,1970.42. Pratt J. W. Statisticaland MathematicalAspects of Pollu-tionProblems. StatisticsTextbooks and Monographs,Vol.6), Dekker,New York, 1974,p. 384.

    October 1977 157

    This content downloaded from 186 18 32 91 on Tue 1 Oct 2013 15:59:06 PM

    http://www.jstor.org/page/info/about/policies/terms.jsp