publications.jrc.ec.europa.eupublications.jrc.ec.europa.eu/repository/bitstream... · european...

209
A of Too f AirB l for Base the Data Spat asets tioT s for 201 3 Oliver K Gerboles Examp AT, C and NL emp r Abn Kracht, Hann ple of P CZ, DE, L (2006 poral norm nes I. Reu PM10 d , FR, E 6-2007) Scre mal Va Report EUR uter and M datasets ES, IT, U eenin alues 25787 EN Michel s of UK ng s

Upload: others

Post on 10-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • A of Toof AirB

    l for Base

     the  Data

     Spatasets

    tio‐Ts for

    2 0 13

    Oliver KGerboles

    ExampAT, Cand NL

    empr Abn

    Kracht, Hann

    ple of PCZ, DE,L (2006

    poral norm

    nes I. Reu

    PM10 d, FR, E6-2007)

     Scremal Va

    Report EUR

    uter and M

    datasetsES, IT, U

    eeninalues

    25787 EN

    Michel

    s ofUK

    ngs 

    kracholTypewritten Text

    kracholTypewritten Text

  • EuropeanCommissionJointResearchCentreInstitute for Environment and Sustainability ContactinformationOliverKracht,MichelGerbolesAddress:JointResearchCentre,ViaEnricoFermi2749,TP442,21027Ispra(VA),ItalyE‐mail:[email protected].:+390332785652Fax:+390332789931http://ies.jrc.ec.europa.eu/http://www.jrc.ec.europa.eu/LegalNoticeNeithertheEuropeanCommissionnoranypersonactingonbehalfoftheCommissionisresponsiblefortheusewhichmightbemadeofthispublication.EuropeDirectisaservicetohelpyoufindanswerstoyourquestionsabouttheEuropeanUnionFreephonenumber(*):0080067891011(*)Certainmobiletelephoneoperatorsdonotallowaccessto00800numbersorthesecallsmaybebilled.AgreatdealofadditionalinformationontheEuropeanUnionisavailableontheInternet.ItcanbeaccessedthroughtheEuropaserver http://europa.eu/.JRC78437EUR25787ENISBN978‐92‐79‐28286‐7(PDF)ISSN1831‐9424(online)doi:10.2788/81552Luxembourg:PublicationsOfficeoftheEuropeanUnion,2013©EuropeanUnion,2013Reproductionisauthorisedprovidedthesourceisacknowledged.Printed in 2013

  • Summary

    In order to provide scientifically sound information for regulatory purposes andenvironmental impact assessment, long term meso‐ to large‐scale datasets of ambient airqualityprovidean importantmeans forairpollutionmonitoring, evaluationandvalidation.However,thecollectionofhighqualitydatasetswithsuitablespatialcoverageforairpollutionmanagement and decision support poses many challenges. It is thus critical to establishexpedient tools for the efficient assessment and data quality control of air pollutionmeasurementsinlargescalenationalandinternationalmonitoringnetworks.The European Environmental Agency collects, in the Air Quality Database named AirBase,measurementsofambientairpollutionatmorethan6000monitoringstationsfromover30countries. The quality of these data depends on the chosenmethod of measurements andQA/QCproceduresappliedbyeachcountry.Wepresentanovelmethodologytoautomaticallyscreen the AirBase records for internal consistency and to detect spatio‐temporal outliersnestedinthedata.We implemented a spatio‐temporal toolset for screening abnormal valueswhich considersbothattributevaluesandspatialrelationships.Thealgorithmsarebasedonanadaptionofthe“SmoothSpatialAttributemethod”thatwasfirstdevelopedfortheidentificationofoutliersintrafficsensors.Themethodreliesonthedefinitionofaneighbourhoodforeachairpollutantmeasurement, corresponding to a spatio‐temporal domain limited in time (e.g., +/‐ 2days)anddistance(e.g.,+/‐1degree)around locationx. It isassumedthatwithinagivenspatio‐temporaldomaininwhichtheattributevaluesofneighbourshavearelationshipduetotheemission, transport and reaction of air pollutants, abnormal values can be detected byextremevaluesoftheirattributescomparedtotheattributevaluesoftheirneighbours.The application of this method is demonstrated by a comprehensive simulation and dataanalysisstudybasedonthe2006and2007AirBasebackgroundstationrecordsofdailyPM10values for a selection of 8 countries (AT, CZ, DE, ED, FR, GB, IT and NL). These datasetscoveredarangeofdifferentcountrysizesandcomprisedbetween35561and166436recordseach.Fromthese,thecontentofabnormaldatapointsidentifiedrangedbetween2%and4.1%oftheindividualcountrydatasets.However, not all records did fulfill the selection criteria for being included into thecomputations.Furthermore,thesettingupoftheabnormalvaluestestcanalsoleadtosomemathematical deadends restricting theverifiability of individual records. In consequence acertainpercentageof thedatarecords (between9%and40%of the recordsper individualcountry)had tobe flaggedasnon‐verifiable.Thosedatapointshad tobeexcluded fromtheinvestigationandfromthescreeningforirregularitiesforsafetyoftheconclusionsThe implementedmethodcanbeof interestas thebasisofadataquality screeningsystemwhen countries report their measurements to the European Environment Agency. Beyondthis,itcanalsoprovideasimplesolutiontoinvestigatetheaccuracyofstationclassificationinAirBase.Seenfromanotherviewpoint, itcanaswellbeusedasatooltodetectirregularairpollutionemissionevents(e.g.theinfluenceoffires,winderosionevents,orotheraccidentalsituations).

  • Contents1  Introduction...........................................................................................................................................................5 2  Airbase......................................................................................................................................................................6 3  Methodology..........................................................................................................................................................6 4  Robustness,sensitivityandoptimisationofthescreeningtool....................................................10 4.1  Normalityofdatasetsandlogtransformation..............................................................................11 4.2  Optimisationoftheparametersusedintheabnormalvaluescreening............................16 4.2.1  Spatio‐temporallimitsoftheneighbourhood......................................................................16 4.2.2  Testthresholdforz‐test.................................................................................................................19 4.2.3  Limitvalueforincludingziinthecomputationofθ..........................................................21 4.2.4  Windowwidthforthecomputationofθ................................................................................23 

    4.3  Manualcalculations..................................................................................................................................25 5  Results....................................................................................................................................................................25 Annex:Z(Sx)2006/2007timeseriesandabnormaldatapointsidentificationsummaries

  • 5

    1 Introduction

    TheEuropeanCommissionhasworked intensively on the implementationof a harmonizedprogramme for themonitoring of air pollutants. The harmonization program relies on theadoptedEuropeanDirectives2008/50/ECand2004/107/EC [1,2].Thesedirectivesdefineslimit and target values for air pollution that should not be exceeded. Exceedances of theselimits may have legal consequences that trigger mitigation plans. To avoid measurementartefacts triggering suchmeasures, the Directives endeavour to improve the quality of themeasurementsbydefiningdataqualityobjectives(DQOs)thatrepresentthehighestallowedrelative expanded uncertainty of measurements. The reference methods have beenstandardized by the European Committee for Standardization (CEN). These standardsdescribe themethodology tobeapplied for theestimationof themeasurementuncertainty.Thisestimationoftheuncertaintyofmeasurementsisalongandtediousprocedurethatmayrequireconsiderableexperimentalwork.From another perspective, it is possible to derive the uncertainty of spatially referencedmeasurements from the nugget effect of variogram analysis. The nugget effect representsfluctuationsofthemeasurementsonaverysmallscale(tendingtowards0).Gerbolesetal.[3]have shown the possibility to automatically derive the uncertainty of measurements ofambient air pollutants using an innovativemethod based on geostatistical analysis. Duringthis study, it became clear that abnormal values influence the geostatistical calculation.Therefore, a detectionmodulewas developed in order to exclude abnormal value stationsresponsibleforhighdiscrepanciesfromthegeostatisticalevaluations.Whenthemethodwaspresented at the meeting of the AQUILA Network of National Air Quality ReferenceLaboratories (Ispra, June 2010) Member States representatives and the EuropeanEnvironmentalAgencyofficerconsideredtheabnormalvaluemoduleasavaluabletoolabletosupplyimportantinformation.Thisreportgivesdetailsaboutaconsolidatedscreeningmethodforthedetectionofabnormalvalues, andanexampleofwarningsonabnormalvalues for2006‐2007 timeseriesofPM10datasetsinAirBase.Thisreportisintendedtothefollowingstakeholders: Localauthoritiesthatmayusetheindicatortochecktheconsistencyoftheirstations

    measurementsystemorclassification The European Environment Agency (EEA), to take into account the robustness of

    stationoutcomeswhenestimatingtrendsandstatisticsaboutairpollutioninEurope ResearcherandscientistsusingdataofAirBaseinparticularmodellersinchargeofthe

    validation of models compared to field measurements. They could use the qualityindicators provide by our method to better understand differences between airpollutionestimationandfieldmeasurements.

    1 Directive2004/107/ECoftheEuropeanParliamentandoftheCouncilof15December2004relatingtoarsenic,cadmium,mercury,nickelandpolycyclicaromatichydrocarbonsinambientair.OfficialJournalL23,26/01/2005.2 Directive2008/50/ECoftheEuropeanParliamentandtheCouncilof21May2008onAmbientAirQualityandCleanerAirforEurope,OfficialJournaloftheEuropeanUnionL152/1of11.6.20083 M.GerbolesandH.I.Reuter,Estimationofthemeasurementuncertaintyofambientairpollutiondatasetsusinggeostatisticalanalysis,EUR24475EN,ISBN978‐92‐79‐16358‐6,ISSN1018‐5593,DOI10.2788/44902,2010.

  • 6

    Due to the envisioned group of final users, a free and extensible simulation platformwasconsidered an important point to start from. All computer codes were created in the RenvironmentwhichisfreelyavailableundertheGNUGeneralPublicLicense[4].

    2 Airbase

    The European Environmental Agency (EEA) maintains a database on behalf of theparticipatingcountriesthroughoutEurope,theEIONETnetwork.Memberstates(MS)areduetoreportonthebasisoftheCouncilDecision97/101/EC[5],withamendments2001/752/EC[6].Between2006and2007,over6738stationsareinthisdatabase,eachprovidingdifferentcomponents of multi‐annual time series of air quality measurements starting in 1981.Geographically, the stations are spread all over Europe with data collected in 36 differentcountries,including27EuropeanUnionMemberStates.The location of measuring stations of the EIONET network is clustered in general due tonature of themeasuring network. About 155 parameters are reported in AirBase, rangingfrom the concentrations of inorganic/organic gases, particulate matter concentrations andwet and dry depositionwith their speciation. IN 2008, about 66% of all values in AirBasecomes from four different parameters: O3 (21.2%), NO2 (17.2%)/NO (8.2%), SO2 (18.8%),carbonmonoxide(9.4%)andParticulateMatter(PM109.0%,PM2,50.5%,blacksmoke1.1%TotalSuspendedParticulate–2.9%andPb/Cd/As/Ni1.5%).ThequalityofthedatadependsonthechosenmeasurementmethodandQA/QCproceduresapplied by each country. The data in AirBase has undergone additional quality controlperformedduring theuploadof thedata from theMS toEEAsdatabaseusinga specificallydesigned software calledDEM (DataExchangeModule). TheEuropeanTopic Centre onAirandClimateChange(ETC/ACC)isalsoinvolvedindataqualitychecking.

    3 Methodology

    Theabnormalvalueprocedurewasimplementedbasedonalreadyexistingliterature.Chang‐TienLu[7]haveoutlinedandclassifiedseveralalgorithms[8,9,10,11,12,13,14,15,16]as4 R Development Core Team (2011): R: A language and environment for statistical computing. http://www.R‐project.org/ 5   Council Decision 97/101/EC of 27 January 1997 establishing a reciprocal exchange of information and data from networks and individual stations measuring ambient air pollution within the Member States, Official Journal L 035 , 05/02/1997 P. 0014 ‐ 0022 6   Commission Decision 2001/752/EC of 17 October 2001 amending the Annexes to Council Decision 97/101/EC establishing a  reciprocal exchange of  information and data  from networks and  individual stations measuring ambient air pollution within  the Member States. 7  Chang‐Tien Lu, Dechang Chen, Yufeng Kou, "Detecting Spatial Outliers with Multiple Attributes,"  ictai, pp.122, 15th  IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03), 2003.  8   M.  Ankerst,  M.  Breuning,  H.  Kriegel  and  J.  Sander.  Optics:  Ordering  points  to  identify  the  clustering  structure  in Proceedings of the 1999 ACM SIGMOD Int. Conf. on Management of Data, Philadelphia, Pennsylvania, USA, pages 49‐60, 1999. 9   V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley, New York, 3rd Ed. 1994. 10   M. Breuning, H. Kriegel, R. T. Ng and J. Sander. OPTICS‐OF: Identifying Local Outliers in Proc. Of PKDD ’99, Prague, Czech Republic, Lectures Notes in Computer Science (LNAI 1704), pp 262‐270, Springer Verlag, 1999. 11   R. Johnson. Applied Multivariate Statistical Analysis, Prentice Hall, 1992. 12   E. Knorr and R. Ng. Algorithms for Mining Distance‐Based Outliers in Large Datasets in Pric. 24th VLDB Conference, 1998. 13   M. Kraak and F. Ormeling. Cartographer: Visualization of Spatial Data. Longman, 1996 14   F. Preparata and M. Shamos. Computational Geometry: An Introduction. Springer Verlag, 1998. 15   I.  Ruts  ans  P.  Rousseeuw.  Computing Depth  Contours Of  Bivariate  Point  Clouds.  In  Computational  Statistics  and Data Analysis, 23:153‐168, 1996. 16   D. Yu, G. Shekholeslami and A. Zhang. Findout: Finding Outliers in Very Large Datasets. In Department of Computer Science and Engineering State University of New York at Buffalo, Technical report 99‐03, http://www.cse.buffalo.edu/tech‐reports/, 1999. 

  • 7

    summarised in Figure 1. Two families of abnormal value detection methods can bedistinguished.First theoneswhichcalculatesstatisticof thedistributionofpollutant inonedimension and ignore geographical location [9, 11]. The second family, the spatial‐setabnormalvaluedetectionmethods, considerbothattributevaluesandspatial relationships.Fromwithin this family we used the “Smooth Spatial Attributemethod” [7] that was firstdevelopedfortheidentificationofabnormalvaluesintrafficsensors.Thismethodisthoughttobefitfortheidentificationofabnormalvaluesinagivenhomogeneousdatasetofairqualitydatathatrepresentsinasimilarwayaquantitymeasuredintimeandspace.

    TheSmoothSpatialAttributemethodreliesonthedefinitionofaneighbourhoodforeachairpollutantmeasurement. It corresponds to a spatio‐temporal domain limited in time (+/‐ 2days)anddistance(+/‐1sphericaldegrees)aroundalocationx.TheneighbourhoodisbetterunderstoodbyobservingthediagraminFigure2.Wehypothesisethatwithinagivenspatio‐temporal domain the non spatial attribute values (air pollutants) of neighbours have arelationship due to the distribution/transport/emission and reaction of air pollution. Theobjectiveof themethod is thatabnormalvalueswillbedetectedbyextremevaluesof theirattributevaluecomparedtotheattributevaluesof theirneighbours.Themaincomputationcost of themethod is dominated by the large amount ofmultiple calculations of statisticalpropertiesperneighbourhood.Animportantconstrainofthemethodisthenormalityofthedistributionoftheattributevaluesofneighbours.TakingintoaccountthehighspatialvariabilityofPM10concentrationsaroundindustrialandtraffic stations, it was decide to apply the screening method for detection of possibleabnormal values to the sole stations of background type, but for all area types (urban,suburbanandrural).In the following text, we use x to denote a spatial object which attributes are (i) theconcentration of a pollutant, and (ii) its location.Within each neighbourhood of x, severalmeasurements xn,i of the same compounds, performed at different locations and differenttimes,areavailable.Equation1allowsthecomputationofaweightedaverageofallavailable

    Figure 1: Several methods to detect abnormal values in multi dimensional datasets ([7])

  • measuremcorresponNotethatneighborhsettings.

    Figure 2: D± 1 spherica

    Theweighnormalizeparametereach neigdimension The

    isttim

    The(xn,dec

    Thenormdeviations(excludingdistancezj, estimat

    ments (nonndtotheinvouralgorithood exten

    Definition ofal degree an

    htingfactored Euclidears characteghbourhoodnalmultivaespatio‐temthelongitumeindays.espatio‐te,i,1,xn,i,2,xn,icimaldegremalizedEucsoftheattg the centrzero).sj2isated using

    n‐spatial atversedistathmallowsnd in time

    f a spatial-temnd ± 1 day

    rswiarecaan distancerize thedd stationariatevectomporalposdeindecim

    mporalpoi,3),wherexees,andxn,ilideandisttributevalural stationanunbiasethe sam

    ttributes oanceinspacforadyna, in case o

    mporal neig

    alculatedue, and (B)istance in(xn,i). Theors:sitionofx(maldegree

    sitionsof txn,i,1 is thei,3isthetimtance iscomuesxn,i,1,xn,xto avoidedestimatomple varian

    8

    f xn,i) withceandtimeamicexpanof insuffici

    ghborhood of

    singtwod) the inverspaceandspatial att

    (thecentraes,x2isthe

    thexn,i (thelongitude

    meindays.mputedusi,2andxn,i,3division b

    oroftheponce of in

    hin each nebetweenxnsionofinmient data b

    f sampling s

    ifferentmerse squaretimebetwtributes of

    lstation)islatitudein

    eneighbouindecimal

    singEquati3overthenby zero foropulationvandependent

    eighbourhoxn,iandx.maximumfbeing retri

    site x with an

    ethods:(A)ed Mahalanween thecef x and xn

    sdefinedbndecimald

    urhoodstatldegrees,x

    on2wherneighbourhr theweigharianceoftt and ide

    ood. The w

    fivetimesoieved with

    n interval of

    )theinversnobis distaentral statin,i are defi

    y(x1,x2,x3)degrees,an

    tions)aredxn,i,2 is the

    resjare thehoodsetofht of spatiotheattribuentically d

    weights wi

    ofthebaseh the base

    f selection of

    sesquaredance. Bothon (x)andined as 3‐

    ),wherex1ndx3isthe

    definedbylatitude in

    estandardfnstationso‐temporaltevariabledistributed

    i

    ee

    f

    dhd‐

    1e

    yn

    dsled

  • 9

    observations(henceusingthedenominatorn–1)(Equation3).TheMahalanobisdistanceiscomputed using Equation 4 where S is the covariance matrix of the xn,i of the wholeneighbourhoodset(excludingthecentralstationx).Asasimplecontrolstep, thenormalizedEuclideanweighting factorsshallbesymmetricallyaround the point in time of observation. This cannot necessarily be expected for theMahalanobis Distance based weighting factors, which makes them more difficult to bechecked.OnemaynoticethatifthecovariancematrixSisdiagonal,theMahalanobisdistancereduces to thenormalizedEuclideandistance.Forcontrolpurposes,wesetup thecode forthecomputationof theMahalanobisdistance inawaythat it canbemodifiedbyartificiallysettingallnon‐diagonalelementsofStozero.TheweightingfactorswicanfinallybecalculatedbycomputingtheinverseofthesquareofthenormalizedEuclidiandistanceorthesquareoftheMahalanobisdistance.

    ,1

    1

    n

    i n ii

    n n

    ii

    w xx

    w

    Equation1

    2

    3, ,

    , 21

    , n i j jnormalized Euclidian n ij j

    x xD x x

    s

    Equation2

    22 , , , ,1

    11

    n

    j n i j n i ji

    s x xn

    Equation3

    1, , ,,T

    Mahalanobis n i n i n iD x x x x S x x Equation4

    nSx f x f x Equation5

    n

    n

    Sx

    Sx Sxzs

    Equation6

    ,1

    1

    n

    i n ii

    n n

    ii

    w SxSx

    w

    Equation7

    2,1

    1

    1n

    n

    i n i ni

    Sx n

    ii

    w Sx Sxns

    n w

    Equation8

    1.96 1.96iz

    Equation9After a log‐transformation of non‐Gaussian data, we compute the weighted average ix according to Equation 1 and the differences Sx between the non‐spatial attribute valuef(x)(pollutant concentration) at locationx and the averageattributevalueof itsneighboursaccordingtoEquation5.

  • 10

    Withineachneighbourhood,theSxvaluesarenormalisedtocenterdataat0withastandarddeviationof1usingEquation6.Inthisequation, Sx andsSxaretheweightedaverageandtheweightedstandarddeviationofallSxiattributevaluescalculatedoverallstationswithintheneighbourhoodofx[17]. Sx andsSxarecalculatedusingEquation7andEquation8wheren’is thenumberofnon‐zeroweightswithin thewi vectorof lengthn.Noteby calculating theweightsfromtheinverseofthesquaredspatio‐temporaldistances,thewiarealwaysnon‐zeroandthereforen=n’inourapplication.Anotherapproachcouldhavebeentoestimate x andsoverthewholedataset[18].However,since air pollution time series exhibit a strong seasonality effect, applying such a methodwould have led to an overestimation for Sx and sSx, resulting in a number of undetectedpossibleabnormalvalues(falsenegative)whenapplying theabnormalvalue test (Equation9).Finally, the test fordetectinganabnormalvalue,given inEquation9, searches forzi valuesexceedingalimitvalueθconsistingofthemovingaverageoffiveconsecutivezivaluesplusapredefinedthresholdof1.96,correspondingtoaconfidenceintervalinwhich95%ofzivaluesshouldlay.Somelimitationswereapplied: Incaseof|zi|exceedingavalueof1.96,ziwasnottakenintoaccountforthecalculation

    ofthemovingaverage. In case ofθ estimatedbasedon less than threezi values, amoving averagewasnot

    calculated.Thustheabnormalvaluetestwasnotperformedatthisposition.Asa furtherrestriction,outlierswereonlyflaggedwhenthereferencepointneighbourhoodcontainedaminimumnumberofdatapoints(thresholdsetto20datapoints).In contrast to the paper by Lu [7], we precisely did not use an absolute value of the z‐transformation. Indeed the sign of the abnormal value is of interest to us as we want tounderstand ifa station ismeasuring to lowquantitiesor tohighquantitiescompared to itsneighbourhoodstationswiththesameclassification(backgroundstations).Bycomparingtheresultoftheziagainstthemovingaverageoftheziplus/minusthethresholdvalue,abnormalvaluescanbeidentified.

    4 Robustness,sensitivityandoptimisationofthescreeningtool

    Among AT, CZ, DE, ED, FR, GB, IT and NL, a few negative values were observed in theAIRBASE_2007PM10datasets(70valuesforFranceoutof168153recordsand9forGBout48872 records). These values were discarded because they disturb the process oftransformationofdatasetsfornormalisation.The design of the outlier test implies some limitations and can lead tomathematical deadends: Lackofminimum20dataintheneighbourhood. The Mahalanobis distance calculation requires an inversion of the S‐matrix. The S‐

    matrix,however,revealedtobenon‐invertibleforsomedatacases.Forthisreason,theuseofnormalizedEuclideandistancewasintroducedasafirstalternativesolution.

    17 Ref: Shekhar et al “A Unified approach to detecting spatial outliers” page 141, Example 1 18 Dissertation of Yufeng Kou – “Abnormal Pattern Recognition in Spatial Data”, page 19, lines 4 to 8

  • 11

    Otherstatisticalparametersmightaswellnotberetrievableincaseofcolinearitiesinthespatialstructureoftheneighbourhoodofadatapoint.

    Thefirsttrailingdaysandlastdaysofatimeseriescannotbetestedbecauseθvaluescannotbecomputed.

    More generally,when less than 3 zi values are available to calculateθ, computationstops and abnormal datapoint thresholding cannot be performed for this datapoint.Wehoweverobservedaconsiderableamountof|zi|suspectedtobehigherthan1.96whichareacceptedforsafetyoftheconclusions

    All these shortcoming cases are summarized under the data category “non‐verified data”.However it is possible that a considerable part of these unverified values corresponds toabnormalvalues.Thismightespeciallybethecasewhencalculationsstopbecauseofseveralzivaluesexceedingthethresholdvaluesarediscarded,whichinconsequencecanpreventthecontinuouscomputationofthe5daysmovingaverageofθ.InairbaseafewstationsreportPM10valuesformorethanonemethodofmeasurements.Forexample, a few stationsmay use onemanualmethod integrated over 24 and an automaticmethodproducinghourly values. In some cases, stations report values from twoautomaticmethod.However,itwascheckedthatwithinthetableofdailyvaluesonlyoneuniquemethodwas used per station and per day, making unnecessary to check the robustness of thescreeningtoolatstationswithmultiplemeasuringmethods.4.1 NormalityofdatasetsandlogtransformationOurtestforabnormalvalueslooselyassumesthatthePM10datasetsarenormallydistributed.A significant violation of the assumption of normality could increases the chances of un‐reliabledetectionsconsistingeitheraTypeI(falsepositive)orTypeII(falsenegative)error,dependingonthenon‐normality.Thenon‐normalityofPM10datasetsisarealfeatureduetothenatureofairpollutantthat iseasilyobserved(seeFigure3),ratherthancausedbydataentryerror,missingvaluesorpresenceofoutliervalues.Misclassificationof stationsmightalso be a source of skewness, e. g. traffic or industrial stations wrongly classified asbackground stations. Visual inspection of Figure3 shows right‐skeweddistributions (meanvalue higher than themodevalue)with skewness coefficients of 2.51 (DE), 2.38 (FR), 2.25(GB)and1.87(IT).

  • A commonThe squarnestedinwasaddednumbersbFigure 4 ssomeskew

    Figure 3: D

    n transformre root ofsomedatadtomovebetween0shows thatwness(0.95

    Density of PM

    mation forevery valusets(seeatheminimand1bect the distri5forDE,0.

    M10 datasets

    normalisinue was takannex1).Amumvalueocoming largbutions of97forFR,0

    12

    in Airbase fo

    ngdata isken after dAconstanteofthedistrgerwhilenf square‐ro0.86forGB

    for DE, FR, G

    theso‐calldiscardingequalto(1ributionabnumbers aot transforBand0.78f

    GB and IT in

    edsquarethe few ne–minimubove1 inoabove1wormedPM1forIT).

    n 2006-2007

    root transegative PMmPM10peordertoavouldbecom0datasets

    7

    sformation.M10 valuesercountry)oidhavingme smaller.still show

    .sg.w

  • Figure

    SinceasimoftheinitiforexampwasappliePM10valufor thesqaddingacMoreover,transformthatraiseis a genelogarithmcharacterivaluesofλ19 Osborne, J. Evaluation 15,

    e 4: Density o

    mplesquarialdistribuple,logarithed(seeFiguueswerediuareroot tconstanteq, we have

    mationsorBnumberstoralisationic and inveized as x1/2λhavebeenW. “Improving, no. 12 (2010):

    of square-ro

    reroottrantionsofPMhmicorinvure5).Astiscardedprtransformaqualto(1–investigatBox‐Coxtraoanexponof a grouperse transf2, inverse tnsetbyan Your Data Tra 1–9. 

    ot transform

    nsformationM10values,versetransfthelogarithriortotranation,wemminimumed the useansformatinent(seeEqp of otherformation.transformanoptimizati

    ansformations: A

    13

    med PM10 dat

    nwasineffe,moresophformation.hmofanynnsformationmovethemPM10perce of anotheion[19].Poquation10r transformFor exampations canionalgorith

    Applying the B

    tasets for DE

    ectivetocohisticatedtAnaturallnullornegan.Additionminimumvountry).er class ofowertransfwhereλ≠mations whple, a squabe characthmableto

    Box‐Cox Transfo

    E, FR, GB a

    ompletelyrtechniqueslogarithmoativenumbnally,andfoalueof the

    f transformformations0).Theboxhich includare root trterized as xminimizet

    ormation.” Prac

    and IT in 200

    removethecouldbeaofPM10daberisundeforthesameedistributi

    mations calsaretransfx‐Coxtransdes the sqansformatix‐1 and sotheskewne

    ctical Assessme

    06-2007

    eskewnessappliedlikeataplus0.5fined,suchereasonasonto1by

    led powerformationssformationquare root,ion can beforth. Theessofeach

    nt, Research &

    sehsy

    rsn,eeh

  • distributioandITres0.03and0transform

    Figure 5: Dand Italy in

    Comparinboth transthePM10However,doesnotedistributioEquation7Anyhow,implemeneffectivein

    on.Thefollspectively.0.01,respemationof‐0

    Density of Bn 2006-2007

    g the skewsformationdatasets.asshownensurethaon, too. A7andEquait is likelyntationofthndetecting

    lowingλvaConsequenectively.Th.03,0.141,

    ox-Cox tran

    wness of lons successfu

    inFigure6teach indiLog‐transfation8requy that brehez‐testprgabnormal

    alueswerentlytheskeesevalues‐0.38and‐

    '10

    PMPM

    nsformed PM

    og transformfully reach

    6,asymmeividualneigformationuirethattheeching throvidedthavalues.

    14

    eobtained:ewnessofDcanbecom‐0.16forDE

    110 M

    M10 datasets

    medandBthe goal o

    etricaldistrghbourhoowithin eacheSxvaluese normalitatthethres

    0.093,0.1DE,FR,GBmparedtoE,FR,GBan

    in Airbase f

    Box‐Cox traof producin

    ributionforoddatasetch neighbosbetweenty assumpsholdvalue

    0,0.13andandITdectheskewnndIT,resp

    for Germany

    ansformedng symmet

    rthewholewillaswelourhood isneighbourhption doese1.96setin

    d0.14forDcreasedtonessfiguresectively.

    E

    y, France, G

    values sugtrical distri

    e2006‐20llpresenta impossiblhoodsarecnot jeopa

    nEquation

    DE,FR,GB0.01,0.17,softhelog

    quation10

    Great Britain

    ggests thatibutions of

    07datasetaGaussianle becauseconsistent.ardize the9remains

    ,g

    n

    tf

    tne.es

  • Figure 6: H02-20006 (D

    Histrogram oDE and FR)

    of PM10 valu and 01/02/2

    ues and of th2007 (GB an

    15

    heir logarithnd IT) in thei

    hmic transfoir neighbour

    ormation of rhood

    selected stattions on 01--

  • 16

    4.2 OptimisationoftheparametersusedintheabnormalvaluescreeningThechoiceofdifferent functionalparametersthataffect theoutcomeoftheabnormalvaluescreening has been investigated. This includes the temporal/spatial limits of theneighbourhood(initially±2days,±1ºlongitudeand±1ºlatitude),thethresholdvalue1.96setinEquation9,thetestvalueforacceptingvaluesinthemovingaverageofθandthewidthofwindowusedtocalculatethecriteriaforthemovingaverageofθ(5consecutiveziallowingfor2missingvalues).Thesensitivityofthescreeningresultstothesevalueswasinvestigatedby simulations usingPM10datasets. The findings from this sensitivity analysis allow for anoptimizedselectionofparametervalues,andforavalidationofparameterselection.4.2.1 Spatio‐temporallimitsoftheneighbourhood

    For these simulations, the neighbourhood domainwas systematically adjusted in time andspace.We testedall combinationsofneighbourhoodsizes from±1 to±4days in timeandfrom±1to±4degrees in longitudeandlatitude.Byextendingthelimitsofneighbourhoodoutside the given station conditions, these simulation increased the probability of falsedetectionofabnormalvalues.Validationoftheneighbourhoodlimitswasperformedforallselectedcountries(AT,CZ,DE,ES,FR,GB,ITandNL)forallbackgroundstationofallareatypes(rural,urbanandsuburban)usingthePM10datasetsof2006to2007.TheresultsofthesesimulationsaregiveninTable1andFigure7.Note that for thesesimulationsnodynamicexpansionof the timeandspatiallimitsofneighbourhoodhavebeenallowed.On the contrary to initial anticipation, the selection of the time and spatial limits of theneighbourhood,doesnothaveastrongeffectonthenumberofdetectedabnormalvalues.Infact,therelativestandarddeviations,whichappeartobeindependentofthetotalnumberofabnormalvalues,withintheresponsesurfacevaluesare10%(AT),11%(CZ),14%(ES),4%(FR),6%(GB),10%(IT),and15%(NL),respectively.Table1showsthatbetweenthesmallestandlargestneighbourhood,thetotalnumberofabnormalvaluesisonlytwiceasbigforNL.Itcan be concluded that the weighting algorithms presented in chapter 3 make the methodreasonably independent of the preselected extent of the neighbourhood. The effect of theweighting factors ismuch stronger than the preselected limitations of the spatio‐temporalneighbourhoodboundaries.An absolute definition of abnormal values is not feasible. Consequently, we do not havereferencedatafortheoptimumnumberofabnormalvaluestobecomparedtotheoutputofthescreeningtool.Onlyexpertjudgementorrationalindicators(i.elackofcontinuityofthetotalnumberofabnormalvalues)canbeusedtoselectthebestcombinationofspatiallimitsand time limits. Since the screening tool could be used as a warning system for doubtfulvalues by various stakeholders, a combination of limits producing reasonably high figuresshould be selected. At the same time, the extent of the neighbourhood should be asparsimoniousaspossibletosaveonCPUtimeofthecomputationsandinordertoproducez’indicatorsthatarecharacteristicofmeasurementsinthevicinityoftestedstations.Asmentionedabove,forthescatteringofthenumberofabnormalvaluesallcombinationsoftime and space limits produce comparable numbers of abnormal values. However, thevariationsalongthetimeandspatialdimensionsaredifferent.Amultipleanalysisofvarianceshowed20that country is themain influenceaffecting thenumberofabnormalvalueswhiletime window had double an effect compared to the space window. Moreover, one may20 Note that FR was discarded from this analysis because it gave a high number of abnormal values,

  • 17

    observeseveralsteepdecreasesofthenumberofabnormalvaluesoccuringatatimelimitof1dayand1 sphericaldegree.Consequently, itwasdecided to select a timewindowof twodays(withadditionalpossibilityofexpansion)toavoidclosenesstothesteepgradient.ForATandIT,onecanalsoobservethatthevariationofthetotalnumberofabnormalvaluesfluctuatemorealongthespacedimension.Itislikelythatorography,characterisedbyarapidchangebetweenmountainsandvalleysforthesetwocountries,producesthesefluctuations.Followingthisobservationandinordertolimitpossiblefalsepositivesandfalsenegatives,itwasdecided to set the spatial limitsof theneighbourhood to the smallest spacedimensionwithout the possibility of expansion. These figures represent, in our view, the bestequilibrium between avoiding unverified data, high number of detected abnormal values,avoidingtheextremefigurescharacterisedbyalackofcontinuityofthenumberofabnormalvaluesandlimittheCPUtimeneededtoperformthesecalculations.Table 1: Effect of changing the spatial and temporal limits on the detection of abnormal values for Germany for the background - urban - 2007 - PM10 out of 236797 total records -constant threshold for the z value and constant value for the rolling mean value

    Timewindow[days]

    Spatialwindow[°]

    AT CZ DE ES FR GB IT NL

    ±1 ±1 611 1240 2899 506 4959 471 837 146±1 ±2 594 1227 2693 844 5508 570 926 248±1 ±3 579 1190 2444 892 5473 569 939 238±1 ±4 582 1170 2321 885 5388 584 1141 236±2 ±1 714 1214 3058 688 5750 553 825 227±2 ±2 566 1127 2586 803 5821 523 917 304±2 ±3 546 1054 2227 773 5704 515 937 280±2 ±4 564 1020 2082 771 5769 522 1071 278±3 ±1 661 1100 2939 726 5883 511 809 316±3 ±2 544 1030 2396 756 5854 535 933 294±3 ±3 503 961 2100 720 5616 530 913 266±3 ±4 543 919 1930 714 5755 507 1022 266±4 ±1 611 1014 2707 713 5552 492 788 293±4 ±2 509 937 2240 694 5568 543 913 277±4 ±3 482 911 1999 665 5375 511 871 257±4 ±4 528 897 1864 658 5465 491 967 255

  • Figure 7: InfDE, FR, GBand temporalength of the

    nfluence of timB, IT and NL ial extend are e edge of ± 1,

    me and spatiain 2006-2007.given in extethus 2° in lon

    al extent in the. Note the diffension aroundngitude and 2

    18

    e determinatio

    fferent axis ord a centerpoi° in latitude.

    on of abnormrientation per int. Example

    mal values for graph. Note given, a spat

    PM10 datasetalso that the stial extend of

    ts for AT, CZ,spatial extent

    f 1describes a

    t a

  • 19

    4.2.2 Testthresholdforz‐test

    Thetestthresholdtodetectabnormalvaluesshouldbefromastatisticalpointofviewaround1.96forasimplez‐test.However, theexperimentshaveshownthat thisvaluemightbetooconservative.WerunaseriesofexperimentsusingtheresultsofscreeningsforAT,CZ,DE,ES,FR, GB, IT and NL to further investigate this parameter. We observed that for thresholdshigherthan3thenumberofidentifiedabnormalpointsrapidlyconvergestowardszero.Overthewholerangeofthresholdvalues,thenumberofunverifiedvaluesremainsconstant.Figure8showsthatthetestthresholdhighlyaffectstheoutputofthescreeningtoolregardingthe number of abnormal values. However, like for optimisation of the limits of theneighbourhood, without reference values for the number of abnormal values, we cannoteasily decide which threshold to use. Further investigations are needed to find rules andmechanismtosetthisparameter.Furthermore, theselectionof thisparameterwillstronglydependonthespecificobjectivesoftheintendedapplication.

  • 20

    Figure 8: Percentage of abnormal values with respect to different choices for the z-test threshold

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    14%

    0 1 2 3 4 5 6

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    14%

    16%

    Austria (2006 - 2007)

    Threshold value for z test

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    0 1 2 3 4 5 6

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    Czech Republic (2006 - 2007)

    Threshold value for z test

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    14%

    16%

    0 1 2 3 4 5 6

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    Germany (2006 - 2007)

    Threshold value for z test

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s% of abnormal values

    % of unverified records

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    0 1 2 3 4 5 6

    0%

    10%

    20%

    30%

    40%

    50%

    Spain (2006 - 2007)

    Threshold value for z test

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    5%

    10%

    15%

    20%

    25%

    0 1 2 3 4 5 6

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    France (2006 - 2007)

    Threshold value for z test

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    14%

    16%

    0 1 2 3 4 5 6

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    40%

    United Kingdom (2006 - 2007)

    Threshold value for z test

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    14%

    0 1 2 3 4 5 6

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    40%

    Italy (2006 - 2007)

    Threshold value for z test

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    14%

    16%

    0 1 2 3 4 5 6

    0%

    5%

    10%

    15%

    20%

    Netherlands (2006 - 2007)

    Threshold value for z test

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

  • 21

    4.2.3 Limitvalueforincludingziinthecomputationofθ

    The z‐test for detecting abnormal values (Equation 9) is based on the computation ofθ, amoving average of 5 consecutive zi values. zi values are included into themoving averageprovidedthattheirvaluesdonotexceedapredefinedthresholdwhichiscurrentlysetto1.96.All|zi|exceedingavalueof1.96arediscardedfromthecomputationofthemovingaverage.Thisproducesunverifiedrecordswhenseveralconsecutiveziarerejected,hencerestrictingacontinuouscalculationofθ.Figure9showstheinfluenceofthethresholdforacceptingzivalues.Tuningthisparameterindirectionof“strict”values(lowthreshold)causesalargenumberofunverifiedrecordsintheevaluation.The influence on the number of identified abnormal points is complex and indicates thesuperimposition of two or more effects. First, the reduction of the number of unverifiedrecords(byusinglessstrictthresholdvalues)seemstobedirectlyconnectedtoanincreaseofidentified abnormal records (examples of ES, FR, and IT). This indicates that a largeproportion of abnormal records have been hidden within the non‐verifiables. Second,however, towards higher threshold values the effect can also be opposite (decrease ofidentifiedabnormalrecordsintheexamplesofDE,GB,andNL).Asanotherimportantobservation,itisnotfeasibletosetittothehighestnumberofabnormalvaluesandlowestnumberofunverifiedrecords.

  • 22

    Figure 9: Effect of the upper limit value (currently 1.96) for including zi-values into the moving average computation of θ

    0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    0 1 2 3 4 5 6

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    40%

    Austria (2006 - 2007)

    Limit value for accepting zi values

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    0 1 2 3 4 5 6

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    Czech Republic (2006 - 2007)

    Limit value for accepting zi values

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    3%

    0 1 2 3 4 5 6

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    40%

    Germany (2006 - 2007)

    Limit value for accepting zi values

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    3%

    3.5%

    4%

    0 1 2 3 4 5 6

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    Spain (2006 - 2007)

    Limit value for accepting zi values

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    1%

    2%

    3%

    4%

    5%

    6%

    7%

    8%

    0 1 2 3 4 5 6

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    France (2006 - 2007)

    Limit value for accepting zi values

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    3%

    0 1 2 3 4 5 6

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    United Kingdom (2006 - 2007)

    Limit value for accepting zi values

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    3%

    3.5%

    4%

    0 1 2 3 4 5 6

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    Italy (2006 - 2007)

    Limit value for accepting zi values

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    0 1 2 3 4 5 6

    0%

    10%

    20%

    30%

    40%

    50%

    Netherlands (2006 - 2007)

    Limit value for accepting zi values

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

  • 23

    4.2.4 Windowwidthforthecomputationofθ

    The effect of the width of the time window of the moving average (θ) on the number ofdetectedabnormalvalueswasstudiedfortheresultsofthescreeningtoolforAT,CZ,DE,ES,FR,GB, IT andNL. In these calculations,weassumed that foranywindowwidth theactualpercentage of requiredminimum number of valid zi for partial calculations of themovingaveragewassetto60%.Figure10indicatesthatthetimewindowofthemovingaverageshouldnotbesettovalueslower than 4 days to avoid a strong decrease of the percentage of detected abnormal.Conversely, for timewindowwidthover5days, only a slight increaseof thepercentageofabnormal values takes place. This latter effect might be due to instability of weatherconditionsoverlongertimespans,thereforethetimewindowindaysshouldberathershort.Therefore we choose a value of 5, as this seems to be a good compromise over stablethresholding and not indicating to many abnormal values due to false positives. Thisparameterseemsnottoinfluencethepercentageofunverifiedrecordsalthoughsomenoisecanbeobservedfortimewindowoflessthan10days.

  • 24

    Figure 10: Influence of the moving windows width used for the moving average computation of θ

    0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    3%

    0 5 10 15 20 25 30 35 40 45

    0%

    5%

    10%

    15%

    20%

    Austria (2006 - 2007)

    Moving-Window width [days]

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    3%

    3.5%

    0 5 10 15 20 25 30 35 40 45

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    14%

    16%

    Czech Republic (2006 - 2007)

    Moving-Window width [days]

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    3%

    3.5%

    0 5 10 15 20 25 30 35 40 45

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    14%

    16%

    Germany (2006 - 2007)

    Moving-Window width [days]

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    3%

    3.5%

    0 5 10 15 20 25 30 35 40 45

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    Spain (2006 - 2007)

    Moving-Window width [days]

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

    0%

    1%

    2%

    3%

    4%

    5%

    6%

    7%

    8%

    9%

    0 5 10 15 20 25 30 35 40 45

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    40%

    France (2006 - 2007)

    Moving-Window width [days]

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records0%

    1%

    2%

    3%

    4%

    5%

    6%

    0 5 10 15 20 25 30 35 40 45

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    40%

    United Kingdom (2006 - 2007)

    Moving-Window width [days]

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s% of abnormal values

    % of unverified records

    0%

    1%

    2%

    3%

    4%

    5%

    0 5 10 15 20 25 30 35 40 45

    0%

    10%

    20%

    30%

    40%

    50%

    Italy (2006 - 2007)

    Moving-Window width [days]

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records0%

    0.5%

    1%

    1.5%

    2%

    2.5%

    0 5 10 15 20 25 30 35 40 45

    0%

    5%

    10%

    15%

    20%

    25%

    Netherlands (2006 - 2007)

    Moving-Window width [days]

    % o

    f abn

    orm

    als

    in v

    erifi

    ed re

    cord

    s

    % o

    f unv

    erifi

    ed re

    cord

    s

    % of abnormal values

    % of unverified records

  • 25

    4.3 ManualcalculationsChecks by manual calculation were performed for a set of stations in different countriesincluding stations FR34032, FR34052 on day 2006‐02‐01, DETH025 on day 2006‐02‐01,GB0643Aonday2007‐02‐01andIT1186Aonday2007‐02‐01.Thecheckhavebeencarriedoutbothusing twoversionsof theAIRBASEversion,onewithdatasetsending in2007andanotherversionforwhichdatasetsendingin2010.Thecheckconsistedinconfirmingthelistofextractedstationswithinneighbourhoodsfortheselecteddateswithinthespatiallimitsoftheneighbourhoodandforthecorrectcombinationof station type (background) and area type (all area type: urban, suburban and rural).Equation1toEquation9werecomputedforthemanualcalculationsandtheirresultsagreedwiththeresultsofthescreeningtool.AfewdifferenceswereobservedbetweenAirbase2007and2010,mainlyconsistingofafewstationspresentinAirBase_2010thatweremissinginAirbase_2007.Moreover,thevaluesofstationsintheneighbourhoodofDETH025wereslightlydifferentinAirBase2010(abouthalfof the valuesdifferedby less than0.2µg/m³without changing theoutput of the screeningtools).StationGB0788AhadPM10valuesof36,24,31and35µg/m³inAirbase_2007and34,22, 20 and 25 µg/m³ in Airbase_2010. Moreover, when extracting the neighbourhoodAirbase_2007,thefollowingstationsweremissing: DEHE055andDETH042fortheneighbourhoodofDETH025 IT0940AandIT1672AfortheneighbourhoodofIT1186A

    5 Results

    AcompletesetoftimeseriesplotsofdailyPM10abnormalvaluesforthebackgroundstationsof AT, CZ, DE, ES, FR, GB, IT and NL are given in Annex 1. The graphs in Annex 1 areconsidered to be useful for local authorities in order to question the consistency of thedetected abnormal values of their stations. Modellers can use this information whenestimatingtheperformanceofmodelscomparedtofieldmeasurements.Table2summarizestheoutcomeofthescreeningtoolappliedpercountry.Fromatheoreticalperspective, a screening procedure that looks at extreme values within normalizeddistributions implies that a certain percentage of abnormal value detections should beexpected. However, because of the different data transformations employed, we cannotanticipate a detection of 5 % of abnormal values corresponding to the selected level ofconfidence.Infact,takingallcountriesintoconsideration,thepercentagesofabnormalvalueidentifications rangesbetween1.5 and4.1%.However, once thematter of unverified datawillbesettledown, thenumberofabnormalvaluesperstationmay increasewhena largernumberofextremezivaluesareacceptedintheestimationofθ.WehavelookedatcorrelationbetweenthepercentagesofabnormalvaluespercountryanddifferentvariablesinTable2.Tooursurprise,thehighestpercentagesofunverifieddatawerenotcorrelatedwiththedensityofmonitoringstationsofeachcountrynorthehomogeneityofPM10measurementmethod(gravimetry,TEOMorβ‐ray)percountrynorthehomogeneityofarea types of stations per country (urban, suburban or rural). At a first glance, one mayobservethatthepercentageofabnormalvaluesisgenerallyhigherforthecountriesreportingthehighestnumberofrecords.Finally,byvisualinspectionofthegraphsoftheannex,ruralsitesappear toproducemoreabnormalvalues than forurbanorsuburbanareas indicatingthatthepresenceofruralstationsinthe“Allbackground”categoryshouldbefurtherstudied.The above conclusions are somehow premature. We would like to emphasize that the

  • 26

    reportedfiguresaresomewhatdependentontheparametervalueschosen inthetools,andthatthesearestillgoingtobefine‐tunedfurther.Forthenextdevelopmentsofthemethod,wewanttogiveadefinitiveevaluationofwhatcanbeachievedwiththescreeningtools.Ourshorttermobjectiveconsistsof: Investigate if unverified records partly represent abnormal values; decrease the

    percentage of unverified records by modification of the calculation of θ movingaverage(e.g.byapplyingaKolmorogovZurbenkotypeoffilter).

    Compare the current screening tool using normalised Euclidan distance with thefindings using the Mahalanobis distance. Investigate which power of the inversedistance(currently2)isbestsuitedtoestimatetheweightingfactors.Infact,stationsmayhaveoneverycloseneighbour.Theresultingproblemisthattheweightingfactorsforthisonecloseneighbouraregettingverylarge,andtheneighbourhoodmeanistoomuchdependentonthisonesingleattributevalue.

    Validate currently optimised parameter values (neighbourhood limits, averagingwindows for θ, threshold value for the z‐test and for accepting zi values) by spikingPM10datasetstoartificiallyproduceoutliers.Studythepossibilitytoimprovethetoolbysettingitsparametersperindividualday.

    Currently,spatialdistancesare indecimaldegrees,butshouldratherbeevaluatedinkilometres. Therefore we will implement a geodetic projection procedure forcoordinatetransformations.

    Currently the base station is not part of the selection for the calculation ofneighbourhood statistics. This limitation is a consequence of inverse distances forweighting factors calculations becoming undefined otherwise. We will trycircumventingorimprovingthiscalculationlimitation.

    Study if includingruralstations in the“allbackground”categoryof testedstations isappropriate as this type of area in the “All background” categoryproduce toomanyabnormal values. Evaluate the possibility to run the screening for the sole urban,suburbanandruralareatypesandforthetrafficandIndustrialtypesofstations.

    Evaluate the feasibility of an iterative procedure, where once an abnormal value isdetected,immediatecorrectionsaremadesuchasreplacingtheattributevalueofthisabnormaldatapointbytheaverageattributevalueofitsneighboursandupdatingthesubsequentcomputation.Theeffectofthesecorrectionsistoavoidnormalpointsclosetothetrueabnormalpointstobeclaimedaspossibleabnormalpoints,too.

    Determination of abnormal values for all PM10 datasets and for the last version ofAirbase over the 10 last available years for all countries having sufficient PM10records.

    Ourmiddletermobjectiveis: Listandmapofstationscontinuallyproducingzindicatorshigherorbelowtheother

    stationsintheirneighbourhoodinordertocheckstationclassifications. ApplythescreeningtoolstoNO2andO3datasets,iffoundfeasible.

  • 27

    Investigation of transboundary effects on PM10 records; cluster effect will beevaluated by including stations belonging to more than one country into theneighbourhoodofstationsnearborders.

    Re‐evaluate the measurement uncertainty for PM10, according to the methoddeveloped inGerbolesandReuter,2010[3]andtakingadvantageof theconsolidatedscreeningtool.

    Finally,ourlongtermobjectiveislinkedwithinvestigationslike: Applicationofthescreeningtoolforcheckingofdataqualityintheframeworkofnear

    torealtimedatareporting. Evaluatetheperspectivesandfeasibilitiestodevelopthescreeningtoolintoanonline‐

    applicationforoperationaluseandaccessibilitybyindividualstationmanagers.

  • 28

    Table 2: Summary of the output of the screening tool per country including numbers and density of background stations, total number of records, percentages of unverified records and detected abnormal values, types of measuring methods and area type of stations

    Backgr. Stations Density [stations / 10³ km²]

    Records Unverified records Abnormal

    data* Affected Stations Gravimetry TEOM

    Beta ray

    Unknown and others

    Urban area

    Suburban area

    Rural area

    AT 63 0.75 40471 5697 (14%)

    722 (2.1%)

    57 (90%)

    20 % 56 % 22 % 1 %,

    Reflect. 1 % 31% 35% 34%

    CZ 96 1.22 64996 6545 (10%)

    1214 (2.1%)

    87 (91%)

    30 % 70 % 0 % 32% 24% 45%

    DE 240 0.67 160083 16575 (10%)

    3070 (2.1%)

    224 (93%)

    29 % 8 % 40 % 22%, Chrom. 1 % 42% 31% 27%

    ES 134 0.26 59668 24980 (42%)

    729 (2.1%)

    81 (60%)

    39 % 3 % 30 % 0 %,

    DOAS 9 %, AAS 20 %

    33% 33% 34%

    FR 286 0.52 165443 49385 (30%)

    6306 (5.4%)

    259 (91%)

    85 % 15 % 59% 35% 6%

    GB 56 0.24 35561 12342 (35%)

    600 (2.6%)

    41 (73%)

    5 % 94 % 1 % 0 % 84% 6% 10%

    IT 108 0.36 49656 18527 (37%)

    871 (2.8%)

    82 (76%)

    20 % 8 % 59 % 4 %,

    Cond. 1%, Neph. 6%

    59% 26% 14%

    NL 24 0.58 16135 3004 (19%)

    227 (1.7%)

    22 (92%)

    100 % 0 % 37% 32% 32%

    *Percentages of the verified records  TEOM: tapered element oscillating microbalance     Cond.: conductimetry   Neph.: nephelometry  Chrom.: chromatography  DAOS:  differential  optical absorption spectrometry  AAS: atomic absorption spectrometry  Reflect.: reflectometry 

  •  

     

     

     

     

    ANNEX: 

  •  

     

     

     

     

    Z(Sx) 2006 / 2007 time series 

    and 

    abnormal datapoint identification summaries 

     

    Austria 

  • −4

    −3

    −2

    −1

    0

    1

    2

    3

    4

    Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008

    AT0002R (background, rural)long = 16.766 deg E, lat = 47.77 deg N

    z(s x

    ) ●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    ●●

    ●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●

    ●●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ● ●●

    ● ● ●normal point threshold limits abnormal point non−verifiable

    number of datapoints investigated for AT0002R: 728

    identified abnormal datapoints: 17

    abnormal datapoints content: 2.34 %

    abnormal datapoints station ranking = 17 within a total of 63 stations investigated for AT

    non verifiable datapoints: 0

    −3

    −2

    −1

    0

    1

    2

    3

    4

    Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008

    AT0003A (background, urban)long = 14.678 deg E, lat = 47.179 deg N

    z(s x

    )

    ●●

    ●●●

    ●●●

    ●●

    ●●●●●●●●

    ●●●●●

    ●●●●

    ●●●●

    ●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●

    ●●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    ●●●●

    ●●●●●

    ●●

    ●●●

    ●●●

    ●●●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●●●●

    ●●●

    ●●●

    ●●●●●●

    ●●●●

    ●●

    ●●●

    ●●●

    ●●●●●●

    ●●●●●●●●●●●●●

    ●●●●●

    ●●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●●●●●●●●

    ●●●

    ●●●●●●

    ●●●●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●●

    ●●●

    ●●●●

    ●●●

    ●●●●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●●●●●●

    ●●●●●●●●

    ●●

    ●●●●

    ●●

    ●●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●●●●

    ●●

    ●●●

    ●●

    ●●●●

    ●●●●

    ●●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    ●●●●●●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●●

    ●●●

    ●●

    ●●

    ●●●●●●●

    ●●●

    ●●●

    ● ● ●normal point threshold limits abnormal point non−verifiable

    number of datapoints investigated for AT0003A: 725

    identified abnormal datapoints: 3

    abnormal datapoints content: 0.41 %

    abnormal datapoints station ranking = 49 within a total of 63 stations investigated for AT

    non verifiable datapoints: 1

    −16

    −14

    −12

    −10

    −8

    −6

    −4

    −2

    0

    2

    Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008

    ●●●●●●●

    ●●●

    ●●●●●●●●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●●●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●●

    ●●●

    ●●

    AT0005R (background, rural)long = 12.972 deg E, lat = 46.68 deg N

    z(s x

    )

    ●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●●●●

    ●●●●

    ●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●● ●●●

    ●●●● ●●

    ● ● ●normal point threshold limits abnormal point non−verifiable

    number of datapoints investigated for AT0005R: 682

    identified abnormal datapoints: 12

    abnormal datapoints content: 1.76 %

    abnormal datapoints station ranking = 25 within a total of 63 stations investigated for AT

    non verifiable datapoints: 526

    −3

    −2

    −1

    0

    1

    2

    3

    4

    Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008

    AT0012A (background, urban)long = 14.036 deg E, lat = 48.165 deg N

    z(s x

    )

    ●●

    ●●●

    ●●

    ●●●

    ●●●●

    ●●●●●

    ●●●●●●●●●●●

    ●●●

    ●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    ●●●●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●●●●●●●●●●●●

    ●●

    ●●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●●●●

    ●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●●●

    ●●●

    ●●●●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●●●●●●●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●

    ●●

    ●●●●●●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●●●●

    ●●●

    ●●●●

    ● ● ●normal point threshold limits abnormal point non−verifiable

    number of datapoints investigated for AT0012A: 726

    identified abnormal datapoints: 1

    abnormal datapoints content: 0.14 %

    abnormal datapoints station ranking = 56 within a total of 63 stations investigated for AT

    non verifiable datapoints: 1

    −3

    −2

    −1

    0

    1

    2

    3

    4

    Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008

    AT0016A (background, suburban)long = 14.239 deg E, lat = 48.225 deg N

    z(s x

    )

    ●●●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●●●●●●●●

    ●●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●●●

    ●●●

    ●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●●●●●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●●●

    ●●●

    ●●●●●

    ●●●

    ●●●

    ●●●

    ●●●

    ●●

    ●●●●

    ●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●●●●●●

    ●●●●●●●●●

    ●●●

    ●●

    ●●

    ●●●●●●●●●●

    ●●

    ●●●●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●●

    ●●

    ●●●●●●●●

    ●●

    ●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●●

    ●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●●●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●●

    ●●

    ●●●

    ●●●

    ●●●

    ●●●●●●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ● ● ●normal point threshold limits abnormal point non−verifiable

    number of datapoints investigated for AT0016A: 724

    identified abnormal datapoints: 1

    abnormal datapoints content: 0.14 %

    abnormal datapoints station ranking = 55 within a total of 63 stations investigated for AT

    non verifiable datapoints: 2

    −8

    −6

    −4

    −2

    0

    2

    4

    Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008

    AT0020A (background, suburban)long = 16.303 deg E, lat = 48.236 deg N

    z(s x

    )

    ●●●●●●●

    ●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●●●●

    ●●●●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●

    ●●

    ●●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●●●●●●

    ●●●

    ●●

    ●●●

    ●●●●●

    ●●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●●●●

    ●●

    ●●

    ●●●

    ●●●

    ●●●●●●●●●

    ●●●

    ●●●

    ●●●

    ●●●

    ●●

    ●●

    ●●●

    ●●●

    ●●●●

    ●●

    ●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●●●●

    ●●

    ●●

    ●●●●●●●

    ●●●●

    ●●

    ●●

    ●●●●●●

    ●●

    ●●●●●●●

    ●●●●●

    ●●●●●●

    ●●●●●●

    ●●

    ●●●

    ●●●●

    ●●●●●●●

    ●●

    ●●●

    ●●●●