accuracy of thematic maps / implications of choropleth symbolization

21
ACCURACY OF THEMATIC MAPS / IMPLICATIONS OF CHOROPLETH SYMBOLIZATION ALAN M MACEACHREN University of Colorado l Boulder l Colorado ABSTRACT Accuracy of thematic maps is identified as having two components: planimetric accura- cy and data representation accuracy. The first is of only minor concern for thematic maps. Data representation accuracy, however, is at least as significant to map effectiveness as are the perceptual and cognitive aspects of map reading that have been given so much attention in recent years. The focus of the study presented is on data representation accuracy considering the specific case of choropleth maps. It is demonstrated that three factors, enumeration unit size, enumeration unit compactness, and variability of the distributions mapped are significant in determining enumeration unit aggregate value accuracy and, therefore, map accuracy. Based on experimental results, it is suggested that potential accuracy of choropleth maps could be predicted through measurement of these characteris- tics. A second experiment is used to demonstrate the applicability of this method to predicting overall choropleth map accuracy as well as the geographic distribution of any error present in the map. In recent years considerable attention has been directed toward the accuracy with which thematic maps communicate spatial information. Less attention, in con- trast, has been given to accuracy of the maps themselves. This is a seriousomission. The purpose of the present paper, therefore, is to identify aspects of thematic map accuracy deserving attention and, more specifically, to consider implications of the choropleth technique for accuracy of data representation. INTRODUCTION Many of us have not given accuracy of thematic maps much thought because ofthe concept that thematic maps are not meant to communicate specific details, but rather, patterns and associations. If, however, we consider the influence that data classification is known to have on thematic map appearance we can see that accurate communication of these patterns and associations will be dependent upon an accurate representation of the distribution mapped. It seems probable, given results of data classification comparisons, that the magnitude of error inherent in a transfer of information from geographic reality to a typical thematic map may be as great as or greater than that arising during transfer of information from the map to the map reader. In many cases we seem to have lost sight of the fact that maps are intended to communicate something about geographic reality. We have instead limited our attention to evaluating the read- er's ability to interpret the mapped representation of that reality. To evaluate 'communication effectiveness' of a thematic map, we must first know the under- lying accuracy of that map. A concern with representation accuracy is evident in proposals for and experimentation with N-class maps. Recognizing the amount of information lost by classifying values into a small number of categories, Tobler (1973) suggested the alternative of representing each value with a shade of gray that is directly proportional to the value involved. As with other aspects of thematic mapping, however, most research following from this suggestion has focused on the percep- tual problems of discriminating among the large number of tones or the effect of ALAN M MACEACHREN is Assistant Professor in the Department of Geography, University of Colorado, Boulder, Colorado, USA. Research reported in this paper was supported in part by core project grants from the research division of Virginia Polytechnic Institute and State University. The author also wishes to express appreciation to Dana Fairchild for her efforts in generating test surfaces used here as well as for her input into the development of the problems investigated. ~ssubmztted Augwt 1984 CARTOGRAPHICA Vol 22 No I 1985 pp 38-58

Upload: others

Post on 12-Feb-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: accuracy of thematic maps / implications of choropleth symbolization

ACCURACY OF THEMATIC MAPS / IMPLICATIONS OF CHOROPLETH SYMBOLIZATION

ALAN M MACEACHREN

University of Colorado l Boulder l Colorado

ABSTRACT Accuracy of thematic maps is identified as having two components: planimetric accura- cy and data representation accuracy. The first is of only minor concern for thematic maps. Data representation accuracy, however, is at least as significant to map effectiveness as are the perceptual and cognitive aspects of map reading that have been given so much attention in recent years. The focus of the study presented is on data representation accuracy considering the specific case of choropleth maps. It is demonstrated that three factors, enumeration unit size, enumeration unit compactness, and variability of the distributions mapped are significant in determining enumeration unit aggregate value accuracy and, therefore, map accuracy. Based on experimental results, it is suggested that potential accuracy of choropleth maps could be predicted through measurement of these characteris- tics. A second experiment is used to demonstrate the applicability of this method to predicting overall choropleth map accuracy as well as the geographic distribution of any error present in the map.

In recent years considerable attention has been directed toward the accuracy with which thematic maps communicate spatial information. Less attention, in con- trast, has been given to accuracy of the maps themselves. This is a seriousomission. The purpose of the present paper, therefore, is to identify aspects of thematic map accuracy deserving attention and, more specifically, to consider implications of the choropleth technique for accuracy of data representation.

INTRODUCTION

Many of us have not given accuracy of thematic maps much thought because ofthe concept that thematic maps are not meant to communicate specific details, but rather, patterns and associations. If, however, we consider the influence that data classification is known to have on thematic map appearance we can see that accurate communication of these patterns and associations will be dependent upon an accurate representation of the distribution mapped.

It seems probable, given results of data classification comparisons, that the magnitude of error inherent in a transfer of information from geographic reality to a typical thematic map may be as great as or greater than that arising during transfer of information from the map to the map reader. In many cases we seem to have lost sight of the fact that maps are intended to communicate something about geographic reality. We have instead limited our attention to evaluating the read- er's ability to interpret the mapped representation of that reality. To evaluate 'communication effectiveness' of a thematic map, we must first know the under- lying accuracy of that map.

A concern with representation accuracy is evident in proposals for and experimentation with N-class maps. Recognizing the amount of information lost by classifying values into a small number of categories, Tobler (1973) suggested the alternative of representing each value with a shade of gray that is directly proportional to the value involved. As with other aspects of thematic mapping, however, most research following from this suggestion has focused on the percep- tual problems of discriminating among the large number of tones or the effect of

ALAN M MACEACHREN is Assistant Professor in the Department of Geography, University of Colorado, Boulder, Colorado, USA. Research reported in this paper was supported in part by core project grants from the research division of Virginia Polytechnic Institute and State University. The author also wishes to express appreciation to Dana Fairchild for her efforts in generating test surfaces used here as well as for her input into the development of the problems investigated. ~ s s u b m z t t e d A u g w t 1984

CARTOGRAPHICA Vol 22 No I 1985 pp 38-58

Page 2: accuracy of thematic maps / implications of choropleth symbolization

4 0 ALAN M hlACEACHREN

statistical in~plications of symbolization choice on accuracy of quantitative therna- tic maps. One spatial factor relates to the level of detail that can be represented. Dot maps and dasy~netric maps, for example, result in a diff'erentiatiorl of pat- terns lvithin units for which data are available. (;I-aduated circle maps, choropleth maps. and others, in contrast, car1 show onl!. more general (and therefore less accurate) aggregate totals for each unit.

.A second spatial factor involves correspondence between dimensionality of the phenomenon represented and symbolization used to represent it. Phenomena concentrated at points are more accurately represented by point symbols, n,hile those distributed across areas are best represented tvith area symbols. bVithin the subset of area symbolization, an additional question is the extent to which the phenomenon is distributed in discrete areas that correspond to enumeration units versus distr-ibution in a continuous fashion. In the first case, chol-opleth maps lvill represent the pheno~nenorl more accurately, ~vhile isopleth maps \\.ill be more accurate in the second.

T h e second aspect of accuracy related to symbolization is statistical accuracy. Here the major distinction is whether data values are represented directly 01-

whether some derivation or manipulation is first required. When raw values, such as total population, are represented no further effect or1 accuracy occurs. Point symbols are typically used to represent such data. If, however, derived values such as densities, averages, o r ratios are represented, a further generalization and, therefore, decrease in accuracy. results. Choropleth, isopleth, and other area symbolization techniques are among those techniques subject to this additional effect on accuracy. It is rliaps using area symbolization methods: therefore, that have the greatest potential for er-ror of any quantitative thematic maps.

T h e specific research presented here focuses on this last category: maps that represent derived quantities calci~lated for enumeration units. (;horopleth maps are by far the most cornmon of these maps and as a result have been selected for study here. The specific problem to be addressed is the eff'ect on choropleth map accuracy of data manipulation and aggregation procedures inherent in the choro- pleth method.

I>fPLICATIONS OF T H E CHOROPLET13 TECHNIQUE

Choropleth map accuracy is, to a large degree, a fn~lction of methods by which data are organized. Data for choropleth maps consist of derived values for enumeration units such as states, counties. o r census tracts. Consider, for exam- ple, a typical geographic distribution that could represent population density, average income, or cancer rates per looo persons (Figure 1). T h e process of producing a choropleth map of this distribiltio~~ involves two data manipulation steps before actual map production can take place. The original distribution is first transformed to derived or aggregate values for each enumeration unit. For this example, these aggregate values are averages of values at grid locations occurring within each unit. *Aggregate values thus obtained are then assigned to categories prior to mapping. Here the foul- categories represented were derived using Jenk's (1977) optimization procedure (Figure 2).

Page 3: accuracy of thematic maps / implications of choropleth symbolization

the method upon a user's ability to regionalize patterns represented (e.g., Rlullei. 1979; Peterson 1979; and Carstensen 1982). Only 1'etel.son (1979) has exl~licitl!. mentioned the relationship between error inherent in classibing values \.el-sus the error in value estimation for unclasseti \ dues .

T h e present paper is directed to accuracy of quantitative thenlatic niaps giving specific attention to choropleth symbolization. Foi- an), quantitative thematic map, accuracy tvill be a function of four factors:

1 map production procedures: 2 data collection methods; 3 data classification strategies; 4 symbolization techniques.

T h e first factor, map production, includes the skill of and care exercised I,\. the cartographer and the level of gener;~lization employed. These are deterirli- nants of the map's planimetric accuracy. 'LIap accul-acy' has tratlitionally l ~ e e ~ i used to refer primarily to planimetric accuracy of this sort. For most thematic maps, cartographers have correctly concluded that accuracy in a planiinetl-ic sense is probably not of great significance. For exaniple, it could probably be demon- strated, in at least some cases, that a planirnetrically less 'accurate' b ~ ~ t simpler thematic map can result in more 'acc~~rate 'conlmunicatiol~ than a map of the sanie area and topic having greater planimetric accuracy. Planimetric accuracy is prob- ably of relevance to small scale thematic maps only to the extent that nlajoi- blunders (e.g., printing a mirror image) \iould result in confusion.

T h e remaining three factors do not reflect planinletric accuracy. Instead, they relate to accuracy \vith which the map theme is portrayed. As such, they are each at least as critical to effectiveness ofthematic maps as is the user's ability t o

extract information. ?'he second factor, data collection, \~~oulct include considera- tion of the relative reliability of procedures such as a mail census versus door to door enumeration, as ~vell as the accuracy of details provided by respondents. In some cases, this aspect of accuracy can be estimated statistically or determined through a resampling procedure.

Data classification, the third factor, can also be evaluated on a statistical basis. One example is Jenk's (19.7 1 ) tabular accuracy index ( - r ~ i ) . This index is based on the goal of minimum variation within categories and maxin~um variation between categories. T h e procedure varies dependingon the exact method used to evaluate 'variation' (e.g., variance around mean values, absolute deviation around me- dians, etc.) It also can be modified to include consideration of other cartographic variables such as homogeneity of regions on the resulting map (Monrnonier, 19751. In any case, use of the procedure provides a direct statistical measure of the accuracy with which values in each class are represented.

In contrast to the first three factors, implications of symbolization method for thematic map accuracy have been given less consideration and f t ~ v procedures are available for evaluating this aspect of accuracv. There exist both spatial and

Page 4: accuracy of thematic maps / implications of choropleth symbolization

ACCURACY OF T H E M A T I C MAPS A N D CHOROPLETH SYMBOLIZATION 4 1

FIGURE I . ,4n rxaniple of n gtwgr-nplnc clistrzbutio~~ s~rclr o., flln/ f ou l~d forpopul~i/co~r r l ~ ~ r \ ~ t j , iir~rro,qt I I ~ I . I J I I ~ ~ , etc.

F I G U R E 2. A choropleth repre~entation of the distributio~~ In Fzgulr I

Page 5: accuracy of thematic maps / implications of choropleth symbolization

An underlying assumption of choropleth niapping is that data ~ ~ , i t h i n each unit are of equal value and evenly distributed across the unit. 'I'his assumption is implied by a single shade of gray applied across each unit. In relation to this assumption, then, choropleth accuracy will be a f~rriction o fa tlle variation of 'dat;~ ~vithin the unit from the derived 1-alue representing that unit and, b the \-ariation of unit values within data classes from the mean or median of the class. Due to these two generalizations, the final choropleth map presented here I-esults in a very different impression of the distribution than is apparent in the more accurate three-dimensional representation.

Although the effect of data classification procedures on the accuracy ~vith which unit values are represented has been considered (Jenks and Caspall, i 97 1:

and Monmonier 1973)~ the influence of the preceding step of data aggregation to enumeration units has been given much less attention. A theoretical stud) Coulson (1978) of the effect of enumeration unit size and shape on accuracy of aggregated values is one o f the few exceptions.

One explanation for a lack of attention to aggregate data accuracy ma!. be a perceived inability to rrianipulate the geographic variables invol\.etl. While shad- ing patterns or data classif cation procedures can be contl-olled, the cartograpl~er has little or no ability to control size and shape of enumeration units or the nature of the geographic distribution mapped. LVhethel- or not all variables can be controlled, ho\vever, a responsibility exists to evaluate the ~ ~ o t e l ~ t i a l for error on maps produced. In some cases, this evalu;~tiorl nlay result i l l a decision that available aggregate data will not produce a sufficietltly accurate map or that an alternative form of symbolization shoultl be used. In othel- cases, ~vhen it is decided that the nlap is to be constructed, nlap users could be pro~,ideti with a measure of overall map accuracy or alerted to regions o f t h e map where. clue to questionable data, caution should be taker1 in interpretation.

HYPOTHESIS

The overall problem considered in the present study is to develop a method b) which error potential of individual choropleth maps (and consequently other maps that represent derived quantities) can be determined. The specific aspect of this problem addressed here is the correspondence between aggregate values and data they represent. It is hypothesized that unit value accuracy is largely a function of three geographic factors: enumeration unit size, enumeration unit colripact- ness, and variability of the data distribution.

Overall data distribution variability is expected to be indicati1.e of data varia- tion within each unit. T o the extent this assumption is correct, accuracy iv i l l increase as distribution variability decreases (Figure 3).

Size and compactness of units are expected to influence unit value accurac). because of their direct correspondence to distances among individual data ele- ments. The larger a unit is, the farther apart individual locations within the unit will be and, consequently, the more likely it is that their characteristics will varj-. A person's income, for example, is likely to be more similar to that of his next door neighbor than to that of some individual selected at random in another part of the

Page 6: accuracy of thematic maps / implications of choropleth symbolization

ACCURACY OF THEhIA'IlC MAPS A N D (:IiOKOPLETH SYhIBOI.IZATION 43

SURFACE

V A R I A T I O N

A C C U R A C Y

F I G U R E j. T h e hypothe.tized rrlntionshrp betu'een rnurnrratiorl unit i , r r /~ t r rcrrlrrncy nrrd dUtri61rt111rr i~cc~-tcrbi/it~

citv. It is hypothesized, therefore, that accuracy of derived values ti)r units ~cill increase as unit size decreases (Figure 4).

A similar relationship is hypothesized in terms of unit compactness. The more compact a unit, the shorter the aIrerage distance between locations \vithin the unit and, therefore, the nlore similar the values are expected to be. Accuracy of unit values should exhibit an increase i%.ith increasing conlpactness (Figure 3) .

Of the three variables, only two, unit size and compactness, have been con- sidered in the literature as they relate to choropleth mapping. Coulson (1~178), in a comprehensive theoretical treatment, consideretl appropriate methods for the measurement of each. He then devised the concept of a potential for- variation in values aggregated to enumeration units. In relation to the choropleth assumption of homogeneity ~vithin units, he developed a model of the likely relationship between variation within units and the size anti compactness of those units.

Coulson's (1978) findings suggest that s i ~ e and compactness of units ha1.e an equal influence on accuracy of derived unit values, and hence, on resulting choropleth maps. Theoretically, this cunclusion is reasonable. In practice, ho~v- ever, the relative impact of size and compactness on aggregate value accuracy \%.ill

Page 7: accuracy of thematic maps / implications of choropleth symbolization

44 ALAN M MACEACHREN

SIZE

ACCURACY

be a function of variation in each for the units involved. I t has been demonstrated, at least for a randomly selected set of political units in the r:s, that variation in unit size is much greater than that for unit compactness (MacEachren 198;). It is hypothesized here, therefore, that in practice, enumel-ation unit size will have significantly greater influence on aggregate value accuracy than will conlpactness.

Only in relation to isopleth mapping have all three factors cited here been considered as they relate to map accuracy. Hsu and Robinson ( i 970) conducted an empirical study to determine the relationships among variability of' distributions mapped, variability in size and shape of units to which distributiorls are aggre- gated, and the accuracy with which isopleth maps generated from those aggregate values represent the distributions upon which values are based. They concluded that size of units and characteristics of the distribution M.ere more important factors in isopleth accuracy than was unit compactness. Beyond this and other general findings, their conclusions are difficult to assess for several reasons. One problem was that no standard measures for any of the three variahles were developed. Another limitation was that data were initially aggregated to a regular hexagonal grid before values thus obtained \vet-e further aggregated to the units involved.

Page 8: accuracy of thematic maps / implications of choropleth symbolization

ACCURACY OF THEblKl-IC MAPS : iND CHUROPLETH SYMROL.IZAI~ION 4.5

COMPACTNESS

ACCURACY . .

FIGURE j . I he t~jpothesrzed rrlntion.\top hrtu'r(,n rrzurnerr~tir~t~ unit ijnlue t i ( - r u r ( q u11d rrn~rt~rrtrtro~z c ~ r ~ i t cortr pactnets.

I n spite of these limitations, one conclusion reached by Hsu and Kobinson (1970) is expected to apply to the present in\,estigation of choropleth map accu- I-acy. This conclusion is that the most significant factor in determinirig accul-acy will be variability of the distribution itself. kt'ith the chorvpleth assumption, this conclusion is even more likely to be verified than for isopleth maps tested by Hsu and Robinson. It is, therefore, hypothesized that of the three factors being considered, distribution val-iability ~vill provide the greatest contribution to an explanation of variation in unit value accuracy.

METHODOLOGY

The focus of this study is on one aspect ofchoropleth map accuracy - accuracy of derived unit values to be mapped. T o examine this topic, the stud! was organited into two stages. T h e first stage cornpared the relative influence on aggregate \,slue

accuracy of distribution variability, enumeration unit size, and enumeration ([nit compactness. The second stage addressed applicability of these initial findings to predicting both overall choropleth map accuracy and the spatial distl-ih~ition of that accuracy.

Page 9: accuracy of thematic maps / implications of choropleth symbolization

3tj ALAN M hlACEACHKEN

*, 1 1 4 1 . 1 L I = . =

. I r b . v

F I G U R E 6. Sample en7tmeration u n z t ~ : a randomly selected courrtiesfrom the emterri United States, b rarrdomly selected counties from the western United States.

Page 10: accuracy of thematic maps / implications of choropleth symbolization

ACCCTRACY OF THEhlAl ' IC hltlPS ~ Z h ' l ) CIIOKOPLETII S Y h I B O L I L A T I O N 1 7

Experimi~ntal Design: Stage One For purposes identified in stage one, a set of contiguous et~~lrnerat ion units \vas not essential. What was essential was an adequate range in size and shape of units. For this reason, units selected consisted of a stratified random sample of six counties from each of nine regions of the r s . Actual units varied in size by a ratio of about l o to 1. For convenience of illusti-ation, all units are scaled to the same area (Figure 6a and 6b).

For the influence of data distribution val-iability on deri\.ed value accuracy to be examined, distributions representing a range in variability \vere necessary. At this stage of the research, the extent to which specific geographic variables such a s population density o r average farm size are discretely versus continuousl\ distrib- uted is not known. This aspect of' the problem was, therefi)re, controlled b\ selecting four distributions known to be continuous (Figure 7).

T h e first distribution (Figure 72) is a simple linear matl1enlatic:illy derived surface that decreases in value along a diagonal. This is considered the simplest of the four. T h e remaining distributions were derived fi-om topogr-aphic surf'iices. As illustrated, they represent a roughly conic surface (Figure 7b), an undulating linear surface (Figure jc), and a highly variable surface (Figure 7d).

Each surface was generated fro111 a set of coiltrol point values by the Surface I1 Graphics System (Sampson 1975). This system generates a square grid rriatr-ix of z-values from rvhich an isoline map or perspective plot can be created. 111 this case, each matrix consisted of 1 12 rows and 75 colun~ris o r Xqoo grid va l~~es .

Page 11: accuracy of thematic maps / implications of choropleth symbolization

48 ALAN M MACEACHREN

~ I G U K E 8. Example of (I u n ~ t positioned on a portion o f t / ~ ( , g ~ l d matrix ofz-values roprrs(,rztlrtg ( I

surfac-r. T h e nc tz ta lpo .~~t ionir~g and calculation ufng.rrregatr values was pei-forlnrd u.\171g 0

totnputer srnrclr pi-ocrdurr to lo- cate thr, uni t . datr,rrninr gr-id nodcs w ~ t h ~ n the unit , vrtr-iri,r their r'olues, and nicike the, ~ P C P J -

.rar) calculntionc

In terms of'the choropleth assumption of homogeneity within units, accuracy can be measured as the variance of' values occurring ~vithin a unit around the mean used to represent that unit. To obtain this Itleasure for each unit, the unit was positioned at a random location and orientation on the grid matrix representing a distribution (Figure 8). Points of the matrix falling within the unit were then determined. Depending on unit size, between 30 and 300 points occurred within each unit. T h e mean value for points within a unit was determined. A standar-d deviation ar-ound each mean was then calculated and the coefficient of variation for this standard deviation was computed. This procedure was repeated thirty times for each unit and the mean coefficient of variation was used as the measure of aggregate or del-ived value accuracy for each unit on each surface.

Vu~iable illeasul.ernent Examining the influence of the three geographic factors identified (size and compactness of units and variability of the distribution) requires procedures to measure each accurately. Various methods have been proposed fbr measurement of enumeration unit compactness. Although a number of simple measures based on the perimeter or area of the unit exist, the method de~nonstrated to be the most accurate is the relative distance standard deviation (MacEachren i~$-i<?).

Page 12: accuracy of thematic maps / implications of choropleth symbolization

r y O 0

0

d A - Element of area of unit

Relative Distance FIGURE 9 C a l c u l n t ~ o ~ ~ o f / h ~

Standard Deviation r e l a t l ~ ' ~ dr~tance s tandu~d d(~z~za- tzorl meodure ofrornpacl,rrzi,

This measure, which deals with the unit as a whole rather than with indi\,idual parameters of that unit, appears to have been developed independently by a number of authors (e.g., Blair and Biss 1973; Bachi 1976). M'ith the relati\,e distance standard deviation, each enumeration unit is considered to be cornposed of a series of infinitesimally small elements of area (Figure 9). Variation in location of these elements in relation to the unit's centroid is the basis of the measure. It is calculated as the sum of the variance in x and y locations of'the elements. LTalues are adjusted so that they range from zero to one, the latter being the value for a circle, the most compact shape.

While compactness can be determined in relation to the most compact possi- ble shape, there is no similar standard to which size can be cornpar-ed. Any measure of size devised, therefore, must be a relative measure - meaningful only in relation to a specific range of sizes. A practical measure of size can he devised by calculating a ratio between the size of each unit and some selected standard. Convenient standards are the largest, smallest, mean, or median size of unit involved. Largest and smallest units have the advantage o f resulting in a scale from zero to one.

Coulson (1 9783 has advocated use of the smallest unit as a standard in order to produce a scale comparable to that for compactness. Values would range from zero to one with unit size decreasing as the index increases. A high value on either the size or compactness scale would be expected to correspond to accurate unit values.

The drawback to this approach is that the size ratio is dependent on only one,

Page 13: accuracy of thematic maps / implications of choropleth symbolization

,?O ALAN M MACEACHREN

possibly extreme. value. Although not as easil!, interpreted. iriedian tinit sire will pl-ovide a more stable standard and is used here. T h e niedian is preferable to the mean because the distribution of enumeration unit size is likel), to be highly ske~ved with a small nun1be1- of very large units that \vould exert ~ ln \var ran~ed influence on the mean.

T h e third factor considered, surface \.ariabilit!., can be measured in a nurnber of Ivays. .4n important consideration in selecting a nieasure is that it reflect Irariability of the distl-ibution at a frequent), corresponding to eriunieration unit size. For example, a distribution that exhibits exti-enie variablility o n 21 continental scale may exhibit little o r no variation across a possible mapping uiiit.

T h e measure of variability de\zeloped for use here is based o n a connpa1-ison of z-values in the grid matrix representing each distribution. Specifically, spatial autocorrelation of grid r-values is caicuiateti. Spatial autocorrelation has been used successf~illy in ~neasuring surfaces in the context o f a variety of geogl-apliic p~-oblems (e.g., Gatrell 1977, Marcharid i 973, Olson i 973). I'rocedures for its calculation are described in detail by Cliff ant1 Ord (1973). In this case, autocor- relation of grid values is calculated at a lag o r spacirig eq~ial to the average longest axis of the units examined. T h e measure, therefore, sho~ild reflect the niaxirnurn likely variation within an average mapping unit.

Anal~szs - Stage O ~ P T h e initial step in the analysis ivas to examine, for each distrib~~tiori, the relative importance of unit size and compactness on aggregate unit value accurac),. For each distribution, niultiple regression ivas calculated with accuracy (the coefficients of variation) as the dependent varial~le and s i ~ e arid conipactriess as independent variables (Table 1 ) .

In all cases, size, as hypothesized, provides a greatel- contribution to explana- tion of variation in aggregate value accuracy than does compactness. While compactness demonstrates a definite influence, that influence is largely redun- dant with that of size. Compactness, t1re1-efore, provides ;I statistically significant, but minor contribution to the explanation of aggregate value accuracy.

T o examine the relative influence of all three factors on accuracy, niultiple regression was conducted between the coefficients of variation for each unit o n each surface (the acul-acy measure) and the measures of unit sire, unit compact- ness, and distribution va1-iability. Multiple regression results indicate that all three factors explain a statistically significant proportion of \,ariation in unit value accuracy (Table 2 ) . Data distribution variability, as measured by spatial autocor- relation, PI-ovides the greatest contl-ibution. As expected, \vith increasing varia- tion at a frequency corresponding to enu~neration unit size. there is a decrease in aggregate value accuracy.

Exp~rirnental Design: Stuge Tulo For this rather limited example, the three variables examined account for nearly all variability in accuracy of derived unit values. This suggests that the quality of a geographic data base and chol-opleth maps derived from it can be estimated froni characteristics of the data base itself. A second experiment was, therefore, con- ducted to demonstrate how this might be accomplished.

Page 14: accuracy of thematic maps / implications of choropleth symbolization

ACCURACY OF THEhlATlC hIAPS A N D C;MOROPI.ETEi S Y ~ I B O L L Z A I ' I O N 5 1

Table r C:OKRELATIOS O k LSU>IER.A~I~IOX U S I T A(:( UKACY \\-I-I.ti S l Z t AS[ ) ( .O>II'A(: .rSES5 O k 1'21.1 S

SIII face -- -

R

SIZE ( R ) - 0 . 9 2 1 -0.<]~ti -",'33.i O . ! ) 2 j

C o m p a c t n e 5 . i ( r ) 0 . 4 5 0 - 0 . 4 0 2 - 0 . 4 1 2 - 0 . . 1 ~ 8

Multiple I- o.c~tiy o.cjtig 0.9 7 I 0.97ti I- square o . 9 2 Y 0 . 9 4 0 (>.!),I 2 o . < j . ~ z

Table 2 M C L T I P L E R E G K E S S I O S O t A<:(:CKA(.Y LV1.1 H S I / E . C .OSlPACTSESS, A S D SITKkA( .E \.AKI:%lIIL.I~IY

r S(~LI;I I -C

Variables hlultiple r r s c l u a r c c h a n g e s~mple r -- ~-p

S u r f i i ~ e v a r i a t i o n u.7ti o. 38 0 . 5 8 o. j(i Size 0 . 9 4 0.89 0 . y 2 o.,$i Compactness o.y7 0 . 9 : ' ~ 0 . 0 4 - 0 . 2 0

As identified previously, if we assume that initial data collection provides accurate information, there are two procedures inherent in choropleth map generation that act to generalize this original information and, thus, decrease accuracy \vith which it is portrayed. These factors, aggregation of values to units and classification of unit values, are really txvo levels of one procedure - tlata aggregation.

In the case of data classification the origirial unit values from which the classification is derived are available. As cited previously, it is a simple niatter to calculate a statistical measure of accuracy for classified 1,alues. At least one classification system, that de~,eloped by Jenks ( I 977), actually bases class breaks on such a calculation.

For aggregation of original values to enumeration units, howe\~-r , \.slues upon which aggregate values are based are of'ten not readily accessible. In the case of the smallest census units, they are not available at all. Results outlined thus far, therefore, might be used to estimate or predict aggregate value accuracy. S~ich a prediction can provide a means for evaluating, not only the overall accuracy of'a choropleth map produced, but the geographical distribution of' any error that is present.

T o evaluate the potential of' such a procedure, a typical set of contiguous enumeration units for which a choropleth map might be generated was selected - the counties of Oregon. This example was chosen because it has a fairly large range of unit size and shape. Oregon's county boundaries were superimposed on each of the four distributions used in the initial experiment (and for which surface variability measures were already available). Within each county, an aggregate value was calculated as before. The coefficient of variation of values around this mean 01- aggregate value again served as the measure of unit value error.

After calculating values for size and shape of each county, a predicted error value was calculated for each county. This value was based on parameters of the regression e q ~ ~ a t i o n derived in experiment one. For each distribution, measured and predicted error was compared both cartographically and statistically.

Page 15: accuracy of thematic maps / implications of choropleth symbolization

MEASURED ERROR IN AGGREGATE VALUES

a

--

PREDICTED ERROR IN AGGREGATE VALUES

DIFFERENCE BETWEEN MEASURED

AND PREDICTED ERROR

Coefficient of Variation

, o o i OlO

F I G U R E 10. Slutz.stic(~1 and rurtographic comparz.\on c$nic,usurrd riridprrdir/~d rrrorfor zndzvrdu~~l r~iunirrutio~l unit5 - Si~r fuce '4.

Analysis: Stage Two As expected, surface A , the simplest distribution, results in the least error for derived values. A mean coefficient of' variation of' 0.0091 was predicted. The actual mean value proved to be slightly less (0.0084). A M'ilcoxon test for paired samples of' the difference between means indicate that this difference was not significant at the 0.0; confidence level. Comparison between measured and pre- dicted error on a unit by unit basis also indicates little difference, with a correlation of 0.88 (Figure lo). For several counties, there is virtually no difference between predicted and measured error. Larger differences, are for the larger counties in the southeast, as would be anticipated.

Error for surface B is somewhat greater than for surface A . T h e discrepancy between predicted and actual error is slightly larger as well, with mean values of 0.0183 and 0.0165 respectively. The \Yilcoxon test, howwer, indicates no significant difference between these values. Although overall error differs little

Page 16: accuracy of thematic maps / implications of choropleth symbolization

ACCI'KAC:Y OF THESIA'I'IC: MAPS A N D C13OROPLETH SYMBOLIZATION L3:3

MEASURED ERROR I N AGGREGATE VALUES PREDICTED ERROR I N AGGREGATE VALUES

DIFFERENCE BETWEEN MEASURED

A N D PREDICTED ERROR

Coeff icient of Variation

FIGURE X I . S~utist ical a n d cartogi-aphic co~npnr-rsvrr ~d rr~rcl.surrd rcrrdpr~~lict~cl r r~or . for i ~ i d i z ~ ~ ~ l r i n l ~ I I U N I C I - ~ I ~ I T ) I I

units - Sui face B.

from the prediction, estimation of error on a unit by unit basis was less successf~~l (Figure 1 1 ) . A few large discrepancies in the larger counties account for a correla- tion of 0.68. These differences are in regions where the distribution is consider- ably less variable than is true of the overall distribution. Aggregate values in these areas, therefore, are quite accurate, ~ ih i l e sire and shape o f units lead to predic- tions of error that are higher than that found.

Surface c proved to be the most variable at a frequency corresponding to unit size. Error of derived values, therefore, is considerably higher than for the other surfaces. In this case, predicted mean error was significantly higher than that actually measured with values of o.onqg and 0.0 17 I respectively. Again there is a positive correspondence between measured and predicted error for specific map units ( r = 0.64) (Figure I 2) . Surface c varies more consistently from place to place than does surface B. As a result, error is distributed across the map more evenly. Largest errors are distributed along the peaks or ridges of the distribution.

Page 17: accuracy of thematic maps / implications of choropleth symbolization

j 4 ALAN M MACEACHKEK

MEASURED ERROR I N AGGREGATE VALUES PREDICTED ERROR IN AGGREGATE VALUES

DIFFERENCE BETWEEN MEASURED

t-\l\ AND PREDICTED ERROR

Coellicient of Variation

O 0 5 olr l

,120

FIGURE I 2 . Stat~stical and cur~ugraptric cornpnn.ton of rnt~o.\urrcl a7tdp1-rdicfd t'rrorfor ~~rdzz~ril tml cr~umc~rutto~r unlts - Surfacrz C.

Surface D, in spite of its visual appearance, was only the secorld most 1,ariable at the frequency measured. ,4s a result, derived values are somewhat more accurate with a coefficient of variation of 0.0 136. This corresponds closely with the predicted value of 0.0 144 and the difference is not statistically significant. In this case, the correlation of measured with predicted values is somewhat stronger than for surfaces B and c. rzith a value of 0.72 (Figure 13). AS with surface B the discrepancy betlveen measured and predicted error is concentrated in four or five counties.

In general, results indicate that rneasures of unit size. unit compactness, and distribution variability can be used to predict overall choropleth rnap error. The error of derived values for specific enumeration units and consequently the spatial distribution of error on choropleth maps was somewhat less successfully predicted. It appears that factors other than those tested are involved as well. Orientation of the units to the distribution is one potential Factor that could

Page 18: accuracy of thematic maps / implications of choropleth symbolization

MEASURED ERROR IN AGGREGATE VALUES PREDICTED ERROR IN AGGREGATE VALUES

DIFFERENCE BETWEEN MEASURED

AND PREDICTED ERROR

Coellicient of Variation . o o i

8 ol i )

FIGLIRE 13. Statistical and ct~r~ographic cornparisorr of mcc~.su,-rd c~r ld prr~fil l ~c i r.1 ,-or- for- i n d i ~ ~ i d u a / ~nurncratiofr units - Surfrrcr L).

explain discrepancies between measured and predicted error for individual units that have resulted here.

111 the initial experime~lt, orientation was controlled by using the average coefficient of variation for thirty placements of each unit on a surface. For an actual map, of course, there will be only one orientation of a unit to the distribu- tion mapped. If the unit is non-compact, it will be longer in one direction than another. Its orientation in relation to the steepest slopes of the distribution mapped could be a significant factor beyond the general effect of compactness tested here. If the longest axis of a unit was oriented in the direction of the distribution's steepest slope, variahility of values within the unit would be greater than if the long axis was oriented perpendicular to that slope.

Findings here suggest that orientation of non-compact units has a significant influence on accuracy of aggregate unit values. Examination of the four 'differ- ence' maps (Figures 1 o- I 3) reveals that many ofthe larger discrepancies between

Page 19: accuracy of thematic maps / implications of choropleth symbolization

measured and predicted error are in elongated counties. Comparison of the principal trend of a distribution ~vith a unit's longest axis might be one method by which this factor could be taken into account.

DISCUSSION

T h e focus here has been the relative influence of enumeration unit size, enumera- tion unit compactness, and data distribution variability on accuracy of' derived values for units represented on choroplerh maps. ,411 three f'actors have been shown to make a significant contribution to explaining variability of enumeration unit value accuracy. Overall, the three factors pro\,ide an adequate, if not com- plete, explanation ofthis variation.

In the present example, data characteristics have exhibited a greater influence on ~ ln i t value accuracy than have unit size or compactness. In actual applications, however, the influence of distribution variability will be a function of the extent to which the distribution mapped actually nieets the choropleth assumption of homogeneity within units. Geographic phenomena such as popula- tion density are likely to vary in a relatively continuous fashion from place to place. For such distributions, results found here ~ v i l l be applicable.

At the other extreme are those phenomena that actually satisfy the choro- pleth assumption of homogeneity within units. An example would be residential property tax rates per thousand dollars valuation. These rates are generally constant throughout individual counties. No matter how much variation there was in the overall distribution across a state, therefore, all unit values would be 100% accurate. In this case the only aspect of cartographic method influencing map accuracy would be the nature of data classification procedures used.

An ultimate goal of research initiated here is to develop procedures f'or production of reliability maps that can illustrate error potential in maps they represent. This is not a new idea. It was suggested at least as early as the 1 ggos by John K. Wright (1942). In his case, concern was not with thematic maps, but with reference maps for which sources for different sections varied in reliability. He suggested that relative reliability diagrams should accompany all such maps. Unfortunately, few if any have followed this suggestion.

If the concept of' reliability maps is to be carried to thematic maps as indicated here, a single measure of distribution variability such as spatial autocorrelation will not be sufficient. An aspect of variability that obviously requires further investigation is the extent to which specific kinds of geographic information collected for enumeration units represent discrete versus continuous distribu- tions. It may be possible, for example, to categorize geographic phenomena on this basis. Such a categorization would allow estimates, prior to map construction, of potential accuracy with which a choropleth map could represent a specific phenomenon. Relative accuracy of portraving the data with a choropleth map versus an isopleth map could also be estimated before a symbolization choice was made. Any procedure for generating potential reliability maps, therefore. will depend upon a categorization of the extent to 1%-hich different potential map topics are discretely versus continuously distributed.

Page 20: accuracy of thematic maps / implications of choropleth symbolization

ACCURACY OF TIiEMATIC MAPS A N D CHOROPLETH SYMBOLIZATION 57

REFERENCES

B A C H I . R . I 973. Geostatistical ;inalysis of territories. H~rllc~tirr: Irrtc~rr~otiorrtrl .Ytoti\t~col Irr.\titutr. (l'~.oceed- ingsof the 39th Session) 45, 1 , 121-131.

BLAIR, D.J. and T.H. s i ss , 1967. .2Ieasurement of shape in geogr-;lph\: all appl.aihal of metllods ; ~ r l t l

techniques. Bulletiri ofQunntito/iueL)ntc~ for (;eogl-opirrfi. No . I I .

CARSTENSEN. L.W. 1982. A corltinuous shading sclremc for two-varial~le mappirlg. (:nrtogrnphrcc~. I < ) ,

33-50 CLIFF, A.D. and J .K. ~ R D . 1973. ~ ~ ) ~ i t i c t ~ a u l o r u r r r / a / ~ o r i . London: Pion Limited. COLLSOS, M.R.C. I 978. Potential for- val-iation: a concept for rneasuri~lg thc sigrrificarice of variation in

size and shape of areal units. Geogrc~fiskn Arlrrcrlrr. 6oB. 48-64, CATRELL, A.C. 1977. Complexity and reduntlalrcy in binar\ niaps. (;rogrophrrol :lrro~.\i.\. !j, 211-4 I . ttsc, M.L. and A.H. ROBIKSON. i<>70. F i d e l i t ~ ~ f ~svplettr rrrnps. I\Iiirnrapolis: Uui\.eriit) ot.21inriesot;l lJress.

J E N K S . G.F. and F.C. C.ASPAI.L.. I 97 1 . Err-or on chor-opleth maps: tlefnitiorl, n>e;lsul-enrent, reduction. Annnl~ of the Assoctutio7l of A~ertrarl C;t,u,qrcrphrrs. 61, n I j-244.

MACEACHREN. ALAN M . i 985. <:ornpactness of gcograph~c slrape: cornparisoll arrtl evaluation of mea- sures, Geogrnjska Annaler. 67B. (in press).

MARCHASD, B. 1973. O n the information content of I-egional rnaps: the concept of geogr-aphic;~l redundancy. Economzc Geogrnpl~j. 3 1, I I 7-27.

MoNaroNrER, MARK S. 197 3. <;lass inter\;lls to enhance the \.i.;ual cort-rl;lt~ori o f choropletll [naps. Canadian Cnttogropher. 12. 2 , I 6 1 - 178.

M ~ L L E R , J.C. 1979. Perception ot torltinuoosl! shatied liraps. .4rtrrc1l.r of thr .-l\cucrcctiorl of :irrrrr7c.rirr Gcupophers. 69, 2 4 0 - 4 .

o ~ s o s , J . 1973. Autocorrelation and visual map con~plexit? . drzr~crl, of the :i.\.\orrc~tiorr of ilrrrrrtuirl Geographcrr. 6 j, 189-204.

PETERSON, M.T. 1979. An evaluation of uriclasseti crossed-line cl~oropletll mappirlg. 7'hc .-lrrrrrrccirr Ca,togu1pher. 6, 2 I -37.

SAMPSOS, R.J. 197 j. Surfa(.f I l g r ~ p h i c ~ .\jttrnr. Lawrence: K;III~;IS Geologici~l Sur-vey. TOBLER, l v . 1973. Choropleth maps without class intervals. Grogrnphical.-lr~c~ljsrr. 5. ~ t i ~ - ' ~ ( i , ? . WRIGHT: J.K. I 942. Xfap makers ;Ire hunian: conirnents on the subjective in Iiiaps. Ttrlr (;ro,qr-nplrr~crl

Remew. 32 . jz7-544 .

RESC'ME O n definit que I'exactitutle des cartes the~iiatiques ;I deux coml)os;lntes: I'exactitutle planirnetrique, e t I'exactitude d e la representation tie\ donnCes. La pren~ii-re cor~rlrosanre a peu d'importance pour les cartes thernatiques. Par contre I'exactitude tle la represerrtation des donriees esl au rnoins aussi sigriificative pour- la valeur tle la carte que le son1 ley aspects perceptit's et cogriitifs d e la lecture d e carte aux-quels on a tellenlent prete d'attentiori ces derriieles ;tnriees. I.'6tude se concent r r sur I'exactitude d e la reprkseritation des tionnees dans le cas precis des cartes tle statisticlue. O n dernontre que trois facteur-s: tailie d e l'uriitk d'erlunieration, 121 densite tie cette unite et la variabilite tie\ distributions cartographiees sont 5ignificatif\ pour determiner l'e\;tctitude d e la valeur globale d e I'unitk d'enurneration, et doric d e la carte mi-nie. SUI- \;I base tie donllees experirnentales on suggkre qu'on pourrait predir-e l'exactitude possible des cartes d e swtistique i I'aide dc mesure [it. ces caracter- istiques. Une deuxikrne exper-ience dernontre que cetre niethotle pourrait sesvir a predire I'exactitude generale des cartes d e statistique d e rnPrne que la dirtribution geographicjue tle toute e r reur preserrte sur la carte.

ZUSAhfhIENFASSUNG Die Genauigkeit therlratischer Karterl wit-d vorl zlvei Koniporlenten twi- tirnmt: die planimetrische Genauigkeit und die Genauigkeit d e r Dateridat-stellung. Erstere hat riur untergeol-dnete Bedeutung fur thernatische Karten. Die Geriauigkeit der Dater~tiar-stellui~g ist fur die Wirksarnkeit d e r Kartejedoch rnindestens so bedeutsani wir (lie perpetuellen und kognitiven Aspckte des Kartenlesens, die in d e n ietzten Jahren so vie1 Ueachturig tantlerl. Irn RIi t telp~~nkt des vorliegen- den Berichts steht die Genauigkeit der Datendarsteliurlg am spezifischeri Beispiel det- C:lloropIethenk- arten. Es \\ird n a c h g e ~ i e s e n , dass drei Faktor-en, narnlich die Grcisse der Zi~lrleinheit, die Dichtigkeit d e r Zahlrinheit und die Ver-indelichkeit der kartierten Verbreitungen fur die Bestirnn~ung d e r Genauigkeit der- angehauften Zahleinl~eitsrverte urld soniit f i r die Kartengenauigkeit signifikarit sind. Der Verfasser bernerkt, dass die rncigliche Genauigkeit \or1 Choroplrthenkar-teri durch htessen diesel- Merkrnale bestimrnt wer-den kann, wie e r aus cxperimentellen Ergebnissen l~eausfancl. Des \'ertassrr

Page 21: accuracy of thematic maps / implications of choropleth symbolization

58 ALAN M MACEACHREN

benutzt ein weiteres E x p e r i m e n ~ , un die Xnwe~ldbarkcit clieset- Slcthotle t i i l - die Uestimmurig d e r Gesamtgenauigkeit von <:horoplethenkartrn naclizuwcise~i so\vic lur die g e o g ~ . a l ~ l ~ i s t tic I 'ertrilung jedes in d e r Karte vorhantlenen Fehlers.

RESUhIEN La precisidn d e m a p i \ tematiros sr identifira en dos cornl,oncntes; la p1.ecisih11 pln- ninietrica y la pl.ecisidn d e la repr-rsrr~racivn d e los datos. La primel-a cs solan~entc tle importancia menor para niapas rerniticas. La ~JI-ecisidn cle la r-eprcsentac-it511 cle clatos, sin cnlbargo, cs pot. lo nicnos tan importante a la eficacia d e iln niapa romo 10s aspectos pel.ceptu,rlcs y cognitivos clc la lecrura tle Lln

nlapa que hat1 r-ecibido tanta atencion en anos recientes. El estudio presentado se concrlltra en la precisicin d r la I-epresent;rcidn de datos c o n s i d e r a ~ ~ d o el caso espccific-o d e map;cs (01-opler;~\. Se demuestra que tt-es tactores, el tamano tle la unidad d e rnu~ueraciGn, la densitl;itl d e la u ~ ~ i d ; ~ d de enunierarion, y la variabilidad d e las distt-ihuciones mostrad;rs son >ignifica~ivas cn la tleternrinaci611 de la pt-ecisicin del valor agregndo d e la unitlad d r enumeracitin, y pot- lo ranto d e la PI-ccisihtl tlel mapa. Sobt-e la base d e resultados exper-in~entales, se sugiet-e que la pretisihn potellcia1 cle map;is c o r p l e t ; ~ ~ podria predecirse por medio d e la ri~edicihn d e estas caracterisricas. Se utiliz;~ U I I segundo expel-inlento para demostrar la ;cplicahilidad d e este nletodo para predrcir- 1;1 [JI-ecisidn del Illapa c o r f ~ l e r ; ~ en gellet-;~l asi como la disrribuciOn geogr-ifica d e cunlquier error ell el niapa.