“world population in a grid of spherical quadrilaterals ...kclarke/g232/tobler_1997.pdf · world...

23
World Population in a Grid of Spherical Quadrilaterals Waldo Tobler *, Uwe Deichmann 2 , Jon Gottsegen 1 , and Kelly Maloy 3 1 National Center for Geographic Information and Analysis, Geography Department, University of California, Santa Barbara, CA 93106-4060, USA 2 Statistical Division of the United Nations, DC2-1764, New York, NY 10017, USA 3 ESRI, Redlands, CA 92373, USA ABSTRACT We report on a project that converted subnational population data to a raster of cells on the earth. We note that studies using satellites as collection devices yield results indexed by latitude and longitude. Thus it makes sense to assemble the terrestrial arrangement of people in a compatible manner. This alternative is explored here, using latitude/longitude quadrilaterals as bins for population information. This format also has considerable advantages for analytical studies. Ways of achieving the objective include, among others, simple centroid sorts, interpolation, or gridding of polygons. The results to date of putting world boundary coordinates together with estimates of the number of people are described. The estimated 1994 population of 219 countries, subdivided into 19,032 polygons, has been assigned to over six million five minute by five minute quadrilaterals covering the world. These results are available over the Internet. The grid extends from latitude 57°S to 72°N, and covers 360° of longitude. Just under 31% of the (1548 by 4320) grid cells are populated. The number of people in these countries is estimated to be 5.6 billion, spread over 132 million km 2 of land. Extensions needed include continuous updating, additional social variables, improved interpolation methods, correlation with global change studies, and more detailed information for some parts of the world. © 1997 John Wiley & Sons, Ltd. Received 10 August 1996; revised 30 November 1996; accepted 16 December 1996 Int. J. Popul. Geogr. 3, 203–225 (1997) No. of Figures: 5 No. of Tables: 1 No. of Refs: 58 Keywords: world population; raster; five- minute quadrilaterals INTRODUCTION I nformation on the world’s population is usually provided on a national basis. But we know that countries are ephemeral phenomena, and administrative partitionings of a country are irrelevant to much scientific work. As an alternative scheme one might consider ecological zones rather than nation states, yet there is no agreement as to what these zones should be. By way of contrast global environ- mental studies using satellites as collection devices yield results indexed by latitude and longitude. Thus it makes sense to assemble information on the terrestrial arrangement of people in a compatible manner. A recent pilot study demonstrated some practical advantages of gridded population data (Clarke and Rhind, 1992), including reporting the potential impact of sea level rise on inhabitants of the coastal region of a Scandinavian country. The project described here extends the compilation to much of the entire Earth, using latitude/longitude quad- rilaterals as bins for population information. In addition to its compatibility with environmental information this data format has considerable advantages for analytical studies, and the data * Correspondence to: W. Tobler. E-mail: [email protected] Contract/grant sponsor: California Space Institute. Contract/grant sponsor: CIESIN through NASA grant; Con- tract/grant number: NAGW-2901. Contract/grant sponsor: ESRI. Contract/grant sponsor: NCGIA. INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) CCC 1077–3509/97/030203–23 $17.50 © 1997 John Wiley & Sons, Ltd.

Upload: dinhcong

Post on 19-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

World Population in a Grid ofSpherical QuadrilateralsWaldo Tobler†*, Uwe Deichmann2, Jon Gottsegen1, and Kelly Maloy3

1 National Center for Geographic Information and Analysis, Geography Department,University of California, Santa Barbara, CA 93106-4060, USA2 Statistical Division of the United Nations, DC2-1764, New York, NY 10017, USA3 ESRI, Redlands, CA 92373, USA

ABSTRACT

We report on a project that convertedsubnational population data to a raster of cellson the earth. We note that studies usingsatellites as collection devices yield resultsindexed by latitude and longitude. Thus itmakes sense to assemble the terrestrialarrangement of people in a compatiblemanner. This alternative is explored here,using latitude/longitude quadrilaterals as binsfor population information. This format alsohas considerable advantages for analyticalstudies. Ways of achieving the objectiveinclude, among others, simple centroid sorts,interpolation, or gridding of polygons. Theresults to date of putting world boundarycoordinates together with estimates of thenumber of people are described. Theestimated 1994 population of 219 countries,subdivided into 19,032 polygons, has beenassigned to over six million five minute by fiveminute quadrilaterals covering the world.These results are available over the Internet.The grid extends from latitude 57°S to 72°N,and covers 360° of longitude. Just under 31%of the (1548 by 4320) grid cells are populated.The number of people in these countries isestimated to be 5.6 billion, spread over 132million km2 of land. Extensions neededinclude continuous updating, additional socialvariables, improved interpolation methods,

correlation with global change studies, andmore detailed information for some parts ofthe world. © 1997 John Wiley & Sons, Ltd.

Received 10 August 1996; revised 30 November 1996; accepted16 December 1996Int. J. Popul. Geogr. 3, 203–225 (1997)No. of Figures: 5 No. of Tables: 1 No. of Refs: 58

Keywords: world population; raster; five-minute quadrilaterals

INTRODUCTION

Information on the world’s population isusually provided on a national basis. Butwe know that countries are ephemeral

phenomena, and administrative partitionings ofa country are irrelevant to much scientific work.As an alternative scheme one might considerecological zones rather than nation states, yetthere is no agreement as to what these zonesshould be. By way of contrast global environ-mental studies using satellites as collectiondevices yield results indexed by latitude andlongitude. Thus it makes sense to assembleinformation on the terrestrial arrangement ofpeople in a compatible manner. A recent pilotstudy demonstrated some practical advantagesof gridded population data (Clarke and Rhind,1992), including reporting the potential impact ofsea level rise on inhabitants of the coastal regionof a Scandinavian country. The project describedhere extends the compilation to much of theentire Earth, using latitude/longitude quad-rilaterals as bins for population information. Inaddition to its compatibility with environmentalinformation this data format has considerableadvantages for analytical studies, and the data

* Correspondence to: W. Tobler.E-mail: [email protected]/grant sponsor: California Space Institute.Contract/grant sponsor: CIESIN through NASA grant; Con-tract/grant number: NAGW-2901.Contract/grant sponsor: ESRI.Contract/grant sponsor: NCGIA.

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997)

CCC 1077–3509/97/030203–23 $17.50 © 1997 John Wiley & Sons, Ltd.

can be converted to alternative formats withrelative ease. The results of the study included anestimate of the population of over 6.5 million fiveminute by five minute quadrilaterals. This infor-mation was obtained by conversion frompopulation censuses and boundary coordinates.The gridded population data are publicly availa-ble on the Internet in machine readable mediafor convenient use1. The complete report can beobtained from the National Center for Geo-graphic Information and Analysis2.

WHY

Many global environmental problems are humaninduced or affect human activities. One reasonfor the small number of studies that incorporatehuman factors in global change modelling is thelack of suitable population related data. Ingeneral terms these data encompass demo-graphic variables (population numbers,components and dynamics), spatial arrange-ment, social factors (such as attitudes, education,income, health, etc.), and information relating tothe economic activities of people. On the nationallevel, many of these variables are collected andcompiled on a regular basis by various agenciesand institutions (e.g. the World Bank, WorldResources Institute, United Nations, etc.), andmacro-level economic and social analysis hasbeen made easier by the increasing availability ofthese data in digital form either as a genericdatabase or in a geographical information sys-tem (GIS) format.

On the sub-national level, however, efforts atcompiling global socio-economic data have beenscarce. Indeed, to date no consistent worldwidedata-sets of socio-economic indicators for sub-national administrative units exist, other thanthe one we report on here. Partly this is due tothe fact that data collection is rarely coordinatedbetween countries. While great efforts have beenmade for national level data – for example, thenational systems of product and incomeaccounts designed by the UN Industrial Devel-opment Organization (UNIDO), or thestandardized medical reporting encouraged bythe World Health Organization (WHO) – thereare no generic reporting schemes for the sameinformation on a sub-national scale. A secondfactor is that the collection of socio-economicvariables is a process which is often far more

tedious than the collection of physical data. Forinstance, large scale monitoring techniques suchas remote sensing have no equivalent in thehuman domain. Data collection therefore has torely on expensive and time-consuming censuses(for a comprehensive overview of these prob-lems, see Clarke and Rhind, 1992). Yet the rangeof potential applications for global socio-eco-nomic data is clearly very wide. The lack ofconsistent population related information wasseen as one of the major impediments forstrategic planning and targeting of developmentstrategies for large areas at the recent meeting ofthe Coordinative Group for International Agri-cultural Research (CGIAR) and Global ResourceInformation Database of the UN EnvironmentProgramme (UNEP/GRID) on digital datarequirements in the agricultural research com-munity (Martin, 1992). socio-economic data on aregional or a continental level are used by theCGIAR centres for the selection of representativesurvey sample sites, for the generalization ofrelationships determined in micro level analysisto larger areas, and for the forecasting of futuretarget areas. These efforts are all based on therelationship of population pressure to foodavailability. Further applications can be antici-pated in global-level migration studies, largescale epidemiological research (e.g. Thomas,1990, 1992), or global economic modelling (Hick-man, 1983).

Many of the human impacts on the naturalenvironment are directly related to the numberand arrangement of people and the associatedindustrial metabolism. The project reported onhere has attempted to address a small part of thisproblem by collecting population totals at amodest level of resolution, but for the entireEarth. There are two primary advantages tohaving global demographic data referenced tolatitude and longitude. The first of these is theanalytical capability which it provides. Whengiven in this form, geographic observations arerendered susceptible to the same kinds of objec-tive analyses as are performed on time seriesdata, as pointed out by the Swedish geographerHagerstrand in 1955. Viewing spatial data as akind of (two-dimensional) regional series, wemay undertake a variety of analyses. For exam-ple, population pressure might be definedanalytically as the absolute value of the gradientof population density, and this is easily com-

W. Tobler et al.204

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

puted from gridded data (Muehrcke, 1966). Thusanalytical and simulation models of severaltypes are more conveniently formulated whenone works with cellular geography.

The second advantage is the ability to relatedemographic data to other global data. The focusof global change studies is often to link environ-mental data to socio-economic variables for thestudy of human induced degradation processes.A complement to this is the study of the socialimpact of environmental degradation. In manyinstances, such factors (e.g. vegetation cover orclimatic variables) are stored in gridded form.Global monitoring systems using satellite sen-sors report data by pixels – small, usually square,pieces of the terrestrial surface. Population dataare typically available only for political units,generally countries and their political subdivi-sions. There is thus a mismatch between the twospatial data collection and reporting schemes.The higher the resolution of the satellite system,the greater the mismatch. This discrepancy ren-ders more difficult the inclusion of populationinformation into global modelling efforts. Minoradditional advantage accrues to spherical quad-rilaterals because these are essentially invariantcollection boxes. They have a permanence thatdoes not apply to political units, since these arelikely to change. A corresponding disadvantageis that political action is generally only availableon a nation state or regional basis. There havealso been suggestions that ecological zones aremore appropriate units of study than countriesor other political units. Unfortunately there doesnot seem to be agreement on which set ofecological units to use. Such units are also likelyto change over time, although probably not asrapidly as political units. Having the informationin small latitude/longitude boxes allows easyreaggregation to arbitrary partionings of theEarth’s surface, including ecological zones.Watersheds, or a hierarchy of watersheds, havealso been proposed as a useful way of organizinginformation about the Earth.

The idea of using the Earth grid to assembledata is not particularly new. In 1851 Maurypublished a map of whale sightings by fivedegree quadrilaterals. Of course the oceans werenot then partitioned politically, so he had to usethis method. In Sweden, population counts in1855, 1917 and 1965 are known to have beenassembled in 10310 km cells. Since the introduc-

tion of computerized geographical storage andprocessing systems in the 1960s the process hasaccelerated. Many of these systems work on araster or grid basis and much local data is nowcollected in this manner. Minnesota, for example,has an information system which uses thesquare-like Public Land Survey system of Sec-tions, Townships and Ranges for manyobservations. A recent British Census reporteddata by kilometre squares on their national grid(Rhind et al, 1980). The recent Population Atlas ofChina (Chengrui et al, 1987) contains severalmaps with population density shown within1’15" by 1’52.5" quadrilaterals. Japanese socialdata are now sold, for marketing studies, by acommercial firm on the basis of 5003500 metrecells for a portion of the country.

We must also address the question of whichdemographic data. The typical demographerwishes population totals, further broken downby gender, age, occupation, education, and so on.Ideally the number of births, by age of mother,and the number of deaths, by age groups, shouldalso be known. Completing the demographicequation requires knowledge of the number ofimmigrants and of emigrants, optimally also byage group and social, occupational and educa-tional status. In the migration category, refugeesshould also be included. The economist wouldwant information on energy consumption, capi-tal investment, and so on. The political scientistis interested in attitudes. The geographer wouldwant tables of trade or of information exchange.The list seems endless, and all of the informationshould pertain to the same time period. Oureffort represents only a very modest beginningand one of the concerns is the necessity tointroduce mechanisms by which other groups,(international, national, and/or private) can beinduced to continue to maintain and improve thedatabase established. In order that this be done itis necessary to overcome the reluctance of manynations to release information that may beembarrassing or considered of value in potentialconflict, including economic domination. It isimportant to emphasize that the indirect envi-ronmental impacts of population are even moredifficult to assess, with even greater data lacu-nae, than simple population quantities (Stern etal, 1992; National Research Council, 1994). Socio-political data and attitudes may be as importantas socio-economic or demographic information.

World Population Grid 205

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

The United Nations publishes a considerablequantity of statistics, and there are now severalsources of machine-readable world populationdata. The number of countries in the worldranges from 165 to 311, depending on the source,and is partly a matter of definition (Freedman,1993). Besides the United Nations and someother official (i.e. bureaucratic, governmental)public agencies, there are also several private(e.g. atlas publishers) and non-profit (PopulationReference Bureau, etc.) organizations that haveassembled such data. Generally they do this onlyby political unit or subunit, or in some cases bycity. In addition there are several publicationsthat focus on global problems, sponsored by theWorld Resources Institute, the Worldwatch Insti-tute, the World Bank, and several others. Theseprovide tabular data by country, and citesources, including national and internationaldocuments. Recent books include Keyfitz andFlieger (1990), Willett (1988), the WorldResources Institute (1990), the United NationsDevelopment Programme (1990), and Zachariahand Vu (1988). A missing component in virtuallyall of these information assemblages is thegeography, i.e. the precise specification of thecoordinates of the boundaries of the units used.This is where geographers and geographicalinformation and analysis systems can providevaluable assistance. Furthermore, the quality ofthe data given by these suppliers is extremelyheterogeneous and of unknown validity. Internalstrife often renders all census data inadequate.One need only recall what is happening inRwanda, Yugoslavia, the former Soviet Union,Somalia, the Sudan and Angola to see thedifficulty. On occasion, political considerationsbias ‘official’ statistics to reflect favourably on thegroup in authority, or to justify action. At a laterstage in this type of work, a necessary compo-nent should be studies to evaluate the problemof data accuracy. Sub-national data are usuallythe most difficult to obtain at a distance, andoften may be less reliable.

HOW

Several methods exist to make compatible thedisparate data bases. One can aggregate theenvironmental pixel data to the country orregional scale, which has the advantage thatdecisions are often made at this level, but this

also often results in meaningless averages. Con-sider, as an example, one average temperaturereading for the entire United States. Mixing theAlaskan climate with that of Florida hardlymakes sense. Alternatively one can attempt toreallocate the national data to finer levels ofresolution. This is what is described here, andhas involved correspondence with many inter-national and national agencies. We attempted toassemble global population counts (neglectingrelated parameters) and to convert them tospherical quadrilaterals: small latitude and lon-gitude cells. Using one degree as the bins wewould get 1803360 such cells. If only land areasare included, about 16,000 cells would result,each of approximately 7800 km2. This contrastswith about 185 countries, with an average size ofover 700,000 km2. The resolution of these twoschemes, the one degree cells or the countries, isvery coarse for global modelling purposes. Finerdata are generally available for political subdivi-sions of countries, although difficult to obtainoutside of the individual nations. In the US thereare 50 states, 3141 counties, and some 40,000enumeration areas. Data are even available at thecity block level. In France there are data for over32,000 communes, in Switzerland there are 26Cantons and 3029 Gemeinde, India has 337districts, Mexico over 2400 Municipio, Italyrecords information on about 60 million peoplein 8090 municipalities, etc. The resolution alsovaries within different parts of each country. Inthe US the counties increase in size as one moveswestward and across less densely occupiedterritory. The reason for this is a mixture ofhistory, physical terrain, climate and technology.Of course, the resolution used by data collectionagencies throughout the world to some extentreflects their perception of national problems, orpopulation density, and is a type of adaptivegrid. Some partial differential equation solversuse variable resolution grids, and these wouldseem to be possibilities for global modelling too.Forcing everything into a fixed grid carries someobvious disadvantages. But now it is time forsome definitions.

For the purposes of the present study wedefine several levels of political partitioning. Thenation state or country we refer to as the zerolevel. Then comes the first level, comparable inthe US hierarchy to states. Below this is thesecond level, equivalent to US counties. In many

W. Tobler et al.206

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

countries, including the US, there are furtherlevels. We have attempted to obtain data andboundary coordinates to only the second level,but have not been successful for the entire world.The second level for a country such as Switzer-land consists of 3,029 Gemeinde, very nearly thesame number as the 3142 units available for theUS. Switzerland is a country of approximately41,150 km2 to the US at 9,282,840 km2. The samenumber of units at the second level clearly yieldsa different level of spatial resolution in differentparts of the world. We define the mean spatialresolution in kilometres by the following for-mula:

Average resolution = Î Area (km2)No. of observations

(1)

Here the number of observations is the numberof data collection units. Resolution measured inthis manner is clearly the same as the length ofthe side of a square whose area is that of theaverage observation unit. Thus at the secondlevel the Swiss data yield an average resolutionof 3.69 km, whereas for the second level UScounties one gets an average resolution of54.45 km. A useful additional parameter wouldbe the variance of this quantity for each set ofunits for which the resolution is calculated. Forthe sake of comparison the resolution (in metres)of a map is conveniently calculated by dividingthe map scale by 2000. From the Nyquistsampling theorem we know that a spatial patterncan only be detected if it is greater in size thantwice the resolution. This assumes data withouterror. When there is error in the data (is thereever no error?) then patterns or features can onlybe detected if they are several times this size.Using map scale again as a convenient analogy,the optimistically sized pattern that can bedetected on a map is, in metres, the map scaledivided by 1000. Thus on a map at a scale of1:1,000,000 objects greater than 1 km in sizemight be detected. This neglects the fudgingdone by cartographers which confuses the situa-tion.

The ultimate resolution needed for globalpopulation modelling is problem dependent, buta simple calculation shows that having data by1° quadrilaterals of latitude and longitude is theequivalent of working with maps at an average

scale of approximately 1/178,000,000. The meanquadrilateral size is then 7900 km2. A sixty-foldreduction would result in quadrilaterals of onesquare minute of arc, roughly 1.5 km on a side.About 37 million cells of this resolution sizewould capture the world population, taking 3/4of the Earth’s surface as unoccupied water withAntarctica making up an additional 9% of theland area. This is a large amount of data andcorresponds to a global map scale of approx-imately 1/3,000,000. A more ambitious effortwould attempt 20" by 20" quadrilaterals, finerthan the largest scale (1/1,000,000) world mapfor which a digital version now exists (Danko,1992). A map coverage of the entire world at ascale of 1/250,000 would correspond to a demo-graphic database of 3.4 trillion 5" by 5"quadrilaterals, with cells averaging roughly125 m on a side. This is obviously impractical atpresent, and since human populations aremobile, hardly useful in any event. The averagedaily activity space of individuals is dependentupon culture, environment, social, and urban–rural status, but averages more than 15 km inwestern societies. This argues strongly for amacrogeographic approach, along the lines pio-neered by Warntz (1965). In this paradigmpopulations are modulated by their distanceapart, to yield a ‘potential’ field covering theearth. It seems reasonable that population poten-tials based on raw counts of people, or on births,deaths, migrations, or on income weighted orenergy-usage weighted populations, could use-fully be related to global models. This ‘potential’approach is particularly useful when eventsdepend on many neighbouring places, so thatplace-specific data capture only part of animpact. The population potentials (computed ateach place as the sum of geographically dis-counted values) are more easily calculated frompopulation data given by spherical quadrilat-erals. Geodemographic models of the accountingtype (i.e. future population at every geographiclocation equals current population plus birthsless deaths plus immigration less emigration) arealso more easily modelled as systems of partialdifferential equations (Dorigo and Tobler, 1983)when each of these components is available in asystem of regular tessellations rather than bypolitical units. This approach makes use of thefinite difference form of these equations, arelatively easy task on modern high speed

World Population Grid 207

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

computers.At this time it is not possible to specify the

optimum resolution for global demography.Tables and maps in the full report indicate whatwe have achieved. Of course, the resolutionmight have to vary in different parts of the worldand for different purposes. Our method has beento collect world population data by politicalunits from a variety of sources. We have thenattempted to convert the population figures byone of the several possible methods detailedbelow. The basic data are the numbers of people,but the components of change (births, deaths,immigration, emigration), age distribution andenergy consumption are also important; severalof these are already available as estimates bycountry (or subunits). There are many problemswith these data, synchronicity being the mostobvious, and one might wish a time series foreach latitude/longitude cell in order to dospace–time modelling, simulations and anima-tions. Data definition problems are alsowidespread.

Geographical locations can be specified inmany ways. Different names given to the sameplace are referred to as aliases. The processing ofgeographical information often requires conver-sion between these aliases. A few of thegeographic conversions which occur in practiceare given in Table 1. Conversion of informationgiven by political unit to a grid of latitude/longitude cells is only one such conversion task.Eventually one would expect nearly all of theseconversions to be available on demand, using asinput any data source with the output formattailored to the expected usage. It is also desirableto develop analysis methods that are invariantunder alternative locational naming conventions(Tobler, 1990).

A simple method of converting national datato spherical quadrilaterals is to assign the aver-age national density to all of the lat/lon cellscovering the country, with simple prorating forpartial cells. This yields a piecewise continuousfunction for the world, with abrupt jumps at theedges of countries, and a constant value withineach country or unit. This method is clearly notappropriate for large countries, but must be usedfor such aggregates as gross national product(GNP), by definition only applying to the polit-ical unit, and a context variable for otherproperties. An alternative method of approxima-

tion assigns all of the data to the centroid of thecountry (or subunit) polygon. One then simplyadds all of the values for those centroids that fallinto each cell, a rather simple binning procedure.

Table 1. A few examples of geographictransformations.

Point>Point(All aliases for the same place)

Street address < > State plane coordinatesLat/Lon < > UTM coordinates

Digitizer x, y > Lat/LonRectangular x, y < > Polar (distance, direction)

Point > LineHighway Accident < > Highway segment

Point > AreaStreet address > Census tract

Place name > Bingo coordinatesLat/Lon < > Public Land Survey

Point > FieldSpot elevation < > Contour mapDot map < > Density contours

Area > PointCensus tract < > Centroid

Public Land Survey < > Lat/Lon

Area > LineCensus tract > Street names

Political district < > Address list

Area > AreaCensus tract < > School district

Country < > Grid cellPolitical Unit < > Ecologic zone

Area > FieldCounty population < > Contour map

Census data > Choropleth map

Field > PointDensity contours < > Dot map

Elevation contours < > Spot height

Field > AreaContours > Polygon total

Field > FieldContour map < > Gradient map

Contour map < > Smoothed contours

W. Tobler et al.208

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

For this, all that is needed is the centroid of eachcollection unit. This technique was used byHaaland and Heath (1974) with enumerationdistricts for the US and by Tobler (1992) for theentire world using five degrees of latitude andlongitude. In this method there may result somecells with no population, and others with exces-sively large values. A light smoothing of thiscrude assignment helps to redistribute the pop-ulation.

Another alternative is to interpolate from theknown observations to the spherical quadri-laterals. Given the densities at centroids one caninterpolate from these to a lattice or triangula-tion, making the usual assumptions needed ininterpolation from data given at points. This isthe method used by Bracken and Martin (1989,1991) for areas in Great Britain. A better methodis to use smooth pycnophylactic redistribution toreassign populations from one arbitrary set ofregions to another (Tobler, 1975; Rylander, 1986;Park, 1990). This reassignment method has theadvantage of always correctly summing toregional totals, using regional boundaries ratherthan centroids, and taking into account values ingeographically adjacent areas. Coordinates forthe boundaries of political units are more diffi-cult to obtain than coordinates for centroids, butmany such boundaries, at the national level, arenow available in computerizedmap drawingsystems (e.g. Taylor, 1989). The computationalload for this method, involving a finite differenceapproximation to the solution of a partial differ-ential equation, is greater than the summationmethods previously cited, although the accuracyof the resulting interpolation should be better.Since this method has actually been used in thisproject it is summarized in the Appendix. Anappendix in the full report gives details on howto adapt the published version (Tobler, 1979) totake into account the shape of the Earth,assumed spherical. In this approach it is notreasonable to use demographic rates; one mustuse counts (e.g. of people, births, deaths, migra-tion, or diseases, etc). This also holds true for theratio known as population density since, as withall such rates, the value obtained is highlydependent on the spatial extent of the chosendenominator. Rates can later be obtained bydivision of one cellular map (i.e. computer gridarray) by another.

One can also make measurements on popula-

tion density maps, where these are available (cf.Rhind et al, 1980; Liu, 1987). Using this methodrequires digitization of the isopleths followed byinterpolation of the density values from theisopleths to a grid, then the use of a mathemat-ical integration to get the ‘volume’ containedwithin each grid cell, taking into account themap projection used. This is rather like the ‘cutand fill’ computation used by the civil engineerwhen designing road construction. Urban pop-ulations on these maps are often represented bycircles whose area is related to the population,and this can be inverted if class intervals (a formof discretization) have not been used. Populationdensity maps are usually at a small scale, so thatthis technique can only result in a data file at acoarse resolution.

It is also well known that it is possible to makepopulation estimates from aerial photographsand satellite images, although the accuracy ofsuch estimates is disputed (Dureau et al., 1989).The procedures are well established for land useand agricultural crop estimates but also work fornucleated settlements (Tobler, 1975; Lee, 1989).Such imagery can also be used to mask emptydistricts. The method is best suited to highresolution estimates covering small areas, and isnot well suited for complete world coverage in ashort time; no one argues that birth and deathrates, and similar important demographic com-ponents, can be obtained in this manner.

WORLD POPULATION AT MEDIUMRESOLUTION

As a first step towards a global sociopolitical–demographic–economic database indexed bylatitude and longitude, we have produced aconsistent data set of population numbers for theworld. This required the collection of two majorcomponents. The first component was spatialboundary information describing administrativeregions within the countries of the world, fol-lowed by assembly of the correspondingenumerations (or estimates) of the total popula-tion residing in each of these regions. In thisinitial attempt only information on the totalnumber of people was collected. From these twocomponents we have derived a gridded data setof population numbers for the globe at a modestresolution. We refer to this as modest or medium

World Population Grid 209

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

resolution because, as described earlier, finerdata exist for several parts of the world. But weare not concerned here with data at the city blocklevel, and we recognize that simply knowing thegeographic arrangement of the world’s popula-tion is not sufficient to determine its two-wayinteraction with the environment.

Information on 219 countries (or units;included are all major countries, but also smallislands such as Aruba and Guam) has beenobtained from various agencies and institutions.In some cases we have been able to assemblesecond level administrative unit boundaries butnot second level populations. In other cases theconverse situation holds. The exact details are inlarge tables in the full report. The primaryproduct, an approximation to a globally con-tinuous surface of population, requiredoverlaying a grid on the polygons, followed byan assignment of the population to the cells ofthis grid. Essentially the problem is to distributethe published population count for a givenadministrative unit over the grid cells that fallwithin the unit, taking into account the popula-tion values in the adjacent units.

BOUNDARY DATA

The key to producing a spatial demographicdatabase is the geographic boundaries that areused to reference socio-economic information.Virtually every country of the world has insti-tuted an administrative hierarchy that partitionsthe country into units; often an attempt is madeto have more or less homogeneous units, basedon some criteria. The size of these units obvi-ously varies greatly by country. Most first-levelunits in Zaire, for example, are still larger thanthe neighbouring countries of Rwanda andBurundi. For that reason, objective measuressuch as the average size of a unit at a given level(i.e. mean resolution as previously defined) orthe number of people per unit in a countryprovide a basis for cross-national comparison.These measures are generally more appropriatethan the position of the unit in the politicalhierarchy. Details for each country are given inthe full report.

We attempted to obtain second-level bounda-ries, corresponding to counties or districts,

except in the case of very small countries orislands. Owing to the limited time and resourcesavailable, the global demography project reliedmostly on existing spatial boundary datasources. The primary producers of such data arenational or supra-national public data collectingagencies (e.g. US Census Bureau, Eurostat),international institutions (e.g. UN-FAO, UNEP,CGIAR), universities engaged in research activ-ities around the world, and private companiesproviding spatially referenced socio-economicdata, chiefly for marketing applications. For afew countries for which no existing boundarydata could be obtained, administrative bounda-ries were digitized at NCGIA using maps foundin University of California libraries. We haveincluded a detailed list of data sources by regionin the full report.

The global database assembled at NCGIAcontains 19,032 administrative units for some 219countries of the world. The largest number ofunits were available for the US (3142), China(2422), Mexico (2402), and Brazil (2224). For verysmall countries, including some of the islandstates of the Caribbean and the South Pacific, noattempt was made to obtain subnational bounda-ries. Since the boundary information came froma wide range of sources, considerable effort hadto be spent on rendering these data compatible inorder to create a consistent global database. Thistypically involved coordinate transformationand projection change as the first step. Individ-ual coverages for countries or groups ofcountries were merged into one of severalregional coverages. The world map (Fig. 1)shows the boundaries of the administrative unitsused. This map also gives a good indication ofthe accuracy of the final result, since the redis-tribution of the total population within a unitwill be more accurate if the units are very small.

In order to merge the individual regionalcoverages, the international boundaries neededto be matched to avoid sliver polygons orpolygon overlaps caused by differences in datasources. These regional coverages formed thebasis for setting up the population data andrunning the gridding routines. While this wholeprocess is conceptually straightforward, severalproblems had to be dealt with, including thefollowing.

d Many of the boundary data contained no (or

W. Tobler et al.210

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

Figure 1. Map of 19,030 administrative districts. The blackened areas result from very fine units.

World P

opulation Grid

211

INT

ER

NA

TIO

NA

L JO

UR

NA

L O

F POPU

LA

TIO

N G

EO

GR

APH

Y, VO

L. 3, 203–225 (1997)

©1997 John W

iley & Sons, L

td.

only minimal) documentation. Thus informa-tion on the scale and reference of the sourcemaps, the time period to which the boundariesrefer, or the accuracy of the information wasoften unavailable. For a number of boundaries,even the projection of the maps was notknown. Administrative boundaries are rarelyavailable on maps with precise geodetic refer-encing; particularly in developing countries,one needs to work with whatever is available,including ‘cartoon maps’, often page size, orhand-drawn sketch maps in census publica-tions. Such information may be better than noinformation at all, and may indeed reflect thefact that at this level administrative bounda-ries are often not well defined. In cases wherethe boundaries were not spatially referenced,rubber sheeting using standard GIS tools wasrequired to make the coverages compatiblewith the global database. The area of thepolygonal units was obtained by convertingthe latitude and longitude coordinates of theboundary points to a local equal area mapprojection (by Carl Mollweide) with a sub-sequent polygon area computing routine. Thisintroduces an error which is small as long asthe units are small. The calculations used asphere of radius 6,371.007178 km, equal insurface area to the World Geodetic Systemellipsoid of 1984. The maximum differencesbetween the sphere and ellipsoid for regions ofthis size do not exceed 0.5 km2, and amount toless than one percent. Thus the sphericalapproximation is adequate for the presentpurpose. Summing all of the polygon areasyields an estimated 132,306,314 km2 of occu-pied land, 25.94% of the world’s surface area.These areal estimates may differ from pub-lished figures because some countries includeor exclude internal water bodies, or havediffering definitions of how large a lake mustbe to be excluded. Despite these source limita-tions we feel that the database is sufficientlyaccurate for global or regional scale applica-tions, but not for high resolution applicationswhich require more detailed boundary infor-mation.

d The source scale and the resolution at whichadministrative boundaries were digitized var-ied a great deal. Our estimate is that theboundary data sources ranged in scale fromabout 1:25,000 to about 1:5,000,000. This is an

estimate because in many cases scale informa-tion was not indicated on the source maps.

d Due to the limited time available, no effort wasmade to make the international boundariescompatible with ‘standard’ global countrydatabases such as the Digital Chart of theWorld at 1 :1 million scale, or the WorldBoundary Database (WBDII) at 1 :3 millionscale. In practice, two neighbouring countrieswere merged into the same coverage and theresulting sliver polygons were eliminatedusing standard GIS tools. In retrospect this wasa mistake, and in future it would be desirableto standardize the database by replacing allinternational boundaries with a standard dataset. In this manner it would be possible toavoid problems encountered in combiningnations or regions. An international effort maybe required to achieve this. We have ignoredproblems of de jure versus de facto boundariesand of disputed boundaries.

d The spatial resolution of the administrativeunits acquired for different countries variesgreatly. We normally used the most detailedlevel available to us. For a few countries moredetail is available (e.g. block level data for theUS), but the data volume involved thenbecomes impractically large. The average reso-lution by country is included in tables and ona PC diskette in the full report. Also listedthere is the range in census unit sizes for eachcountry, giving an impression of the resolutionranges. Taking into account the total occupiedland area and the number of units, we calcu-late an average resolution for the world landarea of 83.4 km, which is directly comparableto the size of the largest five minute sphericalquadrilaterals used. A five minute quadrangleis aproximately 9.3 km wide at the equatorwith an area of 85.5 km2. At 50° of latitude it is5.95 km wide with an area of 55.3 km2.

d Boundary data for several countries (e.g.,Australia, Canada, Community of Independ-ent States, Mexico, New Zealand) came fromcommercial sources or from public agenciesthat put restrictions on further distribution ofthe data. This boundary information is copy-righted and consequently cannot bedistributed freely. The database documenta-tion lists all sources and contact addresses forthese data sets. Interested parties can contactthese organizations directly.

W. Tobler et al.212

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

POPULATION DATA

For each of the administrative units weattempted to obtain a recent estimate of theresident number of people. For many analyticalpurposes it would be useful to have otherdemographic indicators as well (e.g. populationby age and sex, rural versus urban, and eco-nomic involvement, or time series data ofpopulation figures). The compilation of suchdata, in particular the effort involved in ensuringcross-national definitional compatibility, wasbeyond the scope of the project. In several casesthe agencies from whom population and/orboundary data were obtained do also have thesesorts of additional demographic or economicdata.

Population figures were typically taken fromthe latest available census. A list of censuses bycountry is compiled by the UN Statistical Divi-sion and continually updated. Additionalsources of demographic data are civil registra-tion systems or estimates based on samplesurveys. For this project we used the most recentenumerated or estimated population figurereleased by the national statistical offices, andpublished in official census publications, year-books or gazetteers, that we were able to locategiven our resource limitations.

For most countries, data were available fromthe 1980s or early 1990s. Often there is aconsiderable time lag between a census andpublic dissemination of the data. Thus the mostrecently collected figures may not yet have beenpublished. All limitations associated with pub-lished demographic data, which are well knownin demographic research (e.g. Chapter 3 of UNPopulation Division, 1993), obviously apply toour collection of population figures as well. Themain problem faced in compiling the data relatesto synchronicity. Firstly, census dates vary. In ourdatabase, the reference years range from 1979 to1994. We decided to derive estimates for 1994 foradministrative units using a standard growthmodel based on average annual growth rates bycountry provided by the United Nations Popula-tion Division (1993):

Pt = Pc exp SOt21

i=c

ri

100D (2)

(see Rogers, 1985) where Pt and Pc are the

regional populations in the target year (1994) andin the census year respectively, and ri is theannual percentage growth rate in year i for theparticular country. While this method is admit-tedly simplistic (it assumes uniform growth in allregions of a country), the error introducedshould be small for most countries that have hada recent census. Furthermore, the error is likelyto be minor in relation to the overall uncertaintyassociated with census information. Ideally wewould like to use historical census data to derivegrowth rates that are specific to each admin-istrative unit in order to capture recenturbanization processes, migration, etc. This wasnot possible with our limited resources. We havehowever compared our estimates with the latestUN estimates and the disparity between thesetwo suggests the magnitude of the possibleerrors; as another comparison the 1995 WorldAlmanac lists 1994 estimates of the population ofthe major countries of the world. These numbers,which differ from the two cited lists, wereallegedly obtained from the US Bureau of theCensus – see the full report for details. The total1994 population estimated for our 19,032 units is5,617,519,139 people. This works out to a worldaverage of 42.5 persons per square km2 ofoccupied land.

The second problem related to synchronicityderives from the fact that boundary and popula-tion data often did not come from the samesource. Administrative boundaries change fre-quently, and in some cases, boundaries wereonly available for a previous census, and did notmatch the units for which population figureswere defined. In other cases, the opposite wastrue. In the few cases where a mismatchoccurred, the population data had to be aggre-gated or disaggregated using simpleproportional areal weighting, although morecomplex areal interpolation techniques could beapplied (e.g. Goodchild et al., 1993). Attemptingto compile geographical time series exacerbatesthe problem of boundary consistency and it maybe necessary to use pycnophylactic areal redis-tribution between changed zonations to makethe data compatible over time.

GRIDS

The primary product, an approximation to aglobally continuous surface of population,

World Population Grid 213

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

required the gridding of the polygons followedby an assignment of the population to the cells ofthis grid. Essentially the problem is to convertthe published population count for a givenadministrative unit into a population within thegrid cells that cover the unit. This grid assign-ment is followed by a smoothing to overcomethe jumps between boundaries of the admin-istrative units by taking into account thepopulation values in the adjacent units. Thesmoothing technique used is the smooth pycno-phylactic reallocation described earlier in thispaper, modified to take into account the spher-ical shape of the planet. The routine has beenimplemented in the C programming languageand is executed from within an Arc/Info macro.Thus, Arc/Info is used to do the basic tasks likethe polygon to raster conversion and the estab-lishment of a list of population totals by region.The Arc/Info export format has been used toprepare and deliver the compressed files. Copy-right restrictions do not permit us to release all ofthe actual boundary coordinates.

While the preparation and interpolation of theraster data is conceptually fairly simple, severallogistical problems arose as we implemented thegridding process in Arc/Info’s GRID program.The two major problems – in the sense that theyrequired a great deal of time – were:

(1) very small administrative units disappearedin the polygon to raster process, and werenot assigned to any grid cell. Some of thesehad a large population; and

(2) gaps appeared between regional grids due tosmall differences in the country boundariesobtained from different data sources.

The reason for the omission of small admin-istrative units is attributable to the discretizingeffect which occurs in any polygon to rasterprocedure. It is generally impractical to choose agrid sufficiently fine to avoid this. If more thanone polygon falls into a cell, the value for thepolygon which takes up the largest areal propor-tion of the cell is taken as the value for the entirecell. The remaining polygons are ignored. Allvector polygon to raster programs will have thisproblem. It is important that it is stated whensmall polygons are missed in this manner.

In the present instance the areas of the fiveminute quadrilaterals range from 85 km2 at theequator to 22 km2 at the extreme latitudes of our

data. But Macao is only 9 km2 in size, and inregions with fairly high resolution adminis-trative units (e.g. some of North America andespecially Central America), several census unitsare under 20 km2 in area. These units do notoccupy enough area in a cell for their value to beincluded in the gridded data. Since many ofthese units have high populations (i.e. the urbandistricts), it was necessary to derive a method forincluding them in the grids.

The first step is to grid the vector data at asmaller cell size. We chose 2.5 minutes for thisprocess, twice the final resolution, because thissize was small enough to include most of theunits omitted at five minutes but still largeenough so as not to be unwieldy. For example, inNorth America, 35 units are omitted at the fiveminute cell size but only 15 units are omitted atthe 2.5 minute cell size. It is then possible tocreate a separate grid containing only the miss-ing units. In these grids each of the unitsoccupied only a part of a cell, but the data aretaken to apply to the entire cell. The populationsof the small units are then added to the maingrid so that all of the population for everyadministrative unit is included. This process isfairly simple using the relational database capa-bilities of the program used. Since the averageresolution of the original polygon units was closeto 85 km2 the 2.5 minute grid size was consideredunrealistically precise and not appropriate forthe final distribution.

Many gaps between regions occurred becausethe administrative boundaries came from a widerange of sources. When we assembled the coun-tries into regions, or received data for an entireregion, we addressed these mismatches. How-ever, when we converted the polygons to gridsfor several regions at once, we had to ensure thatthere were no gaps or slivers when adjacentregions were joined. We identified where theboundaries did not match, and where empty oroverlapping cells occurred. A certain amount ofminor fudging was then necessary in theseproblem areas. The process seems simple but infact required several tedious steps, detailed inthe full report.

SURFACES

In the previous section we described somedifficulties in the gridding process. Here we note

W. Tobler et al.214

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

some difficulties in the reallocation process usedto create the spatially continuous populationsurface. For the smoothed redistribution, thepycnophylactic program performed the inter-polative smoothing at the 2.5 minute resolution;the program then aggregated the resulting pop-ulations to the five minute cell size. The results ofthis pycnophylactic interpolation formed thefinal smoothed product.The difference betweenthe smoothed and unsmoothed surfaces show onthe maps for West Africa (Fig. 2) and southeastAsia (Fig. 3). Several limitations of the sourceinformation are apparent from these images. Theresolution of the administrative units used isobviously dependent on availability. Conse-quently the resolution sometimes variesdramatically between, as well as within, coun-tries; for example see Brazil on the world map.We tried to address this problem by varying thenumber of iterations in the smoothing realloca-tion algorithm, using the average resolution ofthe computational regions and a smoothingconvergence criterion. The number of iterationsvaried from only 30 for Central America to 100for the former Soviet Union and the Middle East,where the units are larger. Within many regionsthe variance in resolution was still too large toovercome this problem completely. In Brazil verylarge units are adjacent to many small ones andno single iteration count was able to yieldentirely satisfactory results. A modification of thealgorithm to be adaptive in the amount ofsmoothing might be more successful. This mightrequire a grid size adapted to the resolution,with a corresponding small change in the com-puter program, followed by reaggregation to auniform grid. A further unanticipated resultappeared in the population reallocation method.When the units are relatively homogeneous inresolution the method works well and is appro-priate. In cases where small units abut largeones, or the population density changes dras-tically from one administrative unit to the next,certain anomalies occur. For example, whereurban districts are located next to rural areas, theobjective of achieving maximum smoothnesswill ‘pull’ nearly all of the population of the ruralregion tightly towards the boundary with theurban region. This is apparent in the maps (e.g.in Sumatra and central India), but also occurredin Australia where it is probably very realistic.The problem seems exacerbated in areas where

the resolution variance is greatest.The smoothing redistribution used to create

the population surface maintains the correctpopulation totals for each polygon and includesthe geographic insight that neighbouring placeshave similar values. But it is rather mechanicaland, as previously noted, also has the curiousconsequence of pulling virtually all of the pop-ulation of a sparsely populated region to theborder adjacent to a densely populated unit,leaving the remainder of the sparsely populatedunit completely devoid of population. Such aneffect may represent the true situation, but itlooks strange on density maps – see, for exam-ple, the region around Hyderabad in centralIndia (Fig. 4). Thus the method used for thespatial disaggregation of the population needs tobe improved. One approach is to obtain furtherguidance from geographic theory: for example,the cited interaction potential of population andthe literature on urban density gradients. Thuswe believe that the reallocation can be greatlyimproved by using additional data on factorsthat influence the population arrangement.These factors would include the location and sizeof towns and cities, major infrastructure (roads,railroads), and natural features (rivers, uninhab-itable areas, protected and wilderness areas,etc.). The feasibility of this approach has beenshown in a study by UNEP/GRID in which apopulation density surface for the African con-tinent was produced and subsequentlypublished in UNEP’s Global Atlas of Desertifica-tion; see Deichmann and Eklundh (1991) for adescription of the methodology. In that project,efficient use was made of the capabilities of GISto integrate heterogeneous data within a con-sistent framework. This approach wassubsequently adopted and modified for produc-ing experimental population maps of the BalticStates at 1 km2 resolution (Sweitzer and Langaas,1994) and for the European continent at 10’resolution (Veldhuizen et al., 1995). To apply thisto the entire world requires overlays of theinfrastructure and physical geography, and thesehave not been used in the present project. Butthis work could now be facilitated by theavailability of a consistent global base map,namely the Digital Chart of the World or thedigital world topography (at 5’ by 5’), both nowavailable on compact discs (CDs). Taking intoaccount secondary information is referred to in

World Population Grid 215

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

Figure 2. Unsmoothed and smoothed population density in West Africa.

W. Tobler et al.216

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

Figure 3. Unsmoothed and smoothed population density in southeast Asia.

World Population Grid 217

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

some interpolation literature as co-Kriging, andis an obvious approach as long as the additionalinformation is available and its relationship tothe primary variable is known. Thus satellitescould be used to delineate areas of sparsepopulation, or sensors capable of recordingvisible emissions at night (used judiciously) may

offer a means of detecting, correcting or calibrat-ing global population concentrations.

RESULTS

The final useful products from the severalprocessing steps include:

Figure 4. Smoothed population in India, obtained by pycnophylatic interpolation. Five minute cell size map using 150 iterationsof smoothing.

W. Tobler et al.218

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

d A piecewise continuous population surface,without any smoothing (i.e. a constant popula-tion within each 5’ by 5’ quadrilateral) takinginto account the Earth’s shape and the con-vergence of the meridians. Cells nearer theequator can have a slightly larger populationbecause of the Earth’s shape. This gives aconstant density to the unit.

d Smoothly reallocated population values, by5’35’ quadrilateral.

d Gridded population density surfaces, obtainedby dividing the value in each cell in thepreceding two files by the spherical area, inkm2. Deflation of the true density values mayoccur in this process since the cells couldcontain ocean or other water bodies as well asoccupied land. No correction for this wasapplied.

d The administrative unit name(s) for each gridcell. This is particularly useful if one wishes toincorporate additional data by subdivisions ofa country and automatically assign the valuesto the quadrilaterals, thus avoiding a newinterpolation. Financial or nutritional informa-tion could thus be assigned to the five minutegrid from sub-national observations.

d The centroid coordinates, in latitude and longi-tude, for all of the 19,032 polygons used. Thisform of the data does not allow recovery of thecopyrighted administrative unit boundarieswith any precision, but is useful for moregeneralized tasks. Included with the coor-dinates are the estimated population andcalculated area for each polygon, and theadministrative unit name(s). Population den-sity at spot locations can therefore becalculated. This is a small data set of 19,032records (928 kbytes) and has been put onto anIBM PC diskette accompanying the originalreport, along with a program to display thepoints on a PC screen. The display can be usedto investigate the resolution of the assembleddata in greater detail.

d Several regional maps are given in the fullreport which further illustrate our results. Themore important result is the digital data whichcan be used for analysis, correlation with otherquadrilaterized data,for analytical modeling,or from which the user can produce alternateoutputs and aggregations. It is easy, for exam-ple, to summarize and cumulate population bylatitude or longitude strip, or to produce

population maps at medium scales.Gridding the entire world by five minutes of

latitude and longitude yields a raster of1803123360312 which is 2160 rows by 4320columns, or 9,331,200 cells. But less than a thirdof the Earth is people-occupied land. Conse-quently we have truncated our grid at 72°N and57°S since there are essentially no permanentresidents at higher latitudes. The resulting grid is1548 rows by 4320 columns of four bytes each(before compression). Approximately 30.9% ofthe 6,687,360 cells contain people. The finalglobal rasters range in size from 35.6 Mb to 46.3Mb before compression. Since a good proportionof the world is unoccupied, compression reducesthe storage volume by over 89%.

CONCLUSION AND EXTENSIONS

The global gridded population informationdescribed here should be considered an explora-tory reconnaissance designed to stimulateinterest worldwide. Now that this preliminarybut basic database has been established it can bewidely circulated and its usefulness determined.It will hopefully lead to follow-up projects aimedat improving and expanding the current effort.Improvements can be made by incorporatingmore up-to-date information as it becomes avail-able, and by filling in areas for which we havenot been able to obtain sufficient detail. Higherresolution data are desirable for several parts ofthe world, particularly the former Soviet Unionwhere we have obtained only oblast data, andseveral countries in South America. A criticalcomponent is a maintenance organization. Onemodel would be to have local authorities orresearchers add data to the 5’ by 5’ cells and tokeep them up to date, using a worldwide systemof social observatories.

We describe here several extensions which wehave considered. These include error estimation,the incorporation of urban populations, exten-sion to include movement modelling, and datacompaction schemes. We do not comment on theneed for correlation with other items of globalconcern such as environmental degradation: etc.this need is rather obvious.

Data-Set Comparisons

Given the difficulties of obtaining populationdata at sub-national levels in many countries of

World Population Grid 219

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

the world, we have made no attempt to estimatethe magnitude of the likely errors. But during thecourse of this study we have discovered a fewalternative collections of demographic datawhich might help alleviate some of these con-cerns. We believe that comparison of these withthose we have obtained might shed light on thereliability of our methodology. In particular, theCenter for International Research in the USBureau of the Census has recently made availa-ble their estimates of population by 10’ by 20’cells for some 20 countries. Our data can easilybe aggregated to this level. Of course theproblem of synchronicity remains. In addition,the Goddard Institute for Space Systems (Mat-thews, 1983) has assembled world population byone degree quadrilaterals. Some enhancementsto this tabulation have been performed by theOak Ridge National Laboratory, and these areavailable, but nearly a decade old. Severalcountries, including the US, have populationdata at a much finer resolution than we haveattempted. Some countries have demographicdata by dwelling unit, recorded in coordinates tothe metre or decametre level! These figures couldbe aggregated to the medium resolution we haveassembled and used as an estimate of the errorsintroduced by our redistribution. A few coun-tries even already have such aggregations andthese can be compared directly with our assem-blage in future work. A useful investigationwould be to examine ways in which dataassembled at such micro levels have been used,and whether this is warranted for larger areas ofthe world.

Urban Populations

In some cases it may be possible to distinguishbetween urban and rural populations, treatingthem separately. When an urban populationcount is available by city and if the latitude andlongitude of the centre of a city is known, then itis possible to make an estimate of the populationat various distances from the centre of the city.Thus the people can be assigned to the cells of alattice around the city, after which the ruralpopulation portion can be allocated separately.This should provide a better estimate of thepopulation arrangement than simply using thecity centroid as a point of large population. Itwould be even better to have the urban area

digitized as an individual polygon; this was infact often the case in our second-level data. Thebad news is that we have been unable to discoverany source of comprehensive worldwide citydemographic information with latitude and lon-gitude coordinates for our use in those caseswhere this would have been helpful. We havefound one file, apparently based on UnitedNations data, pre-dating 1992, of 2763 cities withpopulations and coordinates; the file is on thediskette accompanying the full report. The pop-ulation in these cities sums to 1,334,783,002people. We have not used these data in ourcompilation because much of the same informa-tion is captured in the administrative units. Amore comprehensive file, if it existed, could beused to compare with our assemblage, or even toenhance it. The good news is that there isempirical evidence telling us how this might bedone. It seems that relating the population of acity to the geographic area which it covers can bedone with some degree of confidence. Althoughthe correlations are high this must still be usedwith caution since the functional city and thepolitical city often do not coincide. The land areaestimate pertains to the functional city, butpopulation counts are most often based onpolitical cities.

The two empirical regularities on which thefollowing discussion is based pertain to citysizes, as a function of population, and to thearrangement of people within a city. An ide-alized circular city size can be based on theobservation that the land area of a city can bepredicted with considerable accuracy by know-ing the number of its residents: city area=a (population)b, or taking logarithms, ln area=ln a+b ln P. From this an assumed circular citysize of equivalent area can be computed. Thetotal population density of a city, in persons/km2, can also be estimated from this empiricalformula for city area. We also find that thechange in area, or radius assuming a circular city,due to an increase of one person is dA=a dpb ordR=u dPg, where the new coefficients a, b, u, andg can easily be determined from the aboveformula. It is then seen that the average densityincreases with city size, but that the influence ofan individual on the total area or radius of a citydecreases with increasing city size. Perhaps theindividual’s political influence also varies in thismanner.

W. Tobler et al.220

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

For the US, using P for the total populationand R to denote a city radius in km, the empiricalrelationship is R=0.035 P0.44. In some other cul-tures the scaling coefficient, a, is generallysmaller, resulting in more compact towns, butthe exponent, b, appears stable. The empiricalobservations from which this relation wasinferred cover settlement sizes ranging from 150people to over a million persons with very highdegrees of correlation. A similar rule is used, ininverted form, by some archaeologists to esti-mate the former population of excavation sites.The constant of proportionality, a, may alsochange with time. In the US it appears to havechanged since 1945 when the impact of theautomobile affected suburbanization (Tobler,1975).

The next step is to choose a model of thearrangement of the population around the centreof the city, assumed circular. Once this is donethe population can be reallocated to the geo-graphic area of the city. The density at the centreof the urban area can also be estimated from themodels by setting the distance from the centre tozero. The models suggested include the expo-nential decay model suggested by Clark (1951)and the Gaussian model suggested by others.There is disagreement in the literature as towhich model is most appropriate, and, as mod-els, none can be expected to be perfect. Nor arecities really circular, although the equationspredict the land area and total density quite well.Other models including gamma functions andmore complicated types yielding lower popula-tions in the city centre exist (e.g. Vaughn, 1987);some versions result in different urban densitiesin different azimuths (Medvedkov, 1965; Haynesand Rube, 1973) and with variations over time(Bussiere, 1972; Newling, 1973). To get the totalpopulation numbers it is necessary to convert thedensities to a count, inverting the model equa-tions. Using these models one can assign theurban population to a restricted (circular) subsetof the area and spread the rural population overthe remainder. Adjustments may need to bemade at political boundaries, and overlaying acoastline file would be useful in avoiding theassignment of people to water areas and mightrequire a corresponding adjustment to obtain thetotal land area suggested by the model. In thesecases the theoretical radius might also suggestareas for potential future city growth by infill-

ing.This method of reallocating urban population,

using a kernel function to convolve the data, isquite analogous to the one used in statistics toassign two dimensional densities to scatteredobservations (e.g. Silverman, 1986; Scott, 1992;Wand and Jones, 1995). Examples used to distrib-ute urban populations are available in the workof Honeycutt and Wojcik (1990) for the US, andBracken and Martin (1989) for a region in GreatBritain. The kernel is generally chosen to mimicthe (theoretical) distribution of populationwithin a city using one of the above models.Often the urban centroid coordinates are notavailable, although they may be estimated withsufficient accuracy by examining maps, or bychoosing a dominant city within each region.

Movement Data

The immigration–emigration component ofdemography has not been included in theforegoing discussion. Since most change requiresmovement (of information, of people, of credit,or of material) more stress should be placed onthe global measurement of movement. Mostinternational agency data collections are of statevariables rather than of flow variables, to use thelanguage of systems analysis. That is, they givethe sums of the volume of movement but not theactual path of the movements. Complete from–totables for the world by country (circa 200 by 200countries in size) are very difficult to obtain, inpart because of definitional differences. Andcountries usually do not reconize subdivisions offoreign countries. Using sub-national data onmovement at the same level as the data nowassembled for population would require a tableof 19,032 rows by 19,032 columns, although itwould be quite sparse. Tables of industrial oragricultural commodity trade, when available,can be treated in a manner similar to migrationtables, as can more general communicationinformation. To be consistent with the presentreport the information should be given bylatitude/longitude quadrilaterals (using the6,687,360 five minute lat/lon cells would requirea whopping 4.4731013 two-way table entries). Asmentioned in the introduction, partial differ-ential equation models of movement could thenbe used and, as abstract models, should work forrefugees and for global trade too. Realistically,

World Population Grid 221

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

given the computer and data constraints, mete-orologists today are able to work only with cellsabout half a degree on a side. But the geographicmovement of people or global monetary flow, isas important as wind and weather movements.Getting the data promptly and continuously isthe hard part. Aside from this the details are notdifficult.

Data Compaction

Verified theory is the best data compactionmethod, but for numerical data the classicalmethod used by geophysicists for describingcontinuous variables distributed over the surfaceof the Earth is that of spherical surface har-monics, the spherical equivalent of Fourierseries. This has also been reported for thearrangement of population on the Earth (Tobler,1992). Non-scalar demographic change compo-nents, such as migration, may be represented byvector spherical harmonic fields, and similartechniques may be used for trade data (Raskin,1994). But it is not apparent that the arrangementof population density satisfies the harmonicassumption. Consequently, less restrictive spher-ical wavelets have been investigated (Tobler,unpublished). Spherical splines have also beenused to fit spherical distributions, includingpopulation (Wahba, 1981; Dierckx, 1984; Slater,1993). This remains an area for further develop-ment and research applied to human relations.

ACKNOWLEDGEMENTS

Financial support by the California Space Insti-tute, the Consortium for International EarthScience Information Network (CIESIN) throughNational Aeronautics and Space Administration(NASA) grant NAGW-2901, the EnvironmentalSystems Research Institute (ESRI), and theNational Center for Geographic Information andAnalysis (NCGIA) is gratefully acknowledged.Substantive support has also been received frommany cooperative individuals and from over 18agencies, generally in the form of data. Theopinions, conclusions and recommendations arethose of the authors and do not necessarilyrepresent the views of any of the sponsoringorganizations.

NOTES

(1) The Center for International Earth Science Infor-mation Network (CIESIN) makes thequadrilaterized population data publicly availableat no cost on an ftp site, ftp.ciesin.org, placing thedata in a directory called pub/data. For informa-tion contact CIESIN at the addresses below. TheEnvironmental Systems Research Institute (ESRI)also distributes the data on their ArcWorld 1996CD supplement. CIESIN, ESRI, and Microsofthave also published population density mapsbased on the gridded world population estimate.The NCGIA site [email protected] mayalso be visited for further discussion.

CIESIN: 2250 Pierce Road, University Center, MI48710, US. (517) 797-2700; fax: (517) 797-2622;e-mail: [email protected]

(2) Technical Report TR-95-6, The Global DemographyProject (75 pp+PC diskette), can be obtained for$25 from the National Center for GeographicInformation and Analysis, Geography Depart-ment, University of California, Santa Barbara, CA93106-4060, US. E-mail: [email protected]

APPENDIX

Mass Preserving Reallocation from Areal Data

First define the primary condition for mass preserva-tion. This is the required invertibility condition for anymethod of areal information redistribution:

ERi

E f (x, y) dx dy=Vi for all i

where Vi denotes the value (population in the presentcontext) in region Ri (a sub-national polygon).

Next constrain the resulting surface to be smooth byrequiring neighbouring places to have similar values.This is an assumption about spatial demographicprocesses, a form of geographic insight capturing thenotion that most people are gregarious and con-gregate. Densities in neighbouring areas thereforetend to resemble each other unless there is a physicalbarrier. Thus there should be no abrupt jumpsbetween densities in adjacent administrative units.Laplacian smoothness, the simplest kind, is obtainedby minimizing:

ERE S∂ f 2

∂x+

∂ f 2

∂yD dx dy

where R is the set of all regions. The boundary

W. Tobler et al.222

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

condition generally used is

∂ f∂h

=0

This says that the gradient normal to the edge of theregion is flat; it is one of several possible choices. Moredetail is given in the original paper (Tobler, 1979). Alsosee the appendix to the full report for the sphericalversion. The procedure is illustrated graphically inFig. 5.

REFERENCES

Bracken, I. and Martin, D. (1989) The generation ofspatial population distributions from census cen-

troid data, Environment and Planning A 21: 537–43Bussiere, R. (ed.) (1972) Modeles Mathematiques de

Repartion des Populations Urbaines (Paris: Centre deRecherche d’Urbanisme)

Chengrui, L., et al., (eds) (1987) The Population Atlas ofChina (Hong Kong: Oxford University Press)

Clark, C. (1951) Urban population densities, Journal ofthe Royal Statistical Society A, 114:490–96.

Clarke, J. and Rhind, D. (1992) Population Data andGlobal Environmental Change, International SocialScience

Council, Programme on Human Dimensions of GlobalEnvironmental Change ( Paris: UNESCO)

Danko, D. (1992) The digital chart of the World,GeoInfo Systems, 2:29–36

Deichmann, U. and Eklundh, L. (1991) Global digitaldatasets for land degradation studies: a GIS

Figure 5. Illustration of mass preserving reallocation method for 1990 population of county for Kansas (data courtesy of T.Slocum).

World Population Grid 223

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

approach, GRID Case Studies Series No. 4, GlobalResource Information Database (Nairobi: UnitedNations Environment Programme)

Dierckx, P. (1984) Algorithms for smoothing data onthe sphere with tensor product splines, Computing32:319–42.

Dorigo, G. and Tobler, W. (1983) Push–Pull migrationlaws, Annals of the Association of American Geogra-phers 73:1–17

Dureau, F., Barbary, O. Michel, A. and Lortic, B. (1989)Area Sampling from Satellite Images for Socio-Demo-graphic Surveys in Urban Environments (Paris: InstitutFrancais de Recherche Scientific pour Developpe-ment et Cooperation)

Freedman, P. (1993) Counting countries: what makescountries count? The Geographical Magazine, March:10–13.

Goodchild, M., Anselin, L. and Deichmann, U. (1993)A framework for the areal interpolation of socio-economic data, Environment and Planning A,25:383–97.

Haaland, C. and Heath, J. (1974) Mapping of popula-tion density, Demography, 11:321–36.

Hagerstrand, T. (1955) Statistika Primaruppgifter, flyg-kartering och ‘dataprocessing’-maskiner, Ettkombineringsprojeckt, Svensk Geografisk Arbok(Lund University)

Haynes, K. and Rube, M. (1973) Directional bas inurban population density, Annals of the Association ofAmerican Geographers 63(1):40–47

Hickman, B. (1983) Global International Economic Mod-els: Selected Papers from an IIASA Conference(Amsterdam: North Holland)

Honeycutt, D. and Wojcik, J. (1990) Development of apopulation density surface for the coterminousUnited States, Proceedings GIS/LIS 1:484–96

Keyfitz, N. and Flieger, W. (1990) World PopulationGrowth and Aging (Chicago: Chicago UniversityPress)

Lee, Y. (1989) An allometric analysis of the US urbansystem 1960–1980, Environment and Planning A,21:463–76.

Liu, Y. (1987) National Population Atlas of China, 13thInternational Cartographic Conference Proceedings III:481–92

Martin, D. and Bracken, I. (1991) Techniques formodelling population-related raster databases,Environment and Planning A, 23:1069–75

Martin, G. (1992) Final Report of the CGIAR/NOR-AGRIC/UNEP Meeting on Digital Data Requirementsfor GIS Activities in the CGIAR, September 21–24(Arendal, Norway)

Matthews, E. (1983) Global vegetation and land use:new high-resolution data bases for climate studies,Journal of Climate and Applied Meteorology 22:474–87.

Maury, M. (1851) Whale Chart (Washington, DC:

National Observatory)Medvedkov, Y. (1965) Ekonomgeograficheskaya Uzy-

chennosty Rainov Kapitalisticheskogo Mira,Prilojeniya Matematiki v Ekonomucheskoi Geographii 2(Moscow: Institute for Scientific Information), pp110–120.

Muehrcke, P. (1966) Population Slope Maps, MA thesis(Ann Arbor: University of Michigan, Department ofGeography)

National Research Council (1994) Science Priorities forthe Human Dimensions of Global Change (Washington,DC: National Academy Press)

Newling, B. (1973) Urban growth and spatial struc-ture: mathematical models and empirical evidence,Ekistics 36:291–97.

Park, A. (1990) Smooth Pycnophylactic Areal Interpolationand the Estimation of Missing Data, MSc thesis(Leicester: Leicester University, Department ofGeography)

Raskin, R. (1994) Spatial Analysis on the Sphere: AReview, Technical Report 94–7 (Santa Barbara:NCGIA) 1–44

Rhind, D., Evans, I. and Visvalingam, M. (1980)Making a national atlas of population by computer,Cartographic Journal 17:1–11

Rogers, A. (1985) Regional Population Projection Models(Beverly Hills: Sage)

Rylander, G. (1986) Areal Data Reaggregation, MA thesis(East Lansing: Michigan State University, Depart-ment of Geography)

Scott, D. (1992) Multivariate Density Estimation (NewYork: John Wiley)

Silverman, B. (1986) Density Estimation for Statistics andData Analysis (New York: Chapman & Hall)

Slater, P. (1993) World population distribution, AppliedMathematics and Computation, 53:207–23.

Stern, P., Young, O. and Druckman, D. (eds) (1992)Global Environmental Change: Understanding theHuman Dimension (Washington, DC: National Acad-emy Press)

Sweitzer, J. and Langaas, S. (1994) Modelling PopulationDensity in the Baltic States using the Digital Chart of theWorld and Other Small Scale Data Sets (Stockholm:Beijer International Institute of Ecological Econom-ics)

Taylor, J. (1989) The World Maps Set, diskettes (IndianaUniversity of Pennsylvania: Department of Geog-raphy and Regional Planning)

Thomas, R. (1990) Spatial Epidemiology (London: Pion)Thomas, R. (1992) Geomedical Systems: Intervention and

Control (London: Routledge)Tobler, W. (1975) City Sizes, Morphology, and Inter-

action, Internal Report WP-75-18 (Laxenburg:IIASA)

Tobler, W. (1979) Smooth pycnophylactic interpolationfor geographical regions, J. Am. Stat. Assn.

W. Tobler et al.224

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.

74:519–30Tobler, W. (1990) Frame independent spatial analysis,

in M. Goodchild and S. Gopal (eds) Accuracy ofSpatial Data Bases (London: Taylor and Francis)115–122

Tobler, W. (1992) Preliminary representation of worldpopulation by spherical harmonics, Proceedings ofthe National Academy of Sciences of the United States ofAmerica, 89:6262–64

United Nations Development Programme (1990)Human Development Report 1990 (New York: OxfordUniversity Press)

United Nations Population Division (1993) WorldPopulation Prospects (1992 Revision) (New York:United Nations)

Vaughn, R. (1987) Urban Spatial Traffic Patterns (Lon-don: Pion)

Veldhuizen, J. van, van de Velde, R. and van Woerden,J. (1995) Population mapping to support environ-mental monitoring: some experiences at a European

scale, Proceedings, Sixth European Conference onGeographical Information Systems (Utrecht: EGISFoundation)

Wahba, G. (1981) Spline interpolation and smoothingon the sphere, SIAM Journal of Scientific and Statis-tical Computing. 2:5–16

Wand, M. and Jones, M., (1995) Kernel Smoothing(London: Chapman & Hall)

Warntz, W. (1965) Macrogeography and Income Fronts(Philadelphia: Regional Science Research Institute)89–113

Willett, B. (ed.) (1988) The New Geographical Digest(London: George Philip)

World Almanac and Book of Facts (1995) (Mahwah: Funkand Wagnalls) 840–41

World Resources Institute (1990) World Resources1990–1991 (New York: Oxford University Press)

Zachariah, K. and Vu, M. (1988) World PopulationProjections (Baltimore: World Bank and Johns Hop-kins University Press)

World Population Grid 225

INTERNATIONAL JOURNAL OF POPULATION GEOGRAPHY, VOL. 3, 203–225 (1997) © 1997 John Wiley & Sons, Ltd.