andreas forø tollefsen - national bureau of economic research grid c… · tollefsen, andreas...

15
1 PRIO-GRID Codebook 1 Andreas Forø Tollefsen Peace Research Institute Oslo (PRIO) Version 1.01 Last updated March 15, 2012 Please cite: Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure. Journal of Peace Research 49(2): 363-374. 1 PRIO-GRID is a product of the Advanced Conflict Data Catalogue (ACDC) project, funded by the Research Council of Norway. PRIO-GRID was initiated by Andreas Forø Tollefsen and Halvard Buhaug in 2008 and is maintained by Andreas Forø Tollefsen and administered by ACDC project leader Håvard Strand at PRIO. Halvard Buhaug and Gerdis Wischnath have contributed to the finalizing and testing of the data. We thank numerous colleagues at PRIO, Uppsala University, ETH Zurich, University of Colorado Boulder, Columbia University, and elsewhere for crucial feedback during the development of the data project. We also appreciate the providers of the source data that are integrated into the PRIO-GRID framework. The PRIO-GRID website is: http://www.prio.no/CSCW/Datasets/PRIO-Grid/ . Questions and comments should be addressed to Andreas Forø Tollefsen: [email protected] .

Upload: others

Post on 02-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

1

PRIO-GRID Codebook1

Andreas Forø Tollefsen

Peace Research Institute Oslo (PRIO)

Version 1.01 Last updated March 15, 2012

Please cite: Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure. Journal of Peace Research 49(2): 363-374.

1 PRIO-GRID is a product of the Advanced Conflict Data Catalogue (ACDC) project, funded by the Research

Council of Norway. PRIO-GRID was initiated by Andreas Forø Tollefsen and Halvard Buhaug in 2008 and is maintained by Andreas Forø Tollefsen and administered by ACDC project leader Håvard Strand at PRIO. Halvard Buhaug and Gerdis Wischnath have contributed to the finalizing and testing of the data. We thank numerous colleagues at PRIO, Uppsala University, ETH Zurich, University of Colorado Boulder, Columbia University, and elsewhere for crucial feedback during the development of the data project. We also appreciate the providers of the source data that are integrated into the PRIO-GRID framework. The PRIO-GRID website is: http://www.prio.no/CSCW/Datasets/PRIO-Grid/. Questions and comments should be addressed to Andreas Forø Tollefsen: [email protected].

Page 2: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

2

1. Introduction This document describes the development and content of the PRIO-GRID dataset – a standardized spatial grid structure with global coverage at a resolution of 0.5 x 0.5 decimal degrees. See Tollefsen, Strand & Buhaug (2012) for additional information on the background, motivation, and application of PRIO-GRID, and respective source documentation for third-party data that are included in PRIO-GRID.

PRIO-GRID consists of three components. The first is a set of data files (*.dta; *.csv) that contain spatially disaggregated data at the grid cell level. This also includes geographic information systems (GIS) shapefiles containing the polygon grid and the point grid. Many of the data files contain time-series data with one grid representation per calendar year. The second component includes open-source replication scripts that were used to generate the PRIO-GRID data set. These files facilitate replication, modification, and extension of the original files, including joining of additional geo-referenced data. The third component is the documentation consisting of the journal article presenting PRIO-GRID (Tollefsen, Strand & Buhaug 2012) and this codebook.

PRIO-GRID is a versioned dataset, meaning that any changes to the data are released with a new version number. Higher version numbers indicate more recent data. All files, scripts, and documentation should reflect these version changes.

2. The development of PRIO-GRID

Development of PRIO-GRID is done in a relational database management system (RDBMS); PostgreSQL with the spatial PostGIS extension supplying the geometric functionality of the Structured Query Language (SQL) database. In addition, Python scripting language is used to automate the development processes. Each python script connects to the SQL database using the psycopg2 python module. The combination of the Python scripting language and the PostgreSQL/PostGIS provides the PRIO-GRID project with the ability to modify spatial data and read most available spatial data formats. The Python script also provides the ability to loop through data at different ranges and time dimensions, which is crucial since the PRIO-GRID project has a temporal component (represented as annual grids). PRIO-GRID is released with a 0.5 x 0.5 decimal degree cell resolution. This corresponds to a cell of roughly 55 x 55 kilometers at the equator (3025 square kilometers area). Cell area decreases with higher latitudes.

The grid structure is defined by a south-western starting point defined by x and y coordinates (90S and 180W) and represented using the WGS84 geographic reference system. The cell identifier starts at 1 at the south-western corner (column 1 and row 1) and increases by 1 for each column, until reaching 720 (column 720 and row 1). The cell identifier then starts at the next row and begins at 721 (column 1 row 2). The full grid at 0.5 x 0.5 degrees resolution contains 259.200 cells (720 x 360). A majority of these cells cover water and other uninhabited areas (notably the Arctic and Antarctica) and are of little relevance in most applications. To limit file size, the released PRIO-GRID only includes terrestrial grid cells although the full grid is maintained and is available on request.

The current version of PRIO-GRID consists of one grid per calendar year for the period 1946–2008. Each cell can be assigned to one and only one country in each yearly file. To determine country ownership, PRIO-GRID draws on the cShapes dataset (Weidmann et al. 2008). cShapes is a time-series dataset that contains geographic data on the outline of countries (represented as polygons) since 1946, based on the Gleditsch & Ward (1999) list of

Page 3: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

3

independent states. Grid cells that fall completely within the territory of an independent state are assigned the corresponding Gleditsch & Ward country code (gwno). Grid cells that cover the territory of two or more independent states (i.e. the cell intersects with multiple country polygons) are assigned to the country that covers the largest share of the cell’s area (Figure 1). Note that while all terrestrial cells are included in all yearly files, country codes are assigned to cells only in those years that the host country is a member of the Gleditsch & Ward international system (Figure 2). In case of territorial transfer (e.g., from East Pakistan to Bangladesh in 1971), a cell is given the country code that applies to the status at the end of the year, 31 December.

Figure 1. Country code allocation based on majority rule

Figure 2. Grid cells allocated to independent states by year

Note: The graph shows the number of cells with valid country codes. This increases as states become independent according to the Gleditsch & Ward (1999) list of independent states (solid line). However, each yearly grid file contains 64,818 cells (dashed line).

Page 4: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

4

3. Data This section contains a brief presentation of all variables in the PRIO-GRID files and how they were imported and modified to fit into the PRIO-GRID data structure. Each file, or attribute table, contains the same grid cell identifier (gid) and the temporal data tables also include a year variable. This means that gid + year create a unique identifier in the time-series data whereas gid constitutes the unique identifier in the static files. 3.1. PRIO-GRID The PRIO-GRID file is the main attribute table in the PRIO-GRID project. This table contains yearly observations of all terrestrial grid cells (excluding Antarctica and Greenland) for all calendar years between 1946 and 2008. In total, the table contains 64,818 cells x 63 years = 4,083,534 observations (cell years) in total. The PRIO-GRID table contains eight variables:

gid is the grid cell identifier, a unique id code for each cell in the grid. Since we only include the terrestrial cells from the full grid, the gid starts at 49182 and ends at 249344.

col denotes column number for the grid cell; column 1 is the westernmost column in the grid, between 180 and 179.5 decimal degrees W. With one column per half degree, there are 720 columns in PRIO-GRID.

row denotes the row number for the grid cell; while row 1 is the southernmost row (between 90 and 89.5 degrees S) and row 360 is the northernmost row in the full grid in the underlying data, the released data file contains observations for cells between row 69 and 347.

xcoord denotes the longitude coordinate (decimal degrees) for the centroid of the grid cell. Negative coordinates are located west of the Prime Meridian (Greenwich) at 0 degrees longitude.

ycoord denotes the latitude coordinate (decimal degrees) for the centroid of the grid cell. Negative coordinates are located south of the equator at 0 degrees latitude.

year gives the calendar year of observation.

gwno denotes the numerical country code for the country to which the cell is allocated, based on the Gleditsch & Ward (1999) system membership list. Missing values imply non-independent territory. The country code reflects the status as of 31 December of each year.

cellarea gives the area of the land territory of the grid cell, given in square kilometers.

Page 5: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

5

Table 1. Attribute table of PRIO-GRID

gid col row xcoord ycoord year gwno cellarea

49182 222 69 -69.25 -55.75 1946 155 4.132284

49183 223 69 -68.75 -55.75 1946 155 3.950606

49184 224 69 -68.25 -55.75 1946 155 268.3174

49185 225 69 -67.75 -55.75 1946 155 242.042

49186 226 69 -67.25 -55.75 1946 155 310.7517

49898 218 70 -71.25 -55.25 1946 155 3.401011

49899 219 70 -70.75 -55.25 1946 155 346.9918

49900 220 70 -70.25 -55.25 1946 155 179.8895

49901 221 70 -69.75 -55.25 1946 155 806.2632

49902 222 70 -69.25 -55.25 1946 155 1098.728

3.2. Distance The Distance attribute table includes measures of the location of each cell centroid relative to other entities of interest. It includes seven variables:

gid is the grid cell identifier, a unique id code for each cell in the grid.

year gives the calendar year of observation.

bdist1 gives the distance (in kilometer) from the cell centroid to the border of the nearest contiguous neighboring country (based on Gleditsch & Ward system membership list). This implies that cells in e.g. Northern Denmark are measured to the border to Germany even if the straight-line distance to Norway (across international waters) is shorter. Cells belonging to island states with no contiguous neighboring country (e.g. New Zealand) are coded as missing.

bdist2 gives the distance (in kilometer) from the cell centroid to the border of the nearest neighboring country, regardless of whether the nearest country is located across international waters. Hence, for cells belonging to island states (e.g. New Zealand), bdist2 gives the shortest distance to the nearest land territory of another state.

bdist3 gives the distance (in kilometer) from the cell centroid to the territorial outline of the country the cell belongs to. For cells located along a coast and for cells of island states (e.g. New Zealand), bdist3 measures the shortest straight-line distance to international waters. By definition, bdist3 can never have higher values than the two other border distance indicators and for 44 % of the cell years all three border distance estimates are identical.

capdist gives the distance (in kilometer) from the cell centroid to the national capital city in the corresponding country (gwno). Geographical coordinates for the capital cities were derived from the cShapes dataset (Weidmann et al. 2008) and captures changes over time wherever relevant. Figure 3 visualizes these straight-line distances.

Page 6: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

6

ttime gives the estimated cell-average travel time (in minutes) by land transportation from the cell to the nearest major city with more than 50,000 inhabitants. The values are extracted from a global high-resolution raster map of accessibility (Nelson 2008) where ttime reflects the average pixel value within each PRIO-GRID grid cell. Unlike the other distance indicators, ttime does not contain time-varying information.

Figure 3. Cell-capital distance calculations

Note: The map visualizes straight-line distance calculations from the centroid of each grid cell to the capital city (year 2008).

3.3. Conflict The Conflict attribute table includes information on the outbreak and incidence of armed intrastate conflict, derived from the UCDP/PRIO Armed Conflict Dataset (Gleditsch et al. 2002; Themnér & Wallensteen 2011) and adapted to PRIO-GRID through two versions of the PRIO Conflict Site dataset (Raleigh et al. 2006; Dittrich Hallberg 2012). In these datasets, all conflicts are assigned circular conflict polygons that encompass all reported locations of fighting in the given calendar year (Figure 4). Only cells assigned to the host country of the armed conflict (indicated by the gwnoloc indicator in the UCDP/PRIO Armed Conflict Dataset) are coded in conflict. See source material for the Armed Conflict Dataset and the Conflict Site codebook for definitions and further information on armed conflict and the geo-referencing of the conflict zones.

Page 7: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

7

Figure 4. Circular conflict polygons represented in PRIO-GRID

Note: The figure shows the circular conflict site polygon for the 1990 conflict in eastern Mali. All cells that intersect the conflict polygon and belong to the conflict-affected country (i.e. Mali; shaded cells) are coded as being in conflict.

The Conflict table contains 13 variables:

gid is the grid cell identifier, a unique id code for each cell in the grid.

year gives the calendar year of observation.

conf is a simple dummy variable that indicates whether the grid cell was located in a conflict zone in the given year. This indicator is based on the updated v.3 of the Conflict Site coding (Dittrich Hallberg 2012) and contains time-varying values for the period 1989–2008.

civconf is a dummy variable indicating whether the conf is of type 3 or type 4, that is internationalized intrastate or intrastate conflict. See Themnér & Wallensteen 2011 for more information on the type variable.

confold is a similar conflict incidence dummy based on the older v.2 of the Conflict Site coding (see Raleigh et al. 2006), initially developed by Buhaug & Gates (2002). This indicator contains valid values for the period 1946–2005, though users should be aware that the grid-specific incidence coding in many earlier conflicts vary little during the course of conflict due to poor information on the exact location of battle events (meaning that cells that were part of the conflict zone in one or more years are coded in conflict in all years of the conflict).

civcfold is a dummy variable indicating whether the confold is of type 3 or type 4, that is internationalized intrastate or intrastate conflict. See Themnér & Wallensteen 2011 for more information on the type variable.

onset is a dummy that identifies the grid cell hosting the initial battle location for each intrastate conflict, based on coding from Holtermann (n.d.).

conflag1 is a spatial conflict lag indicator that gives the share of ongoing conflict (conf=1) among contiguous cells in the same country in the given year. This indicator takes on values between 0 and 1. For cells with no contiguous cells

Page 8: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

8

in conflict, conflag1 is coded 0 whereas a cell with 3 of 8 contiguous cells in conflict is assigned the value 3/8=0.375.

conflag3 is a spatial conflict lag indicator that gives the share of ongoing conflict among cells within three orders of proximity in the same country in the given year.

conflag5 is a spatial conflict lag indicator that gives the share of ongoing conflict among cells within five orders of proximity in the same country in the given year.

cfoldlag1 is a spatial conflict lag indicator that gives the share of ongoing conflict (confold = 1) among contiguous cells in the same country in the given year. This indicator takes on values between 0 and 1. For cells with no contiguous cells in conflict, conflag1 is coded 0 whereas a cell with 3 of 8 contiguous cells in conflict is assigned the value 3/8=0.375.

cfoldlag3 is a spatial conflict lag indicator that gives the share of ongoing conflict among cells within three orders of proximity in the same country in the given year.

cfoldlag5 is a spatial conflict lag indicator that gives the share of ongoing conflict among cells within five orders of proximity in the same country in the given year.

In some cases the conflict zone polygons in the Conflict Site datasets (Raleigh et al. 2006; Dittrich Hallberg 2012) overlap, implying that several grid cells host more than one intrastate armed conflict in a calendar year. Providing information on the conflict IDs for all conflicts represented in PRIO-GRID while maintaining a 1:1 relationship with the other PRIO-GRID files (i.e. one observation per grid cell per calendar year) would require multiple conflict ID indicators with mostly missing or redundant information. Instead, we offer information on the identity of the conflicts in PRIO-GRID through separate files (Confsitegrid and Confsiteoldgrid) that contain one observation per conflict per cell year. 3.4. Confsitegrid The Confsitegrid attribute table provides the conflict IDs for the conflicts that are indicated by the conf variable, and correspond to v4-2010 of the UCDP/PRIO Armed Conflict Dataset. In cases of multiple overlapping conflict polygons, each cell year is listed once per conflict ID. Georeferenced conflict data are based on the Conflict Site dataset v.3 (Dittrich Hallberg 2012). The Confsitegrid table has valid values for the period 1989–2008, reflecting the temporal coverage of v.3 of the PRIO Conflict Site dataset. This file contains the following four variables:

gid is the grid cell identifier, a unique id code for each cell in the grid.

year gives the calendar year of observation.

confid indicates the identity of the conflict(s) for cell years where conf=1. This indicator is identical to the “ID” variable of the UCDP/PRIO Armed Conflict Dataset. Additional variables from the UCDP/PRIO dataset (incompatibility, type, etc) may be imported into PRIO-GRID by joining on the confid.

type indicates the type of the conflict. The UCDP/PRIO Armed Conflict Dataset identifies four types of conflict. Type 1 conflicts are extrasystemic conflicts between a state and a non-state group outside its own territory. Typically

Page 9: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

9

type 1 conflicts are referred to as colonial conflicts. Type 2 conflicts are are interstate conflicts between two or more states. Type 3 conflicts are internal armed conflicts between a state and one or more opposition groups. Type 4 conflicts are internationalized internal armed conflicts between a state and one or more opposition groups with intervention from other states on one or both sides.

3.5. Confsiteoldgrid The Confsiteoldgrid table contains the same information as Confsitegrid, but is based on the older v.2 of the PRIO Conflict Site dataset. This indicator has valid values for the period 1946–2005, reflecting the temporal coverage of v.2 of the PRIO Conflict Site parent dataset Buhaug & Gates 2002; Raleigh et al. 2006). This file corresponds to the confold coding and contains the following four variables:

gid is the grid cell identifier, a unique id code for each cell in the grid.

year gives the calendar year of observation.

confid indicates the identity of the conflict(s) for cell years where confold=1. This indicator is identical to the “ID” variable of the UCDP/PRIO Ared Conflict Dataset. Additional variables from the UCDP/PRIO dataset (incompatibility, type, etc) may be imported into PRIO-GRID by joining on the confid.

type indicates the type of the conflict. The UCDP/PRIO Armed Conflict Dataset identifies four types of conflict. Type 1 conflicts are extrasystemic conflicts between a state and a non-state group outside its own territory. Typically type 1 conflicts are referred to as colonial conflicts. Type 2 conflicts are are interstate conflicts between two or more states. Type 3 conflicts are internal armed conflicts between a state and one or more opposition groups. Type 4 conflicts are internationalized internal armed conflicts between a state and one or more opposition groups with intervention from other states on one or both sides.

3.6. Socioeco The Socioeco attribute table includes variables related to the demographic and socio-economic characteristics of the cell. Note that while the table contains observations of all terrestrial grid cells in all years, the variables only contain data for certain years (see below for details). This file contains four variables:

gid is the grid cell identifier, a unique id code for each cell in the grid.

year gives the calendar year of observation.

pop gives the population size for each populated cell in the grid, extracted from the Gridded Population of the World, version 3 (CIESIN 2005). Population estimates are available for 1990, 1995, 2000, and 2005. The unit is number of people in the cell; to obtain true population density estimates this variable should be divided by the cellarea variable contained in the PRIO-GRID table.

Page 10: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

10

imr indicates cell-specific infant mortality rate, based on raster data from the SEDAC Global Poverty Mapping project (Storeygard et al. 2008). The imr variable is the average pixel value inside the grid cell. The value unit is the number of children per 10,000 that die before reaching their first birthday. This indicator is available for the year 2000 only.

3.7. Physclimate The Physclimate attribute table contains variables that tap physical characteristics and climatic conditions for all cells in PRIO-GRID. The file contains nine variables:

gid is the grid cell identifier, a unique id code for each cell in the grid.

year gives the calendar year of observation.

mnt gives the proportion (i.e., average pixel value, in percent) of mountainous terrain within each cell. This indicator is based on high-resolution mountain raster data that were developed for UNEP’s Mountain Watch Report (WCMC-UNEP 2002). This indicator does not vary over time.

frst To measure the coverage of forest areas we include the percentage forest cover in a cell extracted from the Globcover 2009 dataset. To compute the frst variable we aggregate the classes in Globcover 2009 with the land class number equal to : 40, 50, 60, 70, 90, 100, 110, 160 or 170. The value indicate the percentage area of the cell covered by forested area.

irri gives the proportion (i.e., average pixel value, in percent) of area equipped for irrigation within each cell. The data is taken from the FAO Aquastat irrigation raster (Siebert et al. 2007), which indicates pixelated data on areas equipped for irrigation as of year 2000. This indicator does not vary over time.

prec gives the yearly total amount of precipitation (in millimeter) in the cell, based on monthly meteorological statistics from the University of Delaware (NOAA 2011). This indicator contains data for all years 1946–2008.

temp gives the yearly mean temperature (in degrees Celsius) in the cell, based on monthly meteorological statistics from the University of Delaware (NOAA 2011). This indicator contains data for all years 1946–2008.

spi6 is an aggregated yearly Standardized Precipitation Index that indicates within-year deviations in precipitation based on monthly data. For each month, the monthly SPI6 index measures deviation from long-term normal rainfall during the six preceding months. The values are standardized where deviation estimates less than 1 standard deviation indicate near normal rainfall; 1 to 1.49 standard deviations from mean indicate moderately dry conditions; 1.5 to 1.99 indicate severely dry conditions; and values in excess of 2 standard deviations indicate extremely dry conditions. See Guttmann (1999) and McKee et al. (1993) for more on the SPI6 calculation and drought measurement. The monthly data were then aggregated to a yearly format and categorized to indicate anomalous years. The annualized spi6 is coded 1 if there were at least three consecutive months with SPI6 ≥ 1 in the given grid cell during the year (moderate drought); spi6 = 1.5 if SPI6 was ≥ 1.5 for at least

Page 11: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

11

two consecutive months (severe drought); and spi6 = 2.5 if both of the above criteria are met (extreme drought). A spi6 value of 0 indicates that no drought event occurred during that year. The SPI6 data are extracted from NetCDF data provided by the International Research Institute for Climate and Society derived from the Global Precipitation Climatology Centre (GPCC), see Rudolf et al. (2010) for more information. Cells without spatial or temporal SPI6 coverage is coded as missing.

spi6dist measures the distance (in kilometer) to the nearest cell with spi6 value above 0 for the current year. For cells with a spi6 above 0, the spi6dist indicator is coded 0 kilometers by default. The spi6dist measures if there is a spi6 within a 500 kilometers radius from the cell centroid. If no spi6 is present within 500 kilometers the spi6dist is set to missing.

3.8. GeoEPR The GeoEPR attribute table contains information of the identity of spatially defined, politically relevant ethnic groups settled in the grid cell, derived from the GeoEPR-ETH v.2 dataset (Wucherpfennig et al. 2011). Note that when more than one group exists inside a cell, this file contains one row per group per cell year, where gid + year + grpid constitute the unique observation identifier. Five variables are included:

gid is the grid cell identifier, a unique id code for each cell in the grid.

year gives the calendar year of observation.

grpid denotes the ID of the ethnic group in the cell. This ID variable corresponds to the cowgroup variable of the GeoEPR-ETH and EPR-ETH datasets, which enables importing additional group-level information on e.g. political status and population size.

grptype refers to the settlement pattern of the coded ethnic group. Users should be aware that type = 5 refers to both large (often dominant) groups that are settled throughout a country and small, dispersed groups that (perhaps unrealistically) are coded with the entire country as their settlement area due to lack of more precise information. See codebook and additional documentation for the GeoEPR-ETH dataset for further information.

grparea measures the coded group’s settlement area as proportion (percent) of the cell’s total land area (cellarea).

3.9. Globcover The Globcover attribute table includes land cover information within each cell extracted from the Globcover 2009 project (Bontemps et al. 2009). To preserve as much information from the original land cover raster dataset the table includes area calculation of each land cover class within each cell. In cells containing several land classes, each type is listed in a separate row. Accordingly, gid + lclass constitute the unique observation identifier. See Bontemps et al. (2009: 52) for more information on the classes. This file contains three variables:

Page 12: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

12

gid is the grid cell identifier, a unique id code for each cell in the grid.

lclass gives the land class code.

lclasspct gives the proportion (percent) of the cell area covered by the given land class.

3.10. Nordhaus The Nordhaus attribute table contains information on local economic activity. This file is based on the G-Econ dataset (Nordhaus 2006) which is released at a 1x1 decimal degree resolution. Hence, each of the G-Econ cells contains four 0.5x0.5 decimal degree cells. This has to be taken into account when using the data together with the other PG variables. The Nordhaus file includes the economic gross cell product (GCP) variables from G-Econ, indicating the level of local economic activity for five-year intervals since 1990. In border areas, the G-Econ 1x1 decimal degree cells might overlap with PRIO-GRID cells allocated to a neighboring country. To minimize vias, the PG only extracts G-Econ data for cells that have the same country code as the G-Econ cell represents. See the original source documentation for further details on the data. Ten variables are included:

gid is the grid cell identifier, a unique id code for each cell in the grid.

gcppc90 indicates the gross cell product per capita in 1990, measured in USD. This variable is calculated by multiplying the gcp90 value with one billion divided by the population of the G-Econ 1 x 1 decimal degree cell (gcp90 *1,000,000,000 / gpw90).

gcppc95 indicates the gross cell product per capita in 1995, measured in USD.

gcppc00 indicates the gross cell product per capita in 1995, measured in USD.

gcppc05 indicates the gross cell product per capita in 1995, measured in USD.

gcp90 indicates gross cell product in 1990, measured in Billion USD.

gcp95 indicates gross cell product in 1995, measured in Billion USD.

gcp00 indicates gross cell product in 2000, measured in Billion USD.

gcp05 indicates gross cell product in 2005, measured in Billion USD.

gcpqual indicates the quality of the GCP values. Quality is a measure of the quality of the economic data. Quality = 1 for countries for which the data are consistent, but it does not capture the quality of the underlying country statistics. In general, quality < 1 indicates that there are major inconsistencies in one of the underlying data inputs into GCP. See the G-Econ definition table, available at http://gecon.yale.edu/.

4. Adding additional data using the provided shapefile

In addition to the data available in the tables explained above, we provide two shapefiles that makes it possible for users to add their own data. One shapefile with the cell geometry and one shapefile with the centroid geometry are provided. These files may be used in a GIS

Page 13: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

13

software to extract, join or overlay with other spatial data. These shapefiles may be joined to the various attribute tables using the gid variable. We provide two different examples on how to combine data with the PRIO-GRID using ArcGIS. The first one relates to cases where one wants to combine vector data with the PRIO-GRID data. In these cases one would use the spatial join function in ArcGIS. If the relationship is 1:1, use the PRIO-GRID cell shapefile as the target layer and the 3rd party vector shapefile as the join features. If one has many vector data overlapping with each cell, one would typically use the 3rd party vector shapefile as the target layer and the PRIO-GRID cell shapefile as the join feature. This would create a one-to-many join where all the 3rd party features would be assigned the gid for the PRIO-GRID cell overlaying the 3rd party data. The other example relates to instances where one wants to aggregate 3rd party raster data to the PRIO-GRID cells. In these cases one would use the zonal statistics in ArcGIS. The PRIO-GRID cell shapefile would work as the zonal features, and the 3rd party raster as the input raster. One would also select the gid variable as the feature id. The gid makes it easy to merge the output table from the zonal statistics with the rest of the PG data. Please refer to your GIS software user manual for more information.

5. Using the python scripts Information to be added in future updates. Please contact the author in case of questions regarding the python scripts.

Page 14: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

14

6. References

Bontemps, Sophie; Pierre Defourny & Eric Van Bogaert (2009) Globcover 2009. Products Description and Validation Report. European Space Agency. (http://globcover.s3.amazonaws.com/LandCover2009/GLOBCOVER2009_Validation_Report_1.0.pdf ). . Buhaug, Halvard & Scott Gates (2002) The geography of civil war. Journal of Peace Research 39 (4): 417-433. CIESIN - Center for International Earth Science Information Network, Columbia University; and Centro Internacional de Agricultura Tropical (CIAT) (2005) Gridded Population of the World Version 3 (GPWv3): Population Grids. Palisades, NY: Socioeconomic Data and Applications Center (SEDAC), Columbia University. (http://sedac.ciesin.columbia.edu/gpw/). Dittrich Hallberg, Johan (2012) PRIO Conflict Site 1989-2008: A geo-referenced dataset on armed conflict. Conflict Management and Peace Science (In press). Gleditsch, Kristian S & Michael D Ward (1999) Interstate system membership: A revised list of the independent states since 1816. International Interactions 25: 393-413. Gleditsch, Nils Petter; Peter Wallensteen, Mikael Eriksson, Margareta Sollenberg & Håvard Strand (2002) Armed Conflict 1946–2001: A new dataset. Journal of Peace Research 39(5): 615–637. Guttmann, Nathaniel B (1999) Accepting the Standardized Precipitation Index: A calculation algorithm. Journal of the American Water Resources Association 35(2): 311-322. Holtermann, Helge (n.d) Location of armed conflict onsets: Documentation of coding decisions. Unpublished. McKee, Thomas B; Noland J Doesken & John Kliest (1993) The relationship of drought frequency and duration to time scales. Proceedings of the 8th Conference of Applied Climatology, 17-22 January, Anaheim, CA. American Meteorological Society, Boston, MA: 179-184. National Oceanic and Atmospheric Administration. 2011. Delaware climate data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA. http://www.esrl.noaa.gov/psd/. Nordhaus, William D (2006) Geography and macroeconomics: New data and new findings. Proceedings of the National Academy of Sciences of the USA 103(10): 3510-3517. Rudolf, Bruno; Andreas Becker, Udo Schneider, Anja Meyer-Christoffer & Markus Ziese (2010) GPCC Status Report, December 2010. Global Precipitation Climatology Centre, http://gpcc.dwd.de.

Page 15: Andreas Forø Tollefsen - National Bureau of Economic Research GRID C… · Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure

15

Raleigh, Clinadh; David Cunningham, Lars Wilhelmsen & Nils Petter Gleditsch (2006) Conflict Sites 1946-2005. (http://www.prio.no/sptrans/1140767671/conflict%20site%20codebook%20v2.pdf). Siebert, Stefan; Petra Döll, Jippe Hoogeveen, J-M Frenken Karen Frenken & Sebastian Feick (2007) Development and validation of the global map of irrigation areas. Hydrology and Earth System Sciences 9: 535-547. Storeygard, Adam; Deborah Balk, Marc Levy & Glenn Deane (2008) The global distribution of infant mortality: A subnational spatial view. Population, Space and Place 14 (3): 209-229 Themnér, Lotta & Peter Wallensteen (2011) ‘Armed conflict, 1946–2010.’ Journal of Peace Research 48(4): 525–536. Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified spatial data structure. Journal of Peace Research 49(2): 363-374. UNEP-WCMC World Conservation Monitoring Centre. 2002. Mountain Watch 2002. http://www.unep-wcmc.org/mountains/mountain_watch/pdfs/WholeReport.pdf. Weidmann, Nils B; Doreen Kuse & Kristian Skrede Gleditsch (2008) The geography of the international system: The CShapes Dataset. International Interactions: Empirical and Theoretical Research in International Relations 36(1): 86-106. Weidmann, Nils B; Jan Ketil Rød & Lars-Erik Cederman (2010) Representing ethnic groups In space: A new dataset. Journal of Peace Research 47(4): 491-499. Wucherpfennig, Julian; Nils B Weidmann, Luc Giardin, Lars-Erik Cederman & Andreas Wimmer (2011) Politically relevant ethnic groups across space and time: Introducing the GeoEPR dataset. Conflict Management and Peace Science 20 (10): 1-15.