reverse geocoding – a new method of enumeration...

14
1 Reverse Geocoding – A New Method of Enumeration Listing Naresh Kumar 1 , Jin Chen 1 and Dong Liang 1 Abstract: We have entered into a new era of the geo-information revolution. A vast amount of data/information that we generate is georeferenced with the aid of geospatial technologies. Some of these data are available publically. Not only does the availability of these georeferenced datasets affect our day-to-day lives, such as in finding an address and an optimal route, but also influences the way we administer social surveys and conduct social science research. This article demonstrates the use of reverse geocoding and web-mining for developing a georeferenced enumeration list of residential units. Not only does it offer an additional opportunity to research to develop enumeration lists in a cost-effective manner, but also extends the scope of spatial sampling to social surveys. Unlike traditional methods of sampling, spatial sampling for social surveys involves sampling locations/areas first and then sampling respondents at/around the locations. Thus, it ensures spatial coverage and population representation. Four different methods of spatial sampling were employed to draw four sets of samples in the Chicago Metropolitan Statistical Area (MSA). An enumeration list of residential addresses at/around each sample site was developed with the aid of reverse geocoding and web-mining using publically available datasets; a residential address was selected randomly from this list. The sampled residential addresses were compared with the US Postal Service’s address database. Our analysis suggests that 98.7% of the addresses that we generated using reverse geocoding and web-mining were valid residential addresses. Results are important, and suggest that the availability of geo-referenced data provides us with a unique and unprecedented opportunity to develop an enumeration list of residential addresses in a cost- effective manner, and augment the scope of spatial sampling and multi-level social science research. Keywords: reverse geocoding, spatial sampling, geo-information revolution. 1 Department of Geography, University of Iowa, IA – 52242. Corresponding Email: [email protected] NOTE: Please do not quote or cite.

Upload: others

Post on 22-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

1

Reverse Geocoding – A New Method of Enumeration Listing

Naresh Kumar1, Jin Chen1 and Dong Liang1

Abstract: We have entered into a new era of the geo-information revolution. A vast amount of data/information that we generate is georeferenced with the aid of geospatial technologies. Some of these data are available publically. Not only does the availability of these georeferenced datasets affect our day-to-day lives, such as in finding an address and an optimal route, but also influences the way we administer social surveys and conduct social science research. This article demonstrates the use of reverse geocoding and web-mining for developing a georeferenced enumeration list of residential units. Not only does it offer an additional opportunity to research to develop enumeration lists in a cost-effective manner, but also extends the scope of spatial sampling to social surveys. Unlike traditional methods of sampling, spatial sampling for social surveys involves sampling locations/areas first and then sampling respondents at/around the locations. Thus, it ensures spatial coverage and population representation. Four different methods of spatial sampling were employed to draw four sets of samples in the Chicago Metropolitan Statistical Area (MSA). An enumeration list of residential addresses at/around each sample site was developed with the aid of reverse geocoding and web-mining using publically available datasets; a residential address was selected randomly from this list. The sampled residential addresses were compared with the US Postal Service’s address database. Our analysis suggests that 98.7% of the addresses that we generated using reverse geocoding and web-mining were valid residential addresses. Results are important, and suggest that the availability of geo-referenced data provides us with a unique and unprecedented opportunity to develop an enumeration list of residential addresses in a cost-effective manner, and augment the scope of spatial sampling and multi-level social science research. Keywords: reverse geocoding, spatial sampling, geo-information revolution.

1 Department of Geography, University of Iowa, IA – 52242. Corresponding Email: [email protected] NOTE: Please do not quote or cite.

Page 2: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

2

INTRODUCTION Modern geo-spatial technologies georeference a vast amount of (non-geographic) data/information we collect, ranging from a bank transaction to making a phone call. For example, we can locate an airplane or predict weather at a given location in real time, identify the location where a phone call originates from, and trace a person’s interaction across geographic space and time. Most of these data are stored and maintained digitally, and some of these data are available publically. With the availability of these georeferenced data, we have entered into a new era of geo-information revolution. This, in turn, has a phenomenal impact on our day-to-day lives, ranging from finding our neighbors’ names and phone numbers to planning a trip. This is also changing the way we collect survey data and conduct social science research. Utilizing these publically available datasets, namely Google Earth and White Pages, this article demonstrates how these datasets can be utilized to develop an enumeration list of georeferenced addresses, and extend the scope of spatial sampling to social surveys. While the major focus of social surveys has been on population representation, spatial sampling focuses on spatial coverage and representation of population in question. Spatial sampling has been used extensively for sampling natural resources. But its application to social surveys has been limited, because of the unavailability of a complete sampling frame of human population with the locational (or georeferenced) information. Traditionally, a pseudo-sampling frame of geographic areas is created to implement spatial sampling, because a finite and complete sampling frame of natural resources is difficult to construct, especially for large geographic areas. Such a sampling frame is constructed by overlaying a geometric grid over the area of interest, referred to as the sampling domain. Once the sampling frame is constructed and characterized, any classical sampling methods (such as random systematic and random stratified) can be employed to draw a sample of locations or areas. The selection of locations or areas, however, does not necessarily represent people or residential units or households. In this paper, we demonstrate the application of reverse geocoding coupled with web-mining to construct a georeferenced enumeration list of residential units at and around sample locations. This, in turn, extends the application of spatial sampling to social surveys. Spatial sampling (and its integration with the reverse geocoding) offers several advantages over traditional (non-spatial) social survey methodology. First, it ensures spatial coverage and representation of population distribution across geographic space. Human population is not distributed homogenously, and can range significantly from rural to urban areas and within urban areas. Traditional population data are aggregated, analyzed, and available at higher-order geographic units (such as census blocks in the US) due to confidentiality issues, and their shapes and sizes vary greatly. Integration of satellite remote sensing with the existing secondary datasets, such as the US Census, allows us to develop population estimates at a very fine spatial resolution; LandScan population concentration data at 90m spatial resolution for the continental US, and 400m and 1km spatial resolution worldwide, are some examples of such datasets.1 The availability of these data formulates bases to draw a spatially representative sample of human population. Second, spatial sampling can capture spatial heterogeneity in the distribution of socio-economic and demographic characteristics, which is critically important for social surveys. The distribution

Page 3: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

3

of human population and its socio-economic characteristics witness significant spatial disparity and segregation.2 Spatial analytical methods, such as local spatial autocorrelation and semivariance, can be used to quantify and characterize geographic space by socio-economic and demographic attributes, and construct homogenous strata in terms of socio-economic characteristics. This, in turn, can ensure socio-economic representation of population across geographic space. Third, spatial sampling is likely to ensure better population representation as compared to traditional methods used for social surveys. A finite and complete sampling frame of population is needed to draw a representative sample, but an up-to-date and complete sampling frame of population is rarely available. For spatial sampling, however, we rely on the sampling frame of areas, which can be updated frequently and constructed at any spatial resolution. Once the sampled areas and/or locations are identified, the reverse geocoding, presented in this paper can generate an enumeration list of addresses within the selected areas and/or at or around the sample sites. Fourth, reverse geocoding adds locational reference (or georeference) to the selected sample. This, in turn, is important to attach multi-level socio-physical environmental contexts to the selected sample. With the increasing importance of place and space, these contextual data are becoming important to study social, behavioral, and health outcomes, because peoples’ long-term exposure to their immediate place-specific socio-physical and chemical environment is likely to influence their attitudes, behavior, social and economic outcomes, and health.3-5 Integration of survey data with the multi-level socio-physical contextual data from multiple sources is likely to augment the scope of the survey data and advance interdisciplinary, multi-level social science research. Utilizing four sets of samples (of locations) drawn using four different methods of spatial sampling, this article demonstrates the use of reverse geocoding for developing an enumeration list around the sampled sites. This list is utilized to select a residential unit around each site and to select a respondent for administering a pilot General Social Survey (GSS) in the Urban Chicago MSA. The remainder of this article is organized into three sections. The first section describes the study area, data used, and methods employed. The second section, presents the implementation of reverse geocoding with the spatial sampling design, and the final section discusses the findings of this research along with their implications and limitations. METHODS AND MATERIALS Study Area: The pilot survey was conducted in the urban Chicago, MSA. Given that there is a significant gradient in socio-physical and chemical environment, the study area is ideally suited for this experiment, as it will demonstrate how effectively the proposed method represents enumeration list in this area with very diverse population in terms of socio-economic characteristics and population concentration. Data: The data for this research come from a variety of sources, namely Google Earth, White Pages, the US Census, and Oak Ridge National Laboratories. The latter two datasets were utilized to construct a pseudo-sampling frame of human population and characterize it with the socio-economic characteristics. The Google Earth dataset was utilized for reverse geocoding to develop a valid list of addresses at/around the selected sites. The White Pages database was utilized to extract a list of residential units from the valid addresses from the Google Earth.

Page 4: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

4

Method: Four different methods of sampling, namely clustered random (adopted for GSS),6 optimized clustered random, spatially random, and optimized spatial, were employed to draw four different sets of sample sites. The first two methods were implemented in two stages: first, census tracts were drawn, and then random points were simulated with the selected census tracts. The first set of census tracts was selected using the GSS design, in which census tracts were sorted on socio-economic characteristics and then clusters were drawn using a systematic random sampling. The second set of census tracts were selected using an optimization method, such that the selected census tracts captured the maximum variability in the composite index (comprised of population concentration, and socio-economic status (SES)) and the local spatial autocorrelation was minimized. Once the clusters were selected, 250 random points were simulated within the selected clusters proportionate to population size of these clusters. For the spatial random sampling, the study domain (Urban Chicago MSA) was partitioned into 400m pixels, and these pixels were selected randomly avoiding for spatial autocorrelation. In the optimized designed, however, the selected sample set maximized the variance in the composite index and minimized spatial autocorrelation (Fig 1).7, 8 IMPLEMENTATION Unlike traditional social surveys, for the spatial social survey we sample locations/areas first and then we sample respondents at/around the sampled locations/areas (Fig 2). The spatial sampling integrated with reverse geocoding and web-mining is implemented in four steps as illustrated in Fig 3. These steps include: selection of sample sites using a spatial sampling method, developing a list of valid addresses at/around the sample sites, extraction of residential units at the selected addresses and sampling residential units, and validation of sampled residential addresses. Step 1: Selection of sample sites using a spatial sampling design: Since population representation is an important requirement for social surveys, a sample of locations/areas which represent population distribution is selected in the first step. A spatially organized sampling frame of human population, however, is rarely available. We utilize high resolution land-scan1 data to construct a pseudo-sampling frame of population (organized at 400 x 400m spatial resolution), and utilizing census socio-economic and demographic data from the 2000 US Census,9 these SES attributes were attached with the sampling frame. This results in a pseudo-sampling frame of population with socio-economic characteristics. The four methods of spatial sampling, described above, were employed to draw four different sets of sample sites using this sampling frame (Fig 1). Step 2: Developing an enumeration list of addresses at/around the selected sample sites: The selection of sample sites does not necessarily represent a residential address. Utilizing Google Earth, we construct a valid list of addresses at/around the sample sites. The process of locating/situating an address onto the earth’s surface is called geo-coding.10 For spatial social surveys, however, we have the locations and need to extract addresses at/around these locations. This process is called reverse-geocoding.11 This involves two important decisions: what geographic extent to use to search for addresses around a sample site and how many addresses to extract. The decision about the first should be dictated by the domain/geographic area a sample site represents. If sample site is a 400x400m pixel, this is the geographic extent within which we

Page 5: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

5

should search for valid addresses. If the goal is to sample residential units, this can be problematic for commercial areas and rural areas, because in commercial areas all addresses could be commercial and in rural areas there may not be enough valid addresses within small geographic extent. Keeping the above constraints in mind, we integrated reverse geocoding with web mining. This verifies whether valid residential units are present within the chosen geographic extent. If the required number of residential units is not present within the selected geographic extent, it is expanded and iteratively a valid list of residential units is prepared. Ideally, all addresses present within the domain of a sample site must be included in the enumeration list from which to draw respondents. This process, however, can be computationally intensive, especially in a densely populated area. Depending on the number residential units to be drawn around each sample site and computational time, the number of residential addresses to include in the enumeration list must be chosen with care The number to be included in the list could be set to 10-20 times the number of respondents to be drawn from the list. In the proposed study, we set this number to 20 valid residential addresses to draw one from this list of 20 addresses. We utilize the Google Application Programming Interface (API) for reverse geocoding addresses. This application allows us to pass on a location (with longitude and latitude), and extracts an address at/around the location. For example, passing on a location with longitude ~ -91.53615 and latitude ~ 41.661335 returns returned 35 E. Jefferson St., Iowa City, IA 52245 (Fig 4). In addition, this application can return ten different levels of information at/around a location, including Street Name, City, County, State, and Country. Since the location can be a business or an institute or a residential building, such as a house, an apartment complex, or a condominium, it is important to determine whether at least one valid address is present at this location. The White Pages database allows us to determine whether an address has residential units. Since we need multiple addresses around a sample site and each site represents a 400x400m pixel, it is important that addresses present within the domain of a sample site are selected randomly. We simulate a random point within the pixel, find an address at/around the simulated point, validate the address, and then pass it on to White Pages to fetch valid residential units at this address. Iteratively, we develop a list of twenty residential addresses around each sample site. In some cases, residential addresses are not present within the domain of a sample sites, because either most areas around the sample sites are predominantly commercial or uninhabited. In such cases, we expand the geographic extent by an increment of 0.001 degree (~ 100m) around the sample sites to simulate random points and repeat the procedure until a list of twenty valid addresses is complete. Step 3: Selecting the required sample from the enumeration list: Any conventional method of sampling can be employed to select the desired number of residential units from the enumeration list around each site. The decision about the number of residential units to be selected can be influenced by a number of factors, including response rate, importance of a sample site, and mode of survey. If the response rate is 10% (with a standard deviation of 2%), selecting 14 addresses from each list will 95% ensure participation of at least one respondent around each sample site.

Page 6: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

6

Step 4: Validation of the selected sample using US Postal Service Database: To demonstrate the validity of reverse geocoding, we created a list of twenty addresses around each of 951 samples sites (chosen using the four different methods of spatial sampling described above), and one residential unit was drawn randomly from each list. We validated these addresses using AccuZIP6, a comprehensive mailing list management system.12 AccuZIP6 is Coding Accuracy Support System (CASS) and Presort Accuracy Verification and Evaluation (PAVE), certified by the United States Postal Service (USPS). Of these 951 addresses, 938 (98%) were valid addresses. The thirteen addresses that were not valid addresses are shown in Fig 5. As evident from this figure, these addresses are distributed randomly and no systematic bias is evident. These addresses were investigated further and provided insight into the limitations of reverse geocoding. These thirteen invalid addresses were not matched due to four different reasons, namely, outdated data on dismantled buildings (Fig 6a), wrong nomenclature for building type (apartment building listed as a town house) (Fig 6b), mismatch in the address format between the USPS database and the White Pages database (Fig 6c), and the duplicated listing of an address in multiple towns (Fig 6d). Although the number of invalid addresses retrieved using reverse geocoding is significant, it will be important to validate all addresses and evaluate the reasons for invalid addresses. It can serve dual purposes: provide a sense of completeness of addresses in the list, and help us understand the reasons for invalid addresses. DISCUSSION A finite and complete enumeration listing of population in question is critically important for social surveys. Field listing and existing database lists (such as the US Census and USPS addresses) are widely used and accepted listing methods. Utilizing publically available datasets, this paper presents a novel method of constructing an enumeration list. The proposed method utilizes two different datasets, Google Earth13 and White Pages,14 and can develop a list of addresses, residential and household units, by geographic locations. Although the proposed method can develop an enumeration list for any geographic and/or administrative units and can be used for traditional social surveys, its unique benefits are realized when utilized with the spatial sampling methods. The proposed method has several important implications for social surveys. First, it can help overcome the problems of over- and under-coverage of the population that the traditional methods of listing suffer from. Over-coverage involves listing of units not in the area of interest; under-coverage involves missing residential units that are of interest for the study. If residential units of interest are missing or undesired units are included unintentionally, it can lead to sampling bias. The field listing method has been accepted as a standard method for survey research. It involves training an enumerator to identify residential units in the study area. The enumerator is instructed to count a variety of residential units. This can include multi-family dwelling units, counting mailboxes, door bells, or utility meters. The field listed data are usually validated with respect to census data. While under-coverage is a well-known issue associated with the field listing method, it is also expensive and time consuming. Thus researchers have recently begun to utilize the database listing method. For example, the United States Postal Services (USPS) maintains a database for all the mail delivery points in the country. Commercial firms are licensed to the databases.12, 15

Page 7: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

7

The National Opinion Research Center (NORC) utilizes these data for drawing samples for GSS. These services can provide a list of mailing addresses (with names in many cases) by different census units, including census block. Metadata about the type of each address are also available. Although a census block is a fine spatial unit, location accuracy in geocoding these data can pose the problem of under-coverage by eliminating un-geocoded addresses.10 In addition, the cost associated with acquiring these data could be important issue. As discussed earlier, the proposed method of enumeration listing augments the scope of spatial sampling for social surveys. Spatial sampling can ensure better spatial coverage and population representation than the traditional methods of sampling. Since the enumeration list is drawn using the geographic extent, it can help assess the population (by SES and demographic characteristics) an enumeration list represents. The availability of precise locations can also ensure contextualizing of the selected sample by socio-physical environments, such as the number of recreational facilities or fast food stores or schools or pharmacies within the identified spatial extent of a sample site. This, in turn, can attract researchers’ interest in these data from a wide variety of disciplines, including public health, economics, ecology and sociology. Although the reverse geocoding integrated with White Pages offers a unique and unprecedented opportunity to develop a location-based enumeration list, there are several limitations of this methodology. First, the enumeration list is developed based on location or geographic extent or a boundary. This means, locations or geographic units are needed for developing an enumeration list. There are two potential ways to address this: use pre-defined census or administrative units, or utilize a set of locations selected using a spatial sampling method. Extraction of an enumeration list can be computationally intensive for large geographic areas. This problem can be addressed by simulating random locations within the selected geographic extent and develop lists around the simulated locations, or utilize a spatial sampling method to draw a set of sample sites. The reliability of enumeration units developed using the proposed method is largely dictated by the quality and completeness of geo-referenced address dataset. If these data are not updated, the unavailability of new addresses or the availability of abundant addresses could pose the problems of under-coverage and over-coverage, respectively. Although 98% of the samples were valid residential addresses, a few were not valid, because some buildings (or addresses) were dismantled. In addition, there are certain limitations on the access to these publically available datasets, for example Google Earth allows extraction of 15,000 addresses/day and White Pages allows for two addresses/second and 200 addresses/day. Nonetheless, the commercial license to these datasets may help overcome these problems. The validation of results presented in this study applies to the Chicago MSA. It is likely that the quality and completeness of the georeferenced data used may vary geographically. The future research will be geared towards implementation and validation of this methodology of enumeration listing nationally. Intensifying competition is increasingly creating pressure among commercial firms to maintain a complete and reliable dataset and develop more advanced features, such as API services, to attract customers and businesses. A few years ago, Google Earth and Mapquest were the only publically available datasets of georeferenced addresses. In recent years, however, Microsoft and Yahoo! have also ventured in this area and are making available similar types of datasets. Since

Page 8: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

8

the lack of geo-referenced datasets on addresses is still problematic in many other countries, especially in developing countries, the application of the proposed method of listing can be of little use for these countries. Because of modern advances in inexpensive geo-spatial technologies it is likely that georeferenced address databases can be developed for these countries and the application of the proposed method can be extended to these countries. Acknowledgement: This work was supported by NSF Grant 0825588. Reference 1. ORNL. LandScanTM Global Population Database. 2008 [cited; Available from:

http://www.ornl.gov/landscan/. 2. Osborne, T. and N. Rose, Do the social sciences create phenomena?: the example of public

opinion research. British Journal of Sociology, 1999. 50(3): p. 367-396. 3. Frumkin, H., Healthy places: Exploring the evidence. American Journal of Public Health, 2003.

93(9): p. 1451-1456. 4. Caughy, M.O. and P.J. O'Campo, Neighborhood poverty, social capital, and the cognitive

development of African American preschoolers. Am J Community Psychol, 2006. 37(1-2): p. 141-54.

5. Chen, C., H.M. Gong, and R. Paaswell, Role of the built environment on mode choice decisions: additional evidence on the impact of density. Transportation, 2008. 35(3): p. 285-299.

6. Harter, R., et al., eds. Applied Sampling for Large-Scale Multi-Stage Area Probability Designs. Handbook of Survey Research, ed. P. Marsden and J. Wright. 2010, Emerald Group Publishing Limited: Bingley, UK.

7. Kumar, N., Spatial Sampling for a Demography and Health Survey. Population Research and Policy Review, 2007. 26(5-6): p. 581-99.

8. Kumar, N., An optimal spatial sampling design for intra-urban population exposure assessment. Atmospheric Environment, 2009. 43(5): p. 1153-1155.

9. U.S. Census. 2000. 10. Rushton, G., et al., Geocoding in cancer research - A review. American Journal of Preventive

Medicine, 2006. 30(2): p. S16-S24. 11. Curtis, A., J.W. Mills, and M. Leitner, Spatial confidentiality and GIS: re-engineering mortality

locations from published maps about Hurricane Katrina. International Journal of Health Geographics, 2006. 5(44): p. .

12. AccuZIP. AccuZIP6 5.0. 2010 [cited 2010 08/16/2010]; Available from: http://www.accuzip.com/.

13. Google, Google Map. 2010. 14. WhitePages, WhitePages.com. 2010. 15. Valassis, Valassis. 2010.

Page 9: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

Fig 1: Sample sites selected using four spatial sampling methods in urban Chicago, MSA

Page 10: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

Fig 2: A contrast between social survey and spatial social surveys

Spatial Social Survey

SocialSurvey

Defining and charactering

population

SamplingMethod

CharacteringSamplingDomain

PseudoSampling

Frame

SamplingFrame

A sample oflocations

Reverse Geocoding

Enumeration list of addresses at/around

sample locations

A sample ofIndividuals/Addresses/Households

A Sample ofAddresses/Buildings/

Spatial Social Survey

SocialSurvey

Defining and charactering

population

SamplingMethod

CharacteringSamplingDomain

PseudoSampling

Frame

SamplingFrame

A sample oflocations

Reverse Geocoding

Enumeration list of addresses at/around

sample locations

A sample ofIndividuals/Addresses/Households

A Sample ofAddresses/Buildings/

Spatial Social Survey

SocialSurvey

Defining and charactering

population

SamplingMethod

CharacteringSamplingDomain

PseudoSampling

Frame

SamplingFrame

A sample oflocations

Reverse Geocoding

Enumeration list of addresses at/around

sample locations

A sample ofIndividuals/Addresses/Households

A Sample ofAddresses/Buildings/

Page 11: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

Fig 3: Schematics of integrating reverse geocoding and white pages.

A sampleof geo-referenced

locations

Final sample of residentialaddresses around sample sites

Sampling Method

ContextualizedSpatial SamplingPseudo Frame Reverse Geocoding Application

Location Evaluation

Valid Street Addresses at/around

Sample Sites

List of residential addresses at/around

sample sites

Street Address Mining Application

Addresses validationand filtering

Publically availableaddress database

(white pages/yellow pages)

Invalid Location Location adjustment

A sampleof geo-referenced

locations

Final sample of residentialaddresses around sample sites

Sampling Method

ContextualizedSpatial SamplingPseudo Frame Reverse Geocoding Application

Location Evaluation

Valid Street Addresses at/around

Sample Sites

Reverse Geocoding Application

Location Evaluation

Valid Street Addresses at/around

Sample Sites

List of residential addresses at/around

sample sites

Street Address Mining Application

Addresses validationand filtering

Publically availableaddress database

(white pages/yellow pages)

Street Address Mining Application

Addresses validationand filtering

Publically availableaddress database

(white pages/yellow pages)

Invalid Location Location adjustment

Page 12: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

Fig 4: An example of residential address listing around a sample site.

Page 13: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

Fig 5: Spatial distribution of invalid addresses in Urban Chicago, MSA.

Page 14: Reverse Geocoding – A New Method of Enumeration Listingweb.ccs.miami.edu/~nkumar/Manuscripts/NK_Reverse_Geocoding.pdf1 Reverse Geocoding – A New Method of Enumeration Listing Naresh

a. Dismantled building: “468 Duane St, Glen Ellyn, IL, 60137 is valid address, but it represents a dismantled building.

b. Wrong nomenclature: The address above seems to be an apartment building, but it is listed as a house.

c. Address format: The address is “1 N640 State Route 59, West Chicago, IL, 60185”. In the Google maps the address is “1N640 Rte 59, West Chicago, IL 60185”. The WhitePage’s address is “1 N640 State Route 59 West Chicago, IL 60185”

d. Double listing: some addresses are listed in two towns, e.g. 43 Ickenham Ln, Elgin, IL 60124” and “43 Ickenham Ln, Campton Hills, IL 60124.

Fig 6: Some examples of invalid addresses.