reverse geocoding and implicaons for geospaal privacy

26
Reverse geocoding and implica1ons for geospa1al privacy Paul Zandbergen Department of Geography University of New Mexico

Upload: others

Post on 03-Feb-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reverse geocoding and implicaons for geospaal privacy

Reversegeocodingandimplica1onsforgeospa1alprivacy

PaulZandbergenDepartmentofGeographyUniversityofNewMexico

Page 2: Reverse geocoding and implicaons for geospaal privacy

Outline

•  Geospatial privacy •  Geocoding / reverse geocoding •  Experimental design •  Results and Conclusion

Page 3: Reverse geocoding and implicaons for geospaal privacy

Geospa1alPrivacy

•  Underincreasingthreatfromnewtechnologies,highresolu1ondataandeasy‐to‐usetools

Page 4: Reverse geocoding and implicaons for geospaal privacy

How easy is it to hack this map? 

Page 5: Reverse geocoding and implicaons for geospaal privacy

Geocoding

•  Widelyemployed

•  Wellunderstood•  Substan1alerrors

Page 6: Reverse geocoding and implicaons for geospaal privacy

Toolofahacker:Reversegeocoding

•  Geocodinginreverse•  Rela1veeasy,rela1velynew•  Keytoolfor“hacking”publishedmaps•  Notwellunderstood

ID Address ZIP 101 123 Main St 12345 102 456 Central Ave 12346 … … …

Page 7: Reverse geocoding and implicaons for geospaal privacy

Howtoprotectspa1alconfiden1ality?

•  Geographicmasking

•  Buthowtodothismosteffec1vely?•  NeedbeRerunderstandingofreversegeocoding

randompointwithincircle

randompointoncircle

randomdistanceanddirec1on

donutmasking

donutmaskingwithexclusion

Page 8: Reverse geocoding and implicaons for geospaal privacy

ReviewoftheStateoftheArt

Panel of confidentiality issues arising from the integration of remotely sensed and self-identifying data

“No known technical strategy [….] for managing linked spatial-social data adequately resolves conflicts among the objectives of data linkage, open

access, data quality, and confidentiality protection across datasets and across

uses.” (Conclusion 3)

National Research Council, 2007

Page 9: Reverse geocoding and implicaons for geospaal privacy

ResearchQues1ons

What are the capabili5es of reverse geocoding to iden5fy individuals from published loca5ons? 

How does this vary with the methods employed for geocoding and reverse geocoding? 

How does this vary with popula5on density? 

Page 10: Reverse geocoding and implicaons for geospaal privacy

ExperimentalDesign

•  Actualbuildingloca1onswithknownaddresses–  TravisCounty,TX(Aus1n)–  Sampleof2,500residen1alloca1ons

–  Stra1fiedacross5popula1ondensityclasses•  Geocodeusing5differentgeocoders•  Reversegeocodeusing3differenttechniques•  Determineaccuracyofreversegeocoding

Page 11: Reverse geocoding and implicaons for geospaal privacy

AddressPoints

Page 12: Reverse geocoding and implicaons for geospaal privacy

Geocoders

•  TeleAtlasAddressPoints(commercial)•  TeleAtlasStreets(commercial)

•  GoogleMaps(freeAPI)

•  StreetMapUSAPro2007inArcGIS

•  Geoly1cs2007(usingTIGER2007data)

Page 13: Reverse geocoding and implicaons for geospaal privacy

StudyArea–TravisCounty,TX

Popula1onDensityZones Sampleofresiden1aladdresspoints

Page 14: Reverse geocoding and implicaons for geospaal privacy

ReverseGeocoding&Accuracy

•  Originalresiden1aladdresspoints–  Snaptonearestresiden1albuilding

•  TeleAtlasreversestreetgeocoding–  Submitforcommercialprocessing

•  GoogleMapsreversegeocoding–  FreeAPI

•  Accuracyofreversematches1.  Perfectmatch(streetnameandnumber)

2.  Closematch(numberwithin10)

3.  Samestreetonly

Page 15: Reverse geocoding and implicaons for geospaal privacy

Results–MatchRates(%)

GeocodingTechnique Popula1ondensity(people/km2)

<50 50to250 250to1000 1000to2500 >2500 Total

TeleAtlasAP 43.6 70.2 91.8 92.8 93.6 78.4

TeleAtlasStreet 93.2 92.8 98.0 99.8 96.8 96.1

GoogleMaps 92.2 95.6 98.2 99.0 96.2 96.2

StreetMapPro 81.2 83.6 95.8 99.0 95.8 91.1

Geoly1cs 77.0 80.2 92.4 96.2 90.2 87.2

Combined 31.4 59.0 86.0 89.6 86.8 70.6

Page 16: Reverse geocoding and implicaons for geospaal privacy

Results–SameStreetMatches

ReverseGeocodingTechnique

Aus1nAP GoogleMaps TeleAtlasStreet

GeocodingTechnique

Aus1nAP 100.0 96.7 90.1

TeleAtlasAP 99.5 92.0 89.6

GoogleMaps 99.2 32.0 90.5

TeleAtlasStreet 88.2 92.5 99.5

StreetMapPro 89.1 76.8 82.5

Geoly1cs 54.5 54.7 56.9

Page 17: Reverse geocoding and implicaons for geospaal privacy

Results–CloseReverseMatches

ReverseGeocodingTechnique

Aus1nAP GoogleMaps TeleAtlasStreet

GeocodingTechnique

Aus1nAP 100.0 95.5 23.1

TeleAtlasAP 99.1 91.8 24.3

GoogleMaps 98.8 28.6 24.3

TeleAtlasStreet 69.8 63.2 92.5

StreetMapPro 60.8 42.9 44.0

Geoly1cs 33.8 27.4 29.3

Page 18: Reverse geocoding and implicaons for geospaal privacy

Results–PerfectReverseMatches

ReverseGeocodingTechnique

Aus1nAP GoogleMaps TeleAtlasStreet

GeocodingTechnique

Aus1nAP 100.0 94.2 9.1

TeleAtlasAP 97.9 91.8 9.0

GoogleMaps 97.2 27.8 9.1

TeleAtlasStreet 18.7 9.4 55.0

StreetMapPro 17.0 7.3 8.2

Geoly1cs 7.1 2.3 2.6

Page 19: Reverse geocoding and implicaons for geospaal privacy

EffectofPopula1onDensity–PercentPerfectMatches

Geocoding Reverse Popula1ondensity(people/km2)

<50 50to250 250to1000 1000to2500 >2500

StreetMapPro Aus1nAP 22.9 16.9 15.3 16.1 17.3

Geoly1cs Aus1nAP 14.0 7.1 7.0 5.4 6.5

Aus1nAP TeleAtlasStreet 2.5 6.8 11.6 11.4 8.3

TeleAtlasStreet TeleAtlasStreet 51.6 63.7 56.7 53.8 49.8

GoogleMaps TeleAtlasStreet 4.5 6.4 12.3 11.2 7.4

TeleAtlasStreet GoogleMaps 9.6 9.8 8.8 9.6 9.4

Geoly1cs GoogleMaps 1.9 2.0 2.6 2.5 2.3

Page 20: Reverse geocoding and implicaons for geospaal privacy

ResultsSummary

•  Accuracyofreversegeocodingvariesgreatly•  Buildinglevel(reverse)geocodingistypicallymostaccurate•  Streetgeocodingisquitenoisy

–  Easytogettherightstreet–  Veryfewperfectmatches

•  Accuracyissubstan1allyimprovedifknowledgeoftheoriginalgeocodingtechniqueisavailable!

•  NoclearpaRernwithpopula1ondensity

Page 21: Reverse geocoding and implicaons for geospaal privacy

RoopopGeocodinginGoogleMapsandVirtualEarth

Page 22: Reverse geocoding and implicaons for geospaal privacy

Commercial Address Points - TeleAtlas

Source: TeleAtlas, 2009

Licensed to Google, Virtual Earth, ArcGIS Business Analyst, Pitney Bowes / Group 1 / MapInfo

51 million address points in the US

Page 23: Reverse geocoding and implicaons for geospaal privacy

ReverseGeocoding

ReversegeocodingnowsupportedinGoogleMapsandMicrosopVirtualEarth

AlsosupportedinlatestversionofArcGIS–requiressomecustomiza1onorArcWebservices

Numerousfreeeasy‐to‐useonlineu1li1es

Page 24: Reverse geocoding and implicaons for geospaal privacy

Conclusions

•  Accuracyofreversegeocoding–  Variesgreatlywithgeocoder/reversegeocodercombina1on

–  Between2%and98%perfectreversematches

•  Knowledgeoforiginalgeocodingmethodiscri1cal–  noisyresultsfromstreetgeocodingcanbereversecoded

•  Trends:–  Addresspointsarethenewstandardingeocoding–  Reversegeocodingisrela1velyeasy

•  Techniquestoprotectprivacymayneedtoassumeaworst‐casescenario:veryhighresolu1onaddressdata

Page 25: Reverse geocoding and implicaons for geospaal privacy

FutureResearch

•  Replicateinotherstudyareas•  Examineurban/ruralgradientsmoreclosely•  Experimentwithdifferentmaskingtechniques•  Developaframeworkforspa1alκ‐anonymity

Page 26: Reverse geocoding and implicaons for geospaal privacy

Acknowledgements

•  Na1onalScienceFounda1on•  UNMResearchAlloca1onCommiRee•  AmericanCivilLiber1esUnion