geocoding overview
TRANSCRIPT
OpenCage FOSSGIS 2015
Overview
I. place name disambiguation (homonyms)– with & without spellcheck
II. Nominatim
III. other (open data) geocoders
– 2015 trends– opportunities to share data, config, tests
IV. shared ranking/scoring data
OpenCage FOSSGIS 2015
“eiffel tower”
=> one in Paris
=> replicas around the world
=> restaurants around the world
OpenCage FOSSGIS 2015
Nominatim
● OSM data, minutely updates● + UK postal codes, TIGER● 1TB PostGIS● import in C, setup scripts in PHP, Postgres stored
procedures, PHP frontend, Python&PHP test suite● autocomplete if you add Photon geocoder● no spellcheck
OpenCage FOSSGIS 2015
other geocoders
Closed source Open source, high resources Open source, low resources
Google Maps Mapzen “Pelias” OpenStreetMap “Nominatim”
Bing/Yahoo Mapbox “Carmen” OpenCage (multiple)
Mapquest Mapquest open (Nominatim) geonames
ESRI/ArcGIS Online Foursquare “Quattroshapes” geocod.io (Tiger data)
Baidu Scout Photon (Nominatim)
Yandex Cloudmade geo.io (Nominatim)
TomTom DSTK (Tiger, geonames)
Amazon (Android only) SmartyStreets
Telenav ...
Nokia/Ovi/Here
Apple (iOS only)
...
OpenCage FOSSGIS 2015
trends
● SSD● Add commercial sources● Full builds, downloadable index● High parallel (map/reduce, nodejs), cloud scaling,
noSQL● Community building, guidelines● Test suites
OpenCage FOSSGIS 2015
typical features to improve
● horizontal scaling● autocomplete● spellcheck● improve text parsing (App 3, 111-113b)● crossings (Main & 2nd N, New Orleans)● “4km north of $cityname on the N6”● tests for non-latin alphabets● postal code boundaries● localsearch/POIs
OpenCage FOSSGIS 2015
what should be shared
● aka. don't reinvent everything● standard test suite to compare geocoders● hierarchy data● address parsing● address formatting● language configuration● data parsing, e.g. OSM tags
OpenCage FOSSGIS 2015
openaddresses.io
● 110m addresses● 10GB of text files
1174 SMITH CREEK WAY, BRASSFIELD, WAKE FOREST, NC 27587
732 STEWARTS ROAD, LANEXA, VA 23124
OpenCage FOSSGIS 2015
address formatting
https://github.com/lokku/address-formatting/
– configuration– test cases for 33 countries– reference implementation in Perl
{ country_code: 'dk', village: 'Ærøskøbing', county: 'ÆrøMunicipality', house_number: '17A', neighbourhood: 'Paradiset',postcode: '5970', road: 'Baggårde', state: 'Region of Southern Denmark'}
Baggårde 17A, 5970 Ærøskøbing, Denmark
Adama Asnyka 1, 59-700 Bolesławiec, Poland
CAI, Cerrito 1250, Retiro, C1010AAZ Buenos Aires, Argentina
OpenCage FOSSGIS 2015
core geocoding logic1. tokenize
2. filter
• fixed bounding box, browser window, country• OSM tags/POI search• min-max admin
3. search
4. rank
• country bias• language bias (client, explicit)• location boost (client, explicit, history)• maybe: spellcheck• maybe: retry/failover/remove phrases• importance boost
OpenCage FOSSGIS 2015
map to hierachy (ranks)
http://wiki.openstreetmap.org/wiki/Nominatim/Development_overview
OpenCage FOSSGIS 2015
name is one of many factorsranking examples:
● Altona– type: suburb vs train station vs town ins US/Canada
● Germany– admin_level=2 (country) vs island
● Mt everest– importance: viewpoint vs peak vs island
● Oktoberfest– actually a alt_name of Theresienwiese
● Königsberg– 10x a peak, 1x old_name of Kaliningrad
● Hitlerberg– old_name:1934-1945 of Heigelkopf
OpenCage FOSSGIS 2015
status on wikipedia_articles.bin● version 1: wikipedia pageview logs
– https://en.wikipedia.org/wiki/Wikipedia:Notability
● version 2 (current): parsing wikipedia articles and count links
– last updated 2013– 80m wikipedia entries + 15m redirects– 0.6m places in OSM have wikipedia tag set (2013: 0.4m)
● Version 3 (TBD): parsing wikipedia geo exports
– http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Hauptseite/Wikipedia-World/en
– 3.4m entries, more languages, regular dumps, new documentaton
● version 4 (?)
- used wikidata exports
- used by multiple geocoders
OpenCage FOSSGIS 2015
what can mappers do?● add wikipedia tags● fix administrative levels● don't add wrong names (typos)● file bugs (github)
http://nominatim.openstreetmap.org/