(almost) everything you ever wanted to know about geo (with woeids)
Post on 11-Sep-2014
73 views
DESCRIPTION
"(Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)"; presented on March 10th. 2010 at the London Twitter DevNest 7, at the Sun Customer Briefing Centre in London.TRANSCRIPT
London Twitter #devnest 7, March 2010
(Almost) Everything You Ever WantedTo Know About Geo (with WOEIDs)…
Gary Gale, Yahoo! Geo Technologies
the agenda
louisvolant on Flickr : http://www.flickr.com/photos/27048731@N03/4003756731/
3
the agenda• the hello
• the WOEIDs
• the WTF?
• the background
• the geocoding and the geoparsing
• the frustration
• the WOEIDs redux
• the APIs
• the demo
• the goodbye
4KELLYLEEBARRETT on Flickr : http://www.flickr.com/photos/kellylee/4177529745/
5Gary Gale on Flickr : http://www.flickr.com/photos/vicchi/4414198544/
WOEIDs
stevefaeembra on Flickr : http://www.flickr.com/photos/stevefaeembra/3567750853/
1258934244418
8David Armano on Flickr : http://www.flickr.com/photos/7855449@N02/3158864420/
some background
blakophoto on Flickr : http://www.flickr.com/photos/cleveralias/3158810304/
let’s talk about geocoding
inF! on Flickr : http://www.flickr.com/photos/nathanbarrow/3339245753/
geocoding is the process of finding associated geographic coordinates (often expressed as latitude and longitude) from other geographic data, such as street addresses, or zip codes (postal codes).
reverse geocoding is the process of back (reverse) coding of a point location (latitude, longitude) to a readable address or place name.
noway on Flickr : http://www.flickr.com/photos/noway/78606643/
what? where?
what? (maybe) where? (maybe)
this is not geocoding, this is geoparsing
szim90 on Flickr : http://www.flickr.com/photos/szim90/272670479/
geoparsing is the process of assigning geographic identifiers (e.g., codes or geographic coordinates expressed as latitude-longitude) to textual words and phrases that occur in unstructured content.
cheap flights from london to paris in october
20
“I’m sorry dave; I can’t find that place”
21Jamison Judd on Flickr : http://www.flickr.com/photos/jamisonjudd/2433102356/
web servers
22
51° 30' 50.0868", 0° 7' 42.8514"
163.1.117.210
20442/6015
#C5243B212
(125 Shaftesbury Avenue, London, UK)
(Oxford, UK)
(Brest, France)
(Wilmington, Delaware, USA)
23National Library NZ on The Commons on Flickr : http://www.flickr.com/photos/nationallibrarynz_commons/3326203787/
web surfers
24
The West End
Downtown
The Shops
The High Street
25
The Online WorldFormal, normalised, structured, regular
The Offline World
Informal, eccentric, bizarre, irregular
The Real World“We Are Here”
cheap flights from london to paris in october
London
Paris
1) Tokenize
2) Remove common words
3) Remove words not in gazetteer
“in”… India?
bodhitjal on Flickr : http://www.flickr.com/photos/bodhithaj/361857780/
“in”… Indiana?
OZinOH on Flickr : http://www.flickr.com/photos/75905404@N00/505688957/
“to”… Tonga?
j_buswell on Flickr : http://www.flickr.com/photos/j_buswell/3683814556/
language
Jovike on Flickr : http://www.flickr.com/photos/jvk/19894053/
Thé?a town in Burgundy, France
To?a town in Ibarakiprefecture, Japan
AND?ISO 31660-1 Alpha-3for Andorra
Å?a town in Norland Fylke,Norway
IN?ISO 3166-1 Alpha-2for India
Is?another town in Burgundy, France
IT?ISO 3166-1 Alpha-2 for Italy
You?a town in Yatenga, Burkina Faso
That?a town in Rajasthan, India
may cause frustration
paloaltosoftware on Flickr : http://www.flickr.com/photos/paloalto/3038701605/
disambiguation
Koen Vereeken on Flickr : http://www.flickr.com/photos/koenvereeken/2088902012/
this is peru …
and so is this (in argentina)
and so is this (in bolivia)
semantics required
dullhunk on Flickr : http://www.flickr.com/photos/dullhunk/3525013547/
Hilton, Paris Paris Hilton
London Jack London
Panama Panama Hats
who uses official names anyway?
takomabibelot on Flickr : http://www.flickr.com/photos/takomabibelot/234301712/
MOMA NYC
paula moya on Flickr : http://www.flickr.com/photos/40351463@N00/745012335/
Museum of Modern Art, New York
Millennium Wheel
hismith83 on Flickr : http://www.flickr.com/photos/hismith83/200701961/
London Eye
San Francisco
SF Brit on Flickr : http://www.flickr.com/photos/cnbattson/192162591/
City and County of San Francisco
WOEIDs (redux)
stevefaeembra on Flickr : http://www.flickr.com/photos/stevefaeembra/3567750853/
1258934244418
51° 30' 50.0868", 0° 7' 42.8514"
Unique
Permanent
Global
Language Neutral•London = Londra = Londres = ロンドン•United States = États-Unis = Stati Uniti = 미국
Ensures that geography can be employed consistently and globally
straup on Flickr : http://www.flickr.com/photos/straup/3504862388/
GeoPlanetA Global Location Repository
Names + Geometry +TopologyWOEIDs for
• cities and towns• postal codes, airports
• admin regions, time zones• telephone code areas
• marketing areas• points of interest• colloquial areas• neighbourhoods
woodleywonderworks on Flickr : http://www.flickr.com/photos/wwworks/2222523978/
Continents
Countries
Counties
Regions
Colloquials
Targeting Zones
Postal Codes
Area Codes
Boroughs
Neighbourhoods
POIs
Stratford-upon-Avon
36424
CV3726787646
Stratford-on-Avon12696101
Warwickshire12602190
England24554868
United Kingdom23424975
Earth1
Vereinigtes Königreich
Royaume Uni
イギリス
Europe24865675
Great Britain28298150
Worcestershire12602192
Warwick39228
Supername
Country
Country
County
District
Town
ZIP
Continent
Colloquial
County
Town
http://engineering.twitter.com/2010/02/woeids-in-twitters-trends.html
http://isithackday.com/hacks/placemaker/tweet-locations.php
http://wherein.yahooapis.com/v1/document
unlock your api
sam.d on Flickr : http://www.flickr.com/photos/samd/65693717/
https://developer.apps.yahoo.com/wsregapp/
Placemaker Parameters
appid• 100% mandatory
inputLanguage• en-US, fr-CA, …
outputType• XML or RSS
documentContent• text to geoparse
documentTitle• optional title
documentURL• URL to geoparse
documentType• MIME type of doc
autoDisambiguate• remove duplicates
focusWoeid• filter around a WOEID
// POST to Placemaker
$ch = curl_init();
define('POSTURL', 'http://wherein.yahooapis.com/v1/document');define('POSTVARS', 'appid='.$key.'&documentContent='.urlencode($content).
'&documentType=text/plain&outputType=xml'.$lang); $ch = curl_init(POSTURL);curl_setopt($ch, CURLOPT_POST, 1);curl_setopt($ch, CURLOPT_POSTFIELDS, POSTVARS);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $placemaker = curl_exec($ch);curl_close($ch);
places
that_james on Flickr : http://www.flickr.com/photos/that_james/496797309/
<placeDetails><place><woeId>44418</woeId><type>Town</type><name><![CDATA[London, England, GB]]></name><centroid><latitude>51.5063</latitude><longitude>-0.12714</longitude></centroid></place><matchType>0</matchType><weight>1</weight><confidence>10</confidence></placeDetails>
One place for WOEID 44418
references
misterbisson on Flickr : http://www.flickr.com/photos/maisonbisson/117720946/
<reference><woeIds>44418</woeIds><start>1079</start><end>1089</end><isPlaintextMarker>1</isPlaintextMarker><text><![CDATA[London, UK]]></text><type>plaintext</type><xpath><![CDATA[]]></xpath></reference><reference><woeIds>44418</woeIds><start>1116</start><end>1126</end><isPlaintextMarker>1</isPlaintextMarker><text><![CDATA[London, UK]]></text><type>plaintext</type><xpath><![CDATA[]]></xpath></reference>
Two references for WOEID 44418
Two references for WOEID 44418
// turn into an PHP object and loop over the results
$places = simplexml_load_string($placemaker, 'SimpleXMLElement',
LIBXML_NOCDATA); if($places->document->placeDetails){
$foundplaces = array();
// create a hashmap of the places found to mix with// the references found
foreach($places->document->placeDetails as $p){$wkey = 'woeid'.$p->place->woeId;$foundplaces[$wkey]=array(
'name'=>str_replace(', ZZ','',$p->place->name).'', 'type'=>$p->place->type.'', 'woeId'=>$p->place->woeId.'', 'lat'=>$p->place->centroid->latitude.'', 'lon'=>$p->place->centroid->longitude.'’
);}
}
// loop over references and filter out duplicates
$refs = $places->document->referenceList->reference;$usedwoeids = array();foreach($refs as $r){
foreach($r->woeIds as $wi){if(in_array($wi,$usedwoeids)){
continue;} else {
$usedwoeids[] = $wi.'';}$currentloc = $foundplaces["woeid".$wi];if($r->text!='' && $currentloc['name']!='' &&
$currentloc['lat']!='' && $currentloc['lon']!=''){
$text = preg_replace('/\s+/',' ',$r->text);$name = addslashes(str_replace(', ZZ’,
$currentloc['name']));$desc = addslashes($text);$lat = $currentloc['lat'];$lon = $currentloc['lon'];$class = stripslashes($desc)."|$name|$lat|$lon";$placelist.= "<li>".
}}
http://www.vicchi.org/speaking
the internet is broken
Nesster on Flickr : http://www.flickr.com/photos/nesster/3168425434/
// load the URL, using YQL to filter the HTML// and fix UTF-8 nasties
$url = 'http://www.vicchi.org/speaking';
$realurl = 'http://query.yahooapis.com/v1/public/yql’.'?q=select%20*%20'.'from%20html%20where%20url%20%3D
%20%22'.urlencode($url).'%22&format=xml';
$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $realurl); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $c = curl_exec($ch); curl_close($ch);if(strstr($c,'<')){
$c = preg_replace("/.*<results>|<\/results>.*/",'',$c);$c = preg_replace("/<\?xml version=\"1\.0\"".
" encoding=\"UTF-8\"\?>/",'',$c);$c = strip_tags($c);$c = preg_replace("/[\r?\n]+/"," ",$c);
}
minor annoyances
swooshthesnail on Flickr : http://www.flickr.com/photos/swooshthesnail/3281681399/
50,000 bytes
ASurroca on Flickr : http://www.flickr.com/photos/asurroca/147049402/
no json
X
post not get
sludgegulper on Flickr : http://www.flickr.com/photos/sludgeulper/2645478209/
http://where.yahooapis.com/v1/
collections
bradman334 on Flickr : http://www.flickr.com/photos/bradman334/3402569690/
74
collections
• lists of related resources, such as places
• e.g. find all places called “london”
http://where.yahooapis.com/v1/places.q('london');count=0?appid=[your id]
• e.g. find the most likely place called “london”
http://where.yahooapis.com/v1/places.q('london’)?appid=[your id]
<places xmlns="http://where.yahooapis.com/v1/schema.rng" xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:start="0" yahoo:count="1" yahoo:total="22"><place yahoo:uri="http://where.yahooapis.com/v1/place/44418" xml:lang="en-us"><woeid>44418</woeid><placeTypeName code="7">Town</placeTypeName><name>London</name><country type="Country" code="GB">United Kingdom</country><admin1 type="Country" code="GB-ENG">England</admin1><admin2 type="County" code="">Greater London</admin2><admin3></admin3><locality1 type="Town">London</locality1><locality2></locality2><postal></postal><centroid><latitude>51.506321</latitude><longitude>-0.127140</longitude></centroid><boundingBox><southWest><latitude>51.261318</latitude><longitude>-0.563000</longitude></southWest><northEast><latitude>51.686031</latitude><longitude>0.280360</longitude></northEast></boundingBox></place></places>
resources
joshuarichards on Flickr : http://www.flickr.com/photos/joshywoshywoo/124671979/
77
resources
• unique objects that contain multiple attributes, such as a place
• e.g. get attributes for WOEID 44418
http://where.yahooapis.com/v1/place/44418?appid=[your id]
• e.g. find the most likely place called “london”
http://where.yahooapis.com/v1/places.q('london’)?appid=[your id]
78
resources
• unique objects that contain multiple attributes, such as a place
• e.g. get places related to WOEID 44418
http://where.yahooapis.com/v1/place/44418/relation?appid=[your id]
• parent, ancestors, belongsto, neighbours, siblings, children
<?xml version="1.0" encoding="UTF-8"?><places xmlns="http://where.yahooapis.com/v1/schema.rng" xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:start="0" yahoo:count="10" yahoo:total="34"><place yahoo:uri="http://where.yahooapis.com/v1/place/12695806" xml:lang="en-us"><woeid>12695806</woeid><placeTypeName code="10">Local Administrative Area</placeTypeName><name>City of London</name></place><place yahoo:uri="http://where.yahooapis.com/v1/place/12695807" xml:lang="en-us"><woeid>12695807</woeid><placeTypeName code="10">Local Administrative Area</placeTypeName><name>London Borough of Camden</name></place><place yahoo:uri="http://where.yahooapis.com/v1/place/12695808" xml:lang="en-us"><woeid>12695808</woeid><placeTypeName code="10">Local Administrative Area</placeTypeName><name>London Borough of Hackney</name></place>…</places>
Far more than you could ever wanthttp://delicious.com/codepo8/geotoys
never work with children, animals or live demos
elephipelephi on Flickr : http://www.flickr.com/photos/elephipelephi/1493013250/
not taking notes?
selva on Flickr : http://www.flickr.com/photos/selva/24604141/
London Twitter #devnest 7, March 2010
(Almost) Everything You Ever WantedTo Know About Geo (with WOEIDs)…
Gary Gale, Yahoo! Geo Technologies
http://slideshare.net/vicchi
thanks for listening
Paul Keleher on Flickr : http://www.flickr.com/photos/pkeleher/1658311814/
www.ygeoblog.com
twitter.com/vicchi
twitter.com/yahoogeo