(almost) everything you ever wanted to know about geo (with woeids)

85
London Twitter #devnest 7, March 2010 (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs) Gary Gale, Yahoo! Geo Technologies

Post on 11-Sep-2014

73 views

Category:

Technology


2 download

DESCRIPTION

"(Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)"; presented on March 10th. 2010 at the London Twitter DevNest 7, at the Sun Customer Briefing Centre in London.

TRANSCRIPT

Page 1: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

London Twitter #devnest 7, March 2010

(Almost) Everything You Ever WantedTo Know About Geo (with WOEIDs)…

Gary Gale, Yahoo! Geo Technologies

Page 2: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

the agenda

louisvolant on Flickr : http://www.flickr.com/photos/27048731@N03/4003756731/

Page 3: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

3

the agenda• the hello

• the WOEIDs

• the WTF?

• the background

• the geocoding and the geoparsing

• the frustration

• the WOEIDs redux

• the APIs

• the demo

• the goodbye

Page 4: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

4KELLYLEEBARRETT on Flickr : http://www.flickr.com/photos/kellylee/4177529745/

Page 5: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

5Gary Gale on Flickr : http://www.flickr.com/photos/vicchi/4414198544/

Page 6: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

WOEIDs

stevefaeembra on Flickr : http://www.flickr.com/photos/stevefaeembra/3567750853/

Page 7: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

1258934244418

Page 8: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

8David Armano on Flickr : http://www.flickr.com/photos/7855449@N02/3158864420/

Page 9: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

some background

blakophoto on Flickr : http://www.flickr.com/photos/cleveralias/3158810304/

Page 10: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

let’s talk about geocoding

inF! on Flickr : http://www.flickr.com/photos/nathanbarrow/3339245753/

Page 11: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

geocoding is the process of finding associated geographic coordinates (often expressed as latitude and longitude) from other geographic data, such as street addresses, or zip codes (postal codes).

Page 12: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

reverse geocoding is the process of back (reverse) coding of a point location (latitude, longitude) to a readable address or place name.

Page 13: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

noway on Flickr : http://www.flickr.com/photos/noway/78606643/

Page 14: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

what? where?

Page 15: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)
Page 16: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

what? (maybe) where? (maybe)

Page 17: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

this is not geocoding, this is geoparsing

szim90 on Flickr : http://www.flickr.com/photos/szim90/272670479/

Page 18: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

geoparsing is the process of assigning geographic identifiers (e.g., codes or geographic coordinates expressed as latitude-longitude) to textual words and phrases that occur in unstructured content.

Page 19: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

cheap flights from london to paris in october

Page 20: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

20

“I’m sorry dave; I can’t find that place”

Page 21: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

21Jamison Judd on Flickr : http://www.flickr.com/photos/jamisonjudd/2433102356/

web servers

Page 22: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

22

51° 30' 50.0868", 0° 7' 42.8514"

163.1.117.210

20442/6015

#C5243B212

(125 Shaftesbury Avenue, London, UK)

(Oxford, UK)

(Brest, France)

(Wilmington, Delaware, USA)

Page 23: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

23National Library NZ on The Commons on Flickr : http://www.flickr.com/photos/nationallibrarynz_commons/3326203787/

web surfers

Page 24: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

24

The West End

Downtown

The Shops

The High Street

Page 25: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

25

The Online WorldFormal, normalised, structured, regular

The Offline World

Informal, eccentric, bizarre, irregular

The Real World“We Are Here”

Page 26: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

cheap flights from london to paris in october

London

Paris

1) Tokenize

2) Remove common words

3) Remove words not in gazetteer

Page 27: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

“in”… India?

bodhitjal on Flickr : http://www.flickr.com/photos/bodhithaj/361857780/

Page 28: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

“in”… Indiana?

OZinOH on Flickr : http://www.flickr.com/photos/75905404@N00/505688957/

Page 29: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

“to”… Tonga?

j_buswell on Flickr : http://www.flickr.com/photos/j_buswell/3683814556/

Page 30: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

language

Jovike on Flickr : http://www.flickr.com/photos/jvk/19894053/

Page 31: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

Thé?a town in Burgundy, France

To?a town in Ibarakiprefecture, Japan

AND?ISO 31660-1 Alpha-3for Andorra

Å?a town in Norland Fylke,Norway

IN?ISO 3166-1 Alpha-2for India

Is?another town in Burgundy, France

IT?ISO 3166-1 Alpha-2 for Italy

You?a town in Yatenga, Burkina Faso

That?a town in Rajasthan, India

Page 32: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

may cause frustration

paloaltosoftware on Flickr : http://www.flickr.com/photos/paloalto/3038701605/

Page 33: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

disambiguation

Koen Vereeken on Flickr : http://www.flickr.com/photos/koenvereeken/2088902012/

Page 34: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

this is peru …

Page 35: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

and so is this (in argentina)

Page 36: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

and so is this (in bolivia)

Page 37: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

semantics required

dullhunk on Flickr : http://www.flickr.com/photos/dullhunk/3525013547/

Page 38: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

Hilton, Paris Paris Hilton

Page 39: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

London Jack London

Page 40: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

Panama Panama Hats

Page 41: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

who uses official names anyway?

takomabibelot on Flickr : http://www.flickr.com/photos/takomabibelot/234301712/

Page 42: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

MOMA NYC

paula moya on Flickr : http://www.flickr.com/photos/40351463@N00/745012335/

Museum of Modern Art, New York

Page 43: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

Millennium Wheel

hismith83 on Flickr : http://www.flickr.com/photos/hismith83/200701961/

London Eye

Page 44: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

San Francisco

SF Brit on Flickr : http://www.flickr.com/photos/cnbattson/192162591/

City and County of San Francisco

Page 45: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

WOEIDs (redux)

stevefaeembra on Flickr : http://www.flickr.com/photos/stevefaeembra/3567750853/

Page 46: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

1258934244418

Page 47: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

51° 30' 50.0868", 0° 7' 42.8514"

Page 48: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

Unique

Permanent

Global

Language Neutral•London = Londra = Londres = ロンドン•United States = États-Unis = Stati Uniti = 미국

Ensures that geography can be employed consistently and globally

straup on Flickr : http://www.flickr.com/photos/straup/3504862388/

Page 49: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

GeoPlanetA Global Location Repository

Names + Geometry +TopologyWOEIDs for

• cities and towns• postal codes, airports

• admin regions, time zones• telephone code areas

• marketing areas• points of interest• colloquial areas• neighbourhoods

woodleywonderworks on Flickr : http://www.flickr.com/photos/wwworks/2222523978/

Page 50: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

Continents

Countries

Counties

Regions

Colloquials

Targeting Zones

Postal Codes

Area Codes

Boroughs

Neighbourhoods

POIs

Page 51: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

Stratford-upon-Avon

36424

CV3726787646

Stratford-on-Avon12696101

Warwickshire12602190

England24554868

United Kingdom23424975

Earth1

Vereinigtes Königreich

Royaume Uni

イギリス

Europe24865675

Great Britain28298150

Worcestershire12602192

Warwick39228

Supername

Country

Country

County

District

Town

ZIP

Continent

Colloquial

County

Town

Page 52: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

http://engineering.twitter.com/2010/02/woeids-in-twitters-trends.html

Page 53: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

http://isithackday.com/hacks/placemaker/tweet-locations.php

Page 54: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

http://wherein.yahooapis.com/v1/document

Page 55: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

unlock your api

sam.d on Flickr : http://www.flickr.com/photos/samd/65693717/

https://developer.apps.yahoo.com/wsregapp/

Page 56: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

Placemaker Parameters

appid• 100% mandatory

inputLanguage• en-US, fr-CA, …

outputType• XML or RSS

documentContent• text to geoparse

documentTitle• optional title

documentURL• URL to geoparse

documentType• MIME type of doc

autoDisambiguate• remove duplicates

focusWoeid• filter around a WOEID

Page 57: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

// POST to Placemaker

$ch = curl_init();

define('POSTURL', 'http://wherein.yahooapis.com/v1/document');define('POSTVARS', 'appid='.$key.'&documentContent='.urlencode($content).

'&documentType=text/plain&outputType=xml'.$lang); $ch = curl_init(POSTURL);curl_setopt($ch, CURLOPT_POST, 1);curl_setopt($ch, CURLOPT_POSTFIELDS, POSTVARS);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $placemaker = curl_exec($ch);curl_close($ch);

Page 58: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

places

that_james on Flickr : http://www.flickr.com/photos/that_james/496797309/

Page 59: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

<placeDetails><place><woeId>44418</woeId><type>Town</type><name><![CDATA[London, England, GB]]></name><centroid><latitude>51.5063</latitude><longitude>-0.12714</longitude></centroid></place><matchType>0</matchType><weight>1</weight><confidence>10</confidence></placeDetails>

One place for WOEID 44418

Page 60: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

references

misterbisson on Flickr : http://www.flickr.com/photos/maisonbisson/117720946/

Page 61: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

<reference><woeIds>44418</woeIds><start>1079</start><end>1089</end><isPlaintextMarker>1</isPlaintextMarker><text><![CDATA[London, UK]]></text><type>plaintext</type><xpath><![CDATA[]]></xpath></reference><reference><woeIds>44418</woeIds><start>1116</start><end>1126</end><isPlaintextMarker>1</isPlaintextMarker><text><![CDATA[London, UK]]></text><type>plaintext</type><xpath><![CDATA[]]></xpath></reference>

Two references for WOEID 44418

Two references for WOEID 44418

Page 62: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

// turn into an PHP object and loop over the results

$places = simplexml_load_string($placemaker, 'SimpleXMLElement',

LIBXML_NOCDATA); if($places->document->placeDetails){

$foundplaces = array();

// create a hashmap of the places found to mix with// the references found

foreach($places->document->placeDetails as $p){$wkey = 'woeid'.$p->place->woeId;$foundplaces[$wkey]=array(

'name'=>str_replace(', ZZ','',$p->place->name).'', 'type'=>$p->place->type.'', 'woeId'=>$p->place->woeId.'', 'lat'=>$p->place->centroid->latitude.'', 'lon'=>$p->place->centroid->longitude.'’

);}

}

Page 63: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

// loop over references and filter out duplicates

$refs = $places->document->referenceList->reference;$usedwoeids = array();foreach($refs as $r){

foreach($r->woeIds as $wi){if(in_array($wi,$usedwoeids)){

continue;} else {

$usedwoeids[] = $wi.'';}$currentloc = $foundplaces["woeid".$wi];if($r->text!='' && $currentloc['name']!='' &&

$currentloc['lat']!='' && $currentloc['lon']!=''){

$text = preg_replace('/\s+/',' ',$r->text);$name = addslashes(str_replace(', ZZ’,

$currentloc['name']));$desc = addslashes($text);$lat = $currentloc['lat'];$lon = $currentloc['lon'];$class = stripslashes($desc)."|$name|$lat|$lon";$placelist.= "<li>".

}}

Page 64: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

http://www.vicchi.org/speaking

Page 65: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)
Page 66: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

the internet is broken

Nesster on Flickr : http://www.flickr.com/photos/nesster/3168425434/

Page 67: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

// load the URL, using YQL to filter the HTML// and fix UTF-8 nasties

$url = 'http://www.vicchi.org/speaking';

$realurl = 'http://query.yahooapis.com/v1/public/yql’.'?q=select%20*%20'.'from%20html%20where%20url%20%3D

%20%22'.urlencode($url).'%22&format=xml';

$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $realurl); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $c = curl_exec($ch); curl_close($ch);if(strstr($c,'<')){

$c = preg_replace("/.*<results>|<\/results>.*/",'',$c);$c = preg_replace("/<\?xml version=\"1\.0\"".

" encoding=\"UTF-8\"\?>/",'',$c);$c = strip_tags($c);$c = preg_replace("/[\r?\n]+/"," ",$c);

}

Page 68: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

minor annoyances

swooshthesnail on Flickr : http://www.flickr.com/photos/swooshthesnail/3281681399/

Page 69: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

50,000 bytes

ASurroca on Flickr : http://www.flickr.com/photos/asurroca/147049402/

Page 70: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

no json

X

Page 71: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

post not get

sludgegulper on Flickr : http://www.flickr.com/photos/sludgeulper/2645478209/

Page 72: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

http://where.yahooapis.com/v1/

Page 73: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

collections

bradman334 on Flickr : http://www.flickr.com/photos/bradman334/3402569690/

Page 74: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

74

collections

• lists of related resources, such as places

• e.g. find all places called “london”

http://where.yahooapis.com/v1/places.q('london');count=0?appid=[your id]

• e.g. find the most likely place called “london”

http://where.yahooapis.com/v1/places.q('london’)?appid=[your id]

Page 75: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

<places xmlns="http://where.yahooapis.com/v1/schema.rng" xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:start="0" yahoo:count="1" yahoo:total="22"><place yahoo:uri="http://where.yahooapis.com/v1/place/44418" xml:lang="en-us"><woeid>44418</woeid><placeTypeName code="7">Town</placeTypeName><name>London</name><country type="Country" code="GB">United Kingdom</country><admin1 type="Country" code="GB-ENG">England</admin1><admin2 type="County" code="">Greater London</admin2><admin3></admin3><locality1 type="Town">London</locality1><locality2></locality2><postal></postal><centroid><latitude>51.506321</latitude><longitude>-0.127140</longitude></centroid><boundingBox><southWest><latitude>51.261318</latitude><longitude>-0.563000</longitude></southWest><northEast><latitude>51.686031</latitude><longitude>0.280360</longitude></northEast></boundingBox></place></places>

Page 76: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

resources

joshuarichards on Flickr : http://www.flickr.com/photos/joshywoshywoo/124671979/

Page 77: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

77

resources

• unique objects that contain multiple attributes, such as a place

• e.g. get attributes for WOEID 44418

http://where.yahooapis.com/v1/place/44418?appid=[your id]

• e.g. find the most likely place called “london”

http://where.yahooapis.com/v1/places.q('london’)?appid=[your id]

Page 78: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

78

resources

• unique objects that contain multiple attributes, such as a place

• e.g. get places related to WOEID 44418

http://where.yahooapis.com/v1/place/44418/relation?appid=[your id]

• parent, ancestors, belongsto, neighbours, siblings, children

Page 79: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

<?xml version="1.0" encoding="UTF-8"?><places xmlns="http://where.yahooapis.com/v1/schema.rng" xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:start="0" yahoo:count="10" yahoo:total="34"><place yahoo:uri="http://where.yahooapis.com/v1/place/12695806" xml:lang="en-us"><woeid>12695806</woeid><placeTypeName code="10">Local Administrative Area</placeTypeName><name>City of London</name></place><place yahoo:uri="http://where.yahooapis.com/v1/place/12695807" xml:lang="en-us"><woeid>12695807</woeid><placeTypeName code="10">Local Administrative Area</placeTypeName><name>London Borough of Camden</name></place><place yahoo:uri="http://where.yahooapis.com/v1/place/12695808" xml:lang="en-us"><woeid>12695808</woeid><placeTypeName code="10">Local Administrative Area</placeTypeName><name>London Borough of Hackney</name></place>…</places>

Page 80: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

Far more than you could ever wanthttp://delicious.com/codepo8/geotoys

Page 81: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

never work with children, animals or live demos

elephipelephi on Flickr : http://www.flickr.com/photos/elephipelephi/1493013250/

Page 82: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

not taking notes?

selva on Flickr : http://www.flickr.com/photos/selva/24604141/

Page 83: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

London Twitter #devnest 7, March 2010

(Almost) Everything You Ever WantedTo Know About Geo (with WOEIDs)…

Gary Gale, Yahoo! Geo Technologies

http://slideshare.net/vicchi

Page 84: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

thanks for listening

Paul Keleher on Flickr : http://www.flickr.com/photos/pkeleher/1658311814/

Page 85: (Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

www.ygeoblog.com

twitter.com/vicchi

twitter.com/yahoogeo