why geolocating written routes is harder than it looks
TRANSCRIPT
Why Geolocating Written Routes is Harder than it Looks
Ian Turton
Department of GeographyPenn State University, University Park,
Acknowledgements
Research for this paper was funded by the National Geospatial-Intelligence Agency/NGA through the NGA University Research Initiative Program/NURI program. The views, opinions, and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the National Geospatial-Intelligence Agency or the U.S. Government.
Summary
Route mapping Why? How? Problems Fixes?
Aims
GeoCAM project aims to take written route descriptions and convert them to maps.
Extracting and Mapping Geographic Entities
A simple natural language processing task
All you need to do: Is find the named entities Determine if they are geographic
features Join them up to form a route Then draw a map with them on
BUT
Multiple Places
Berlin – any reasonable system chooses Germany (40 worldwide, 28 in US)
Springfield occurs in many states and many times per state
Maitri (flickr) CC-NC-SA
Strange Place Names
Look out for North East (both MD and PA).
Or North (VA, SC)
Or South (AL, KY) Not to be
confused with NE Irving St or Nebraska.
But…
If you thought towns were hard wait until you try streets.
213,142 Towns, 13,543,533 Streets in the USA
Place name Count Street name Count Township of Union 248 6210946 Township of Washington 246 Main St 12849 Midway 214 2nd St 8977 Fairview 207 1st St 8093 Township of Jackson 204 3rd St 8027 Oak Grove 164 4th St 6945 Township of Liberty 160 Oak St 6612 Five Points 149 Elm St 6104 Township of Jefferson 147 Pine St 6069 Township of Lincoln 147 5th St 5677
Springfield – 66, Berlin - 28
Common Nouns as Street Names
These are hard to distinguish from nouns at the start of the sentence.
Ambiguous Road Names
Turn from Independence into Washington.
Washington fought for independence.
Plus there are 246 Washington Twp
Interesting Road Names
Colorado has many “interesting” road names.
Consider also Street Rd which occurs in 9 different states.
Really Interesting Names!
Slworking2 Flickr – CC-NC-SA
Roads with Multiple Names
Many highways and interstates have two or more names.
I-99/US 220/Bud Shuster Hwy
Streets are no better
Photograph by Joe Mabel, licensed under GFDL
The Main Street issue
Awesome Joolie Flickr CC-SA -NC
Numbered Streets Is this 39th St? Or Thirty Ninth St? Or 39 St? All appear to be
equally acceptable when writing directions.
Bitchcakesny (Flickr) CC-NC-SA
Directions as seen in the wild
Directions to the State College Farmers’ Market
FROM EAST or WEST : Exit Rt 322 at College Ave.(also named RT 26; Benner Pike). Turn west onto College Ave and proceed 2 mi. to Locust Lane (2nd street to left after campus intersection of College Ave. & Shortlidge Rd (Garner St).
FROM NORTH or SOUTH (within town): Follow Atherton Street (business Rt. 322) to Beaver Ave. light. Turn east on Beaver Ave. and go thru 4 lights (6 blocks) to Locust lane.
Downtown State College, PA N Atherton
St S Atherton
St E College Av W College
Av PA26
Issues
College Av (Rt. 26 or Benner Pike) In the database as West (or East)
College Av Alternate name is PA 26 Benner Pike is actually PA150.
Atherton Street (business Rt. 322) In the database as North (or South)
Atherton St Mostly referred to as Atherton by locals
Directions to The Callan Theatre
From west (Adams Morgan, Georgetown)At the intersection of 16th Street and Irving Street, two blocks east of the heart of Adams Morgan, go east on Irving (right if you are coming from downtown). Cross Georgia Avenue, pass the Washington Hospital Center, go under North Capitol Street. Irving dead-ends on Michigan Avenue; turn left onto Michigan. The next light, which comes up quickly, is Harewood Road; turn left onto Harewood. The theater is half a mile up Harewood on the right.
Catholic University Neighbourhood
Irving St NW Irving St NE Michigan Av
NW Michigan Av NE
Issues
All references are missing the vital NE/NW
Need to distinguish between Michigan (Avenue) and the State of Michigan.
How to handle Street names?
We need to be able to look up shortened (and possibly partly wrong) street names in our database.
Could use SQL LIKE query Slow, difficult Programmatic adjustment of names
encountered (add N/S/E/W etc)
Stemming
In linguistic morphology, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.
Stemming Examples
So Dog and dogs (and doggedly) stem to
dog. Cat, cats and cattily stem to cat. fishing, fished, fish, and fisher stem to
fish. This allows a search engine to return
pages about dogs when you search for dog.
So Stemming for Street Names…
Washington Pl, Washington Ave and Washington St all become Washington.
North Atherton St, South Atherton St and Atherton Av become Atherton.
1St St, First Ave and 1 St N become 1. Interstate 80, I-80 and I 80w become
80. State Rd 26, PA 26 and county road
26 become 26.
Does this help?
This allows us to take (possibly wrong) and often shortened street names and look them up in the database.
(after we have worked through the 7.3 million named street segments in the USA)
(Caching is obviously our friend here)
6210946 2 31097 Main St 12849 main 30003 2nd St 8977 3 28614 1st St 8093 1 27938 3rd St 8027 4 25726 4th St 6945 5 23359 Oak St 6612 6 20369 Elm St 6104 park 19280 Pine St 6069 oak 17706 5th St 5677 7 17666 Church St 5662 8 16039 Maple St 5527 maple 15877 Walnut St 5041 pine 14703 6th St 4698 10 14260 Washington St 4104 9 14193 7th St 3927 elm 13163 N Main St 3535 washington 12290 Center St 3502 11 12219 River Rd 3493 cedar 12176 High St 3452 walnut 12107
Route Detection Algorithm
Find numeric strings in text Lookup zip codes and telephone
numbers Select Noun Phrases (NP)
JTextPro For each NP:
Is it a state? Is it a town (populated place)? Is it a point of interest (POI)? Is it a street name (after stemming)?
Algorithm (continued)
Select any NP that is unambiguous (Zzyzx Rd, Autumn Crocus Ct, Abell City)
Define a minimum bounding box (or polygon) based on the unambiguous points.
Sort the ambiguous NP based on number of matches (so prefer Gibsonia (3) over Midway (214)) then see if only one falls in the polygon if select that one.
Route Formation
Once you have a beginning and an end for the route attempt to determine the road segments that form the route.
Note: not all road segments are named nor do they all join correctly
Where possible determine turns from one road to another and use this to truncate highlighted sections of road
Pass the details to the mapping server and display
Further Work
Improve detection of streets, POI and places (linguistics). Take the X exit (X is probably a place) Turn left at X (X is probably a POI) Turn left on X (X is probably a street)
Improve route determination1. Fix up streets database – naming and
joins2. Imprecise routing
Given a set of POI which roads pass nearby?
Probably a graph problem?
Foreign Languages
Secret Pilgrim (Flickr) CC-NC-SA
Gullevek (flickr) cc-nc-sa