why geolocating written routes is harder than it looks

Post on 02-Jun-2015

382 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Why Geolocating Written Routes is Harder than it Looks

Ian Turton

Department of GeographyPenn State University, University Park,

PAijt1@psu.edu

Acknowledgements

Research for this paper was funded by the National Geospatial-Intelligence Agency/NGA through the NGA University Research Initiative Program/NURI program. The views, opinions, and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the National Geospatial-Intelligence Agency or the U.S. Government.

Summary

Route mapping Why? How? Problems Fixes?

Aims

GeoCAM project aims to take written route descriptions and convert them to maps.

Extracting and Mapping Geographic Entities

A simple natural language processing task

All you need to do: Is find the named entities Determine if they are geographic

features Join them up to form a route Then draw a map with them on

BUT

Multiple Places

Berlin – any reasonable system chooses Germany (40 worldwide, 28 in US)

Springfield occurs in many states and many times per state

Maitri (flickr) CC-NC-SA

Strange Place Names

Look out for North East (both MD and PA).

Or North (VA, SC)

Or South (AL, KY) Not to be

confused with NE Irving St or Nebraska.

But…

If you thought towns were hard wait until you try streets.

213,142 Towns, 13,543,533 Streets in the USA

Place name Count Street name Count Township of Union 248 6210946 Township of Washington 246 Main St 12849 Midway 214 2nd St 8977 Fairview 207 1st St 8093 Township of Jackson 204 3rd St 8027 Oak Grove 164 4th St 6945 Township of Liberty 160 Oak St 6612 Five Points 149 Elm St 6104 Township of Jefferson 147 Pine St 6069 Township of Lincoln 147 5th St 5677

Springfield – 66, Berlin - 28

Common Nouns as Street Names

These are hard to distinguish from nouns at the start of the sentence.

Ambiguous Road Names

Turn from Independence into Washington.

Washington fought for independence.

Plus there are 246 Washington Twp

Interesting Road Names

Colorado has many “interesting” road names.

Consider also Street Rd which occurs in 9 different states.

Really Interesting Names!

Slworking2 Flickr – CC-NC-SA

Roads with Multiple Names

Many highways and interstates have two or more names.

I-99/US 220/Bud Shuster Hwy

Streets are no better

Photograph by Joe Mabel, licensed under GFDL

Numbered Streets Is this 39th St? Or Thirty Ninth St? Or 39 St? All appear to be

equally acceptable when writing directions.

Bitchcakesny (Flickr) CC-NC-SA

Directions as seen in the wild

Directions to the State College Farmers’ Market

FROM  EAST or WEST : Exit  Rt 322 at College Ave.(also named  RT 26; Benner Pike). Turn west onto College Ave and proceed  2 mi. to Locust Lane (2nd street to left after campus intersection of College Ave. & Shortlidge Rd (Garner St).

FROM  NORTH or SOUTH (within town): Follow Atherton Street (business Rt. 322) to Beaver Ave. light. Turn east on Beaver Ave. and go thru 4 lights (6 blocks) to Locust lane.

Downtown State College, PA N Atherton

St S Atherton

St E College Av W College

Av PA26

Issues

College Av (Rt. 26 or Benner Pike) In the database as West (or East)

College Av Alternate name is PA 26 Benner Pike is actually PA150.

Atherton Street (business Rt. 322) In the database as North (or South)

Atherton St Mostly referred to as Atherton by locals

Directions to The Callan Theatre

From west (Adams Morgan, Georgetown)At the intersection of 16th Street and Irving Street, two blocks east of the heart of Adams Morgan, go east on Irving (right if you are coming from downtown). Cross Georgia Avenue, pass the Washington Hospital Center, go under North Capitol Street. Irving dead-ends on Michigan Avenue; turn left onto Michigan. The next light, which comes up quickly, is Harewood Road; turn left onto Harewood. The theater is half a mile up Harewood on the right.

Catholic University Neighbourhood

Irving St NW Irving St NE Michigan Av

NW Michigan Av NE

Issues

All references are missing the vital NE/NW

Need to distinguish between Michigan (Avenue) and the State of Michigan.

How to handle Street names?

We need to be able to look up shortened (and possibly partly wrong) street names in our database.

Could use SQL LIKE query Slow, difficult Programmatic adjustment of names

encountered (add N/S/E/W etc)

Stemming

In linguistic morphology, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.

Stemming Examples

So Dog and dogs (and doggedly) stem to

dog. Cat, cats and cattily stem to cat. fishing, fished, fish, and fisher stem to

fish. This allows a search engine to return

pages about dogs when you search for dog.

So Stemming for Street Names…

Washington Pl, Washington Ave and Washington St all become Washington.

North Atherton St, South Atherton St and Atherton Av become Atherton.

1St St, First Ave and 1 St N become 1. Interstate 80, I-80 and I 80w become

80. State Rd 26, PA 26 and county road

26 become 26.

Does this help?

This allows us to take (possibly wrong) and often shortened street names and look them up in the database.

(after we have worked through the 7.3 million named street segments in the USA)

(Caching is obviously our friend here)

6210946 2 31097 Main St 12849 main 30003 2nd St 8977 3 28614 1st St 8093 1 27938 3rd St 8027 4 25726 4th St 6945 5 23359 Oak St 6612 6 20369 Elm St 6104 park 19280 Pine St 6069 oak 17706 5th St 5677 7 17666 Church St 5662 8 16039 Maple St 5527 maple 15877 Walnut St 5041 pine 14703 6th St 4698 10 14260 Washington St 4104 9 14193 7th St 3927 elm 13163 N Main St 3535 washington 12290 Center St 3502 11 12219 River Rd 3493 cedar 12176 High St 3452 walnut 12107

Route Detection Algorithm

Find numeric strings in text Lookup zip codes and telephone

numbers Select Noun Phrases (NP)

JTextPro For each NP:

Is it a state? Is it a town (populated place)? Is it a point of interest (POI)? Is it a street name (after stemming)?

Algorithm (continued)

Select any NP that is unambiguous (Zzyzx Rd, Autumn Crocus Ct, Abell City)

Define a minimum bounding box (or polygon) based on the unambiguous points.

Sort the ambiguous NP based on number of matches (so prefer Gibsonia (3) over Midway (214)) then see if only one falls in the polygon if select that one.

Route Formation

Once you have a beginning and an end for the route attempt to determine the road segments that form the route.

Note: not all road segments are named nor do they all join correctly

Where possible determine turns from one road to another and use this to truncate highlighted sections of road

Pass the details to the mapping server and display

Further Work

Improve detection of streets, POI and places (linguistics). Take the X exit (X is probably a place) Turn left at X (X is probably a POI) Turn left on X (X is probably a street)

Improve route determination1. Fix up streets database – naming and

joins2. Imprecise routing

Given a set of POI which roads pass nearby?

Probably a graph problem?

top related