1 the geoparser. 2 overview what is a geoparser? –software for the automated extraction of place...

11
•1 The GeoParser

Post on 19-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation

•1

The GeoParser

Page 2: 1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation

•2

Overview

• What is a geoparser?– Software for the automated extraction of place names

from text

• Why would you want one?– Document characterisation

– Explicit geocoding of metadata making document inherently geographically searchable

• How?– ‘bruteforce’

– rule based

Page 3: 1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation

•3

Geo-spatial data“data that have some form of spatial or geo-graphic reference that enables them to be located in two- or three-dimensional space”

Statistical Account of Scotland

NUMBER XIII.

PARISH OF CULLEN.

(COUNTY OF BANFF, SYNOD OF ABERDEEN, PRESBYTERY OF FORDYCE.)

By the Rev. Mr. ROBERT GRANT.

Royalty, Extent, Climate, etc.

CULLEN, as appears from old charters, was originallycalled Inverculan, because it stands upon the bank ofthe Burn of Cullen, which, at the N. end of the town, fallsinto the sea: but now it is known by the name of Cullen on-ly. Cullen is a royal burgh, formerly a constabulary, ofwhich the Earl of Findlater was hereditary constable. Theset, as it is called, of the council, consists of 19, in which num-ber are included the Earl of Findlater, hereditary preses, 3bailies, a treasurer, a dean-of-guild, and 13 counsellors. Theparish extends from the sea fouthward, about 2 English milesin length.

Page 4: 1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation

•4

Input document

Geoparse

Review

Output document

Geoparsing Flowline

Page 5: 1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation

•5

Geoparser architecture

Web Interface

geoXwalk Database

Text Docs / web pages

Parser : rule based place name id

Downloadable metadata

record xml, (gml?)

Results Table / map

preview

2. Geoparse

1.Inputs 3.Review4.Ouputs

Page 6: 1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation

•6

Demonstration

Page 7: 1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation

•7

Broad Issues

• What’s a geoparser for?– Geo-referencing tool for enhancing metadata?– Text analysis tool?

• Areas for improvement – Need for more reliable geoparsing algorithms

• to disambiguate multiple occurrences of the same place name in the same text

• to develop automated feature typing Areas for improvement

– Need for more reliable geoparsing algorithms • to disambiguate multiple occurrences of the same place name in

the same text– to develop automated feature typing

• Degree of user intervention - how ‘semi’ should semi-automatic be? – Interface design depends largely on the ‘accuracy’ of the

parser and the user’s motivations ?

Page 8: 1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation

•8

(An aside - Possible Solutions)

• Implement variety of parsing methods– user selects depending on use e.g.

• context based approach• definitive place name matching against gazetteer

• Tools made available to user depend on type and number of documents and intended use. – Need to find balance between text analysis and user

interaction

e.g. Batch facility limited to certain document types and user selected parsing method - minimal user intervention.

Page 9: 1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation

•9

Specific Issues

• The distinction between parser selected locations and gazetteer locations needs to be more explicit– no. of occurrences in text following geo-

referencing?

• Users will be able to search the gazetteer and add records to output

• Addition of ‘rogue’ place names to the gazetteer– (Quality assurance issues)

Page 10: 1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation

•10

Continued...

• Implementation of sorting functions to the results table

• Output options– currently preview results table

– map view for geo-referenced place names

– file download• required formats (xml, gml?)• Original document marked up in html(?)

Page 11: 1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation

•11