naturalgeo : final presentation

32
NaturalGeo: Final Presentation Dr Kristin Stock and Mr Javid Yousaf University of Nottingham

Upload: jillian-navarro

Post on 01-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

NaturalGeo : Final Presentation. Dr Kristin Stock and Mr Javid Yousaf University of Nottingham. Project Goal. To develop methods for natural language spatial querying How to map natural language expressions to queries. car parks beside the river. What kinds of expressions?. - PowerPoint PPT Presentation

TRANSCRIPT

NaturalGeo: Final Presentation

Dr Kristin Stock and Mr Javid Yousaf

University of Nottingham

Project Goal

To develop methods for natural language spatial querying

How to map natural language expressions to queries.

car parks beside the

river

What kinds of expressions?

• the car park beside the river• a field on the corner of the lane• the route follows the lane• the hall is on this quadrangle• the tramline at town x• the park contains trails

can be used as queries generically (e.g. all car parks that are beside a river), or with a place name (the car park beside the River Trent)

ScopeAlignment

Object parthood

Sidedness

Containment

Collocation (same place)

Adjacency

Why?

• Easier access to OS data products, vs.– Limited place name/postcode search;– Advanced and complex tools.

• Extraction of location from text documents.

• Potentially, generation of language descriptions.

• Easier access = increased potential for data use.

What has already been done?

• Mainly mathematical models for specific natural language terms. e.g:– Topology, fixed formal model– Some models that include context, like near

(e.g. model density etc).

What does NaturalGeo add?

• Takes a ‘whole of language’ approach.• Considers context.

How?

• Memory/instance based learning.• Use a store of expressions whose

interpretation is known.• For next expressions, find most

semantically similar known expression, and use that interpretation.

How do we represent interpretations?

• Geometric Configuration Ontology (GCO).• 50 types of geometric configurations

between pairs of objects• Each defined with a query.

GCO profiles We can represent the meaning of a geospatial expression using a GCO profile

GCO profiles and queries

• Then we can create a query based on the GCO profile.

• Query composition required.• Decision between:

– conjunctive inclusion (multiple concepts to represent the relation)

– eliminating some relations due to weakness in selection

GCOConceptx GCOConcepty GCOConceptz⋀ ⋀

How do we know what the GCO profiles are for an expression?

• Questionnaire of 2000 expressions.• Users selected best diagrams, diagrams

depict GCO concepts.• So we have GCO profiles for 2000

expressions.• Use some as ‘known’ expressions, the

rest for evaluation.

Interpreting a new expression

• In simple terms:– For expression x, we find the most similar

known expression y.– We know the GCO profile for y.– GCO profile for x = GCO profile for y.

• But, we may look at a the most similar group of expressions (and their GCO profiles) to try to get best results.

The big question…

• How do we find the ‘most similar’ expression?

First, we parse the expression…

• Identify:– Locatum (object being located)– Relatum (object used as a reference)– Verb– Preposition– Spatial adverb– Division nouns for relatum and locatum (e.g.

part of)– Div noun adjective

the station is right by the side of the river

Then, we compare like with like…

the station is right by the side of the river

the station is located in the city centre

using 4 comparison methods

Method 0: Baseline

• similarity score = count of matching components/max number of components

the station is right by the side of the river

the station is located in the city centre

1 00

0

similarity score = 1/6 = 0.16667

Method 1: Word Distribution Similarity

• similarity score = ∑ word distribution similarity of element pairs/max number of populated elements

• cosine method

the station is right by the side of the river

the station is located in the city centre

1 0.50.6

0.3

similarity score = 2.4/6 = 0.4

Method 2: Ontology-based Similarity

• similarity score = ∑ (1-normalised semantic distance) of element pairs/max number of populated elements

• dependent on ontology structure.

the station is right by the side of the river

the station is located in the city centre

1 0.50.6

0.3

similarity score = 2.4/6 = 0.4

Method 3: Geolinguistic Factor Similarity

• Same as method 2 for all elements except relatum and locatum.

• For relatum and locatum, we determine similarity of geolinguistic factors, not of the feature types themselves.

• Geolinguistic factors, factors thought to be significant in use of language– image-schemata– geometry type– liquid/solid– scale– axial structure…..

the station is right by the side of the river

the station is located in the city centre

1 0.50.6

image schemata: 1 shared, 3 max = 1/3geometry: 0 shared, 1 maxaxial structure: 0 shared, 1 maxscale: 2 shared, 3 max = 2/3liquid/solid: 0 shared, 1 maxTotal 1/5 = 0.2

c.f. street/river, could be 0.7 or 0.8

0.2

LAGO

• Geolinguistic factors contained in the Linguistically Augmented Geospatial Ontology (LAGO).

• Extends OS ontologies with geolinguistic factors.

Analysis (1)

• Broad measures of success:– Similarity of GCO profiles for most highly

matched expressions. • Most similar expressions should have most similar

GCO profiles, if similarity is being measured correctly.

– Using simple measures:• correlation (pearson) between our score and GCO

similarity (spearman) – should be maximised (<=1)• average difference between our score and GCO

similarity – should be minimised.

Analysis (2)

1. Which method is best?

2. How does the size of the kb affect results?

3. Which elements (relatum, locatum, verb) have the greatest impact on the results?

4. Which geolinguistic factors have the greatest impact on the results?

5. How does the success of the method vary with different spatial relations?

6. How transferable is the method to different spatial relations and what is required of the knowledgebase?

To do (June)• General refinements/improvements:

– Parsing of expressions.– Method 1: cosine method (DISCO) returns very

low numbers– Method 2: Wordnet network distance matching, WS4J

methods not very good, implement our own method.– Matching of terms to LAGO, currently uses hyponyms,

hypernyms and synonyms, inconsistent ordering.

• Improve speed.• Improve/extend overall measures of success,

currently:– spearman corr coeff of similarity score and GCO

profile correlation (pearson) (trying to maximise)– average of difference between similarity score and

GCO profile correlation (trying to minimise)

And then…

• Analysis.• Query composition methods.• Methods for selecting the best GCO

profile (single best match, or combined multiple matches?).

• Comparison with mathematical models.• Refine methods: weightings? different

measures of similariy?• More geolinguistic factors.• Richer geolinguistic factor model.

Conclusions

• Proof of concept so far.• Framework set up.• Now we have the opportunity to test,

refine and further develop the method.• Then, the next goal:

– Can we use the data we have in the kb (2000 expressions) to discover patterns and infer GCO profiles for expressions for which there are no close matches in the kb (e.g. new spatial relations etc)?