naturalgeo : final presentation
DESCRIPTION
NaturalGeo : Final Presentation. Dr Kristin Stock and Mr Javid Yousaf University of Nottingham. Project Goal. To develop methods for natural language spatial querying How to map natural language expressions to queries. car parks beside the river. What kinds of expressions?. - PowerPoint PPT PresentationTRANSCRIPT
Project Goal
To develop methods for natural language spatial querying
How to map natural language expressions to queries.
car parks beside the
river
What kinds of expressions?
• the car park beside the river• a field on the corner of the lane• the route follows the lane• the hall is on this quadrangle• the tramline at town x• the park contains trails
can be used as queries generically (e.g. all car parks that are beside a river), or with a place name (the car park beside the River Trent)
Why?
• Easier access to OS data products, vs.– Limited place name/postcode search;– Advanced and complex tools.
• Extraction of location from text documents.
• Potentially, generation of language descriptions.
• Easier access = increased potential for data use.
What has already been done?
• Mainly mathematical models for specific natural language terms. e.g:– Topology, fixed formal model– Some models that include context, like near
(e.g. model density etc).
How?
• Memory/instance based learning.• Use a store of expressions whose
interpretation is known.• For next expressions, find most
semantically similar known expression, and use that interpretation.
How do we represent interpretations?
• Geometric Configuration Ontology (GCO).• 50 types of geometric configurations
between pairs of objects• Each defined with a query.
GCO profiles and queries
• Then we can create a query based on the GCO profile.
• Query composition required.• Decision between:
– conjunctive inclusion (multiple concepts to represent the relation)
– eliminating some relations due to weakness in selection
GCOConceptx GCOConcepty GCOConceptz⋀ ⋀
How do we know what the GCO profiles are for an expression?
• Questionnaire of 2000 expressions.• Users selected best diagrams, diagrams
depict GCO concepts.• So we have GCO profiles for 2000
expressions.• Use some as ‘known’ expressions, the
rest for evaluation.
Interpreting a new expression
• In simple terms:– For expression x, we find the most similar
known expression y.– We know the GCO profile for y.– GCO profile for x = GCO profile for y.
• But, we may look at a the most similar group of expressions (and their GCO profiles) to try to get best results.
First, we parse the expression…
• Identify:– Locatum (object being located)– Relatum (object used as a reference)– Verb– Preposition– Spatial adverb– Division nouns for relatum and locatum (e.g.
part of)– Div noun adjective
the station is right by the side of the river
Then, we compare like with like…
the station is right by the side of the river
the station is located in the city centre
using 4 comparison methods
Method 0: Baseline
• similarity score = count of matching components/max number of components
the station is right by the side of the river
the station is located in the city centre
1 00
0
similarity score = 1/6 = 0.16667
Method 1: Word Distribution Similarity
• similarity score = ∑ word distribution similarity of element pairs/max number of populated elements
• cosine method
the station is right by the side of the river
the station is located in the city centre
1 0.50.6
0.3
similarity score = 2.4/6 = 0.4
Method 2: Ontology-based Similarity
• similarity score = ∑ (1-normalised semantic distance) of element pairs/max number of populated elements
• dependent on ontology structure.
the station is right by the side of the river
the station is located in the city centre
1 0.50.6
0.3
similarity score = 2.4/6 = 0.4
Method 3: Geolinguistic Factor Similarity
• Same as method 2 for all elements except relatum and locatum.
• For relatum and locatum, we determine similarity of geolinguistic factors, not of the feature types themselves.
• Geolinguistic factors, factors thought to be significant in use of language– image-schemata– geometry type– liquid/solid– scale– axial structure…..
the station is right by the side of the river
the station is located in the city centre
1 0.50.6
image schemata: 1 shared, 3 max = 1/3geometry: 0 shared, 1 maxaxial structure: 0 shared, 1 maxscale: 2 shared, 3 max = 2/3liquid/solid: 0 shared, 1 maxTotal 1/5 = 0.2
c.f. street/river, could be 0.7 or 0.8
0.2
LAGO
• Geolinguistic factors contained in the Linguistically Augmented Geospatial Ontology (LAGO).
• Extends OS ontologies with geolinguistic factors.
Analysis (1)
• Broad measures of success:– Similarity of GCO profiles for most highly
matched expressions. • Most similar expressions should have most similar
GCO profiles, if similarity is being measured correctly.
– Using simple measures:• correlation (pearson) between our score and GCO
similarity (spearman) – should be maximised (<=1)• average difference between our score and GCO
similarity – should be minimised.
Analysis (2)
1. Which method is best?
2. How does the size of the kb affect results?
3. Which elements (relatum, locatum, verb) have the greatest impact on the results?
4. Which geolinguistic factors have the greatest impact on the results?
5. How does the success of the method vary with different spatial relations?
6. How transferable is the method to different spatial relations and what is required of the knowledgebase?
To do (June)• General refinements/improvements:
– Parsing of expressions.– Method 1: cosine method (DISCO) returns very
low numbers– Method 2: Wordnet network distance matching, WS4J
methods not very good, implement our own method.– Matching of terms to LAGO, currently uses hyponyms,
hypernyms and synonyms, inconsistent ordering.
• Improve speed.• Improve/extend overall measures of success,
currently:– spearman corr coeff of similarity score and GCO
profile correlation (pearson) (trying to maximise)– average of difference between similarity score and
GCO profile correlation (trying to minimise)
And then…
• Analysis.• Query composition methods.• Methods for selecting the best GCO
profile (single best match, or combined multiple matches?).
• Comparison with mathematical models.• Refine methods: weightings? different
measures of similariy?• More geolinguistic factors.• Richer geolinguistic factor model.
Conclusions
• Proof of concept so far.• Framework set up.• Now we have the opportunity to test,
refine and further develop the method.• Then, the next goal:
– Can we use the data we have in the kb (2000 expressions) to discover patterns and infer GCO profiles for expressions for which there are no close matches in the kb (e.g. new spatial relations etc)?