25th april 2006 semantics & ontologies in gi services semantic similarity measurement martin...

59
25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal [email protected]

Upload: brook-robertson

Post on 12-Jan-2016

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

25th April 2006 Semantics & Ontologies in GI Services

Semantic Similarity Measurement

Martin Raubal

[email protected]

Page 2: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 2

Outline

• Motivation

• Semantic interoperability, concepts

• Semantic similarity measurement

• Geometric model

• Feature-based model

• Alignment-based model

• Transformational model

• Conclusions

Page 3: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 3

Motivating example (1)

• Customer of OS wants to set up flood warning system.

• Need for existing flooding areas to analyze current flood defense situation in U.K.

• OS Master Map: geographic & topographic; information on areas used for flooding but not designated as such.

• ‘Watermeadow', 'carse‘, 'haugh' identified as flooding areas by their semantic description only (properties in ontology).

Page 4: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 4

Page 5: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 5

User conceptualization of roads & residential areas

System model of roads & residential areas

Roads overlap residential areas?

Intersect to find roads going through residential areas

Motivating example (2)

Page 6: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 6

Semantic interoperability

• “Capacity of (geographic) information systems and services to work together without the need for human intervention” (Harvey, Kuhn et al. 1999)

• Achieving sufficient degree of semantic interoperability => necessary to determine semantic similarity between concepts.

Page 7: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 7

Similarity (psychology)

“Similarity is fundamental for learning, knowledge and thought, for only our sense of similarity allows us to order things into kinds so that these can function as stimulus meanings. Reasonable expectation depends on the similarity of circumstances and on our tendency to expect that similar causes will have similar effects" [Quine 1969, p. 114].

Page 8: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 8

Computer science

• Similarity plays major role to enable machine-based solutions: decision support systems, data mining, pattern recognition.

• Semantic information retrieval: similarity indicates relevance of results with regard to being similar to the query.

Page 9: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 9

Concept

A concept is "a mental representation of a class or individual and deals with what is being represented and how that information is typically used during the categorization" [Smith 1989, p. 502].

Concept vs. Category?

Page 10: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 10

Concepts in knowledge representation

• Conceptual knowledge can be represented in ontologies that consist of specifications of concepts, relations and axioms.

• Relations link concepts together and enable reasoning and measurement within an ontology.

• Taxonomical (hierarchical) relations are the most important for reasoning and structuring knowledge.

Page 11: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 11

dist (Bus, Ferry) < dist (Bus, Bike)

Page 12: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 12

Similarity measurements

• Approaches from different research areas (psychology, computer science, artificial intelligence) => apply to ontology-based semantic similarity measurement.

• Application areas:• Information retrieval & integration• Data mining & maintenance• Categorization• Natural-language processing• Pattern recognition

Page 13: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 13

Measure and representation

Representational model used to describe concepts determines semantic similarity measure (based on one notion of similarity).

Representation => similarity measure

Page 14: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 14

Semantic similarity measurement

• How close are two entities to each other conceptually?

• Value between 0 and 1:

• ‘0’ => no similarity

• ‘1’ => both entities are equal

• Different measurement theories.

Page 15: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 15

[Schwering forthcoming]

Page 16: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 16

Approaches

• Geometric Model / MDS• Gärdenfors: Conceptual Spaces

• Feature-based Model• Tversky: Contrast Model• Rodriguez: MDSM

• Alignment-based Model• Goldstone: SIAM

• Transformational Model• Hahn, Example.: ABBA AABB

Page 17: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 17

Geometric models and MDS

• Multidimensional scaling (MDS) => similarity between entities as geometric models consisting of points in dimensional metric space.

• Similarity inversely related to distance (dissimilarity) between two entities => linear decaying function of the semantic distance d.

Page 18: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 18

Geometric models and MDS cont.

• n … number of dimensions

• xik and xjk … values for dimension k of the entities i and j

• Minkowski metric: r = 1 => city-block metric, r = 2 => Euclidean metric, etc.

rn

k

r

jkikij xxd/1

1

Page 19: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 19

MDS in cognitive science

• Applied to discover mental representations of stimuli and explanations of similarity judgments.

• MDS as mathematical model of categorization, identification, recognition, memory, generalization (Nosofsky 92, Shepard 87).

• Degree of relation between stimuli ~ spatial distance

Page 20: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 20

Representational model

Page 21: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 21

Geometric models and MDS cont.

• Choice for metric to best fit human similarity assessments => depends on entities (stimuli) and subjects’ strategies.

• Euclidean metric provides better fit to empirical data when stimuli are composed of integral, perceptually fused dimensions (e.g., brightness and saturation of color).

• City-block metric appropriate for psychologically separated dimensions (e.g., color and shape).

Page 22: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 22

Euclidean metric City-block metric

Page 23: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 23

Page 24: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 24

shapecolor

Page 25: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 25

MDS vs. Geometric models

• MDS determines number of dimensions from subjects‘ pairwise judgments.

• Goal: maximum correlation between judgments and distances in n-dim. space with minimum number of dimensions.

• Geometric models start with defining dimensions.

Page 26: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 26

Axioms of geometric model

• Minimality:

• Symmetry:

• Triangle Inequality:

0),(),( AADBAD

),(),( ABDBAD

),(),(),( CADCBDBAD

These axioms may not hold for human similarity assessments!

Page 27: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 27

Problems with geometrical model

• Distance between compared entities is not symmetric but asymmetric (Tversky 1977). Example: North Korea is judged to be more similar to Red China than vice versa.

• Category members are judged more similar to category prototypes than prototype to several category members.

Page 28: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 28

Problems with geometrical model

• A lamp is similar to the moon (light);moon similar to soccer ball (shape); lamp NOT similar to soccer ball (?);(James 1892)

• Adding common features to entities does not increase their similarity (distance grows).

Page 29: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 29

Requirements and assumptions

• Independence of properties.

• Property set must reflect human conceptualization to provide good similarity results – how to achieve this?

• Comparability of different dimensions – same relative unit.

Page 30: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 30

Feature-based models

Common elements approach

• Two entities (stimuli) are similar if they have common features (elements).

• The more elements they share, the more similar the stimuli are.

• Problem: always possible to find endless amount of common elements depending on the view.

Page 31: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 31

Representational model

• Set-theoretic: concepts represented as unstructured sets of features.

• Characterization through properties common in analysis of cognitive processes.

• Application areas: speech perception, pattern recognition, perceptual learning.

Page 32: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 32

Page 33: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 33

[Schwering forthcoming]

flat

area

periodically waterlogged

floodplain wetland

low-lying

flat

area

often waterlogged

low vegetation

lowland

flat

area

low-lying

Page 34: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 34

Feature-matching model

• Proposed by Amos Tversky.A. Tversky (1977) Features of Similarity. Psychological Review 84(4): 327-352.

• Supports asymmetric similarity measurement.

• Elementary set operations can be applied to estimate similarities and differences.

Page 35: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 35

Page 36: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 36

Requirements and assumptions

• Independence of features.

• Feature set must be sufficiently rich to account for human categorization.

• Invariance of representational elements (no transformations as in geometric models).

Page 37: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 37

Feature-based models cont.

Contrast model

• Similarity is defined not only by the entities’ common features, but also by their distinctive features (Tversky 1977).

• In contrast to the common elements approach a flexible weighting is used.

Page 38: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 38

Contrast model

• q, a, b … weights for common / distinctive features

• (AB) … number of features that A and B have in common

• (A-B) … features possessed by A but not B

• (B-A) … features possessed by B but not A

Asymmetric because a is not constrained to be equal to b nor f(A-B) to f(B-A).

)()()(),( ABfbBAfaBAfqBAS

Page 39: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 39

Ratio model

• Similarity is normalized => S between 0 and 1.

Page 40: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 40

Assertions

• Similarity measurement is directional and asymmetric.

• Model used to test Rosch‘s (1978) hypothesis that perceived distance from prototype to variant is larger than perceived distance from variant to prototype.

Page 41: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 41

Matching-Distance Similarity Measure

• Matching-Distance Similarity Measure (MDSM): context sensitive, asymmetric semantic similarity measurement approach for geographic entity classes (Rodríguez and Egenhofer 2004).

• Based on Tversky‘s contrast model.

• Different kinds of features: Features are classified by types (parts, functions, attributes).

Page 42: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 42

MDSM cont.

Different feature classes in analogy to WordNet‘s description of nouns.

• Parts: structural elements of a class.

• Functions: what is done to or with instances of concept.

• Attributes: additional characteristics not considered by former two.

Page 43: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 43

MDSM

• t … type of feature (part, attribute, function)

• c1, c2 … compared entity classes

• C1, C2 … respective sets of features of type t for c1, c2

1221212121

2121

21212121

\)),(1(\),(),(

),(),(),(),(

CCccCCccCC

CCccS

ccSccSccSccS

t

aaffpp

Measure applied to each feature type.

Page 44: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 44

)()(,

)()(

)(1

)()(,)()(

)(

),(

OdepthUdepthOdepthUdepth

Udepth

OdepthUdepthOdepthUdepth

Udepth

OU

Page 45: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 45

Degree of asymmetry

• Calculate degree of asymmetry depending on degree of generalization of concepts.

• Based on following idea: people perceive similarity from subconcept to superconcept greater than vice versa.

• Depth = shortest path of each concept to immediate common superconcept that subsumes both concepts.

Page 46: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 46

Exemplar calculation

Page 47: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 47

Page 48: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 48

Calculation: theatre - building

• depth (theatre) [1] > depth (building) [0]=> = 1 – 1 / (1+0) = 0

• Sp = 3 / (3 + 0 + 0) = 1

• Sf = 0 (no functions for building)

• Sa = 1 (same attributes)

Page 49: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 49

Calculation: building - theatre

• depth (building) [0] < depth (theatre) [1]=> = 0 / (1+0) = 0

• Sp = 3 / (3 + 0 + 6) = 1/3

• Sf = 0 (no functions for building)

• Sa = 1 (same attributes)

Page 50: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 50

Similarity values

Entity classes

Sp Sf SaS(a,b)

theatre, building

0.0 1.0 0.0 1.0 0.67

building, theatre

0.0 0.33 0.0 1.0 0.44

theatre, sport arena

0.5 0.53 0.33 1.0 0.62

Page 51: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 51

Discussion

• Information retrieval: Descriptions of query and data source concepts may differ greatly in their granularity - query concepts often focus on the very characteristic properties, data source concepts are described broadly to be context-independent.

• Query ‘flooding area’ (shape, relation to waterbodies) vs. data source ‘floodplain’ (additional hydrologic & ecologic properties) => distinct properties reduce similarity!

Page 52: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 52

Problems with feature-based models

• Features, dimensions are unrelated, but in reality entities are not simply unstructured bags of features.

• Also true for relations between entities!

Page 53: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 53

Alignment-based models

• Use commonalities and differences as notion of similarity, but include also relational structure of properties.

• Motivation: Similarity is like Analogy.

• Similarity involves structural alignment and mapping.

Page 54: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 54

Two spatial scenes are described by a set of features. The similarity between these scenes depends on the correct alignment of these features [Gentner et al. 1995, p. 114]

Page 55: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 55

Transformational model

• Transformations required to make one concept equal to another are defined.

• Similarity depends on number of transformations needed to make concepts transformationally equal.

• Example: Operations modifying the geometric arrangement are rotation, reflection, translation and dilation.

Page 56: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 56

Transformational model

• Similarity assumed to decrease monotonically when number of transformations increases.

• Transformational model is asymmetric, but the metric axioms minimality and triangle inequality hold.

Page 57: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 57

Comparison of models (Schwering)

Page 58: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 58

Conclusions

• Semantic similarity measurement is basis for semantic interoperability.

• Different measurement theories => advantages & disadvantages

• Most common: geometric & feature-based approaches.

Page 59: 25th April 2006 Semantics & Ontologies in GI Services Semantic Similarity Measurement Martin Raubal raubal@uni-muenster.de

Martin Raubal Semantic Similarity Measurement 59

References

• Gärdenfors, P. (2000). Conceptual Spaces - The Geometry of Thought. Cambridge, MA, Bradford Books, MIT Press.

• Goldstone, R. L. and A. Kersten (2003). Concepts and Categorization. Comprehensive handbook of psychology. A. F. Healy and R. W. Proctor. 4: 599-621.

• Rodríguez, A. and M. J. Egenhofer (2004). "Comparing Geospatial Entity Classes: An Asymmetric and Context-Dependent Similarity Measure." International Journal of Geographical Information Science 18(3): 229-256.