logics for data and knowledge representation applications of classl: lightweight ontologies

30
Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Upload: nathaniel-hines

Post on 04-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Logics for Data and KnowledgeRepresentation

Applications of ClassL: Lightweight Ontologies

Page 2: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Outline Ontologies

Descriptive and classification ontologies Real world and classification semantics

Lightweight Ontologies Converting classifications into Lightweight Ontologies

Applications on Lightweight Ontologies Document Classification Query-answering Semantic Matching

2

Page 3: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Ontologies Ontologies are explicit

specifications of conceptualizations

[Gruber, 1993]

They are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts

3

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a

EatsEats

Is-aPart-of

Is-a Is-a

Eats

Body

Part-of

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 4: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Concepts and Relations between them CONCEPT: it represents a set of objects or individuals EXTENSION: the set above is called the concept extension or the

concept interpretation Concepts are often lexically defined, i.e. they have natural language

names which are used to describe the concept extensions (e.g. Animal, Lion, Rome), often with an additional description (gloss)

RELATION: a link from the source concept to the target concept The backbone structure of an ontology graph is a taxonomy in which

the relations are ‘is-a’, ‘part-of’ and ‘instance-of’, whereas the remaining structure of the graph supplies auxiliary information about the modeled domain and may include relations like ‘located-in’, ‘eats’, ‘ant’, etc. They are respectively called hierarchical (BT/NT) and associative (RT) relations in Library Science.

4

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 5: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Ontology as a graph: a mathematical definition

5

An ontology is an ordered pair

O = <V, E>

V is the set of vertices describing the concepts

E is the set of edges describing relations

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a

EatsEats

Is-aPart-of

Is-a Is-a

Eats

Body

Part-of

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 6: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Tree-like Ontologies Take the ontology in the

previous slide and remove those auxiliary relations…

… we get a tree-like ontology consisting of its backbone structure with ‘is-a’ and ‘part-of’ relations (*), that is an informal lightweight ontology.

(*) Notice that in some cases we can obtain more complex structures like DAGs or even with cycles

6

Animal

Bird HeadMammal

Predator Herbivore

GoatTiger

Chicken

Cat

Is-a

Is-a

Is-aIs-a

Is-a Is-aPart-of

Is-a Is-a

Body

Part-of

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 7: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

o Classification ontologies

They are used to classify things, such as books, documents, web pages, etc.; the purpose is to provide domain specific terminology and organize individuals accordingly. Such ontologies usually take the form of classifications with (BT\NT\RT) or without explicit relations.

o Descriptive ontologies

They are used to describe a piece of world, such as the Gene ontology, Industry ontology, etc.; the purpose is to offer an unambiguous description of the world. Relations are typically explicit (e.g. is-a) and can be of any kind.

Classification vs. Descriptive Ontologies

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 8: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

o Classification ontologies are in classification semantics

In classification ontologies, the extension of each concept (label of a node) is the set of documents about the entities or individual objects described by the label of the concept. For example, the extension of the concept animal is “the set of documents about animals” of any kind.

o Descriptive ontologies are in real world semantics

In descriptive ontologies, concepts represent real world entities.

For example, the extension of the concept animal is the set of real world animals, which can be connected via relations of the proper kind.

Classification vs. Real World semantics

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 9: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Classification ontologiesONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 10: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Descriptive ontologiesONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 11: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Why ‘Lightweight’ Ontologies? The majority of existing ontologies are ‘simple’ taxonomies or

classifications, i.e., hierarchically organized categories used to classify resources.

Ontologies with arbitrary relations do exist, but no intuitive and efficient reasoning techniques support such ontologies in general.

… so we need ‘lightweight’ ontologies.

11

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 12: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Lightweight Ontologies A (formal) lightweight ontology is a triple

O = <N,E,C>

where: N is a finite set of nodes, E is a set of edges on N, such that <N,E> is a rooted tree, C is a finite set of concepts expressed in a formal language F,

such that for any node ni N, there is one and only one ∈concept ci C, and, if n∈ i is the parent node for nj, then cj c⊑ i.

NOTE: lightweight ontologies are in classification semantics

12

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 13: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Converting tree-like structures into LOs For a descriptive ontology, the backbone taxonomy of ‘is-a’ and

‘instance-of’ is intuitively coincident with the subsumption (‘ ’)⊑ relation in LOs.

NOTE: ‘part-of’ relations correspond to subsumption only if transitive. For instance the following chain cannot be translated:

handle part-of door part-of school part-of school system

For a classification ontology, the extension of each node is the set of documents (books, websites, etc.) that should be classified under the node. Therefore, the links has to be interpreted as ‘subset’ relations and can be transformed directly into subsumption in the target LOs.

13

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 14: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Descriptive and classification ontologies

Animal

Vertebrate

Mammal

A

B

D

InvertebrateC

E Bird

is-a is-a

is-a is-a

(a) World

Europe

France

A

B

D

AsiaC

E Italy

part-of

part-of part-of

F Rome

part-of

part-of

(b)

(a) and (b) are two descriptive ontologies. The corresponding classification ontologies are obtained by substituting all the relations with ‘subset’.

(a) and (b) can be converted into lightweight ontologies by substituting the relations into subsumptions. However, the semantics changes from real world to classification semantics.

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

14

Page 15: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Populated (Lightweight) Ontologies In Information Retrieval, the term classification is seen as the

process of arranging a set of objects (e.g., documents) into a set of categories or classes.

A classification ontology is said populated if a set of objects has been classified under ‘proper’ nodes.

Thus a populated (lightweight) ontology includes (explicit or implicit) ‘instance-of’ relations

15

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 16: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Example of a Populated Ontology

16

⊑ ⊑

Head Body

Animal

Bird Mammal

Predator Herbivore

GoatTiger

Chicken

Cat

⊑ ⊑

⊑ ⊑ ⊑

⊑⊑ ⊑

‘Chicken Soup’

‘How to Raise Chicken’

‘Tom and Jerry’ ‘www.protectTiger.org’ …

Instance-of

Instance-of

Instance-of Instance-of Instance-of

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 17: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Lightweight Ontologies in ClassL: TBox Subsumption terminologies. Recall:

‘… C is a finite set of concepts expressed in a formal language F, such that for any node ni N, there is one and only one concept ∈ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.’

1. Bird ⊑ Animal

2. Mammal ⊑ Animal

3. Chicken ⊑ Bird

4. Cat ⊑ Predator

5. …

NOTE: a tree-like ontology can be transformed into a lightweight ontology, but not vice versa. This is because we loose information during the translation.

17

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 18: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Populated LOs in ClassL: TBox + ABox ‘instance-of’ links are encoded into ‘concept assertions’:

1. Chicken(ChickenSoup)

2. Cat(TomAndJerry)

3. …

Instances are the elements of the domain, namely the documents classified in the categories.

18

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 19: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Classifications are: Easy to use for humans

Pervasive (Google, Yahoo, Amazon, our PC directories, email folders, address book, etc.).

Largely used in commercial applications (Google, Yahoo, eBay, Amazon, BBC, CNN, libraries, etc.).

Have been studied for very long time (e.g., Dewey Decimal Classification system - DDC, Library of Congress Classification system - LCC, etc.).

19

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 20: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Classification Example: Yahoo! Directory

20

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 21: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Classification Example: Email Folders

21

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 22: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Classification Example: E-Commerce Category

22

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 23: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Label Semantics Natural language words are often

ambiguous

E.g. Java (an island, a beverage, a programming language)

When used with other words in a label, improper senses can be pruned

E.g., “Java Language” – only the 3rd sense of Java is preserved

We translate node labels into unambiguous propositions in ClassL in classification semantics

This can be done by using NLP (Natural Language Processing) techniques

23

Level

4

Subjects

Computers andInternet

0

1

2

3

(1)

(3)

(5)

(7)

(8)

Programming

Java Language

Java Beans

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 24: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Link semantics Get-specific principle: Child nodes in a

classification are always considered in the context of their parent nodes. As a consequence they specialize the meaning of the parent nodes.

Subsumption relation (a): the extension of the child node is a proper subset of the parent node. The meaning of node 2 is B.

General intersection relation (b): the extension of the child node is a subset of the parent node. The meaning of node 2 is C = A ⊓ B.

We generalize to (b). The meaning of the node is what we call the concept at node.

24

1

2

A

B

?

A

B

A

B C

(b)

(a)

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 25: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Concept at node

Wine and Cheese

Italy

Europe

Austria

Pictures

1

2 3

4 5

In ClassL: C4 = Ceurope ⊓ Cpictures ⊓ Citaly

25

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 26: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Document Classification Document concept: each document d in a classification is

assigned a proposition Cd in ClassL, build from d in two steps:1. keywords are retrieved from d by using standard text mining

techniques.

2. keywords are converted into propositions by using the methodology discussed above to translate node labels.

Automatic classification: For any given document d and its concept Cd we classify d in each node ni such that:

1. ⊨ Cd C⊑ i,

2. and there is no node nj (j ≠ i), for which C⊨ j C⊑ i and C⊨ d C⊑ j.

In other words we always classify in the node with the most specific concept.

26

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 27: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Query-answering Query-answering on a hierarchy of documents based on a

query q as a set of keywords is defined in two steps:

1. The ClassL proposition Cq is build from q by converting q’s keywords as said above.

2. The set of answers (retrieval set) to q is defined as a set of subsumption checking problems in ClassL:

Aq = {d document | T C∈ ⊨ d C⊑ q}

27

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 28: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

Semantic Matching: Why? Most popular knowledge can be represented as graphs.

The heterogeneity between knowledge graphs demands the exposition of relations, such as semantically equivalent.

Some popular situations that can be modeled as a matching problem are: Concept matching in semantic networks. Schema matching in distributed databases. Ontology matching (ontology “alignment”) in the Semantic

Web.

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

28

Page 29: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

The Matching Problem Matching Problem: given two finite graphs, finds all nodes in

the two graphs that syntactically or semantically correspond to each other.

Given two graph-like structures (e.g., classifications, XML and database schemas, ontologies), a matching operator produces a mapping between the nodes of the graphs.

Solution: A possible solution [Giunchiglia & Shvaiko, 2003], consists in the conversion of the two graphs in input into lightweight ontologies and then matching them semantically.

29

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES

Page 30: Logics for Data and Knowledge Representation Applications of ClassL: Lightweight Ontologies

A Matching Problem

?

?

?

30

ONTOLOGIES :: LIGHTWEIGHT ONTOLOGIES :: CLASSIFICATIONS :: APPLICATIONS ON LIGHTWEIGHT ONTOLOGIES