ldk r logics for data and knowledge representation lightweight ontologies

50
L Logics for D Data and K Knowledge R Representation Lightweight Ontologies

Upload: lee-perry

Post on 12-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

LLogics for DData and KKnowledgeRRepresentation

Lightweight Ontologies

Page 2: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Outline Classifications

Lightweight Ontologies (LO):

Labels

Links Document Classification Query-answering on LOs.

2

Page 3: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

3

Page 4: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Classifications… Classifications hierarchies are easy to use...

... for humans. Classifications hierarchies are pervasive

(Google, Yahoo, Amazon, our PC directories, email folders, address book, etc.).

Classifications hierarchies are largely used in industry (Google, Yahoo, eBay, Amazon, BBC, CNN, libraries, etc.).

4

Page 5: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Classifications .. more Classification hierarchies have been studied for

very long (e.g., Dewey Decimal Classification system -- DCC, Library of Congress Classification system –LCC, etc.).

Classifications hierarchies are lightweight (no roles, trees or simple DAGs, …).

Classification hierarchies are a kind of concept hierarchies.

5

Page 6: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Example Data instances of the

concept (class) “Developers” are: John, Fred.

A data instance of “Associations” is: BeSafe Inc.

We define a class instances as the data instance of a class.

6

John Fred BeSafe Inc.

Instance of

Instance of

Instance of

Page 7: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

In Classification Hierarchies Labels are natural language sentences; useful but

hard to deal with in an automated way. Links are of the kind “child-of” (e.g. “economy

child-of Europe”), where in an ontology you would have, (instance-of}, or roles, or {is-a} links.

Main Problem: No clear semantics for both labels at nodes and links.

… we need ontologies for reasoning

7

Page 8: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Outlines Classification

Lightweight Ontology (LO):

Labels

Links Document Classification Query-answering on LOs.

8

Page 9: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Ontology Ontologies are explicit specifications of

conceptualizations.

The notion of concept is understood as defined in Knowledge Representation, i.e., as a set of objects or individuals.

There are numerous provers offering services for reasoning about ontologies (FaCT++, Pellet, KOAN…).

9

Page 10: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Example of an ontology

10

Page 11: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Ontology as a graph Ontology: a description of the concepts and

relationships in the domain of interest. They are often thought of as directed graphs

whose nodes represent concepts and whose arcs represent relations between concepts.

A mathematical definition comes from ‘graph’, an ontology is an ordered pair

O=<V, E>

in which V is the set of vertices describing the concepts and E is the set of edges describing relations.

11

Page 12: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

L.O, Refinement of Ontology But there are two observations:

1. Majority of existing ontologies are ‘simple’ taxonomies or classifications, i.e., categories to classify resources.

2. Ontologies with ‘complex’ relations do exists, but no intuitively reasoning techniques supports such ontologies in general.

… so we need ‘lightweight’ ontologies.12

Page 13: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Example of an Lightweight Ontology

13

Wine and Cheese

Italy

Europe

Austria

Pictures

1

2 3

4 5

Page 14: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Lightweight Ontologies Ontologies are explicit specifications of

conceptualizations. A (formal) lightweight ontology is a triple

O = <N,E,C>, where

N is a finite set of nodes, E is a set of edges on N, such that <N,E> is a rooted tree, and C is a finite set of concepts expressed in a formal

language F, such that for any node ni N, there is one and ∈only one concept ci C, and, if n∈ i is the parent node for nj ,then cj c⊑ i.

14

Page 15: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

From Classifications to Lightweight Ontologies We know that a lightweight ontology is a formal

conceptualization of a domain in terms of concepts and {is-a, instance-of} relationships.

Lightweight ontologies (LOs) add a formal semantics and {instance-of} relationships to classification hierarchies.

In short: LOs make classifications formal!

15

Page 16: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Lightweight Ontologies and Classification Classification hierarchies are semi-formal, so we

need (lightweight) ontologies to automatically reason about classification.

A theory of Lightweight Ontology is needed for building the bridge from classifications to simple, i.e., “lightweight” ontologies.

One result: Onto2Class, Class2Onto operators.

16

Page 17: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Lightweight Ontologies and Class Logic The logic of classes (ClassL) provides a formal

language (syntax + semantics) to model lightweight ontologies, where: concepts are modeled by propositions; {is-a, instance-of} relationships are modeled,

respectively, by subsumption ( ) and ⊑ class-propositions (i.e., wffs like P(a)).

An ontology represented by ClassL is a lightweight ontology.

17

Page 18: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Lightweight Ontologies A Lightweight Ontology (LO) is a rooted tree, where

each node is assigned a label represented by a proposition in ClassL.

A Lightweight Ontology provides for formalizing the meaning of labels, and formalizing the meaning of links

Both formalizations come from ClassL!

18

Page 19: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Outline Classification

Lightweight Ontology (LO):

Labels

Links Document Classification Query-answering on LOs.

19

Page 20: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Label Semantics Natural language words

are often ambiguous. E.g. Java (an island, a beverage, an OO programming language)

When used with other words in a label, wrong senses can be pruned. E.g., “Java Language” – only the 3rd sense of Java is preserved.

20

Level

4

Subjects

Computers andInternet

0

1

2

3

(1)

(3)

(5)

(7)

(8)

Programming

Java Language

Java Beans

Page 21: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

From NL Labels to Labels in Class Logic Several approaches to rewrite a natural language label

into a ClassL proposition. Following (Giunchiglia et al., 2007), we may

distinguish four steps:

1. Tokenization (get distinct words);Italian Pictures ‘Italian’, ‘Pictures’

2. Words stemming (get to a basic form);Pictures picture

3. Rewrite each word into its proposition;picture picture-noun-1⊓picture-noun-2⊓…⊓picture-verb-2

4. Prune inconsistent senses.picture-noun-1⊓picture-noun-2⊓…⊓picture-verb-2pictureN1

21

Page 22: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Class Logic Label Eamples E.g.1: “Java” becomes the proposition

Java#1 ⊔ Java#2 ⊔ Java#3

where Java#i is a propositional variable representing the ith-sense of the word “Java” according to a dictionary (e.g., WordNet).

E.g.2: “Java Beans” becomes:

(Java#1 ⊔ Java#2 ⊔ Java#3)⊓(Bean#1 ⊔ Bean#2)

22

Page 23: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Advantages of Propositions NL labels are ambiguous, propositions are NOT!

Extensional semantics of propositions naturally maps nodes to real world objects.

Labels as propositions allow us to deal with the standard problems in classification (e.g., document classification, query-answering, and matching) by means of ClassL’s reasoning, mainly the SAT problem.

23

Page 24: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Outline Classification

Lightweight Ontology (LO):

Labels

Links Document Classification Query-answering on LOs.

24

Page 25: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Formalizing the Meaning of Links (1) Child nodes in a classification are always

considered in the context of their parent nodes.

Child nodes therefore specialize the meaning of the parent nodes.

Contextuality property of classifications.

25

Page 26: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Formalizing the Meaning of Links (2)General intersection relationship(a): can be used to represent facets. The meaning of node 2 is C = A ⊓ B.

Subsumption relationship (b): child nodes are specific case of the parent nodes. The meaning of node 2 is B.

26

1

2

A

B

A

B

A

B? C

(a)

(b)

Page 27: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

General Intersection Example

hardwaresoftwarenetworking …

l3 = “Computers

and Internet” l5 = “Programming”

scheduling, planning

computerprogrammin

g

l1 = “Subjects”

27

Page 28: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Concept at a Node Parental contextuality is formalized in ClassL by the

notion of “concept at a node.”

A concept Cr at the root node r is the class proposition (label) used to denote the node.

A concept Ci at a node ni is the conjunction of a proposition Pi (label of ni) and the concept Cj at node nj parent to ni (if it has any parents).

In ClassL: Pi ⊓ Cj.

28

Page 29: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Concept at a Node A concept at a node ni can be computed as the

conjunction of all the labels from the root of the classification hierarchy to ni.

Concepts at nodes capture the classification semantics by using the meaning of labels (propositions defined by using WordNet and a linguistic analysis) and the nodes' position.

29

Page 30: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Concept at a Node: Example

Wine and Cheese

Italy

Europe

Austria

Pictures

1

2 3

4 5

In ClassL: C4 = Ceurope ⊓ Cpictures ⊓ Citaly

30

Page 31: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

What have we done? Calculate the concepts and label and concept at

nodes.

In which format?

ClassL

Java#1 ⊔ Java#2 ⊔ Java#3

Ceurope ⊓ Cpictures ⊓ Citaly

We have build the ClassL formulas for each node!

31

Page 32: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Outline Classification

Lightweight Ontology (LO):

Labels

Links Document Classification Query-answering on LOs.

32

Page 33: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Document Classification Each document d in a classification is assigned a

proposition Cd in ClassL.Cd is called document concept.

Cd is build from d in two steps: keywords are retrieved from d by using

standard text mining techniques. keywords are converted into propositions by

using methodology discussed above.

33

Page 34: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Document Classification“Get specific” Rule

For any given document d and its concept Cd we classify d in each node ni such that:

1. |= Ci C⊒ d (i.e. the concept at node ni is more general than Cd);

2. and there is no node nj (j ≠ i), whose concept at node Cj is more specific than Ci and more general than Cd: |= Cj C⊑ i and |= Cj ⊒ Cd.

34

Page 35: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Example Suppose we need to

classify “Professional Java, JDK-5th Edition” by W. Clay Richardson et al.

The document concept of such document d is:Cd = Java#3⊓Programming#2.

The node 7 is the only node which conforms to the “get specific” rule.

Level

0

1

2

3

4

(1)

(2) (3)

(4) (5)

(7)(6)

(8)

Subjects

Computers andInternet

Business andInvesting

Small Business and

Entrepreneurship

Programming

New BusinessEnterprises

Java Language

Java Beans

35

Page 36: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Example (cont’) Suppose we need to

classify “Visual Basic.Net Programming for Business” by Philip A. Koneman.

The document concept of such document d is:Cd = VisualBasicNet#1⊓Programming#2⊓Business#1

The nodes 2,5 conform to the “get specific” rule.

Level

0

1

2

3

4

(1)

(2) (3)

(4) (5)

(7)(6)

(8)

Subjects

Computers andInternet

Business andInvesting

Small Business and

Entrepreneurship

Programming

New BusinessEnterprises

Java Language

Java Beans

36

Page 37: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

What have we done by far? Classify documents.

How? Get specific algorithm! But how to implement the algorithm?

ClassL!

We are reasoning with the ‘Concept Realization’ service of ClassL!

(With an empty ABox.)

37

Page 38: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Outline Classification

Lightweight Ontology (LO):

Labels

Links Document Classification Query-answering on LOs.

38

Page 39: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Classifying Query-answering on a classification hierarchy of

documents based on a query q as a set of keywords is defined in two steps:

1. The proposition Cq is build from q by converting q’s keywords as said above.

2. The set of answers (retrieval set) to q is defined as a SAT problem in ClassL:

Aq =df {d∈ document | |= Cd C⊑ q}.

39

Page 40: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Query-Answering: A Problem Searching on all the documents may be

expensive (millions of documents classified). We define a set of nodes which contain only

answers to a query q as follows: Ns

q =df {ni node| |= Ci ⊑ Cq}

Remark: Each document d in ni in Nsq is an

answer to the query q, since |= Cd C⊑ i by definition of classification.

Thus all the documents d in Nsq A⊆ q.

40

Page 41: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Query-Answering: Classification Set We extend Ns

q (called sound classification answer) by adding a set of nodes (called query classification set) defined as:

Clq =df {ni node | d n∈ i and |= Cd ≡ Cq}

i.e., the nodes which constitute the classification set of a document d, whose concept Cd is equivalent to Cq.

41

Page 42: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Query-Answering: Sound Answer Set The set of answers (retrieval set) to q is finally

defined as the following set:

Asq =df {d n∈ i | ni ∈ Ns

q} ∪ {d n∈ i | ni C∈ lq and |= Cd C⊑ q}.

Under this definition, an answer to a query are documents from nodes whose concepts are more specific than the query concept.

42

Page 43: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Example Suppose that a user makes

a query q, which is converted into Cq = Java#3⊓Cobol#1, where Cobol#1 is “common business-oriented language.”

It can be shown that Ns

q = {7,8}.

Exercise: show it.

Level

0

1

2

3

4

(1)

(3)

(5)

(7)

(8)

Subjects

Computers andInternet

Programming

Java Language

Java Beans

Nsq

Nsq

43

Page 44: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Example (cont’) It can be shown that

Nsq = {7,8}.

“Java for COBOL programmers, 2nd ed.” is classified in node 2, so it is not an aswer by using only Ns

q = {7,8}.

We then consider Clq to compute more answers, among others are the documents in node 5.

Level

0

1

2

3

4

(1)

(3)

(5)

(7)

(8)

Subjects

Computers andInternet

Programming

Java Language

Java Beans

Clq

44

Page 45: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Sound Answer Set: Remark The set As

q is sound (i.e., contains answers to q), but not complete (i.e., does not contain all the answers to q).

See the next example.

d

EuropeanUnion

Pictures

(1)

(2)

45

Page 46: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Sound Answer Set Example Suppose that a user makes

a query q like “video or pictures of Italy,” which is converted into Cq = Italy#1 ⊓(Video#2 Pictures#1).⊔

Cq is equivalent to:

Cq1 = Video#2 Italy#1,⊓

Cq2 = Pictures#1 Italy#1.⊓

d

EuropeanUnion

Pictures

(1)

(2)

46

Page 47: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Sound Answer Set Example (cont’) But not |= C2 ⊑ C1

q,

hence a document d in 2 about Rome, with Cd = Pictures#1 Rome#1 ⊓

is not retrieved, since:

Nsq = {ni |= Ci ⊑ Cq} = ∅ and

Clq ={1}, so d ∉ Asq.

(Asq is not complete)

d

EuropeanUnion

Pictures

(1)

(2)

47

Page 48: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

Final Remarks The edge structure of a LO is not considered

neither for document classification nor for query answering.

The edges information becomes redundant, as it is implicitly encoded in the “concept at a node” notion.

There are more than one way to build a LO from a set of concepts at nodes.

48

Page 49: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

What have we done in Query Answering? Find the set of documents.

How? Find the concept that is subsumed by the query.

But how to implement it?

ClassL!

We are reasoning with the ‘Concept subsumption’ service of ClassL!

49

Page 50: LDK R Logics for Data and Knowledge Representation Lightweight Ontologies

References & Credits References:

F. Giunchiglia, M. Marchese, I. Zaihrayeu. “Encoding Classifications into Lightweight Ontologies.” J. of Data Semantics VIII, Springer-Verlag LNCS 4380, pp 57-81, 2007.

50