an overview of ontologies and their practical applications gianluca correndo [email protected]...

45
An Overview of Ontologies and their Practical Applications Gianluca Correndo [email protected] http://www.di.unito.it/~correndo

Upload: anahi-blevens

Post on 13-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

An Overview of Ontologies and their Practical Applications

Gianluca [email protected]

http://www.di.unito.it/~correndo

What is an Ontology?

Ontology

• Semantics – the meaning of meaning.• Philosophical discipline, branch of philosophy that

deals with the nature and the organisation of reality.

In Computer Science …

• An ontology is an explicit specification of a conceptualization [Gruber]

• Defines– A common vocabulary of terms– Some specification of the meaning of the terms– A shared understanding for people and machines

Why develop an ontology?

• To make domain assumptions explicit– Easier to change domain assumptions– Easier to understand and update legacy data

• To separate domain knowledge from operational knowledge– Re-use domain and operational knowledge separately

• A community reference for applications (standards)• To share a consistent understanding of what

information means

Communication

• Syntax is not enough for machine communication, e.g. B2B

Bestellinformation: <Auto><Name>Daimler 500 SLK </Name><Preis>27.000 </Preis></Auto>

Order information: <Product><type>Car</type><Name>Daimler 500 SLK </Name><Price>23.000 $</Price></Product>

A Specification of a Conceptualization

• Concepts (class, set, type, predicate) – Event, gene,molecule, cat

• Properties of concepts and relationships between them (slot)

– Taxonomy: generalisation ordering among concepts isA, partOf, subProcess

– Relationship, role or attribute: functionOf, hasActivity location, eats, size

animal

rodent cowcat

mouse

eats

dog

domesticvermin

What is a concept?

Different communities have different notions on what a concept means:– Formal concept analysis talk about formal

concepts– Description Logics talk about concept labels– ISO-704:2000 – Terminology Work– Often the classical notion of a frame in AI or a

class in OO modeling is seen as equivalent to a concept.

An explicit description of a domain

• Constraints or axioms on properties and concepts: – value: integer – domain: cat– cardinality: at most 1– range: 0 <= X <= 100– oligonucleotides < 20 base pairs– cows are larger than dogs– cats cannot eat only vegetation– cats and dogs are disjoint

• Values or concrete domains– integer, strings– 20, tryptophan

animal

rodent cowcat

mouse

eats

dog

domesticvermin

An explicit description of a domain

• Individuals or Instances – sulphur, trpA Gene, felix

• Nominals– Concepts that cannot have instances– Instances that are used in conceptual

definitions– ItalianDog = Dog bornIn Italy

• Instances– An ontology = concepts + properties +

axioms + values + nominals– A knowledge base = ontology+instances

animal

rodent cowcat

mouse

eats

dog

domesticvermin

mickey

felix

jerry

tom

Light and Heavy expressivity

• Lightweight– Concepts, atomic types

– Is-a hierarchy

– Relationships between concepts

• Heavyweight– Metaclasses

– Type constraints on relations

– Cardinality constraints

– Taxonomy of relations

– Reified statements

– Axioms

– Semantic entailments

– Expressiveness

– Inference systems

A matter of rigour and representational expressivity

Regno Animalia

Tipo Chordata  

Classe Mammalia  

Ordine Primates  

Famiglia Hominidae  

Genere Homo  

Specie sapiens

Carl von Linné (1707-1778) Aristotele (384 b.C. – 322 b.C. )

• Science of Being (Metaphysics, IV,1)

• What is being?

• What are the features common to all beings?

So what is an ontology?

Catalog/ID

Thesauri

Terms/glossary

Informal Is-a

FormalIs-a

Formalinstance

Frames(properties)

General Logicalconstraints

Valuerestrictions

Disjointness,Inverse, partof

…Things in Common

• They are approaches to help structure, classify, model, and/or represent the concepts and relationships pertaining to some subject matter of interest to some community.

• They are intended to enable a community to come to agreement and to commit to use the same terms in the same way.

• The meaning of the terms is specified in some way and to some degree.

Catalog

Glossary

Example:Fruit

Orange Apfelsine (german)

VegetablesimilarTo

synonymWith

NarrowerTerm

- Graph with labels edges (similar, nt, bt, synonym)- Fixed set of edge labels (aka relations)- Use of lexical stem- no instances- Well known in library science- cf. terminologies / classifications (Dewey)

Thesauri

WordNet

news item IS A KIND OF ...1 sense of news item

Sense 1news item -- (an item in a newspaper)=> item, point -- (a distinct part that can be specified separately in a group of things that could be

enumerated on a list; "he noticed an item in the New York Times"; "she had several items on hershopping list"; "the main point on the agenda was taken up first")

=> part, portion, component part, component -- (something determined in relation to something thatincludes it; "he wanted to feel a part of something bigger than himself"; "I read a portion of themanuscript"; "the smaller component is hard to reach")

=> relation -- (an abstraction belonging to or characteristic of two entities or parts together)=> abstraction -- (a general concept formed by extracting common features from specific

examples)

UMLS (Unified Medical Language System) http://umlsks.nlm.nih.gov/

• National Library of Medicine (NLM) database of medical terminology. Terms from several medical databases (MEDLINE, SNOMED International, MeSH, etc.) are unified so that different terms are identified as the same medical concept.

• Metathesaurus provides the concordance of medical concepts: 730.000 concepts, 1.5 million concept names in different source vocabularies

• Specialist Lexicon provides word synonyms, derivations, lexical variants, and grammatical forms of words used in MetaThesaurus terms: 130.000 entries.

• Semantic Network codifies the relationships (e.g. causality, "is a", etc.) among medical terms: 134 semantic types, 54 relationships.

• Used for: patient data creation, curriculum analysis, natural language

processing, and information retrieval

DB

UMLS Metathesaurus

Information System

UMLS Metathesaurus

Information System 2

Information System 1

Formal Ontologies

Frames, SDM, OO models• Frames

– Rich set of language constructs: frames, slots, facets, defaults– Impose restrictive constraints on how they are combined or

used to define a class– All frames asserted into taxonomy by hand– All concepts are primitive– Octet/GKB, Protégé, OCML, Ontolingua– OKBC – Open Knowledge Base Connectivity– OKBC – Lite

• OO / Semantic Data Models (EER, UML)– Taxonomy/inheritance – semantics

• Intuitive, lots of tools, widely used

Frame Data Model

• Frames– Classes: genes, reactions– Instances: lr10

• Relationships– Slots: chromosome, map-position, citations, reactants,

products, Keq– Facets: chromosome is single-valued, instance of class

chromosomes; Citations is multiple valued, set of strings

Description Logics

• A family of logic based knowledge representation formalisms– Descendants of semantic networks and KL-ONE– Describe domain in terms of concepts (set of individuals), roles

(relationships) and individuals• Distinguished by:

– Formal semantics (typically model theoretic)• Decidable fragments of FOL• Closely related to propositional modal & dynamic logics

– Provision of inference services• Sound and complete decision procedures for key problems• Implemented systems (highly optimised)

Description Logic Family

• DLs are a family of logic based KR formalisms• Particular languages mainly characterised by:

– Set of constructors for building complex concepts and roles from simpler ones

– Set of axioms for asserting facts about concepts, roles and individuals

• ALC is the smallest DL that is propositionally closed– Constructors include booleans (and, or, not), and– Restrictions on role successors– E.G., Concept describing “happy fathers” could be written:

Man hasChild.Female hasChild.Male hasChild.(Rich happy)

DL Concept and Role Constructors

• Range of other constructors found in DLs, including:– Number restrictions (cardinality constraints) on roles, e.g.,

3 hasChild, 1 hasMother– Qualified number restrictions, e.g., 2 hasChild.Female,

1 hasParent.Male– Nominals (singleton concepts), e.g., {Italy}– Concrete domains (datatypes), e.g., hasAge.(21),

earns spends.<– Inverse roles, e.g., hasChild– (hasParent)– Transitive roles, e.g., hasChild* (descendant)– Role composition, e.g., hasParent o hasBrother (uncle)

What’s in a “Logic based ontology”?

• Primitive concepts - in a hierarchy– Described but not defined

• Properties - relations between concepts, also in a hierarchy

• Constructors – on concepts and properties– “Some”, “only”, “at least”, “at

most”, and, or, not

• Defined concepts– Made from primitive concepts,

constructors and descriptors– Enzyme protein and catalyses

reaction– Reason that enzyme is a kind of

protein

• “Is-kind-of” = “implies”

– “Dog is a kind of wolf” mean “all dogs are wolves”

• Axioms

– disjointness, further description of defined concepts

• A Reasoner

– To organise it for you. Consistency & taxonomy for defined concepts established though logical reasoning

Reasoning support in DL

• Consistency — check if knowledge is meaningful• Subsumption — structure knowledge, compute taxonomy• Equivalence — check if two classes denote same set of

instances• Instantiation — check if individual i instance of class C• Retrieval — retrieve set of individuals that instantiate C

Problems all reducible to consistency (satisfiability): FACT, racer, cerebra

Pratical Session

Pratical Session

Formal Ontology Applications

Formal Ontology Applications

• Ontology engineering support• Semantic web

– Intelligent information retrieval– E-Commerce– Intelligent web-services

• Agent technologies

Problems with Information Retrieval

• Working with the Web is currently done at a very low level:– Clicking on links and using keyword search for links is the

main (if not only) navigation technique

• Keyword-based search engines– (Alta Vista, Infoseek, Yahoo, MetaCrawler, Google)

Problems with Information Retrieval

• Main burden of information retrieval is that it is only information retrieval.– It helps to retrieve information sources but the human user

has to manually extract and interpret the information.– Information presentation and maintenance is not supported.

Semantic Web Vision

• Express explicitly a high level description of resources accessible via Web

• More processable data availabe• Information more directly available• Enabling intelligent Web features

DAML-S: Ontology language

• Build upon the well-defined semantics of DAML+OIL

• Is expected to provide a common understanding of the semantic in a web-service

• By specifing an ”Upper Ontology for Services”

An Upper Ontology for Services

• Three essential types of knowledge about a service, each characterized by the question it answers:

– What does the service require of the user(s),and provide for them?

– How does it work?– How is it used?

Backup Slides

Ontology for data interoperability• Ontology-based Information Integration (TAMBIS)• Spread a query over different and heterogeneous data

sources• Quite used in gene ontology applications but not only…

DB DB DB

Globalontology

Thesauri & Classification

• UNSPSC: United Nations Standard Products and Services Code

• Provides structrue and a unique identification of terms• Thesauri act as a good starting point for developing an

ontology