an overview of ontologies and their practical applications gianluca correndo [email protected]...
TRANSCRIPT
An Overview of Ontologies and their Practical Applications
Gianluca [email protected]
http://www.di.unito.it/~correndo
Ontology
• Semantics – the meaning of meaning.• Philosophical discipline, branch of philosophy that
deals with the nature and the organisation of reality.
In Computer Science …
• An ontology is an explicit specification of a conceptualization [Gruber]
• Defines– A common vocabulary of terms– Some specification of the meaning of the terms– A shared understanding for people and machines
Why develop an ontology?
• To make domain assumptions explicit– Easier to change domain assumptions– Easier to understand and update legacy data
• To separate domain knowledge from operational knowledge– Re-use domain and operational knowledge separately
• A community reference for applications (standards)• To share a consistent understanding of what
information means
Communication
• Syntax is not enough for machine communication, e.g. B2B
Bestellinformation: <Auto><Name>Daimler 500 SLK </Name><Preis>27.000 </Preis></Auto>
Order information: <Product><type>Car</type><Name>Daimler 500 SLK </Name><Price>23.000 $</Price></Product>
A Specification of a Conceptualization
• Concepts (class, set, type, predicate) – Event, gene,molecule, cat
• Properties of concepts and relationships between them (slot)
– Taxonomy: generalisation ordering among concepts isA, partOf, subProcess
– Relationship, role or attribute: functionOf, hasActivity location, eats, size
animal
rodent cowcat
mouse
eats
dog
domesticvermin
What is a concept?
Different communities have different notions on what a concept means:– Formal concept analysis talk about formal
concepts– Description Logics talk about concept labels– ISO-704:2000 – Terminology Work– Often the classical notion of a frame in AI or a
class in OO modeling is seen as equivalent to a concept.
An explicit description of a domain
• Constraints or axioms on properties and concepts: – value: integer – domain: cat– cardinality: at most 1– range: 0 <= X <= 100– oligonucleotides < 20 base pairs– cows are larger than dogs– cats cannot eat only vegetation– cats and dogs are disjoint
• Values or concrete domains– integer, strings– 20, tryptophan
animal
rodent cowcat
mouse
eats
dog
domesticvermin
An explicit description of a domain
• Individuals or Instances – sulphur, trpA Gene, felix
• Nominals– Concepts that cannot have instances– Instances that are used in conceptual
definitions– ItalianDog = Dog bornIn Italy
• Instances– An ontology = concepts + properties +
axioms + values + nominals– A knowledge base = ontology+instances
animal
rodent cowcat
mouse
eats
dog
domesticvermin
mickey
felix
jerry
tom
Light and Heavy expressivity
• Lightweight– Concepts, atomic types
– Is-a hierarchy
– Relationships between concepts
• Heavyweight– Metaclasses
– Type constraints on relations
– Cardinality constraints
– Taxonomy of relations
– Reified statements
– Axioms
– Semantic entailments
– Expressiveness
– Inference systems
A matter of rigour and representational expressivity
Regno Animalia
Tipo Chordata
Classe Mammalia
Ordine Primates
Famiglia Hominidae
Genere Homo
Specie sapiens
Carl von Linné (1707-1778) Aristotele (384 b.C. – 322 b.C. )
• Science of Being (Metaphysics, IV,1)
• What is being?
• What are the features common to all beings?
So what is an ontology?
Catalog/ID
Thesauri
Terms/glossary
Informal Is-a
FormalIs-a
Formalinstance
Frames(properties)
General Logicalconstraints
Valuerestrictions
Disjointness,Inverse, partof
…Things in Common
• They are approaches to help structure, classify, model, and/or represent the concepts and relationships pertaining to some subject matter of interest to some community.
• They are intended to enable a community to come to agreement and to commit to use the same terms in the same way.
• The meaning of the terms is specified in some way and to some degree.
Example:Fruit
Orange Apfelsine (german)
VegetablesimilarTo
synonymWith
NarrowerTerm
- Graph with labels edges (similar, nt, bt, synonym)- Fixed set of edge labels (aka relations)- Use of lexical stem- no instances- Well known in library science- cf. terminologies / classifications (Dewey)
Thesauri
WordNet
news item IS A KIND OF ...1 sense of news item
Sense 1news item -- (an item in a newspaper)=> item, point -- (a distinct part that can be specified separately in a group of things that could be
enumerated on a list; "he noticed an item in the New York Times"; "she had several items on hershopping list"; "the main point on the agenda was taken up first")
=> part, portion, component part, component -- (something determined in relation to something thatincludes it; "he wanted to feel a part of something bigger than himself"; "I read a portion of themanuscript"; "the smaller component is hard to reach")
=> relation -- (an abstraction belonging to or characteristic of two entities or parts together)=> abstraction -- (a general concept formed by extracting common features from specific
examples)
UMLS (Unified Medical Language System) http://umlsks.nlm.nih.gov/
• National Library of Medicine (NLM) database of medical terminology. Terms from several medical databases (MEDLINE, SNOMED International, MeSH, etc.) are unified so that different terms are identified as the same medical concept.
• Metathesaurus provides the concordance of medical concepts: 730.000 concepts, 1.5 million concept names in different source vocabularies
• Specialist Lexicon provides word synonyms, derivations, lexical variants, and grammatical forms of words used in MetaThesaurus terms: 130.000 entries.
• Semantic Network codifies the relationships (e.g. causality, "is a", etc.) among medical terms: 134 semantic types, 54 relationships.
• Used for: patient data creation, curriculum analysis, natural language
processing, and information retrieval
Frames, SDM, OO models• Frames
– Rich set of language constructs: frames, slots, facets, defaults– Impose restrictive constraints on how they are combined or
used to define a class– All frames asserted into taxonomy by hand– All concepts are primitive– Octet/GKB, Protégé, OCML, Ontolingua– OKBC – Open Knowledge Base Connectivity– OKBC – Lite
• OO / Semantic Data Models (EER, UML)– Taxonomy/inheritance – semantics
• Intuitive, lots of tools, widely used
Frame Data Model
• Frames– Classes: genes, reactions– Instances: lr10
• Relationships– Slots: chromosome, map-position, citations, reactants,
products, Keq– Facets: chromosome is single-valued, instance of class
chromosomes; Citations is multiple valued, set of strings
Description Logics
• A family of logic based knowledge representation formalisms– Descendants of semantic networks and KL-ONE– Describe domain in terms of concepts (set of individuals), roles
(relationships) and individuals• Distinguished by:
– Formal semantics (typically model theoretic)• Decidable fragments of FOL• Closely related to propositional modal & dynamic logics
– Provision of inference services• Sound and complete decision procedures for key problems• Implemented systems (highly optimised)
Description Logic Family
• DLs are a family of logic based KR formalisms• Particular languages mainly characterised by:
– Set of constructors for building complex concepts and roles from simpler ones
– Set of axioms for asserting facts about concepts, roles and individuals
• ALC is the smallest DL that is propositionally closed– Constructors include booleans (and, or, not), and– Restrictions on role successors– E.G., Concept describing “happy fathers” could be written:
Man hasChild.Female hasChild.Male hasChild.(Rich happy)
DL Concept and Role Constructors
• Range of other constructors found in DLs, including:– Number restrictions (cardinality constraints) on roles, e.g.,
3 hasChild, 1 hasMother– Qualified number restrictions, e.g., 2 hasChild.Female,
1 hasParent.Male– Nominals (singleton concepts), e.g., {Italy}– Concrete domains (datatypes), e.g., hasAge.(21),
earns spends.<– Inverse roles, e.g., hasChild– (hasParent)– Transitive roles, e.g., hasChild* (descendant)– Role composition, e.g., hasParent o hasBrother (uncle)
What’s in a “Logic based ontology”?
• Primitive concepts - in a hierarchy– Described but not defined
• Properties - relations between concepts, also in a hierarchy
• Constructors – on concepts and properties– “Some”, “only”, “at least”, “at
most”, and, or, not
• Defined concepts– Made from primitive concepts,
constructors and descriptors– Enzyme protein and catalyses
reaction– Reason that enzyme is a kind of
protein
• “Is-kind-of” = “implies”
– “Dog is a kind of wolf” mean “all dogs are wolves”
• Axioms
– disjointness, further description of defined concepts
• A Reasoner
– To organise it for you. Consistency & taxonomy for defined concepts established though logical reasoning
Reasoning support in DL
• Consistency — check if knowledge is meaningful• Subsumption — structure knowledge, compute taxonomy• Equivalence — check if two classes denote same set of
instances• Instantiation — check if individual i instance of class C• Retrieval — retrieve set of individuals that instantiate C
Problems all reducible to consistency (satisfiability): FACT, racer, cerebra
Formal Ontology Applications
• Ontology engineering support• Semantic web
– Intelligent information retrieval– E-Commerce– Intelligent web-services
• Agent technologies
Problems with Information Retrieval
• Working with the Web is currently done at a very low level:– Clicking on links and using keyword search for links is the
main (if not only) navigation technique
• Keyword-based search engines– (Alta Vista, Infoseek, Yahoo, MetaCrawler, Google)
Problems with Information Retrieval
• Main burden of information retrieval is that it is only information retrieval.– It helps to retrieve information sources but the human user
has to manually extract and interpret the information.– Information presentation and maintenance is not supported.
Semantic Web Vision
• Express explicitly a high level description of resources accessible via Web
• More processable data availabe• Information more directly available• Enabling intelligent Web features
DAML-S: Ontology language
• Build upon the well-defined semantics of DAML+OIL
• Is expected to provide a common understanding of the semantic in a web-service
• By specifing an ”Upper Ontology for Services”
An Upper Ontology for Services
• Three essential types of knowledge about a service, each characterized by the question it answers:
– What does the service require of the user(s),and provide for them?
– How does it work?– How is it used?
Ontology for data interoperability• Ontology-based Information Integration (TAMBIS)• Spread a query over different and heterogeneous data
sources• Quite used in gene ontology applications but not only…
DB DB DB
Globalontology