infosys 2001 part iii: ontologies vub 2sem2001 new tools for is semantics robert meersman vub...
TRANSCRIPT
InfoSys 2001Part III: Ontologies
VUB 2sem2001
New tools for IS semantics
Robert Meersman
VUB STARLabVrije Universiteit Brussel
Brussels, Belgium
2001 ©RobertMeersman
2
Overview
• The semantics of an Information System
• Using ontology as formal semantics
• Using and building ontologies: examples
• Ontology models and formalisms
• Tools, methods, and the DOGMA Project
2001 ©RobertMeersman
3
Semantics in Classical Information Systems
DB
DatabaseSchema
AppsDBMS
ConceptualSchema
agreement
“World”
(Conceptualization)
designer
domain expert
user
Upper CASE
tool
Lower CASEtool (generator)
interpretation
Information System
2001 ©RobertMeersman
4
Declarative (Tarski) Semantics
• Meaning = (mathematical) mapping of a representation (e.g. description in first order language) to an agreed conceptualization of the “real world”
• Meaning, in practice, cannot be absolute:– requires agreement among all involved cognitive agents– about everything, in past, present and future...
• on all observations, facts, events, ...• on all rules in vigor for a particular application...• believed/enforced by large communities...
• May imply levels of trust, authority– how to reach agreement among agents– “group logic” and method may be necessary
2001 ©RobertMeersman
5
Ontology = (Specification of a) Conceptualization
Idea: what if we replace the range of the semantics interpretation mapping, by an ontology base: a well-organized “database” of (nothing but!) simple concept-to-concept relationships...
• An ontology in this way is a purely mathematical (“syntax-less”) object…
T. Gruber [1993]
2001 ©RobertMeersman
6
Using Ontology for Semantics
ConceptualSchema
agreement
designer
domain expert
user
Any Design
Tool
Implementation
Information System (including
the WWW)
Data
“World”
ONTOLOGY
interpretation
2001 ©RobertMeersman
7
Ontobase Conceptualization?
Extensional, huge, elementary, no rules Supply of possible, plausible ground facts Organized by domain, context, application Where to find such databases of terms!
? Authoritative source describing, e.g., a chair?
2001 ©RobertMeersman
8
Example A: WordNet (Miller et al.)sport, athletics -- (an active diversion requiring physical exertion and competition) => contact sport -- (a sport that necessarily involves body contact between opposing players) => outdoor sport, field sport -- (a sport that is played outdoors) => gymnastics -- (a sport that involves exercises intended to display strength and balance and agility) => track and field -- (participating in athletic sports performed on a running track or on the field associated with it) => skiing -- (a sport in which participants must travel on skis) => water sport, aquatics -- (sports that involve bodies of water) => rowing, row -- (the act of rowing as a sport) => boxing, pugilism, fisticuffs -- (fighting with the fists) => archery -- (the sport of shooting arrows with a bow) => sledding -- (the sport of riding on a sled or sleigh) => wrestling, rassling, grappling -- (the sport of hand-to-hand struggle between unarmed contestants who try to throw…) => skating -- (the sport of gliding on skates) => racing -- (the sport of engaging in contests of speed) => riding, horseback riding, equitation -- (riding a horse as a sport) => cycling -- (the sport of traveling on a bicycle or motorcycle) => bloodsport -- (sport that involves killing animals (especially hunting)) => athletic game -- (a game involving athletic activity) => ice hockey, hockey, hockey game -- (a game played on an ice rink by two opposing …) => tetherball -- (a game with two players who use rackets to strike a ball that is tethered …) => water polo -- (a game played in a swimming pool by two teams of swimmers …) => outdoor game -- (an athletic game that is played outdoors) => court game -- (an athletic game played on a court) => handball -- (a game played in a walled court or against a single wall by two …) => racquetball -- (a game played on a handball court with short-handled rackets) => fives -- ((British) a game resembling handball; played on a court with a front wall …) => squash, squash racquets, squash rackets -- (a game played in an enclosed court by two …) => volleyball, volleyball game -- (a game in which two teams hit an inflated ball over …) => jai alai, pelota -- (a Basque or Spanish game played in a court with a ball …) => badminton -- (a game played on a court with light long-handled rackets used to volley a shuttlecock over a net) => basketball, basketball game -- (a game played on a court by two opposing teams of 5 players; points are scored by throwing the basketball through an elevated horizontal hoop) => professional basketball -- (playing basketball for money)
2001 ©RobertMeersman
9
Example B: WordNet
news item IS A KIND OF ...1 sense of news item
Sense 1news item -- (an item in a newspaper)=> item, point -- (a distinct part that can be specified separately in a group of things that could be
enumerated on a list; "he noticed an item in the New York Times"; "she had several items on hershopping list"; "the main point on the agenda was taken up first")
=> part, portion, component part, component -- (something determined in relation to something thatincludes it; "he wanted to feel a part of something bigger than himself"; "I read a portion of themanuscript"; "the smaller component is hard to reach")
=> relation -- (an abstraction belonging to or characteristic of two entities or parts together)=> abstraction -- (a general concept formed by extracting common features from specific
examples)
2001 ©RobertMeersman
10
Problems with current lexicons
• In WordNet: clear that news_item is-a item• Maybe acceptable that news_item is-a part
• But what of news_item is-a relation !?– depends on context, role played…
• But: “role” and “context” knowledge is missing
• Also: some lexicographer’s bias is present
2001 ©RobertMeersman
11
Domain/Application Ontology• Constructing an ontology is quite similar to data
modeling, both conceptually and —in a sense— methodologically
• Between domain/application and o’gy mediating “layer” required that includes constraints, business rules, derivation rules (“theories”)
• Note, no population (“model”) is being described no storage considerations enter into paradigm (…but!)
2001 ©RobertMeersman
12
Ontology in the Corporation
• Ontologies, while still largely non-existent (!) have strategic importance for organizations
• Basis for any IT for corporate knowledge management – “corporate memory” (Kühn et al ‘94)
• Ontologies must be “mined” from corporate data OM tools!
2001 ©RobertMeersman
13
Ontology Mining• Web!
– Huge but unstructured (at the moment)– XML, RDF, … – Parsing technology
• Document corpora, digital libraries, existing domain thesauri, …– Alignment and merging
• But: consider database schemas!– Controlled and formal– Mostly carefully designed
2001 ©RobertMeersman
14
Example: “ontologize” databases (by mediation; part of DOGMA Project)
Empl-Contractemployment_contract
– Empl# employee_number of employee employed_under– Empl-date date_code of date of_start_of– Position position_code of position assigned_by– Dept-id department_id of department assigned_by– Init-Salary amount of salary at_start_of– Supervisor name of supervisor assigned_by– Term number_of_months of term of
• expressed in RIDL language (RM 79)
• NIAM (ORM) lexical/non-lexical distinction
15project_begroting
( project_nr char(10) NOT NULL, versie int NOT NULL, jaar int NOT NULL, kostensoort int NOT NULL, volgnr int NOT NULL, specificatie char(50) NULL, manmaanden decimal(6,2) NULL, bedrag int NULL, IWETO_code char(4) NULL, kostenplaats char(10) NULL, geviseerd_door_FA char(1) NULL, geviseerd_door_FA_datum datetime NULL, geviseerd_door_RD char(1) NULL, geviseerd_door_RD_datum datetime NULL, insert_datum datetime NULL, insert_gebruiker char(30) NULL, update_datum datetime NULL, update_gebruiker char(30) NULL, CONSTRAINT project_begroting_x PRIMARY KEY CLUSTERED
(project_nr, versie, jaar, kostensoort, volgnr) )
2001 ©RobertMeersman
16
Methodology issues in ontology design
• More than annotation! Rather, reverse engineering…
• Helps to separate application-, domain-, upper ontology – ex. ISO TR 9007 “Onion Model” (1982 & 1990)
• Should be relatively simple and teachable to design professionals– DOGMA: uses a variant of ORM (a.k.a. NIAM)
2001 ©RobertMeersman
17
Ontology Languages
• Used to specify an ontology, i.e. its interpreter compiles a specification into a “physical” ontology
• Textual; so far all based on description logics– KIF (Ontolingua) Stanford AI lab– CYCL CYC language– DAML DARPA, derives from UMd SHOE– OIL EC 5th Framework: OntoKnowledge
• Graphical– ORM? or, ORM++…?– Conceptual Graphs (Sowa)
2001 ©RobertMeersman
18
Example: KIF (Ontolingua)“Knowledge Interchange Format” (Genesereth & Fikes)
Class BINARY-RELATIONDefined in theory: Kif-relationsSource code: frame-ontology.lisp
Slots on this class:Documentation:A binary relation maps instances of a class to instances of another class.Its arity is 2. Binary relations are often shown as slots in frame systems.Subclass-Of: Relation
Slots on instances of this class:Arity: 2
Axioms:(<=> (Binary-Relation ?Relation) (And (Relation ?Relation) (Not (Empty ?Relation)) (Forall (?Tuple) (=> (Member ?Tuple ?Relation) (Double ?Tuple)))))
2001 ©RobertMeersman
19
Example: CYC (Lenat & Guha)
#$Skin
A (piece of) skin serves as outer protective and tactile sensory covering for (part of) an animal's body.This is the collection of all pieces of skin. Some examples include TheGoldenFleece (an entire skin) andYulBrynnersScalp (a small portion of his skin).isa: #$AnimalBodyPartTypegenls: #$AnimalBodyPart #$SheetOfSomeStuff #$VibrationThroughAMediumSensor #$TactileSensor#$BiologicalLivingObject #$SolidTangibleThingsome subsets: (4 unpublished subsets)
© CYCORP, Inc.
2001 ©RobertMeersman
20
Example: CYC (Lenat & Guha)
#$AnimalBodyPart
The collection of all the anatomical parts and physical regions of all living animals; a subset of#$OrganismPart. Each element of #$AnimalBodyPart is a piece of some live animal and thus is itself aninstance of #$BiologicalLivingObject. #$AnimalBodyPart includes both highly localized organs (e.g.,hearts) and physical systems composed of parts distributed throughout an animal's body (such as itscirculatory system and nervous system).Note: Severed limbs and other parts of dead animals are NOT included in this collection; see #$DeadFn.isa: #$ExistingObjectTypegenls: #$OrganismPart #$OrganicStuff #$AnimalBLO #$AnimalBodyRegionsome subsets: #$Ear #$ReproductiveSystem #$Joint-AnimalBodyPart #$Organ #$MuscularSystem#$Nose #$SkeletalSystem #$Eye #$RespiratorySystem #$Appendage-AnimalBodyPart #$Torso#$Mouth #$Skin #$DigestiveSystem #$Head-AnimalBodyPart (plus 16 more public subsets, 1533unpublished subsets)
© CYCORP, Inc.
2001 ©RobertMeersman
21
Object-Role Modeling: Example (ORM: was “NIAM”)
2001 ©RobertMeersman
22
2001 ©RobertMeersman
23
SAP glossary --parsed into lexonsversion A
(SAP_Oil)rack_meter
is-a meter
is_attached_to rack
is_measuring amount_of_product
(SAP_Oil>rack_meter)rack
is_in plant < manufacturing_plant
(SAP_Oil>rack_meter)amount_of_product
is_pumped_from plant
SAP glossary --parsed into lexonsversion B
(SAP_Oil)rack-attachment
is-a attachment
involves rack
involves rack_meter
occurs_in plant < manufacturing_plant
(SAP_Oil)rack_meter
is-a meter
is_measuring amount_of_product
(SAP_Oil>rack_meter)amount_of_product
is_pumped_from plant
2001 ©RobertMeersman
25
Example of Ontology creation and use
• (Embley et al. ‘98) generate database wrappers from semi-structured web pages
Parser
ConceptualSchema
InternetCar ads
Structuredtext
SQL Database
= “Ontology” (in OSM syntax + regular expressions)
2001 ©RobertMeersman
26
Example from commercial practice: Medical Ontology
• Belgian startup company L&C “Language & Computing NV”
• Entering several medical thesauri into an ontology– Mostly manual process by experts– Currently >5M entries– Search for automation, control, maintenance:
…Tools? Methodology? Principles?
• Business model: ontology service; develop NL apps; license resulting databases, e.g. to pharmaceutical companies
2001 ©L&C NV
27
ICD
SNOMED
Others ...
L&C: multilingual medical ontology
Formal Domain Ontology
Lexicon
Grammar
Language ALanguage A
Lexicon
Grammar
Language BLanguage B
MEDDRA
ICPC
Proprietary Terminologies
2001 ©L&C NV
28
L&C: Example of formal definition
Having a healthcare phenomenon
Generalised PossessionHealthcare phenomenonHuman
IS-A
Has-possessor Has-
possessed
PatientIs-possessor-of
Patient at risk
IS-A Has-Healthcare-phenomenon
Risk Factor
IS-AIs-Risk-
Factor-Of
Patient at risk for osteoporosis
Risk factor for osteoporosis Osteoporosis
Has-Healthcare-phenomenon
Is-Risk-Factor-Of
IS-A IS-A
IS-A
11 1
2
2
IS-A
3
3
4 4
29
2001 ©L&C NV
L&C: Resolving conflicting views
MESH-2001 : “Seizures”
MESH-2001 : “Convulsions”
Snomed-RT : “Convulsion”
Snomed-RT : “Seizure”
L&C : ConvulsionL&C : Seizure
L&C : Health crisis
L&C : Epileptic convulsion
IS-AIS-A
IS-AIS-A
Is-narrower-than IS-A
Has-CCC
Has-CCC
Has-CCCHas-CCC
2001 ©RobertMeersman
30
Ontology Models and Formalisms• Extensional (Gruber ’93, G&N book, etc.)
– formally defined using declarative semantics– Lexicons, thesauri
• Intensional– formally defined through “possible worlds” (e.g.
Guarino ’98)– thesaurus/ontology for a domain seen intuitively as
“organized union” of all “linguistically plausible” term arrangements
2001 ©RobertMeersman
31
Extensional Ontology(-base)
• Set of lexons: elementary entries of form
<t0 r t>
where is a context; t0, t are terms (t0 is called the headword); and r is a role– in practice, grouped into sets <t0 r T> [place-near logic]
– how to define/specify contexts? • e.g. [Lenat ’98] “12-dimensional context space”, [McCarthy
’96] “replace modality”, [Guha ’95], [Sowa ’99]• context definition by labels “markerese” [Lewis ’72]
how to identify and refer to contexts: see later• compare with situational calculus
2001 ©RobertMeersman
32
DOGMA’s Baseline
• Scalability rather than generality• ontology as a union of possible worlds; application
(domain) instance selects world; • leads to proofs of (partial) “non-semantic-conflict” of
application (e.g., a set of XML database transactions) with a given o’gy
• strict separation of base ontology from “rules” stores semantic elements outside of application programs achieves a form of “semantic independence” analogous to
data independence in databases
2001 ©RobertMeersman
41
Ontology design & tools
• Idea: evolve database design (CASE) tools that capture data models, with constraints– constraints articulation axioms
• CA-generation of an ontology instead of a database design: kernel for methodologies?
• Example: ORM: InfoModeler (VISIO Corp)• DOGMA project: combine into ontology
server with blackboard/agent architecture
2001 ©RobertMeersman
42
A Word on the Roles of Roles• Poorly accommodated in most lexicons and
thesauri, only limited “hardwired” set if not at all absent...– E.g. meronymy, taxonomy, synonymy, …
• First class citizens of an ontology! as co-carriers of “semantics”
• Essential end-user factor in ontology methodologies ( NIAM, ORM, …)
“describe world by communicating a set of instances of (semi-)formal sentence types”
2001 ©RobertMeersman
43
Roles in ORM: native citizens
2001 ©RobertMeersman
45
DOGMA: Ontologies + Agents
Blackboard
Agent1. negotiate
2. agree3. register
4. option: replace lexons by sys id5. do business
ontology server
ontology
2001 ©RobertMeersman
46
DOGMA Ontology Server--Functionality
Ontology
Server
Ontology
ServerNegotiate
o’gy
Negotiate
o’gy
O’gyDB
O’gyDB
(re)organizeapplicationagentO’gy
editor
O’gy
editor
O’gy
miner
O’gy
miner
2001 ©RobertMeersman
47
CLWO
DO DO
Ap
pO
Ap
pO
Ap
pO
Ap
pO
Ap
pO
...
...
Semantic mediators
Ap
pA
gt
Ap
pA
gt
Ap
pA
gt
Ap
pA
gt
Ap
pA
gt
...
DO
GM
A O
ntology Server
--Architecture
Application agents,
requirements described by
own app.-level ontologiesmatch/revision/
extension
Application-specific
ontologiesDomain-specific ontologies
Lexon Base (“ontobase”) =
Colossal Light-Weight
Ontology
“trivial but huge”
2001 ©RobertMeersman
481 key problem:
SCALESCALE...3 manifestations
Identification of individual problem-relevant contexts/namespaces
intensional description of app.-level ontologies
versions, alternatives, extensions of evolving ontologies
2001 ©RobertMeersman
49
Dynamic self-organization in DOGMA
• HOW? application- and domain-specific lexon sets (“mini”-ontobases®.com) are added qualified by context
• HOW? use/disuse of lexons (by application and/or cognitive agents) raises/lowers “ontology trust level” by modifying context
2001 ©RobertMeersman
50
Loading DOGMA’s lexon base
• Partial bootstrap by parsing existing NL descriptions of terms (e.g. from WordNetTM, encyclopedia, thesauri, CYCTM top-level, …)– wanted: parser & ontology-checking tools
• Need a bill-of-material of the planet– (actually, every plausible alternative of it, too)
• Started with parts of SAP Glossary, the IPTC thesaurus, data models (patterns) and existing database schemas (& some populations!), …
2001 ©RobertMeersman
51
A note on meta modeling• Ontology instance: example of metadata
• Requires meta model– all work on this in database and information systems is
being happily reinvented, so far– XOL: meta model expressed in XML– OIM: meta model expressed in UML
• Some differences though: distinction of instance- vs. type- knowledge is more subtle
2001 ©RobertMeersman
52
Person has birthday (month, day) (month_name, day_nr)
<<Person Robert has birthday November 18>>
“November” instance likely in ontology“18” instance most likely not“Robert” instance maybe not (?)
Q: what goes into an ontology?
2001 ©RobertMeersman
53
Robert
instance-of first_name
instance-of male_name
abbreviated_to {Bob, Bobby}
Of course, “different” sub-ontology (context of names) but some applications may wish to access both together
Q: but consider, e.g.
so, why not an entry for the number “18”?
2001 ©RobertMeersman
54
“Upper Ontology” Research
• General, domain-independent model for (all?) knowledge
• Combines philosophy/logic with computing…!• Several models available already (Peirce,
Whitehead, Aristotle, …): “top-down”• Standardization across domains? (ANSI, IEEE, …)• In practice, must allow bottom-up “alignment” of
domain-specific upper ontologies --some in development (MDC†, …)
2001 ©RobertMeersman
55
IEEE Merged Ontology in “KIF++”...(subclass-of BinaryRelation Predicate)(documentation BinaryRelation “Primitive. A relation is binary if it has exactly two arguments, I.e. its population lists have exactly two items.[…].”)
(=>())
(subclass-of ReflexiveRelation BinaryRelation)(documentation ReflexiveRelation “A relation R is reflexive if R(x, x) holds for all x in the domain of R.”)
(=>(instance-of ?R ReflexiveRelation)(forall ( ?X )(holds ?R ?X ?X )))
... DRAFT
2001 ©RobertMeersman
56
Open Information Model (OIM)
• Who— The Meta-Data Coalition (MDC†) OIM, including Knowledge Description Model KDM, kindly donated by Microsoft
• Purpose— “A set of formal meta-data specifications to support tool interoperability across technologies and companies via shared information models”
• Ontology model— KDM provides meta-data types to “describe and categorize information managed by computer systems”. Note its meta model allows an ontology to distinguish concepts from terms describing them (cfr. also e.g. NIAM)
2001 ©RobertMeersman
57
OIM Knowledge Description Model (in ORM)
2001 ©RobertMeersman
58
Knowledge Description Model
(Terms)
2001 ©RobertMeersman
59
Adding context to the OIM model
• Introduce context as a key organizational element into the OIM. Context is situated relative to concepts of the lexicon
• Extensibility: provide a mechanism which allows introducing and naming any (new) kind of relationship between concepts
• Miscellaneous minor adaptations & improvements
2001 ©RobertMeersman
60
Adapted meta-model:
situating “context”
2001 ©RobertMeersman
61
Context tree: representation(S. Casteleyn ‘01)
62
Context example: tree levels
2001 ©RobertMeersman
63
Context example: tree node
drawn_by
lexon
WordNet descriptor
2001 ©RobertMeersman
64
SUMMARY: Key Research Issues• Good semantical paradigm and formalization• Good representations for ontologies
– Ontology bases– Upper ontology(-ies)– Organization (agents, contexts)
• Standard ontology languages and interpreters– Separation of ontobase from domain rules– Scalable to large complex domains
• Ontology building (population)– Methodologies– Alignment algorithms– Content mining
• Efficient ontology servers
2001 ©RobertMeersman
65
The Ontology Limit
“all interface specifications, all communication and documentation for any module in any software system is valid if
and only if it’s mapable to an agreed common ontology”