ontology mapping - out of the babel tower
DESCRIPTION
Keynote at the AI in Medicine Conference (AIME 2005), giving an overview of the work in Ontology Mapping to people in Medical Informatics (which includes explaining the what and why of ontologies in general).TRANSCRIPT
Ontology mapping: a way out of
the medical tower of Babel?
Frank van HarmelenVrije Universiteit Amsterdam
The Netherlands Antilles
Before we start… a talk on ontology mappings
is difficult talk to give: no concensus in the field
• on merits of the different approaches• on classifying the different approaches
no one can speak with authority on the solution
this is a personal view, with a sell-by dateother speakers will entirely disagree
(or disapprove)
Good overviews of the topicKnowledge Web D2.2.3:
“State of the art on ontology alignment”Ontology Mapping Survey
talk by Siyamed Seyhmus SINIRESWC'05 Tutorial on
Schema and Ontology Matching by Pavel Shvaiko Jerome Euzenat
KER 2003 paper Kalfoglou & Schorlemmer
These are all different & incompatible…
Ontology mapping: a way out of
the medical tower of Babel?
The Medical tower of Babel Mesh
• Medical Subject Headings, National Library of Medicine • 22.000 descriptions
EMTREE• Commercial Elsevier, Drugs and diseases• 45.000 terms, 190.000 synonyms
UMLS• Integrates 100 different vocabularies
SNOMED• 200.000 concepts, College of American Pathologists
Gene Ontology• 15.000 terms in molecular biology
NCI Cancer Ontology: • 17,000 classes (about 1M definitions),
Ontology mapping: a way out of
the medical tower of Babel?
no shared understanding
Conceptual and terminological confusion
Actors: both humans and machines
Agree on a conceptualization
Make it explicit in some language.
world
concept
language
What are ontologies &what are they used for
Ontologies come in very different kindsFrom lightweight to heavyweight:
• Yahoo topic hierarchy• Open directory (400.000 general categories)• Cyc, 300.000 axioms
From very specific to very general• METAR code (weather conditions at air terminals)• SNOMED (medical concepts)• Cyc (common sense knowledge)
What’s inside an ontology?
terms + specialisation hierarchy classes + class-hierarchy instances slots/values inheritance (multiple? defaults?) restrictions on slots (type, cardinality) properties of slots (symm., trans., …) relations between classes (disjoint, covers) reasoning tasks: classification,
subsumption
Increasing semantic “weight”
In short (for the duration of this talk)Ontologies are not
definitive descriptions of what exists in the world (= philosphy)
Ontologies are
models of the worldconstructed
to facilitate communication
Yes, ontologies exist(because we build them)
Ontology mapping: a way out of
the medical tower of Babel?
Ontology mapping is old & inevitableOntology mapping is old
• db schema integration• federated databases
Ontology mapping is inevitable• ontology language is standardised,• don't even try to standardise contents
Ontology mapping is importantdatabase integration,
heterogeneous database retrieval (traditional)
catalog matching (e-commerce)agent communication (theory only)web service integration (urgent)P2P information sharing (emerging)personalisation (emerging)
Ontology mapping is now urgentOntology mapping has acquired
new urgency• physical and syntactic integration is ± solved,
(open world, web)• automated mappings are now required (P2P)• shift from off-line to run-time matching
Ontology mapping has new opportunities• larger volumes of data• richer schemas (relational vs. ontology)• applications where partial mappings work
Different aspectsof ontology mapping how to discover a mapping how to represent a mapping
• subset/equal/disjoint/overlap/is-somehow-related-to
• logical/equational/category-theoretical atomic/complex arguments, confidence measure how to use it
We only talk about “how to discover”
Many experimental systems: (non-exhaustive!) Prompt (Stanford SMI) Anchor-Prompt (Stanford SMI) Chimerae (Stanford KSL) Rondo (Stanford U./ULeipzig) MoA (ETRI) Cupid (Microsoft research) Glue (Uof Washington) FCA-merge (UKarlsruhe) IF-Map Artemis (UMilano) T-tree (INRIA Rhone-Alpes) S-MATCH (UTrento)
Coma (ULeipzig) Buster (UBremen) MULTIKAT (INRIA S.A.) ASCO (INRIA S.A.) OLA (INRIA R.A.) Dogma's Methodology ArtGen (Stanford U.) Alimo (ITI-CERTH) Bibster (UKarlruhe) QOM (UKarlsruhe) KILT (INRIA
LORRAINE)
Different approaches toontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
Linguistic & structural mappings
normalisation (case,blanks,digits,diacritics)
lemmatization, N-grams, edit-distance, Hamming distance,
distance = fraction of common parents elements are similar if
their parents/children/siblings are similar
decreasing order of boredom
Different approaches toontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
Up(Q)
Low(Q) µ Q µ Up(Q) Low(Q) µ Q µ Up(Q)
Q
QLow(Q)
Matching through shared vocabulary
Matching through shared vocabulary Used in mapping geospatial databases
from German land-registration authorities (small)
Used in mapping bio-medical and genetic thesauri(large)
Different approaches toontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
Matching through shared instances
Used by Ichise et al (IJCAI’03) to succesfully map parts of Yahoo to parts of Google
Yahoo = 8402 classes, 45.000 instancesGoogle = 8343 classes, 82.000 instancesOnly 6000 shared instances70% - 80% accuracy obtained (!)
Conclusions from authors:• semantics is needed to improve on this
ceiling
Matching through shared instances
Different approaches toontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
sharedbackgroundknowledge
Matching using shared background knowledge
ontology 1 ontology 2
Ontology mapping using background knowledgeCase study 1
Work with Zharko Aleksovski @ Philips Michel Klein @ VU
KIK @ AMC •
PHILIPS
Overview of test data
Two terminologies from intensive care domain
OLVG list• List of reasons for ICU admission
AMC list• List of reasons for ICU admission
DICE hierarchy• Additional hierarchical knowledge
describing the reasons for ICU admission
OLVG listdeveloped by clinician3000 reasons for ICU admission1390 used in first 24 hours of stay
• 3600 patients since 2000based on ICD9 + additional materialList of problems for patient admissionEach reason for admission is described
with one label• Labels consist of 1.8 words on average• redundancy because of spelling mistakes• implicit hierarchy (e.g. many fractures)
AMC listList of 1460 problems for ICU
admission Each problem is described using
5 aspects from the DICE terminology:
2500 concepts (5000 terms), 4500 links•Abnormality (size: 85)•Action taken (size: 55)•Body system (size: 13)•Location (size: 1512)•Cause (size: 255)
expressed in OWL allows for subsumption & part-of
reasoning
Why mapping AMC list $ OLVG list? allow easy entering of OLVG
data re-use of data in
• epidemiology• quality of care assessment• data-mining (patient prognosis)
Linguistic mapping: Compare each pair of concepts Use labels and synonyms of concepts Heuristic method to discover
equivalence and subclass relations
tumorbrainLong tumor LongMore specific than
First round• compare with complete DICE• 313 suggested matches, around 70 % correct
Second round:• only compare with “reasons for admission” subtree• 209 suggested matches, around 90 % correct
High precision, low recall (“the easy cases”)
Using background knowledge Use properties of concepts Use other ontologies to discover
relation between properties
….….….
….….….
?
Action taxonomyAction taxonomy
Abnormality taxonomyAbnormality taxonomy
Body system taxonomyBody system taxonomy
Location taxonomyLocation taxonomy
Cause taxonomyCause taxonomy
DICE aspect taxonomies
Semantic match
OLVG problem list
OLVG problem list
DICE problem list
DICE problem list
Given???
??
Implicit matching:property match
Lexical match
Semantic match
ArteryArtery
AortaAorta
is more general
Taxonomy of body parts
Blood vessel
Veinis more general is more general
Aorta thoracalis dissectionAorta thoracalis dissection Dissection of arteryDissection of artery
Lexical match:has location
Lexical match:has location
Location match:has more
general location
Reasoning:implies
Example: “Heroin intoxication” – “drugs overdose” DrugsDrugs
HeroineHeroine
is more general
Cause taxonomy
Heroin intoxicationHeroin intoxicationDrugs overdosisDrugs overdosis
Lexical match:cause Cause match:
has more specific cause
Abnormality match:has more general
abnormality
IntoxicatieIntoxicatie
OverdosisOverdosis
is more general
Abnormality taxonomy
Lexical match:cause
Lexical match:
abnormality
Lexical match:abnormality
Example results
• OLVG: Acute respiratory failureDICE: Asthma cardiale
• OLVG: Aspergillus fumigatus DICE: Aspergilloom
• OLVG: duodenum perforation DICE: Gut perforation
• OLVG: HIVDICE: AIDS
• OLVG: Aorta thoracalis dissectie type B DICE: Dissection of artery
cause
abnormality,cause
cause
location,abnormality
abnormality
Ontology mapping using background knowledgeCase study 2
Work with Heiner Stuckenschmidt @ VU
Case Study: 1. Map GALEN & Tambis,
using UMLS as background knowledge2. Select three topics with sufficient overlap
• Substances• Structures • Processes
3. Define some partial & ad-hoc manual mappings between individual concepts
4. Represent mappings in C-OWL5. Use semantics of C-OWL
to verify and complete mappings
GALEN(medical ontology)
Tambis(genetic ontology)
UMLS(medical terminology)
lexical mappinglexical mapping
derived mapping
verification &derivation
verification & derivation
Case Study:
Ad hoc mappings: Substances
Notice: mappings high and low in the hierarchy, few in the middle
UMLS GALEN
Ad hoc mappings: Substances
UMLS Tambis
Notice different grainsize: UMLS course, Tambis fine
Verification of mappings
UMLS:Chemicals
Tambis:Chemical
Tambis:enzyme
UMLS:Chemicals_viewed_structurally
UMLS:Chemicals_viewed_functionally
UMLS:enzyme
=
=
?
Deriving new mappings
UMLS:substance
Galen:ChemicalSubstance
UMLS:Phenomenon_or_process
UMLS:Chemicals
UMLS:OrganicChemical
=
Ontology mapping: a way out of
the medical tower of Babel?
“Conclusions”Ontology mapping is (still) hard & openMany different approaches will be
required:• linguistic,• structural• statistical• semantic• …
Currently no roadmap theory on what's good for which problems
Challengesroadmap theory run-time matching“good-enough” matcheslarge scale evaluation methodologyhybrid matchers (needs roadmap
theory)
Ontology mapping: a way out of
the medical tower of Babel?
?