ontology mapping needs context
TRANSCRIPT
Ontology mapping needs context
Frank van HarmelenVrije Universiteit Amsterdam
introductory comments onontologies & contexts
3
Ontologieswe know what they are“consensual, formalised models of a domain”we know how to make and maintain them (methods, tools, experience)we know how to deploy them(search, personalisation, data-integration, …)
Main remaining open questionsAutomatic construction (learning)Automatic mapping (integration)
4
ContextsI don’t really know what they are…Quote from CfP: “Earlier workshops were mostly focused on what contexts and ontologies are”. At least (?) two views:
context as “module”, ist(p,c)context as “relevant knowledge”, “contextual meaning”
I will use 2nd meaning
context-specific nature of knowledge
6
Opinion pollleft
meaning of a sentence is only determinedby the sentence itself,and not influenced bythe surroundingsentences, and not by the situationin which the sentence is used
meaning of a sentence is only determinedby the sentence itself,and not influenced bythe surroundingsentences, and not by the situationin which the sentence is used
meaning of sentence is not only determinedby the sentence itself,but is also influenced byby the surroundingsentences,and also by the situationin which the sentenceis used
meaning of sentence is not only determinedby the sentence itself,but is also influenced byby the surroundingsentences,and also by the situationin which the sentenceis used
right
7
Opinion pollrightleft
don’t you see what I mean?
8
Agenda for talkDoes this “context dependency” also hold for ontology mapping?
Intuitively: yes, obviouslyMore precisely:
can context compensate for lack of structure in source and target?is more context knowledge better?is richer context knowledge better?
This work withZharko Aleksovski &
Michel Klein
Does context knowledge help mapping?
11
The general idea
source target
background knowledge
anchoring anchoring
mapping
inference
12
source and target vocab’sOLVG “problem-list”:
around 3000 problems in a flat listbased on ICD9 + “classificatie van verrichtingen”contains general and specific categories
• implicit hierarchy• e.g.6 types of Diabetes Mellitus, many fractures
some redundancy because of spelling mistakesused to keep track of the problems of patients during the whole stay at the ICU
OLVG-1400:the subset used in the first 24 hour of stay since 2000 (contains data about 3602 patients)
AMC: similar list, but from different hospital
13
Context ontology used
DICE:2500 concepts (5000 terms), 4500 linksFormalised in DLfive main categories:• tractus (e.g. nervous_system,
respiratory_system)• aetiology (e.g. virus, poising)• abnormality (e.g. fracture, tumor)• action (e.g. biopsy, observation, removal)• anatomic_location (e.g. lungs, skin)
14
Baseline: Linguistic methods
Combine lexical analysis with hierarchical structureFirst round
compare with complete DICE313 suggested matches, around 70 % correct
Second round:only compare with “reasons for admission” subtree209 suggested matches, around 90 % correct
High precision, low recall (“the easy cases”)
15
The general idea
source target
background knowledge
anchoring anchoring
mapping
inference
16
Example found with context knowledge (beyond lexical)
17
Example 2
18
Anchoring strengthAnchoring = substring + trivial morphology
anchored on N aspects OLVG AMC
N=5N=4N=3N=2N=1
004
144401
2198711285208
total nr. of anchored termstotal nr. of anchorings
5491298
39% 14045816
96%
19
Experimental resultsSource & target = flat lists of ±1400 ICU terms eachBackground = DICE (2300 concepts in DL)Manual Gold Standard (n=200)
20
So…The OLVG & AMC terms get their meaning from the context in which they are being used. Different background knowledge would have resulted in different mappingsTheir semantics is not context-freeSee also: S-MATCH by Trento
Does more context knowledge help?
22
Adding more context Only lexicalDICE (2300 concepts)MeSH (22000 concepts) ICD-10 (11000 concepts)
Anchoring strength:DICE MeSH ICD10
4 aspects 0 8 03 aspects 0 89 02 aspects 135 201 01 aspect 413 694 80total 548 992 80
23
Results with multiple ontologies
Monotonic improvement Independent of orderLinear increase of cost
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4
Separate Lexical ICD-10 DICE MeSHRecallPrecision
64%95%
64%95%
76%94%
88%89%
Joint
does structured context knowledge help?
25
Exploiting structureCRISP: 700 concepts, broader-thanMeSH: 1475 concepts, broader-thanFMA: 75.000 concepts, 160 relation-types(we used: is-a & part-of)
26
Direct vs. inferred matchesusing only:source-target lexical matchesrelations inside source or target:
e.g: (S <d T) & (T < T’) → (S <d T’)e.g: CRISP:brain =d MESH:brain
MESH:brain > MESH:temp_lobe→ CRISP:brain >d MESH:temp_lobe
27
Direct vs. inferred matches
Using:Lexical anchorings with backgroundRelations inside background knowledge
Matches inferred via anchorings:(S <a B) & (B < B’) & (B’ <a T) → (S <i T)
28
Using the structure or not(S <a B) & (B < B’) & (B’ <a T) → (S <i T)
Only stated is-a & part-ofTransitive chains of is-a, andtransitive chains of part-ofTransitive chains of is-a and part-ofOne chain of part-of before one chain of is-a
29
Examples
30
Examples
31
Anchoring strength
Anchoring concepts
= · ≥ Anchored concepts
CRISP to FMAMeSH to FMA
7381475
483 1042
6071545
14742227
7301462
32
Matching results (CRISP to MeSH)
Recall = · ≥ total incr.Exp.1:DirectExp.2:Indir. is-a + part-ofExp.3:Indir. separate closuresExp.4:Indir. mixed closuresExp.5:Indir. part-of before is-a
448395395395395
417516933
1511972
156405
140222281800
10211316273041433167
-29%
167%306%210%
Precision = · ≥ total correctExp.1:DirectExp.4:Indir. mixed closuresExp.5:Indir. part-of before is-a
171414
183937
35950
38112101
100%94%
100%
(Golden Standard n=30)
wrapping up
34
Related workContext knowledge for mapping is mostly linguistic (WordNet)Notable exception is S-Match using UMLS,but: we have shown source/target structure is not needed
35
ConclusionsStructured investigation on:
The role of source/target structure:we can even do without, given good contextThe role of context structure(it helps, but be careful with its semantics)The amound of context knowledge(surprisingly robust monotonic improvements)