consistency between metathesaurus and semantic network
DESCRIPTION
Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA. Workshop on The Future of the UMLS Semantic Network NLM, April 8, 2005. Consistency between Metathesaurus and Semantic Network. Overview. Defining consistency - PowerPoint PPT PresentationTRANSCRIPT
Consistency between Metathesaurusand Semantic Network
Workshop onThe Future of the UMLS Semantic Network
NLM, April 8, 2005
Olivier BodenreiderOlivier Bodenreider
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA
2 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
OverviewOverview
Defining consistencyDefining consistency What does inconsistency mean?What does inconsistency mean? Testing consistencyTesting consistency
Comparing Metathesaurus relations to SN relationsComparing Metathesaurus relations to SN relations Aligning Metathesaurus concepts and semantic typesAligning Metathesaurus concepts and semantic types Semantic type distribution of sets of descendants of Metathesaurus Semantic type distribution of sets of descendants of Metathesaurus
conceptsconcepts SuggestionsSuggestions
Enforcement mechanismEnforcement mechanism Ontology of relationshipsOntology of relationships CVFCVF
Two levels in the UMLS
4 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
The UMLS: a two-level structureThe UMLS: a two-level structure
Concept 1
Metathesaurus
Semantic Network
SemanticType a
SemanticType b
SemanticType c
Concept 2
Heart
Concepts
Metathesaurus
22
225
97
4
12
9 31
Esophagus
Left PhrenicNerve
HeartValves
FetalHeart
Medias-tinum
SaccularViscus
AnginaPectoris
CardiotonicAgents
TissueDonors
AnatomicalStructure
Fully FormedAnatomicalStructure
EmbryonicStructure
Body Part, Organ orOrgan Component Pharmacologic
Substance
Disease orSyndrome
PopulationGroup
Semantic Types
SemanticNetwork
6 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Relationships can inherit semanticsRelationships can inherit semantics
Semantic Network
Metathesaurus
AdrenalCortex
AdrenalCortical
hypofunction
Disease or SyndromeBody Part, Organ,
or Organ Component
Pathologic Functionisa
Biologic Function
isa
Fully FormedAnatomical
Structure
isa
location of
location of
Defining consistency
8 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
The consistency “square”The consistency “square”
Concept 1
Metathesaurus
Semantic Network
SemanticType a
SemanticType b
Concept 2
9 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
The categorization linkThe categorization link
Semantic Network
Professional Society
Metathesaurus
SalmonellaAmericanMedical
Association
Organism
Bacteriumisa
isa is an instance of
10 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Semantic Network relationsSemantic Network relations
54 types of relationships54 types of relationships 558 asserted relations (SRSTR)558 asserted relations (SRSTR) 6703 fully expanded relations (SRSTRE*)6703 fully expanded relations (SRSTRE*)
Semantic Network
Disease or SyndromeBody Part, Organ,
or Organ Component
Pathologic Functionisa
Biologic Function
isa
Fully FormedAnatomical
Structure
isa
location of
11 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Metathesaurus relationsMetathesaurus relations
REL vs. RELAREL vs. RELA Not always labeledNot always labeled
106 additional types of 106 additional types of relationshipsrelationships
~7 M symbolic relations~7 M symbolic relations
Heart
Concepts
Metathesaurus
22
225
97
4
12
9 31
Esophagus
Left PhrenicNerve
HeartValves
FetalHeart
Medias-tinum
SaccularViscus
AnginaPectoris
CardiotonicAgents
TissueDonors
12 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Metathesaurus relationsMetathesaurus relations
RecordedRecorded at the term level: from source vocabulariesat the term level: from source vocabularies at the concept level: from Metathesaurus editorsat the concept level: from Metathesaurus editors
Aggregated at the concept levelAggregated at the concept level
Oat cell carcinoma of lungCarcinoma, Small CellSCLC
Lung structureLungPulmonary
has_finding_site
Oat cell carcinoma of lungCarcinoma, Small CellSCLC
Lung structureLungPulmonary
has_finding_site
13 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Not all relationships in hierarchies are Not all relationships in hierarchies are isa isa (1)(1)
Autoimmune Diseases
Addison’s disease
Addison’s diseasedue to autoimmunity
TuberculousAddison’s disease
is generally a
14 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Not all relationships in hierarchies are Not all relationships in hierarchies are isa isa (2)(2)
Environment and Public Health [G03]
Public Health [G03.850]
Accidents [G03.850.110]
Accident Prevention [G03.850.110.060] +
Accidental Falls [G03.850.110.085]
Accidents, Aviation [G03.850.110.185]
[…]
Drowning [G03.850.110.500] +
15 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Defining consistencyDefining consistency
SN rel. and Meta rel. must SN rel. and Meta rel. must have the same directionhave the same direction
SN rel. and Meta rel. must SN rel. and Meta rel. must be of the same type (both be of the same type (both hierarchical or hierarchical or associative)associative)
Meta rel. must be the Meta rel. must be the same as SN rel. or one of same as SN rel. or one of its descendantsits descendants
Concept 1
Metathesaurus
Semantic Network
SemanticType a
SemanticType b
Concept 2
16 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Examples of consistent relationsExamples of consistent relations
Lung
Body Part, Organ,or Organ Component
Disease orSyndrome
Pneumonia
has_location
has_location
17 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Examples of consistent relationsExamples of consistent relations
Concept 1
Metathesaurus
Semantic Network
SemanticType a
SemanticType b
Concept 2
affects
treats
What does inconsistency mean?
19 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
The consistency “square” revisitedThe consistency “square” revisited
Concept 1
Metathesaurus
Semantic Network
SemanticType a
SemanticType b
Concept 2 Concept 1
Metathesaurus
Semantic Network
SemanticType a
SemanticType b
Concept 2
??
?
?
20 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
What does inconsistency mean?
Inaccurate/missing Semantic Network relationInaccurate/missing Semantic Network relation
Inaccurate (/missing?) categorizationInaccurate (/missing?) categorization
Inaccurate Metathesaurus relationInaccurate Metathesaurus relation
Testing consistency
22 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
(A)(A) Consistency of associative relations Consistency of associative relations
[McCray& Bodenreider, 2002]
23 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
ResultsResults
6894 pairs of related concepts6894 pairs of related concepts 4496 (65%): a SN relation can be inferred 4496 (65%): a SN relation can be inferred
unambiguouslyunambiguously Validity confirmed in 1981 casesValidity confirmed in 1981 cases 2515 not labeled in the Metathesaurus2515 not labeled in the Metathesaurus
1491 (22%): multiple possible SN relationships1491 (22%): multiple possible SN relationships multiple possible Metathesaurus relationshipsmultiple possible Metathesaurus relationships
907 (13%): inconsistency SN/Meta relationships907 (13%): inconsistency SN/Meta relationships 372: no SN relationship between the STs372: no SN relationship between the STs 415: inconsistent SN/Meta relationship type (REL)415: inconsistent SN/Meta relationship type (REL) 120: inconsistent SN/Meta relationship attribute (RELA)120: inconsistent SN/Meta relationship attribute (RELA)
24 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
(B) (B) Consistency of hierarchical relations Consistency of hierarchical relations
Relations usedRelations used SN: isaSN: isa Categorization: isaCategorization: isa Metathesaurus: PAR/CHD + RB/RNMetathesaurus: PAR/CHD + RB/RN
HypothesisHypothesis For a pair of (ST, C), the concepts categorized by ST For a pair of (ST, C), the concepts categorized by ST
(and its descendants) correspond to the descendants of (and its descendants) correspond to the descendants of the concept Cthe concept C
In the set of descendants of C, expected STs are the ST In the set of descendants of C, expected STs are the ST of C (and its descendants)of C (and its descendants)
➊
➋
25 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
ST-based classes vs. descendantsST-based classes vs. descendants
Semantic typeSemantic type List of all conceptsList of all concepts
having this semantic typehaving this semantic type ConceptConcept
List of all descendantsList of all descendants
Comparing the 2 setsComparing the 2 sets Intersection of the 2 setsIntersection of the 2 sets
➊
[Bodenreider& Burgun, 2004]
26 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Analyzing inconsistenciesAnalyzing inconsistenciesAmphibian Amphibia
1126descendants
1135concepts
11241124in commonin common
TadpoleInvertebrate
Toadlicking
PharmacologicSubstance
Miscategor-ization (?)
Wronghierarchical
relationMissing
hierarchicalrelation
Miscategor-ization
Ranaunclassified
ClassReptilia
Amphibians and Reptiles
27 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Semantic types of descendantsSemantic types of descendants
ConceptConcept Set of all descendantsSet of all descendants
Distribution of semantic Distribution of semantic types in the settypes in the set Allowable STs: ST of C Allowable STs: ST of C
and its descendants (strict) and its descendants (strict) or ST from the same or ST from the same semantic group (loose)semantic group (loose)
[Mougin& Bodenreider, 2005]
➋
28 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Analyzing inconsistenciesAnalyzing inconsistencies
26,584 concepts studied26,584 concepts studied 59% of their descendants have a semantic type 59% of their descendants have a semantic type
incompatible with that of the original conceptincompatible with that of the original concept
Reaction belligerent
Finding Hostility
Mental Process
29 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
# ------------------------------------------------------------# C0597249 Neoplasm of placenta (disorder) (neop)# * B: 190
C0597249|ST|acab| 5.50|incpC0597249|ST|anab| 1.50|incpC0597249|ST|cgab| 76.50|incpC0597249|ST|dsyn| 27.50|incpC0597249|ST|inpo| 1.00|incpC0597249|ST|neop| 76.50|compC0597249|ST|patf| 1.50|incp
C0597249|SG|DISO| 190.00|comp# ------------------------------------------------------------
Analyzing inconsistenciesAnalyzing inconsistencies
Suggestions
31 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Aligning SN and Meta relationshipsAligning SN and Meta relationships
54 types of SN relationships54 types of SN relationships 106 additional types of Metathesaurus 106 additional types of Metathesaurus
relationshipsrelationships
Some are simply synonymousSome are simply synonymous((caused_bycaused_by / / due_todue_to; ; followsfollows / / temporally_followstemporally_follows))
Some are specialized relationshipsSome are specialized relationships((manifestation_ofmanifestation_of / / definitional_manifestation_ofdefinitional_manifestation_of))
Many types of mapping relationships, not in SNMany types of mapping relationships, not in SN
32 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Add classification information to SNAdd classification information to SN
Explicit classificatory principles (in addition to Explicit classificatory principles (in addition to textual definition and examples)textual definition and examples)
Abandon economy principle and return to JEPD Abandon economy principle and return to JEPD (jointly exhaustive/pairwise disjoint) approach(jointly exhaustive/pairwise disjoint) approach
33 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Metathesaurus editing environmentMetathesaurus editing environment
Use SN/Meta relation consistency as a constraint Use SN/Meta relation consistency as a constraint for assigning semantic typesfor assigning semantic types
Use SN relations to suggest labels for unspecified Use SN relations to suggest labels for unspecified Meta relations Meta relations
Use SN/Meta relation consistency to guide the Use SN/Meta relation consistency to guide the review by the Metathesaurus editorsreview by the Metathesaurus editors Inaccurate categorization?Inaccurate categorization? Inaccurate Metathesaurus relation?Inaccurate Metathesaurus relation?
Conclusions
35 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
ConclusionsConclusions
SimultaneouslySimultaneously Improve SNImprove SN Improve categorizationImprove categorization
ST assignment can be automated in partST assignment can be automated in part
36 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Some referencesSome references
McCray AT, Bodenreider O.McCray AT, Bodenreider O.A conceptual framework for the biomedical domain.A conceptual framework for the biomedical domain.In: Green R, Bean CA, Myaeng SH, editors. The semantics In: Green R, Bean CA, Myaeng SH, editors. The semantics of relationships: an interdisciplinary perspective. Boston: of relationships: an interdisciplinary perspective. Boston: Kluwer Academic Publishers; 2002. p. 181-198. Kluwer Academic Publishers; 2002. p. 181-198.
Bodenreider O, Burgun A.Bodenreider O, Burgun A.Aligning knowledge sources in the UMLS: Methods, Aligning knowledge sources in the UMLS: Methods, quantitative results, and applications.quantitative results, and applications.Medinfo 2004:327-331. Medinfo 2004:327-331.
37 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Some referencesSome references
Burgun A, Bodenreider O.Burgun A, Bodenreider O.Aspects of the taxonomic relation in the biomedical Aspects of the taxonomic relation in the biomedical domain.domain.In: Welty C, Smith B, editors. Collected papers from the In: Welty C, Smith B, editors. Collected papers from the Second International Conference "Formal Ontology in Second International Conference "Formal Ontology in Information Systems": ACM Press; 2001. p. 222-233. Information Systems": ACM Press; 2001. p. 222-233.
Mougin F, Bodenreider O.Mougin F, Bodenreider O.Approaches to eliminating cycles in the UMLS Approaches to eliminating cycles in the UMLS Metathesaurus: Naive vs. formal.Metathesaurus: Naive vs. formal.Proceedings of AMIA Annual Symposium 2005:Proceedings of AMIA Annual Symposium 2005:(submitted). (submitted).
MedicalOntologyResearch
Olivier BodenreiderOlivier Bodenreider
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA
Contact:Contact:Web:Web:
[email protected]@nlm.nih.govmor.nlm.nih.govmor.nlm.nih.gov