consistency between metathesaurus and semantic network

38
Consistency between Metathesaurus and Semantic Network Workshop on The Future of the UMLS Semantic Network NLM, April 8, 2005 Olivier Bodenreider Olivier Bodenreider Lister Hill National Center Lister Hill National Center for Biomedical Communications for Biomedical Communications Bethesda, Maryland - USA Bethesda, Maryland - USA

Upload: primo

Post on 24-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA. Workshop on The Future of the UMLS Semantic Network NLM, April 8, 2005. Consistency between Metathesaurus and Semantic Network. Overview. Defining consistency - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Consistency between Metathesaurus and Semantic Network

Consistency between Metathesaurusand Semantic Network

Workshop onThe Future of the UMLS Semantic Network

NLM, April 8, 2005

Olivier BodenreiderOlivier Bodenreider

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA

Page 2: Consistency between Metathesaurus and Semantic Network

2 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

OverviewOverview

Defining consistencyDefining consistency What does inconsistency mean?What does inconsistency mean? Testing consistencyTesting consistency

Comparing Metathesaurus relations to SN relationsComparing Metathesaurus relations to SN relations Aligning Metathesaurus concepts and semantic typesAligning Metathesaurus concepts and semantic types Semantic type distribution of sets of descendants of Metathesaurus Semantic type distribution of sets of descendants of Metathesaurus

conceptsconcepts SuggestionsSuggestions

Enforcement mechanismEnforcement mechanism Ontology of relationshipsOntology of relationships CVFCVF

Page 3: Consistency between Metathesaurus and Semantic Network

Two levels in the UMLS

Page 4: Consistency between Metathesaurus and Semantic Network

4 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

The UMLS: a two-level structureThe UMLS: a two-level structure

Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

SemanticType c

Concept 2

Page 5: Consistency between Metathesaurus and Semantic Network

Heart

Concepts

Metathesaurus

22

225

97

4

12

9 31

Esophagus

Left PhrenicNerve

HeartValves

FetalHeart

Medias-tinum

SaccularViscus

AnginaPectoris

CardiotonicAgents

TissueDonors

AnatomicalStructure

Fully FormedAnatomicalStructure

EmbryonicStructure

Body Part, Organ orOrgan Component Pharmacologic

Substance

Disease orSyndrome

PopulationGroup

Semantic Types

SemanticNetwork

Page 6: Consistency between Metathesaurus and Semantic Network

6 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Relationships can inherit semanticsRelationships can inherit semantics

Semantic Network

Metathesaurus

AdrenalCortex

AdrenalCortical

hypofunction

Disease or SyndromeBody Part, Organ,

or Organ Component

Pathologic Functionisa

Biologic Function

isa

Fully FormedAnatomical

Structure

isa

location of

location of

Page 7: Consistency between Metathesaurus and Semantic Network

Defining consistency

Page 8: Consistency between Metathesaurus and Semantic Network

8 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

The consistency “square”The consistency “square”

Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

Concept 2

Page 9: Consistency between Metathesaurus and Semantic Network

9 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

The categorization linkThe categorization link

Semantic Network

Professional Society

Metathesaurus

SalmonellaAmericanMedical

Association

Organism

Bacteriumisa

isa is an instance of

Page 10: Consistency between Metathesaurus and Semantic Network

10 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Semantic Network relationsSemantic Network relations

54 types of relationships54 types of relationships 558 asserted relations (SRSTR)558 asserted relations (SRSTR) 6703 fully expanded relations (SRSTRE*)6703 fully expanded relations (SRSTRE*)

Semantic Network

Disease or SyndromeBody Part, Organ,

or Organ Component

Pathologic Functionisa

Biologic Function

isa

Fully FormedAnatomical

Structure

isa

location of

Page 11: Consistency between Metathesaurus and Semantic Network

11 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Metathesaurus relationsMetathesaurus relations

REL vs. RELAREL vs. RELA Not always labeledNot always labeled

106 additional types of 106 additional types of relationshipsrelationships

~7 M symbolic relations~7 M symbolic relations

Heart

Concepts

Metathesaurus

22

225

97

4

12

9 31

Esophagus

Left PhrenicNerve

HeartValves

FetalHeart

Medias-tinum

SaccularViscus

AnginaPectoris

CardiotonicAgents

TissueDonors

Page 12: Consistency between Metathesaurus and Semantic Network

12 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Metathesaurus relationsMetathesaurus relations

RecordedRecorded at the term level: from source vocabulariesat the term level: from source vocabularies at the concept level: from Metathesaurus editorsat the concept level: from Metathesaurus editors

Aggregated at the concept levelAggregated at the concept level

Oat cell carcinoma of lungCarcinoma, Small CellSCLC

Lung structureLungPulmonary

has_finding_site

Oat cell carcinoma of lungCarcinoma, Small CellSCLC

Lung structureLungPulmonary

has_finding_site

Page 13: Consistency between Metathesaurus and Semantic Network

13 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Not all relationships in hierarchies are Not all relationships in hierarchies are isa isa (1)(1)

Autoimmune Diseases

Addison’s disease

Addison’s diseasedue to autoimmunity

TuberculousAddison’s disease

is generally a

Page 14: Consistency between Metathesaurus and Semantic Network

14 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Not all relationships in hierarchies are Not all relationships in hierarchies are isa isa (2)(2)

Environment and Public Health [G03]

Public Health [G03.850]

Accidents [G03.850.110]

Accident Prevention [G03.850.110.060] +

Accidental Falls [G03.850.110.085]

Accidents, Aviation [G03.850.110.185]

[…]

Drowning [G03.850.110.500] +

Page 15: Consistency between Metathesaurus and Semantic Network

15 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Defining consistencyDefining consistency

SN rel. and Meta rel. must SN rel. and Meta rel. must have the same directionhave the same direction

SN rel. and Meta rel. must SN rel. and Meta rel. must be of the same type (both be of the same type (both hierarchical or hierarchical or associative)associative)

Meta rel. must be the Meta rel. must be the same as SN rel. or one of same as SN rel. or one of its descendantsits descendants

Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

Concept 2

Page 16: Consistency between Metathesaurus and Semantic Network

16 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Examples of consistent relationsExamples of consistent relations

Lung

Body Part, Organ,or Organ Component

Disease orSyndrome

Pneumonia

has_location

has_location

Page 17: Consistency between Metathesaurus and Semantic Network

17 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Examples of consistent relationsExamples of consistent relations

Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

Concept 2

affects

treats

Page 18: Consistency between Metathesaurus and Semantic Network

What does inconsistency mean?

Page 19: Consistency between Metathesaurus and Semantic Network

19 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

The consistency “square” revisitedThe consistency “square” revisited

Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

Concept 2 Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

Concept 2

??

?

?

Page 20: Consistency between Metathesaurus and Semantic Network

20 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

What does inconsistency mean?

Inaccurate/missing Semantic Network relationInaccurate/missing Semantic Network relation

Inaccurate (/missing?) categorizationInaccurate (/missing?) categorization

Inaccurate Metathesaurus relationInaccurate Metathesaurus relation

Page 21: Consistency between Metathesaurus and Semantic Network

Testing consistency

Page 22: Consistency between Metathesaurus and Semantic Network

22 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

(A)(A) Consistency of associative relations Consistency of associative relations

[McCray& Bodenreider, 2002]

Page 23: Consistency between Metathesaurus and Semantic Network

23 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

ResultsResults

6894 pairs of related concepts6894 pairs of related concepts 4496 (65%): a SN relation can be inferred 4496 (65%): a SN relation can be inferred

unambiguouslyunambiguously Validity confirmed in 1981 casesValidity confirmed in 1981 cases 2515 not labeled in the Metathesaurus2515 not labeled in the Metathesaurus

1491 (22%): multiple possible SN relationships1491 (22%): multiple possible SN relationships multiple possible Metathesaurus relationshipsmultiple possible Metathesaurus relationships

907 (13%): inconsistency SN/Meta relationships907 (13%): inconsistency SN/Meta relationships 372: no SN relationship between the STs372: no SN relationship between the STs 415: inconsistent SN/Meta relationship type (REL)415: inconsistent SN/Meta relationship type (REL) 120: inconsistent SN/Meta relationship attribute (RELA)120: inconsistent SN/Meta relationship attribute (RELA)

Page 24: Consistency between Metathesaurus and Semantic Network

24 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

(B) (B) Consistency of hierarchical relations Consistency of hierarchical relations

Relations usedRelations used SN: isaSN: isa Categorization: isaCategorization: isa Metathesaurus: PAR/CHD + RB/RNMetathesaurus: PAR/CHD + RB/RN

HypothesisHypothesis For a pair of (ST, C), the concepts categorized by ST For a pair of (ST, C), the concepts categorized by ST

(and its descendants) correspond to the descendants of (and its descendants) correspond to the descendants of the concept Cthe concept C

In the set of descendants of C, expected STs are the ST In the set of descendants of C, expected STs are the ST of C (and its descendants)of C (and its descendants)

Page 25: Consistency between Metathesaurus and Semantic Network

25 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

ST-based classes vs. descendantsST-based classes vs. descendants

Semantic typeSemantic type List of all conceptsList of all concepts

having this semantic typehaving this semantic type ConceptConcept

List of all descendantsList of all descendants

Comparing the 2 setsComparing the 2 sets Intersection of the 2 setsIntersection of the 2 sets

[Bodenreider& Burgun, 2004]

Page 26: Consistency between Metathesaurus and Semantic Network

26 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Analyzing inconsistenciesAnalyzing inconsistenciesAmphibian Amphibia

1126descendants

1135concepts

11241124in commonin common

TadpoleInvertebrate

Toadlicking

PharmacologicSubstance

Miscategor-ization (?)

Wronghierarchical

relationMissing

hierarchicalrelation

Miscategor-ization

Ranaunclassified

ClassReptilia

Amphibians and Reptiles

Page 27: Consistency between Metathesaurus and Semantic Network

27 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Semantic types of descendantsSemantic types of descendants

ConceptConcept Set of all descendantsSet of all descendants

Distribution of semantic Distribution of semantic types in the settypes in the set Allowable STs: ST of C Allowable STs: ST of C

and its descendants (strict) and its descendants (strict) or ST from the same or ST from the same semantic group (loose)semantic group (loose)

[Mougin& Bodenreider, 2005]

Page 28: Consistency between Metathesaurus and Semantic Network

28 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Analyzing inconsistenciesAnalyzing inconsistencies

26,584 concepts studied26,584 concepts studied 59% of their descendants have a semantic type 59% of their descendants have a semantic type

incompatible with that of the original conceptincompatible with that of the original concept

Reaction belligerent

Finding Hostility

Mental Process

Page 29: Consistency between Metathesaurus and Semantic Network

29 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

# ------------------------------------------------------------# C0597249 Neoplasm of placenta (disorder) (neop)# * B: 190

C0597249|ST|acab| 5.50|incpC0597249|ST|anab| 1.50|incpC0597249|ST|cgab| 76.50|incpC0597249|ST|dsyn| 27.50|incpC0597249|ST|inpo| 1.00|incpC0597249|ST|neop| 76.50|compC0597249|ST|patf| 1.50|incp

C0597249|SG|DISO| 190.00|comp# ------------------------------------------------------------

Analyzing inconsistenciesAnalyzing inconsistencies

Page 30: Consistency between Metathesaurus and Semantic Network

Suggestions

Page 31: Consistency between Metathesaurus and Semantic Network

31 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Aligning SN and Meta relationshipsAligning SN and Meta relationships

54 types of SN relationships54 types of SN relationships 106 additional types of Metathesaurus 106 additional types of Metathesaurus

relationshipsrelationships

Some are simply synonymousSome are simply synonymous((caused_bycaused_by / / due_todue_to; ; followsfollows / / temporally_followstemporally_follows))

Some are specialized relationshipsSome are specialized relationships((manifestation_ofmanifestation_of / / definitional_manifestation_ofdefinitional_manifestation_of))

Many types of mapping relationships, not in SNMany types of mapping relationships, not in SN

Page 32: Consistency between Metathesaurus and Semantic Network

32 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Add classification information to SNAdd classification information to SN

Explicit classificatory principles (in addition to Explicit classificatory principles (in addition to textual definition and examples)textual definition and examples)

Abandon economy principle and return to JEPD Abandon economy principle and return to JEPD (jointly exhaustive/pairwise disjoint) approach(jointly exhaustive/pairwise disjoint) approach

Page 33: Consistency between Metathesaurus and Semantic Network

33 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Metathesaurus editing environmentMetathesaurus editing environment

Use SN/Meta relation consistency as a constraint Use SN/Meta relation consistency as a constraint for assigning semantic typesfor assigning semantic types

Use SN relations to suggest labels for unspecified Use SN relations to suggest labels for unspecified Meta relations Meta relations

Use SN/Meta relation consistency to guide the Use SN/Meta relation consistency to guide the review by the Metathesaurus editorsreview by the Metathesaurus editors Inaccurate categorization?Inaccurate categorization? Inaccurate Metathesaurus relation?Inaccurate Metathesaurus relation?

Page 34: Consistency between Metathesaurus and Semantic Network

Conclusions

Page 35: Consistency between Metathesaurus and Semantic Network

35 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

ConclusionsConclusions

SimultaneouslySimultaneously Improve SNImprove SN Improve categorizationImprove categorization

ST assignment can be automated in partST assignment can be automated in part

Page 36: Consistency between Metathesaurus and Semantic Network

36 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Some referencesSome references

McCray AT, Bodenreider O.McCray AT, Bodenreider O.A conceptual framework for the biomedical domain.A conceptual framework for the biomedical domain.In: Green R, Bean CA, Myaeng SH, editors. The semantics In: Green R, Bean CA, Myaeng SH, editors. The semantics of relationships: an interdisciplinary perspective. Boston: of relationships: an interdisciplinary perspective. Boston: Kluwer Academic Publishers; 2002. p. 181-198. Kluwer Academic Publishers; 2002. p. 181-198.

Bodenreider O, Burgun A.Bodenreider O, Burgun A.Aligning knowledge sources in the UMLS: Methods, Aligning knowledge sources in the UMLS: Methods, quantitative results, and applications.quantitative results, and applications.Medinfo 2004:327-331. Medinfo 2004:327-331.

Page 37: Consistency between Metathesaurus and Semantic Network

37 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Some referencesSome references

Burgun A, Bodenreider O.Burgun A, Bodenreider O.Aspects of the taxonomic relation in the biomedical Aspects of the taxonomic relation in the biomedical domain.domain.In: Welty C, Smith B, editors. Collected papers from the In: Welty C, Smith B, editors. Collected papers from the Second International Conference "Formal Ontology in Second International Conference "Formal Ontology in Information Systems": ACM Press; 2001. p. 222-233. Information Systems": ACM Press; 2001. p. 222-233.

Mougin F, Bodenreider O.Mougin F, Bodenreider O.Approaches to eliminating cycles in the UMLS Approaches to eliminating cycles in the UMLS Metathesaurus: Naive vs. formal.Metathesaurus: Naive vs. formal.Proceedings of AMIA Annual Symposium 2005:Proceedings of AMIA Annual Symposium 2005:(submitted). (submitted).

Page 38: Consistency between Metathesaurus and Semantic Network

MedicalOntologyResearch

Olivier BodenreiderOlivier Bodenreider

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA

Contact:Contact:Web:Web:

[email protected]@nlm.nih.govmor.nlm.nih.govmor.nlm.nih.gov