consistency between metathesaurus and semantic network

Post on 24-Feb-2016

41 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA. Workshop on The Future of the UMLS Semantic Network NLM, April 8, 2005. Consistency between Metathesaurus and Semantic Network. Overview. Defining consistency - PowerPoint PPT Presentation

TRANSCRIPT

Consistency between Metathesaurusand Semantic Network

Workshop onThe Future of the UMLS Semantic Network

NLM, April 8, 2005

Olivier BodenreiderOlivier Bodenreider

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA

2 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

OverviewOverview

Defining consistencyDefining consistency What does inconsistency mean?What does inconsistency mean? Testing consistencyTesting consistency

Comparing Metathesaurus relations to SN relationsComparing Metathesaurus relations to SN relations Aligning Metathesaurus concepts and semantic typesAligning Metathesaurus concepts and semantic types Semantic type distribution of sets of descendants of Metathesaurus Semantic type distribution of sets of descendants of Metathesaurus

conceptsconcepts SuggestionsSuggestions

Enforcement mechanismEnforcement mechanism Ontology of relationshipsOntology of relationships CVFCVF

Two levels in the UMLS

4 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

The UMLS: a two-level structureThe UMLS: a two-level structure

Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

SemanticType c

Concept 2

Heart

Concepts

Metathesaurus

22

225

97

4

12

9 31

Esophagus

Left PhrenicNerve

HeartValves

FetalHeart

Medias-tinum

SaccularViscus

AnginaPectoris

CardiotonicAgents

TissueDonors

AnatomicalStructure

Fully FormedAnatomicalStructure

EmbryonicStructure

Body Part, Organ orOrgan Component Pharmacologic

Substance

Disease orSyndrome

PopulationGroup

Semantic Types

SemanticNetwork

6 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Relationships can inherit semanticsRelationships can inherit semantics

Semantic Network

Metathesaurus

AdrenalCortex

AdrenalCortical

hypofunction

Disease or SyndromeBody Part, Organ,

or Organ Component

Pathologic Functionisa

Biologic Function

isa

Fully FormedAnatomical

Structure

isa

location of

location of

Defining consistency

8 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

The consistency “square”The consistency “square”

Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

Concept 2

9 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

The categorization linkThe categorization link

Semantic Network

Professional Society

Metathesaurus

SalmonellaAmericanMedical

Association

Organism

Bacteriumisa

isa is an instance of

10 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Semantic Network relationsSemantic Network relations

54 types of relationships54 types of relationships 558 asserted relations (SRSTR)558 asserted relations (SRSTR) 6703 fully expanded relations (SRSTRE*)6703 fully expanded relations (SRSTRE*)

Semantic Network

Disease or SyndromeBody Part, Organ,

or Organ Component

Pathologic Functionisa

Biologic Function

isa

Fully FormedAnatomical

Structure

isa

location of

11 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Metathesaurus relationsMetathesaurus relations

REL vs. RELAREL vs. RELA Not always labeledNot always labeled

106 additional types of 106 additional types of relationshipsrelationships

~7 M symbolic relations~7 M symbolic relations

Heart

Concepts

Metathesaurus

22

225

97

4

12

9 31

Esophagus

Left PhrenicNerve

HeartValves

FetalHeart

Medias-tinum

SaccularViscus

AnginaPectoris

CardiotonicAgents

TissueDonors

12 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Metathesaurus relationsMetathesaurus relations

RecordedRecorded at the term level: from source vocabulariesat the term level: from source vocabularies at the concept level: from Metathesaurus editorsat the concept level: from Metathesaurus editors

Aggregated at the concept levelAggregated at the concept level

Oat cell carcinoma of lungCarcinoma, Small CellSCLC

Lung structureLungPulmonary

has_finding_site

Oat cell carcinoma of lungCarcinoma, Small CellSCLC

Lung structureLungPulmonary

has_finding_site

13 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Not all relationships in hierarchies are Not all relationships in hierarchies are isa isa (1)(1)

Autoimmune Diseases

Addison’s disease

Addison’s diseasedue to autoimmunity

TuberculousAddison’s disease

is generally a

14 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Not all relationships in hierarchies are Not all relationships in hierarchies are isa isa (2)(2)

Environment and Public Health [G03]

Public Health [G03.850]

Accidents [G03.850.110]

Accident Prevention [G03.850.110.060] +

Accidental Falls [G03.850.110.085]

Accidents, Aviation [G03.850.110.185]

[…]

Drowning [G03.850.110.500] +

15 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Defining consistencyDefining consistency

SN rel. and Meta rel. must SN rel. and Meta rel. must have the same directionhave the same direction

SN rel. and Meta rel. must SN rel. and Meta rel. must be of the same type (both be of the same type (both hierarchical or hierarchical or associative)associative)

Meta rel. must be the Meta rel. must be the same as SN rel. or one of same as SN rel. or one of its descendantsits descendants

Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

Concept 2

16 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Examples of consistent relationsExamples of consistent relations

Lung

Body Part, Organ,or Organ Component

Disease orSyndrome

Pneumonia

has_location

has_location

17 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Examples of consistent relationsExamples of consistent relations

Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

Concept 2

affects

treats

What does inconsistency mean?

19 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

The consistency “square” revisitedThe consistency “square” revisited

Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

Concept 2 Concept 1

Metathesaurus

Semantic Network

SemanticType a

SemanticType b

Concept 2

??

?

?

20 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

What does inconsistency mean?

Inaccurate/missing Semantic Network relationInaccurate/missing Semantic Network relation

Inaccurate (/missing?) categorizationInaccurate (/missing?) categorization

Inaccurate Metathesaurus relationInaccurate Metathesaurus relation

Testing consistency

22 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

(A)(A) Consistency of associative relations Consistency of associative relations

[McCray& Bodenreider, 2002]

23 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

ResultsResults

6894 pairs of related concepts6894 pairs of related concepts 4496 (65%): a SN relation can be inferred 4496 (65%): a SN relation can be inferred

unambiguouslyunambiguously Validity confirmed in 1981 casesValidity confirmed in 1981 cases 2515 not labeled in the Metathesaurus2515 not labeled in the Metathesaurus

1491 (22%): multiple possible SN relationships1491 (22%): multiple possible SN relationships multiple possible Metathesaurus relationshipsmultiple possible Metathesaurus relationships

907 (13%): inconsistency SN/Meta relationships907 (13%): inconsistency SN/Meta relationships 372: no SN relationship between the STs372: no SN relationship between the STs 415: inconsistent SN/Meta relationship type (REL)415: inconsistent SN/Meta relationship type (REL) 120: inconsistent SN/Meta relationship attribute (RELA)120: inconsistent SN/Meta relationship attribute (RELA)

24 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

(B) (B) Consistency of hierarchical relations Consistency of hierarchical relations

Relations usedRelations used SN: isaSN: isa Categorization: isaCategorization: isa Metathesaurus: PAR/CHD + RB/RNMetathesaurus: PAR/CHD + RB/RN

HypothesisHypothesis For a pair of (ST, C), the concepts categorized by ST For a pair of (ST, C), the concepts categorized by ST

(and its descendants) correspond to the descendants of (and its descendants) correspond to the descendants of the concept Cthe concept C

In the set of descendants of C, expected STs are the ST In the set of descendants of C, expected STs are the ST of C (and its descendants)of C (and its descendants)

25 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

ST-based classes vs. descendantsST-based classes vs. descendants

Semantic typeSemantic type List of all conceptsList of all concepts

having this semantic typehaving this semantic type ConceptConcept

List of all descendantsList of all descendants

Comparing the 2 setsComparing the 2 sets Intersection of the 2 setsIntersection of the 2 sets

[Bodenreider& Burgun, 2004]

26 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Analyzing inconsistenciesAnalyzing inconsistenciesAmphibian Amphibia

1126descendants

1135concepts

11241124in commonin common

TadpoleInvertebrate

Toadlicking

PharmacologicSubstance

Miscategor-ization (?)

Wronghierarchical

relationMissing

hierarchicalrelation

Miscategor-ization

Ranaunclassified

ClassReptilia

Amphibians and Reptiles

27 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Semantic types of descendantsSemantic types of descendants

ConceptConcept Set of all descendantsSet of all descendants

Distribution of semantic Distribution of semantic types in the settypes in the set Allowable STs: ST of C Allowable STs: ST of C

and its descendants (strict) and its descendants (strict) or ST from the same or ST from the same semantic group (loose)semantic group (loose)

[Mougin& Bodenreider, 2005]

28 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Analyzing inconsistenciesAnalyzing inconsistencies

26,584 concepts studied26,584 concepts studied 59% of their descendants have a semantic type 59% of their descendants have a semantic type

incompatible with that of the original conceptincompatible with that of the original concept

Reaction belligerent

Finding Hostility

Mental Process

29 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

# ------------------------------------------------------------# C0597249 Neoplasm of placenta (disorder) (neop)# * B: 190

C0597249|ST|acab| 5.50|incpC0597249|ST|anab| 1.50|incpC0597249|ST|cgab| 76.50|incpC0597249|ST|dsyn| 27.50|incpC0597249|ST|inpo| 1.00|incpC0597249|ST|neop| 76.50|compC0597249|ST|patf| 1.50|incp

C0597249|SG|DISO| 190.00|comp# ------------------------------------------------------------

Analyzing inconsistenciesAnalyzing inconsistencies

Suggestions

31 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Aligning SN and Meta relationshipsAligning SN and Meta relationships

54 types of SN relationships54 types of SN relationships 106 additional types of Metathesaurus 106 additional types of Metathesaurus

relationshipsrelationships

Some are simply synonymousSome are simply synonymous((caused_bycaused_by / / due_todue_to; ; followsfollows / / temporally_followstemporally_follows))

Some are specialized relationshipsSome are specialized relationships((manifestation_ofmanifestation_of / / definitional_manifestation_ofdefinitional_manifestation_of))

Many types of mapping relationships, not in SNMany types of mapping relationships, not in SN

32 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Add classification information to SNAdd classification information to SN

Explicit classificatory principles (in addition to Explicit classificatory principles (in addition to textual definition and examples)textual definition and examples)

Abandon economy principle and return to JEPD Abandon economy principle and return to JEPD (jointly exhaustive/pairwise disjoint) approach(jointly exhaustive/pairwise disjoint) approach

33 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Metathesaurus editing environmentMetathesaurus editing environment

Use SN/Meta relation consistency as a constraint Use SN/Meta relation consistency as a constraint for assigning semantic typesfor assigning semantic types

Use SN relations to suggest labels for unspecified Use SN relations to suggest labels for unspecified Meta relations Meta relations

Use SN/Meta relation consistency to guide the Use SN/Meta relation consistency to guide the review by the Metathesaurus editorsreview by the Metathesaurus editors Inaccurate categorization?Inaccurate categorization? Inaccurate Metathesaurus relation?Inaccurate Metathesaurus relation?

Conclusions

35 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

ConclusionsConclusions

SimultaneouslySimultaneously Improve SNImprove SN Improve categorizationImprove categorization

ST assignment can be automated in partST assignment can be automated in part

36 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Some referencesSome references

McCray AT, Bodenreider O.McCray AT, Bodenreider O.A conceptual framework for the biomedical domain.A conceptual framework for the biomedical domain.In: Green R, Bean CA, Myaeng SH, editors. The semantics In: Green R, Bean CA, Myaeng SH, editors. The semantics of relationships: an interdisciplinary perspective. Boston: of relationships: an interdisciplinary perspective. Boston: Kluwer Academic Publishers; 2002. p. 181-198. Kluwer Academic Publishers; 2002. p. 181-198.

Bodenreider O, Burgun A.Bodenreider O, Burgun A.Aligning knowledge sources in the UMLS: Methods, Aligning knowledge sources in the UMLS: Methods, quantitative results, and applications.quantitative results, and applications.Medinfo 2004:327-331. Medinfo 2004:327-331.

37 Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Some referencesSome references

Burgun A, Bodenreider O.Burgun A, Bodenreider O.Aspects of the taxonomic relation in the biomedical Aspects of the taxonomic relation in the biomedical domain.domain.In: Welty C, Smith B, editors. Collected papers from the In: Welty C, Smith B, editors. Collected papers from the Second International Conference "Formal Ontology in Second International Conference "Formal Ontology in Information Systems": ACM Press; 2001. p. 222-233. Information Systems": ACM Press; 2001. p. 222-233.

Mougin F, Bodenreider O.Mougin F, Bodenreider O.Approaches to eliminating cycles in the UMLS Approaches to eliminating cycles in the UMLS Metathesaurus: Naive vs. formal.Metathesaurus: Naive vs. formal.Proceedings of AMIA Annual Symposium 2005:Proceedings of AMIA Annual Symposium 2005:(submitted). (submitted).

MedicalOntologyResearch

Olivier BodenreiderOlivier Bodenreider

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA

Contact:Contact:Web:Web:

olivier@nlm.nih.govolivier@nlm.nih.govmor.nlm.nih.govmor.nlm.nih.gov

top related