the umls and the semantic web · 9/22/2008  · the umls and the semantic web. w3c semantic web...

25
The UMLS and the Semantic Web W3C Semantic Web Health Care and Life Sciences Interest Group BioRDF Teleconference September 22, 2008 Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA

Upload: others

Post on 09-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

The UMLS and the Semantic Web

W3C Semantic WebHealth Care and Life Sciences Interest Group

BioRDF TeleconferenceSeptember 22, 2008

Olivier Bodenreider

Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA

Page 2: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 2

Outline

The UMLS (in a nutshell)Lexical resourcesMetathesaurusSemantic Network

Why is the UMLS relevant to the Semantic Web?Issues and challenges

Page 3: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Unified Medical Language System (UMLS)

Page 4: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 4

UMLS: 3 components

SPECIALIST Lexicon200,000 lexical itemsPart of speech and variant information

Metathesaurus5M names from over 100 terminologies1M concepts16M relations

Semantic Network135 high-level categories7000 relations among them

Lexicalresources

Ontologicalresources

Terminologicalresources

Page 5: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 5

UMLS Characteristics (1)

Current version: 2008AA (2-3 annual releases)Type: Terminology integration systemDomain: BiomedicineDeveloper: NLMFunding: NLM (intramural)Availability

Publicly available: Yes* (cost-free license required)Repositories: UMLS

URL: http://umlsks.nlm.nih.gov/

Page 6: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 6

UMLS Characteristics (2)

Number ofConcepts: 1.5M (2008AA)Terms: ~6M

Major organizing principles (Metathesaurus):Concept orientationSource transparencyMulti-lingual through translation

Formalism: Proprietary format (RRF)

Page 7: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 7

UMLS Integrating subdomains

Biomedicalliterature

MeSH

Genomeannotations

GOModelorganisms

NCBITaxonomy

Geneticknowledge bases

OMIM

Clinicalrepositories

SNOMED CTOthersubdomains

Anatomy

FMA

UMLS

Page 8: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 8

Trans-namespace integration

Genomeannotations

GOModelorganisms

NCBITaxonomy

Geneticknowledge bases

OMIMOther

subdomains

Anatomy

FMA

UMLSAddison Disease (D000224)

Addison's disease (363732003)

Biomedicalliterature

MeSH

Clinicalrepositories

SNOMED CT

UMLSC0001403

Page 9: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Heart

Concepts

Metathesaurus

22

225

97

4

12

9 31

Esophagus

Left PhrenicNerve

HeartValves

FetalHeart

Medias-tinum

SaccularViscus

AnginaPectoris

CardiotonicAgents

TissueDonors

AnatomicalStructure

Fully FormedAnatomicalStructure

EmbryonicStructure

Body Part, Organ orOrgan Component Pharmacologic

Substance

Disease orSyndrome

PopulationGroup

Semantic Types

SemanticNetwork

Page 10: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Why is the UMLS relevantto the Semantic Web?

Page 11: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 11

Relevance to the SW Metathesaurus

Terminology integration systemTrans-namespace integrationIntegration beyond shared identifiers

Repository of biomedical terminologies/ontologiesMany UMLS vocabularies used for the annotation of datasets (including clinical records)

Page 12: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 12

Relevance to the SW Metathesaurus

Broad coverage of biomedicineLarge user baseTooling available

E.g, visualization, named entity recognition, etc.

Page 13: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 13

Relevance to the SW Semantic Network

Top-level ontology of the biomedical domainBroad biomedical categoriesHelps partition biomedical conceptsSemantic relations

Page 14: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Issues and Challenges

Page 15: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 15

Issues and challenges

AvailabilityMandatory license agreement

DiscoverabilityNo metadata

FormalismNo easy conversion to SKOS/RDF(S)/OWL

Identifiers

Steep learning curve

Page 16: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 16

Availability

Some source vocabularies have intellectual property restrictions

E.g., most drug vocabulariesComplex agreement for SNOMED CT: available at no cost for member countries of the IHTSDO

Mandatory license agreementNo cost for researchMay require negotiation with the vocabulary developer for production applications

MetamorphoSys helps extract selected sources from the UMLS

Page 17: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 17

Discoverability

Discoverability of individual conceptsUMLSKS web servicesSearch all UMLS source vocabularies at the same timeNamed entity recognition/normalization (e.g., MetaMap)

Discoverability of terminologies/ontologiesNo comprehensive registriesNo rich registries

With rich metadata supporting the discoverability of terminologies/ontologies

Page 18: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 18

Formalism

UMLS: Proprietary formatRich Release Format (RRF)All terminologies/ontologies represented in the same format

No easy conversion to SKOS/RDF(S)/OWLUnderspecified semantics

Child/parent ≠ subClassOf

Complex semanticsDescriptors / concepts / terms

Rich attribute set

Page 19: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 19

Identifiers for biomedical entities

What is identified?Entity vs. resource about the entity

Which identifier to pick?E.g., Addison’s disease

363732003 (SNOMED CT)D000224 (MeSH)C0001403 (UMLS Metathesaurus)

Which format?URI vs. LSID

Which authoritative source for minting URIs?Ontology developers vs. (e.g.) Bio2RDF

Page 20: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 20

Steep learning curve

Large resource1.5M concepts6M termsOver 20M relations

Complex structureMetathesaurusSemantic Network

Rich set of attributes

Rich set of relationsTerminologicalSemanticStatisticalMapping

Multiple languages

Complex domain

Page 21: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Conclusions

Page 22: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 22

Conclusions

UMLS as a terminology integration systemHelps bridge across namespacesHelps integrate information sources

Beyond shared identifiers

UMLS as a repository of terminologies/ontologiesSingle source, single format for 143 vocabularies

Issues with availability, discoverability and formalismIdentifiers for biomedical entities

Page 23: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 23

References

UMLSumlsinfo.nlm.nih.gov

UMLS browsers(free, but UMLS license required)

Knowledge Source Server: umlsks.nlm.nih.gov

Semantic Navigator: http://mor.nlm.nih.gov/perl/semnav.pl

RRF browser(standalone application distributed with the UMLS)

Page 24: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

Lister Hill National Center for Biomedical Communications 24

References

Recent overviewsBodenreider O. (2004). The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Research; D267-D270.Bodenreider O. From terminology integration to information integration: Unified Medical Language System (UMLS). BioRDF Teleconference, W3C Semantic Web Health Care and Life Sciences Interest Group, June 5, 2006.http://mor.nlm.nih.gov/pubs/pres/060605-BioRDF.pdf

Page 25: The UMLS and the Semantic Web · 9/22/2008  · The UMLS and the Semantic Web. W3C Semantic Web Health Care and Life Sciences Interest Group. BioRDF Teleconference. September 22,

MedicalOntologyResearch

Olivier Bodenreider

Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA

Contact:Web:

[email protected]