the umls* metathesaurus*: lessons for metadata registries

36
The UMLS* Metathesaurus*: Lessons for Metadata Registries Betsy L. Humphreys [email protected] http://www.nlm.nih.gov * UMLS and Metathesaurus are registered trademarks of the National Library of Medicine

Upload: archer

Post on 14-Jan-2016

40 views

Category:

Documents


1 download

DESCRIPTION

The UMLS* Metathesaurus*: Lessons for Metadata Registries. Betsy L. Humphreys [email protected] http://www.nlm.nih.gov. * UMLS and Metathesaurus are registered trademarks of the National Library of Medicine. Outline of Presentation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

The UMLS* Metathesaurus*: Lessons for Metadata Registries

Betsy L. Humphreys

[email protected]

http://www.nlm.nih.gov* UMLS and Metathesaurus are registered trademarks of the National Library of Medicine

Page 2: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

Outline of Presentation

• Brief overview -- NLM’s Unified Medical Language System (UMLS) Project and its products

• Description of the UMLS Metathesaurus– content, construction methods, characteristics

• Interspersed Metadata Questions/Issues

Page 3: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 4: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

UMLS Purpose

• Make it easy for health professionals and researchers to retrieve and integrate relevant information from disparate automated sources, e.g.– computer-based patient records– factual databanks– bibliographic databases and full-text– expert systems

Page 5: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

UMLS Focus -- Conceptual Connections

• Build knowledge sources that can be used by intelligent programs to overcome:– disparities in language used by different users

and in different information sources;– difficulties in identifying which of many

information sources is relevant

Page 6: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

UMLS Knowledge Sources

Multi-purpose tools or “intellectual middleware” for System Developers

• Metathesaurus

• SPECIALIST lexicon and lexical programs

• Semantic Network

Page 7: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

UMLS Knowledge Sources Distribution

• Annual updates, 1990 - -

• Free under license agreement with NLM– Need separate license agreements with vocabulary

producers for some uses of some vocabularies in the Metathesaurus

• Available to licensed users (~900) via Internet server and on CDs

• Relational format (ASN.1 retired due to lack of use, XML being developed)

Page 8: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

1999 UMLS Metathesaurus

• 626,313 concepts

• 1,134,413 “terms” (Eye, Eyes, eye = 1)

• 1,358,891 “strings”/concept names– (Eye, Eyes, eye = 3)

• ~50 source vocabularies

Page 9: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 10: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 11: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 12: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 13: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 14: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 15: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

UMLS Metathesaurus

• Concepts, terms, and attributes from many controlled “vocabularies”

• New inter-source relationships, definitional information, use information

• Scope determined by combined scope of source vocabularies

Page 16: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

UMLS Source “Vocabularies”

• Widely varying purposes, structures, properties, but all are in essence “sets of valid values” for data elements:– Thesauri, e.g., MeSH– Statistical Classifications, e.g., ICD– Billing Codes, e.g., CPT– Clinical coding systems, e.g., SNOMED – Lists of controlled terms, e.g., COSTAR, HL7 value

sets

Page 17: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

Metathesaurus Construction

• Convert machine-readable vocabulary sources to UMLS “normal” form, making source semantics explicit

• Merge, using source semantics and lexical processing techniques

• Edit results, adding additional relationships and semantic information

Page 18: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

$100,000 Metadata Questions

• What constitutes “explicit semantics” for Metadata?– At a minimum interpretable by humans– Preferably interpretable by machines

• How will the significant human effort required to create useful Metadata registries be organized and funded?

Page 19: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

Metathesaurus Characteristics (1)

• Concept organization

• Many sources in a common database format

• Representation of the meaning in each source vocabulary

• Explicit tagging of each source vocabulary’s information

Page 20: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

Current MeSH --Organized by Preferred Term

D015154Esophageal Motility Disorders (MH)

Esophageal Dysmotility (ET-- syn) Nutcracker Esophagus (ET-- nar)

Page 21: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

UMLS Metathesaurus -- Organized by Concept

C0014858Esophageal Motility Disorders (MeSH, Read)Esophageal Dysmotility (MeSH,Read) Oesophageal Dysmotility (Read)

C0028705Nutcracker Esophagus (MeSH, Read, ....) Symptomatic esophageal peristalsis (Read)

Page 22: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

Metadata Question

• What is the operational definition of synonymy in the realm of Metadata element names?– OR, When does a distinction make a difference

in Metadata?

Page 23: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 24: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 25: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

Metadata Question

• Will the Metathesaurus approach to “multiple meanings” work for data element names?– E.g., Country

• Country of Birth

• Country of Residence

• Country of Publication

– REMINDER: different data elements can have the SAME set of valid values

Page 26: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

SO|C0007452|L0007452|S0023004|LCH90|PT|U000852|0|SO|C0007452|L0007452|S0023004|MSH99|MH|D002417|0|SO|C0007452|L0007452|S0023004|PSY94|PT|08010|3|SO|C0007452|L0007452|S0023004|SNM2|RT|E-4994|3|SO|C0007452|L0007452|S0023004|SNMI98|SY|L-80100|3|SO|C0007452|L0010229|S0002635|RCD98|PT|X79op|3|SO|C0007452|L0010229|S0002635|SNM2|RT|E-4994|3|SO|C0007452|L0010229|S0002635|SNMI98|SY|L-80100|3|SO|C0007452|L0010229|S0364778|PSY94|ET|12270|3|SO|C0007452|L0010229|S0417039|AOD95|DE|0000014422|0||

Page 27: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

Metadata Question

• What level of explicit tagging is needed in Metadata Registries?

Page 28: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

Metathesaurus Characteristics (2)

• Added relationships between concepts and terms from different vocabularies

• Added definitional and use information

• “Context-free” unique identifiers – the concept “names” that never change

• Normalized word and string indexes produced using UMLS lexical tools

Page 29: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 30: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

CON|C0007452|ENG|P|L0007452|PF|S0023004|Cattle|CON|C0007452|ENG|S|L0010229|PF|S0002635|Cow|CON|C0007452|ENG|S|L0010229|VC|S0417039|cow|CON|C0007452|ENG|S|L0010229|VP|S0364778|Cows|CON|C0007452|ENG|S|L0530279|PF|S0604672|Bovine,NOS|CON|C0007452|ENG|S|L0530279|VO|S1428975|bovines|CON|C0007452|ENG|S|L0530314|PF|S0604663|Bovinespecies|CON|C0007452|ENG|S|L0530314|VC|S0596242|BOVINESPECIES|CON|C0007452|ENG|S|L1120284|PF|S1344523|bovid|CON|C0007452|ENG|S|L1193708|PF|S1428974|Bovidae

Page 31: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

CXT|C0007452|S0002635|RCD98|X79op|1|ANC|1|Readthesaurus|C0338370|.....|||CXT|C0007452|S0002635|RCD98|X79op|1|ANC|2|Organisms|C0029235|XM0Nm|||CXT|C0007452|S0002635|RCD98|X79op|1|ANC|3|Animal|C0003062|X79ol|||CXT|C0007452|S0002635|RCD98|X79op|1|ANC|4|Vertebrate|C0042567|XM0OI|||CXT|C0007452|S0002635|RCD98|X79op|1|ANC|5|Mammal|C0024660|X79pW|||CXT|C0007452|S0002635|RCD98|X79op|1|CCP||Cow|C0007452|X79op|||CXT|C0007452|S0002635|RCD98|X79op|1|SIB||Bat -animal|C0008139|X79om|||CXT|C0007452|S0002635|RCD98|X79op|1|SIB||Cat|C0677516|X79oo|||CXT|C0007452|S0002635|RCD98|X79op|1|SIB||Deer|C0011133|X79oq|||CXT|C0007452|S0002635|RCD98|X79op|1|SIB||Dog|C0012984|X79or|||CXT|C0007452|S0002635|RCD98|X79op|1|SIB||Horse|C0019944|X79ou|||

Page 32: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

Metadata Question

• In the realm of Metadata, what requires unique, permanent, context-free identifiers?

Page 33: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

Normalization -- example

• disorder esophageal motility = normalized form of:– Esophageal Motility Disorders– Esophageal Motility Disorder– Motility Disorder, Esophageal– Disorder, Esophageal Motility

Page 34: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 35: The UMLS* Metathesaurus*:  Lessons for Metadata Registries
Page 36: The UMLS* Metathesaurus*:  Lessons for Metadata Registries

Metadata Questions

• Are similar lexical resources needed as adjuncts to Metadata Registries?

• Are the UMLS lexical tools directly useful for Metadata efforts?