betsy l. humphreys betsy l. humphreys associate director for library operations nlm, nih, hhs nlm,...
TRANSCRIPT
Betsy L. HumphreysBetsy L. HumphreysAssociate Director for Library OperationsAssociate Director for Library Operations
NLM, NIH, HHSNLM, NIH, HHS [email protected]@nlm.nih.gov
National Library of MedicineNational Library of Medicine
CENDI Staff WorkshopCENDI Staff Workshop
Knowledge Organization Systems: Current and Future UsesKnowledge Organization Systems: Current and Future UsesSeptember 16, 2004September 16, 2004
2
NLM “Knowledge Organization Systems”NLM “Knowledge Organization Systems”
Name and Series/Journal Authority FilesName and Series/Journal Authority Files Library Materials ClassificationLibrary Materials Classification Individual Controlled Vocabularies Individual Controlled Vocabularies
MeSH, MedlinePlus Health Topics, NCBI MeSH, MedlinePlus Health Topics, NCBI Taxonomy, RxNorm clinical drug vocabularyTaxonomy, RxNorm clinical drug vocabulary
Unified Medical Language System (UMLS) Unified Medical Language System (UMLS) Knowledge SourcesKnowledge Sources
Metathesaurus – Metathesaurus – many many vocabularies in a common, vocabularies in a common, integrated formatintegrated format
Semantic NetworkSemantic Network LexiconLexicon Associated toolsAssociated tools
3
NLM “Knowledge Organization Systems”NLM “Knowledge Organization Systems”
Common CharacteristicsCommon Characteristics Searchable on the Web, often interlinked with Searchable on the Web, often interlinked with
other NLM resourcesother NLM resources Distributed in one or more electronic formatsDistributed in one or more electronic formats Used within NLM for:Used within NLM for:
Information retrieval and displayInformation retrieval and display Data creationData creation Natural language interpretationNatural language interpretation
Heavily used outside NLM for wide range of Heavily used outside NLM for wide range of applicationsapplications
Most built and maintained with custom systemsMost built and maintained with custom systems
6
Medical Subject Headings (MeSH)Medical Subject Headings (MeSH)
Structure of MeSH upgraded in 2000Structure of MeSH upgraded in 2000 Descriptor Class – closely related concepts Descriptor Class – closely related concepts
grouped to enhance retrievalgrouped to enhance retrieval Concept – distinct meaningConcept – distinct meaning Term – concept nameTerm – concept name
http://www.nlm.nih.gov/mesh/meshrels.html
7
Known Translations of MeSHKnown Translations of MeSH
In UMLS - Dutch, Finnish, French, German, In UMLS - Dutch, Finnish, French, German, Italian, Japanese, Portuguese, Russian, Spanish, Italian, Japanese, Portuguese, Russian, Spanish, SwedishSwedish
Other Complete Translations Other Complete Translations Arabic, Chinese, Czech, Greek, Thai, TurkishArabic, Chinese, Czech, Greek, Thai, Turkish
In Progress or Planned or Hoped ForIn Progress or Planned or Hoped For Korean, Slovenian, Vietnamese, Lithuanian, Korean, Slovenian, Vietnamese, Lithuanian,
Polish, Slovakian, Norwegian, Kiswahili Polish, Slovakian, Norwegian, Kiswahili
8
Coordinating Translations How?Coordinating Translations How?
Single Database - Web InterfaceSingle Database - Web Interface Add Language as a Term PropertyAdd Language as a Term Property Translated Terms added to ConceptTranslated Terms added to Concept Non-English Concepts added to DescriptorNon-English Concepts added to Descriptor
11
Status of UseStatus of Use
Current Active GroupsCurrent Active Groups German, French, Italian, VietnameseGerman, French, Italian, Vietnamese
Groups Beginning Work with MTMSGroups Beginning Work with MTMS Dutch, Finnish, Japanese, Polish, SlovakianDutch, Finnish, Japanese, Polish, Slovakian
Groups Starting SoonGroups Starting Soon Czech, Portuguese, Korean, Norwegian, Russian, Czech, Portuguese, Korean, Norwegian, Russian,
SpanishSpanish
19
The UMLS in practiceThe UMLS in practice
DatabaseDatabase Series of relational filesSeries of relational files
InterfacesInterfaces Web interface: Knowledge Source Server (UMLSKS)Web interface: Knowledge Source Server (UMLSKS) Application programming interfacesApplication programming interfaces
(Java and XML-based)(Java and XML-based)
ApplicationsApplications lvg (lexical programs)lvg (lexical programs) MetamorphoSys (installation and customization)MetamorphoSys (installation and customization) SOON: Metathesaurus browserSOON: Metathesaurus browser
The UMLS is The UMLS is notnot an end-user application an end-user application
20
UMLS UMLS 3 components3 components
MetathesaurusMetathesaurus ConceptsConcepts Inter-concept relationshipsInter-concept relationships
Semantic NetworkSemantic Network Semantic typesSemantic types Semantic network relationshipsSemantic network relationships
Lexical resourcesLexical resources SPECIALIST LexiconSPECIALIST Lexicon Lexical toolsLexical tools
21
Metathesaurus Source VocabulariesMetathesaurus Source Vocabularies
134 source vocabularies134 source vocabularies 126 contributing concept names126 contributing concept names
73 families of vocabularies73 families of vocabularies multiple translations (e.g., MeSH, ICPC, ICD-10)multiple translations (e.g., MeSH, ICPC, ICD-10) variants (American-English equivalents, Australian variants (American-English equivalents, Australian
extension/adaptation)extension/adaptation) subsequent editions usually considered distinct families subsequent editions usually considered distinct families
(ICD: 9-10; DSM: IIIR-IV)(ICD: 9-10; DSM: IIIR-IV)
Broad coverage of biomedicineBroad coverage of biomedicine Common presentationCommon presentation
(2004AB)
22
Metathesaurus Concepts
ConceptConcept (> 1M)(> 1M) CUICUI Set of synonymousSet of synonymous
concept namesconcept names
TermTerm (> 3.8 M)(> 3.8 M) LUILUI Set of normalized namesSet of normalized names
StringString (> 4.3M)(> 4.3M) SUISUI Distinct concept nameDistinct concept name
AtomAtom (> 5.1M)(> 5.1M) AUIAUI Concept nameConcept name
in a given sourcein a given source
(2004AB)
A0000001A0000001 headacheheadache (source 1)(source 1)A0000002A0000002 headacheheadache (source 2)(source 2)
S0000001S0000001
A0000003A0000003 HeadacheHeadache (source 1)(source 1)A0000004A0000004 HeadacheHeadache (source 2)(source 2)
S0000002S0000002
L0000001L0000001
A0000005A0000005 CephalgiaCephalgia (source 1)(source 1)S0000003S0000003
L0000002L0000002
C0000001C0000001
23
Metathesaurus Relationships
Symbolic relations:Symbolic relations: ~9 M pairs of concepts~9 M pairs of concepts Statistical relations :Statistical relations : ~7 M pairs of concepts ~7 M pairs of concepts
(co-occurring concepts)(co-occurring concepts) Mapping relations:Mapping relations: 100,000 pairs of 100,000 pairs of
conceptsconcepts
Categorization: Relationships between concepts Categorization: Relationships between concepts and semantic types from the Semantic Networkand semantic types from the Semantic Network
24
Why you might care about the UMLSWhy you might care about the UMLS
Content with applicability outside of biomedicineContent with applicability outside of biomedicine Tools generally useful in NLP, dataminingTools generally useful in NLP, datamining New Metathesaurus Rich Release FormatNew Metathesaurus Rich Release Format
Potentially useful as format for distribution of any set Potentially useful as format for distribution of any set of vocabularies/ontologies and for robust purpose-of vocabularies/ontologies and for robust purpose-specific mappings between such systemsspecific mappings between such systems
May well lead to development of a variety of tools that May well lead to development of a variety of tools that can output or ingest the formatcan output or ingest the format