margaret haber, rn, ocn co-director enterprise vocabulary services national cancer institute rcrim...
TRANSCRIPT
Margaret Haber, RN, OCN Co-Director
Enterprise Vocabulary ServicesNational Cancer Institute
RCRIM Vocabulary and Controlled Terminology for
Clinical Research
Clinical Research Challenges• Fundamental new capacities to
characterize and intervene in biological systems and the disease process
• Hampered by our inability to integrate huge volumes of data due to information fragmentation
• Many diverse research and delivery platforms that are disconnected due to a lack of common, interoperable systems and semantics
• The problem is International in scope, and with enormous implications for our ability to translate information into knowledge
No Controlled Terminology?No Interoperability
Systems cannot exchange or use information if they use incompatible codes or tokens to signify meaning Terminology services provide token and codes Proper use of them assures consistent meaning across and among enterprises
The Pillars of InteroperabilityNecessary but not sufficient
Common information models across all domains of interestA foundation of rigorously defined data types (metadata)A methodology for interfacing with controlled vocabularies
Interoperability Keys for Terminology
Use of Industry Standards, where feasible Must allow for extensions to core standards Specialty terminology remains common Mapping is therefore essential
Conformance with Data Models For process (logical models) For data flow (messages) For data at rest (database design)
Clinical Data Interchange Standards Consortium (CDISC)
CDISC is an open, multidisciplinary, non-profit organization committed to the development of worldwide industry standards to support the electronic acquisition, exchange, submission and archiving of clinical trials data and metadata for medical and biopharmaceutical product development.
HL7 (Health Level Seven)
HL7 is a volunteer, ANSI-accredited Standards Developing Organization (SDO) that focuses on clinical and administrative healthcare data.
Mission:"To provide standards for the exchange, management and integration of data that support clinical patient care and the management, delivery and evaluation of healthcare services. Specifically, to create flexible, cost effective approaches, standards, guidelines, methodologies, and related services for interoperability between healthcare information systems.“
Bringing It All
Together: RCRIM • The HL7 “Regulated Clinical Research Information Management” Technical Committee formed as a collaboration of CDISC, FDA, and HL7
To facilitate the development of common standards for clinical research information management across a variety of organizations, including government agencies, private research efforts, and sponsored research
To develop standards for interchange of regulated data that are interoperable with general healthcare standards.
HL7 Vocabulary (including RCRIM)
Value sets associated with certain domain portions of HL7 modelsMost vocabulary domains are published as informative references onlyThose domains that have a formal ballot status are shown in bold in the HL7 vocabulary tables on their web siteThere are current initiatives to map these values to standard controlled terminologies
HL7 Vocabulary - Access
HL7 publishes at http://www.hl7.org/library/data-model/RIM/C30204/vocabulary.htmThere are approximately 8,000 terms or “concepts” in the current HL7 vocabularyScroll down to select a specific “table” or set of termsAlso available through an NCI developed “HL7 SDK” (software development kit) application toolConversion notes are included, see “HL7_Design.pdf” on NCI’s website
What’s Happening Now?CDISC, RCRIM and NCI
CDISC terminology group has established an independent working environment at NCI for the specification and development of broad based clinical trials standard terminology, based on CDISC models (SDTM) Using the NCI Data Standards Repository (caDSR), which draws controlled terminology from NCI EVS systems, including but not limited to leveraging NCI Thesaurus resources for novel terminology development
Collaboration
These open standards, developed in collaboration with FDA, NIH, HL7 and industry experts, can provide the basis for a controlled terminology set submitted to HL7 RCRIM as proposed standards for adoption by the clinical trials community
Where? NCI EnterpriseVocabulary Services (EVS)
Services and resources that address NCI and Partner’s needs for controlled vocabulary http://evs.nci.nih.gov/ A collaboration NCI Office of Communications
Physician Data Query (PDQ), Clinical Trials Portal, Cancer Information Service and the NCI web portal www.cancer.gov
NCI Center for Bioinformatics Bioinformatics Core Infrastructure
(caCORE), including a metadata repository (caDSR) and object models built using EVS terminology for their core semantics
NCI EVS Goal – Integration by Meaning
Clinical, translational, and basic research terminology have overlapping but specialized needs, therefore EVS assists to:
Integrate different conceptual frameworksCreate terminological and taxonomic
conventions across systems Vocabulary Products NCI Thesaurus – an ontology-like terminology NCI Metathesaurus – maps vocabularies External vocabularies maintained and
served: MedDRA, HL7, NDF-RT, LOINC, etc.
NCI Thesaurus (NCIt)
Reference Terminology for NCI, PartnersA Federal Standard TerminologyBroad coverage of the cancer, other research, and clinical domain including prevention and treatment trials Neoplastic and other Diseases Findings and Abnormalities Anatomy, Tissues, Subcellular Structures Agents, Drugs, Chemicals Genes, Gene Products, Biological Processes Animal Models – Mouse, other Research techniques and management,
apparatus, clinical trials, lab, radiology, imagery
NCI Thesaurus (2)
Published MonthlyPublic domain, open content licenseAvailable on-line and by download (OWL, Ontylog XML, flat files)55,000+ “Concepts” hierarchically organizedDescription-logic based “Roles” establish machine readable semantic relationships between Concepts
17
NCI Thesaurus is Deployed:
http://nciterms.nci.nih.gov
http://www.nci.nih.gov/EVS (full documentation)• API: caCORE public access• Fulfills NCI and collaborators’ needs
for controlled vocabulary• Public domain, open content license
Example Disease Concept
Gastric Mucosa-Associated Lymphoid Tissue Lymphoma A low grade, indolent B-cell lymphoma, usually associated with Helicobacter Pylori
infection. Morphologically it is characterized by a dense mucosal atypical lymphocytic (centrocyte-like cell) infiltrate with often prominent lymphoepithelial lesions and plasmacytic differentiation. Approximately 40% of gastric MALT lymphomas carry the t(11;18)(q21;q21). Such cases are resistant to Helicobacter Pylori therapy. -- 2003
Molecular abnormalities:Disease_May_Have_Cytogenetic_Abnormality: Trisomy 3Disease_May_Have_Cytogenetic_Abnormality: Trisomy 18Role group 1:
Disease_May_Have_Cytogenetic_Abnormality: t(11;18)(q21;q21)Disease_May_Have_Molecular_Abnormality: AP12-MLT fusion protein expression
Histogenesis:Disease_Has_Normal_Cell_Origin: Post-germinal center marginal zone B-lymphocyte
Pathology:Disease_Has_Abnormal_Cell: Centrocyte-like cellDisease_May_Have_Abnormal_Cell: Neoplastic monocytoid B-lymphocyteDisease_May_Have_Abnormal_Cell: Neoplastic plasma cellDisease_May_Have_Finding: Lymphoepithelial lesion
Anatomy:
Disease_Has_Primary_Anatomic_Site: StomachDisease_Has_Normal_Tissue_Origin: Gut associated lymphoid tissue
Clinical information:
Disease_May_Have_Finding: Indolent clinical courseDisease_May_Have_Associated_Disease: Hepatitis C
A holistic view of information exchange also requires broader interoperability, but where do we place the fences?
Clinical data, regulatory submissions, discovery research?Industry agreements, nationally accredited, global standardization?
One answer is mapping: Relating Terminologies for
Effective Data Exchange
Mapping: NCI Metathesaurus
A filtered version of the NLM UMLS Metathesaurus, extended with additional required vocabularies 1,100,000 concepts, 2,200,000+ terms
and phrases with definitions Mappings among over 55 vocabularies Extensive synonymy: Over 40,000 terms
for neoplasms mapped to 7,000 concepts
Used as online dictionary and thesaurus, for mapping and document indexing
NCI Metathesaurus (2)
Minor releases monthly, Major releases two to three times a year Provides a mapped overlap and partial inter-relation of current versions of NCI and partner required vocabularies, ex. The ICD’s, MedDRA, SNOMED, MeSH (NLM Medical Subject Headings), HCPCS (procedures), LOINC (lab values), drug terminologies (VA NDF-RT, AOD, RxNORM, Multum, NCI Thesaurus drugs, etc.)
1/12/2006 #22
NCI Metathesaurus: Browser Example
EVS Products & Services Are Open
NCI Thesaurus is Open Content ftp://ftp1.nci.nih.gov/pub/cacore/EVS/ThesaurusTermsofUse.htmNCI Metathesaurus is Mostly Open SourceSee Each Source’s License http://ncimeta.nci.nih.gov/MetaServlet/GenerateSourcesServletNCI EVS Servers Are Freely Accessible
On the Web: Via API:All Software Developed by NCI EVS is Public Open Source and Free for the Asking:
http://nciterms.nci.nih.gov and http://ncimeta.nci.nih.gov
http://ncicb.nci.nih.gov/core/caBIO
http://ncicb.nci.nih.gov/core
NCI builds on EVS via caCORE Infrastructure
Enhanced Information integrationCross-discipline
reasoning capabilities
biomedical objects
common data elements
controlled vocabulary
Enterprise VocabularyNCI Meta-Thesaurus (Cross-mapped standard vocabularies, e.g. ICD’s, MedDRA, SNOMED)
Semantic integration, inter-vocabulary mapping among 55+ vocabularies
UMLS Metathesaurus extended with numerous additional vocabularies
1,100,000+ Concepts, 2,200,000 terms and phrases
NCI Thesaurus Description logic-based 55,000+ “Concepts”
Concept is the semantic unit One or more terms describe a
Concept – synonymy Semantic relationships between
Concepts
Freestanding terminologies MedDRA, MGED, NDF-RT, GO, SNOMED, etc.
biomedical objects
common data elements
controlled vocabulary
Common Data Elements (caDSR)
Structured data reporting elementsPrecisely defined, harmonized questions and answers Standardized questions
for forms Standard lists of coded
valid values for answers
biomedical objects
common data elements
controlled vocabulary
Biomedical Information Objects (caBIO)
UML object models representing clinical and research entities such as genes, sequences, chromosomes, pathways, etc.Public access APIs provide an information interface independent of back-end data platforms
biomedical objects
common data elements
controlled vocabulary
Controlled Terminology is integrated into NCI’s standards supporting infrastructure
Enterprise Vocabulary Services (EVS) Core Semantics for caCORE and many other
applications Public access browsers APIs
cancer Data Standards Repository (caDSR) ISO 11179 metadata repository Common Data Elements (CDE’s) for multiple
templates, such as Case Report Forms, drawn from EVS terminology
cancer Bioinformatics Infrastructure Objects (caBIO)
UML Models annotated with EVS concepts/terms, loadable into caDSR
Public access APIs
EVS: Extending Interoperability Beyond the Enterprise
Leverage Collaborations Federal: FDA, VA, CDC, other NIH
Institutes Major Standards Organizations: HL7,
CDISC, W3C Cancer Centers and Cooperative Groups
(caBIG, caGRID) Many research collaborators such as the
Microarray Gene Expression Data Society (MGED)
FDA-NCI MOUSignificance of MOU NCI is leveraging its terminology-related
resources to address FDA needs Avoids expenditure at FDA to replicate existing,
available resources at NCI, increases return on investment for NIH/NCI
Leverages multiple efforts FDA collaboration with NIH/NCI will result in
improved trial drug and related regulatory terminology for the broader clinical trials community
FDA and NCI are to coordinate regarding terminology standards efforts such as HL7 RCRIM (including CDISC)
Example:NCI EVS and FDA SPL
NCI EVS maintains and provides access to FDA SPL TerminologyNCI Thesaurus will be a primary namespace usedAlso FDA standard terminology for the ICSR, IND/NDA, device nomenclature, others Access Via
Download at ftp://ftp1.nci.nih.gov/pub/cacore/EVS/ Public, open API http://ncicb.nci.nih.gov/core/caBIO Web Servlet at http://nciterms.nci.nih.gov
Concept DetailsURI: http://nciterms.nci.nih.gov:80/NCIBrowser/ConceptReport.jsp?
dictionary=NCI_Thesaurus&code=C42887Version: December 30, 2004 (04.12g)
Aerosol Dosage FormIdentifiers: name Aerosol_Dosage_Form code C42887Information about this concept: Preferred_Name Aerosol Dosage Form Semantic_Type Manufactured Object DEFINITION FDA|A product that is packaged under pressure and
contains therapeutically active ingredients that are released upon activation of an appropriate valvesystem; it is intended for topical application to theskin as well as local application into the nose(nasal aerosols), mouth (lingual aerosols), or lungs (inhalation aerosols).
Synonym with source data AER|AB|FDA_CDER|246 Synonym with source data Aerosol Dosage Form|PT|NCI Synonym with source data Aerosol|PT|FDA|246 Synonym AER Synonym Aerosol Synonym Aerosol Dosage Form Synonym Aerosol Dose FormSuperconcepts:
Pharmaceutical Dosage FormSubconcepts:
Aerosol Foam Dosage FormAerosol Spray Dosage FormMetered Aerosol Dosage FormPowder Aerosol Dosage Form
This indicates the concept is used in the FDA Structured Product Label (SPL)
A Vital Collaboration:CDISC and NCI –Shared models, metadata standards, and core semantics drawn from standard terminology
CDISC terminology group is working with NCI tools through EVS for the specification and development of broad based clinical trials standard terminology, based on CDISC models CDISC is using the NCI Data Standards Repository and controlled terminology from NCI EVS, including but not limited to NCI Thesaurus, for novel terminology developmentThese open CDISC standards, developed in collaboration with FDA, NIH, HL7 and others, can provide the basis for a controlled terminology set able to be adopted across the clinical trials community
NCIt Browser: CDISC Tagged Concept
35
Terminology concept for Race showing harmonization of different users, including CDISC, NCI, CDC, etc.
NCI ThesaurusConcept: Race
36
Benefits of Terminology Development in a Common Environment
A Step Towards Semantic Interoperability
• Support and maintenance of terminologies in NCI EVS provides access to and common usage of standard terminologies
• Enables use of controlled terminology by clinicians and researchers for data encoding, retrieval, reporting, and aggregation
• Facilitates collaboration and information exchange by increasing the ability to predictably use information that is gathered
• Leverages the power of shared knowledge
You can collaborateJoint Participation: In standards groups such as HL7 RCRIM in order to inform relevant standards decisionsJoint Development: Contributing to clinical trials standard terminology development efforts, i.e. through CDISC terminology groupProviding validation and testing: Content and modeling developed with industry input is more robust, better able to meet your needs, and you can better plan/anticipate implementation/impacts on your organization
Participate in HL7 RCRIM
The HL7 “Regulated Clinical Research Information Management” Technical Committee, formed as a collaboration of CDISC, FDA, and HL7
To facilitate the development of common standards for clinical research information management across a variety of organizations -- including government agencies, private research efforts, and sponsored research
To develop standards for interchange of regulated data that are interoperable with general healthcare standards.
Contact:Margaret W. Haber, RN, OCNCo-DirectorNCI Enterprise Vocabulary ServicesNCI Office of the [email protected]://evs.nci.nih.gov/