terminology metadata extension of the service meta model swg proposal january 2008
TRANSCRIPT
Agenda
• Background (5 min)• Review Proposed Model (15 min)• Discussion (5 min)• Vote• Next Steps (5 min)
Team Members
• Tom Johnson (Mayo)• Frank Hartel (NCI)• George Komatsoulis (NCI)• Sal Mungal (Duke)• Hua Min (Fox Chase)• Scott Oster (OSU)• Mike Riben (MD Anderson)• Brian Davis (3rd Millennium)
Background - Goals
• Goals: • Identify metadata queryable at the index service level• Narrow focus for first revision …
• Initial model defined to satisfy discoverydiscovery use cases
• Support development of enhanced grid discovery client• Resolve runtime services for terminologies of interest• Additional metadata available through runtime services
• Allow/anticipate future expansion
Background – Use Cases
• Use Case Collection & Classification of Attributes• Identification• Internationalization• Intended/Allowed Usage• Provenance• Administration
Background - Use Cases
• Samples• Browse Existing Ontologies• Viewing Differences• Detecting Recently Added Ontologies• Web of Trust
(1) Browse Existing Ontologies• An ontology developer is interested in creating an ontology
for a domain (e.g., radiographic anatomy).
• Determine if there are already similar ontologies in that domain. • Evaluates assigned categories for registered ontologies. • Discovers match for “anatomy”
• Views available titles and descriptions• Finds listings for “human” and “mouse” anatomy, but not “radiology” • Looks at the human anatomy ontology to see if it fits the need
Attributes: category, title, description
Background - Use Cases
(2) Viewing Differences • An ontology developer wants to view what has changed
between two versions of an ontology.
• Retrieve listing of registered terminology services• Sort by URI, then version• Select and resolve grid services for differing versions• Invokes runtime services to resolve and compare content
Attributes: uri, version
Background - Use Cases
(3) Detecting Recently Added Ontologies• A user wants to contact the providers for new ontologies
registered within the last quarter.
• Query registered ontologies by registration date• Pull point of contact information (source, curator, registration
authority) from listed items
Attributes: registration date, registration authority, source, curator
Background - Use Cases
(4) Web of Trust• Quality of ontologies:
• User is aware that there are several anatomy ontologies, and is unclear which to use.
• Trusts certain ontology sources (anatomists) more than others • Views ontology source to determine content origin• Views intended and example use to consider alignment with
application• Considers caBIG certification level
Attributes: source, intended use, example use, certification level
Background - Use Cases
Background – Model
• Focus of work on …• Model alignment
• External … Incorporate feedback from review and alignment with relevant specifications and standards.
• Internal … Take better advantage of previously registered models and classes.
• Incorporating specific feedback on model classes and attributes.
Background - Alignment
• Specifications/standards considered …• Dublin Core• ISO 11179-2/3/6: classification, registries, admin• LexGrid/LexBIG model• National Center for Biomedical Ontology (NCBO) BioPortal• Public Health Information Network (CDC/PHIN)• Simple Knowledge Organization System (SKOS core)• UMLS Rich Release Format (RRF)• CTS/CTS2
Background – Model Alignment
• Findings …• No silver bullet• General alignment for defined items
• All SWG items and definitions represented conceptually in one or more specifications
• Adequate, but not perfect, alignment of semantics
• Some name changes
• Some new attributes identified• Supplement existing use case
• Generally not found to be required unless we add use cases
Model - Overview
class Domain Objects
The Domain class model captures essential information about objects in the domain.
TerminologyMetaData
+ category: String [0..n]+ defaultLanguage: String = eng+ description: String [0..1]+ keyword: String [0..n]+ localName: String [1..n]- structure: StructureType+ supportedContentType: String [1..n]+ supportedLanguage: String [1..n]+ title: String+ type: typeEnum [0..1]+ uri: String
TerminologyUsage
+ exampleUse: String [0..n]+ intendedUse: String [0..n]+ isRestricted: isRestrictedType+ rights: String [0..n]+ rightsHolder
TerminologyProv enance
+ curator [0..1]+ releaseDate: Date+ releaseFormat: String+ releaseLocation: String+ releasePackage: String+ releaseVersion: String+ source [0..1]
TerminologyAdmin
+ certification: certificationType [0..1]+ registrationAuthority+ registrationDate: Date+ registrationStatus: registrationStatusType+ registrationTag: String [0..n]
+terminologyMetaData
1
hasStatus
+terminologyAdmin 1+terminologyMetaData 1
hasProvenance
+terminologyProvenance 1
+terminologyMetaData
1
hasUsage
+terminologyUsage 1
Model – Core Identification& Description
• uri (1)• Unique persistent identifier.• urn:oid:2.16.840.1.113883.6.2
• title (1)• Formal or published name for display.• International Classification of Disease, 9th…
• localName (1..n)• Name used to refer to the terminology within a
localized context; often a mnemonic.• ICD-9-CM, ICD-9
• description (0..1)• Human-readable explanation or narrative.• The International Classification of …
• category (0..n)• Applicable domains or scientific fields.• e.g. anatomy, genomic, proteomic,
phenotype…
class Logical Model
TerminologyMetaData
+ category: String [0..n]+ defaultLanguage: String = eng+ description: String [0..1]+ keyword: String [0..n]+ localName: String [1..n]- structure: StructureType+ supportedContentType: String [1..n]+ supportedLanguage: String [1..n]+ title: String+ type: typeEnum [0..1]+ uri: String
• type (0..1)• Nature of content relative to the category.• application – describes domain in an application
dependent manner• core – describes domain in an application
independent manner• domain – describes the most important
concepts in a domain• task – describes generic types of tasks or
activities (e.g. selling, selecting)• upperLevel – describes general, domain
independent concepts (e.g. space, time)
• structure (1)• Indicates complexity of maintained relationships• flat – no hierarchy• simple - supports a single inheritance mono-
hierarchical structure.• complex - supports multiple relationships and/or
relationship types
Model – Core Identification& Description
class Logical Model
TerminologyMetaData
+ category: String [0..n]+ defaultLanguage: String = eng+ description: String [0..1]+ keyword: String [0..n]+ localName: String [1..n]- structure: StructureType+ supportedContentType: String [1..n]+ supportedLanguage: String [1..n]+ title: String+ type: typeEnum [0..1]+ uri: String
• defaultLanguage (1)• Language for text unless otherwise specified• eng
• supportedLanguage (1..n)• Languages supported for text-based content• eng, spa, …
• supportedContentType (1..n)• Supported type of text or imbedded multimedia• e.g. mime type (text/plain, image)
• keyword (0..n)• Words or phrases of special significance.• patient record, nursing protocol, …
Model – Core Identification& Description
class Logical Model
TerminologyMetaData
+ category: String [0..n]+ defaultLanguage: String = eng+ description: String [0..1]+ keyword: String [0..n]+ localName: String [1..n]- structure: StructureType+ supportedContentType: String [1..n]+ supportedLanguage: String [1..n]+ title: String+ type: typeEnum [0..1]+ uri: String
Model - Usage
• intendedUse (0..n)• Human-readable description of intended use.• data integration
• exampleUse (0..n)• Human-readable example of use.• Integration of protein data.
• isRestricted (1)• Indication of intellectual property boundaries.• true
• rights (0..n)• Human-readable description of IP rights.• NCI Thesaurus terms of use …
• rightsHolder (point of contact) (0..1)• Contact point for intellectual property rights.• National Cancer Institute
class Logical Model
TerminologyUsage
+ exampleUse: String [0..n]+ intendedUse: String [0..n]+ isRestricted: isRestrictedType+ rights: String [0..n]+ rightsHolder
Model - Provenance
• source (0..1)• Origin or provider of content• National Center for Health Statistics (NCHS)
• curator (0..1)• Maintains the content in the release format (e.g.
OWL, OBO, RRF)• National Library of Medicine
• releaseDate (0..1)• Date of availability in released format.• 2007-08-30
• releaseFormat (0..1)• Format as released by the curator.• e.g. OWL, OBO, RRF
• releaseLocation (0..1)• Location of resource in the releaseFormat.• ftp://ftp1.nci.nih.gov/pub/cacore/EVS/
NCI_Thesaurus/Thesaurus_07.12a.OWL.zip
class Logical Model
TerminologyProv enance
+ curator [0..1]+ releaseDate: Date+ releaseFormat: String+ releaseLocation: String+ releasePackage: String+ releaseVersion: String+ source [0..1]
Model - Provenance
• releasePackage (0..1)• Name of the composite ontology or meta
distribution containing the terminology as released.
• e.g. UMLS, NCI_MetaThesaurus, BiomedGT
• releaseVersion (0..1)• Represented version identifier.
• 2007
class Logical Model
TerminologyProv enance
+ curator [0..1]+ releaseDate: Date+ releaseFormat: String+ releaseLocation: String+ releasePackage: String+ releaseVersion: String+ source [0..1]
Model - Administration
• registrationAuthority (1)• Responsible for maintaining content on the grid• National Cancer Institute
• registrationDate (1)• Date of grid availability or last change of
registration status.• 2007-09-30
• registrationStatus (1)• Designation of terminology status in life cycle.• Possible values from 11179-3 registration life
cycle status category.
• registrationTag (0..1)• Supports lookup by version-agnostic designation• development, test, production
• certification (0..1)• caBIG level of compliance.• bronze, silver, gold
class Logical Model
TerminologyAdmin
+ certification: certificationType [0..1]+ registrationAuthority+ registrationDate: Date+ registrationStatus: registrationStatusType+ registrationTag: String [0..n]
«enumeration»registrationStatusType
candidate incomplete preferredStandard qualified recorded retired standard superceded
Model – Anticipated Alignmentagainst available classes
class Domain Objects
The Domain class model captures essential information about objects in the domain.
TerminologyMetaData
+ abbreviation: java.lang.String [1..n]+ category: java.lang.String [0..n]+ defaultLanguage: java.lang.String = eng+ description: java.lang.String [0..1]+ keyword: java.lang.String [0..n]- structure: StructureType+ supportedContentType: java.lang.String [1..n]+ supportedLanguage: java.lang.String [1..n]+ title: java.lang.String+ type: typeEnum [0..1]+ uri: java.lang.String
TerminologyUsage
+ exampleUse: java.lang.String [0..n]+ intendedUse: intendedUseType+ isRestricted: isRestrictedType+ rights: java.lang.String [0..n]+ rightsHolder: java.lang.String
TerminologyProv enance
+ curator: java.lang.String [0..1]+ releaseDate: Date+ releaseFormat: java.lang.String+ releaseLocation: java.lang.String+ releaseVersion: java.lang.String+ source: java.lang.String [0..1]
TerminologyAdmin
+ certification: java.lang.String [0..1]+ registrationAuthority+ registrationDate: Date+ registrationStatus: registrationStatusType+ registrationTag: java.lang.String [0..n]
+terminologyMetaData
1
hasStatus
+terminologyAdmin 1+terminologyMetaData 1
hasProvenance
+terminologyProvenance 1
+terminologyMetaData
1
hasUsage
+terminologyUsage 1
SuperclassesBased on 11179
Vote
• Vote will be for …• Approval of the identified criteria• Acknowledgement that model will be aligned with
existing (e.g. 11179-based) superclasses, with model and attribute details to be addressed as required.