2006-03-219th open forum on metadata registries, kobe, japan1 xmdr project overview frank olken...
TRANSCRIPT
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 1
XMDR Project OverviewFrank Olken & Kevin D. Keck
{olken,kdkeck}@lbl.gov
Lawrence Berkeley National LaboratoryPresentation to
Open Metadata Forum
Kobe, JapanMarch 21, 2006
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 2
XMDR means:
Extended Metadata Registry
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 3
The Cast● Bruce Bargmeyer (LBNL) = Principal Investigator
● Kevin Keck (LBNL) = architect & stds. (design)
● Frank Olken (LBNL) = content characterization & stds. (design)
● John McCarthy (LBNL) = prototype development (management)
● Karlo Berket (LBNL) = prototype development
● Harold Solbrig (Mayo) = content preprocessing via LexGrid, stds
● Gayle Hodge (USGS) = content characterization, acquisition
● Denise Warzel (NCI) = content acquisition, standards, design
● Larry Fitzwater (EPA) = program mgt. (vision, direction)
● Nancy Lawler (DOD) = program mgt. (vision, direction)
● Sam Chance (DOD) = program mgt. (vision, direction)
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 4
Organizational Cast
● Lawrence Berkeley National Laboratory● Environmental Protection Agency● National Cancer Institute● Mayo Clinic● United States Geological Survey● Department of Defense
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 5
Goals● Assist revisions of ISO/IEC 11179 Metadata Registry
Standard to encompass additional semantic descriptions and resources
Vocabularies, thesauri, etc. Ontologies Relationships Semantic types
● Design and implement prototype Extended Metadata Registry
● Load metadata content into prototype● Demonstrate prototype
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 6
Why Metadata Registries?● Facilitate reuse/standardization/integration/exchange of data
● Design time:
Database / messaging / application / forms designers
Data warehouse design ● Run-time:
Query formulation / optimization
Federated data query optimization / processing
Extraction, Translation, Load (ETL) of Data Warehouses
Semantic services, composition, workflows, ...● Users
Finding, understanding data
Understanding data entry forms
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 7
Why Standards?
● Developing metamodel to serve as design for next generation metadata registries
● Evolve ISO/IEC 11179 Metadata Data Registry Standard Edition 2 (current)
● UML modeling, relational DB technology implementation Edition 3 (new)
● UML + OWL (Ontology Web Language) / MOF (Meta Object Facility) / CL (Common Logic) modeling
● Add support for ontologies
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 8
More on Why MDR Standards?
● MDR Standards Can improve metadata creation practice Can improve metadata and data reuse Facilitate MDR adoption by organizations Facilitate MDR interoperability Facilitate MDR software marketing Facilitate MDR procurement Facilitate alignment / mapping among metadata
schemas, ...
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 9
Proposed Changes to ISO/IEC 11179
● Support for ontologies, etc.● More formal modeling of relationships● Semantic types (?)
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 10
Changes to ISO/IEC 11179 Std.
● Add support for ontologies, vocabularies Add ontologies Add predicates (logical formulae) Add axioms (asserted to be true) Add support for modularization of ontologies
● Add inclusion mechanisms for concept systems and ontologies
● Assert axioms in context of containing ontology
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 11
Why add support for ontologies?
● More precise specification of data semantics (than natural language definitions)
● Machine processing of semantic specifications of data
Classification, subsumption testing, alignment, spatial, temporal reasoning
● Reusable semantic specifications for subject domains
● Conceptual data models to facilitate data integration
● Encoding of much current work on data semantics and terminologies as ontologies
● Useful for machine learning.
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 12
Issues in Including Ontologies in ISO/IEC 11179
● Lack of agreement on logical formalisms
FOL, description logic (which?), ...● Hence, MDR std must be agnostic among logic
formalisms● Poses difficulties for:
Standards specification MDR implementation MDR interoperability
● See work of OMG Ontology Definition Metamodel (ODM) standard
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 13
Changes to ISO/IEC 11179 Std.
● Formalize specification of semantic relationships Refinement of Edition 2 Classification Schemes Add relationships (types), roles, links (instances)
among concepts Specify attributes of relationships
● Reflexivity, irreflexivity, symmetry, anti-symmetry, transitivity
To support inference across semantic relationships● e.g., transitive closure over is-a, part-of, ...
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 14
Relationship Modeling in ISO/IEC 11179 Edition 3
● Edition 2 has classification schemes and specialized relationships among various metamodel entities
● Proposed for Edition 3
● Binary and N-ary semantic relationships among concepts (a.k.a. relations)
● Treat data element concept, conceptual value domain, value meaning, etc. as subtypes of concept
● More detailed characterization of relationships: Roles / links Reflexivity, symmetry, anti-symmetry, transitivity, ....
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 15
Why care about relationship characterization?
● Who cares about reflexivity, irreflexivity, symmetry, transitivity?
● Answer: need this information for inference on semantic relationships (usually binary) Example: Does it make sense to compute transitive
closure? ● Is-a: transitive● Part-of: sometimes transitive● Equals: transitive, symmetric● Similar: usually symmetric, typically not transitive
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 16
Semantic Types for ISO/IEC 11179
● ISO/IEC 11179 Edition 2 has “datatypes” Associated with “value domain” i.e., datatypes are an aspect of representation NOT
semantics● Semantic Types
Concern meaning rather than representation Uses:
● Constraints over relationship roles● Attribute of concepts, conceptual value domains, ...● Ubiquitous in ontologies, schemas, ...
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 17
Some Issues for Semantic Types● Alternative approaches:
Build semantic types into 11179 metamodel Reuse relationships for semantic type specifications Treat semantic types as unary predicates in
ontologies + axioms ● Should we have a standard set of semantic types
(at least base types) Yes, for interoperability No, for flexibility
● Collection types, type constructors ?
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 18
Why Construct A Prototype?● To explore alternative revisions to ISO/IEC 11179
● To demonstrate that proposed revisions to ISO/IEC 11179 Metadata Registry Std. are:
Feasible
Useful● To experiment with alternative architectures / technologies for
constructing extended metadata registries.
Text retrieval engines - Lucene
Inference engines – Jena, Kowari (?), ....
Service oriented architecture (SOA) ● To facilitate deployment of revised ISO/IEC Metadata Registries
Example implementation
Open Source Code !
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 19
Why Content?● Content characterization assists in shaping
revisions to ISO/IEC 11179● Content characterization assists in selection of
content to load● Content ingestion, installation, querying
provides a means to exercise the prototype Testing Demonstration Performance evaluation Utility evaluation
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 20
Metadata Content Activities
● Content Characterization e.g., graph theoretic characterization
● Content Acquisition● Content Preprocessing
Into standard formats for loading (H. Solbrig)● Content Loading● Content Querying
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 21
Desiderata for Content Selection
● Accessibility
Licensing, source cooperation, unclassified● Documentation, familiarity to XMDR collaborators
● Funder interest
● Diversity of metadata types, subject areas
● Diverse graph structures (of semantic relationships)
● OWL encodings available
● Moderate size
● Opportunities for mappings among metadata sets
● Multi-linguality
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 22
Content Characterization● Provenance: Name, source, contact, ...
● Type of metadata:
thesauri, ontology, ISO/IEC 11179 metadata registry, ...● Graph Characterization
Tree, Faceted Classification, partial order (directed acyclic graph), cyclic graph, ...
● Size: # concepts, # links, # bytes
● Definitions ?
● File Formats
● OWL encoding ?
● Multilingual
● Availability / licensing issues
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 23
Why Graph-theoretic Content Characterization?
● Important structural taxonomy
● Impacts:
Expressivity required of registry Content representation, index structures Search, matching algorithms Computational complexity of search, matching, ... Inference algorithms Computational complexity of inference Design / implementation / performance of metadata
registries
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 24
Loaded content metadatasets
● National Cancer Institute Thesaurus (NCIT)● Defense Technology Information Center (DTIC)
Thesaurus● General Multilingual Environmental Thesaurus
(GEMET)● Adult Mouse Anatomical Dictionary ● EPA Terms of the Environment● ISO 3166 Country Codes● ISO 4217 Currency Codes
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 25
Other Metadatasets of Interest● NCI Cancer Data Standards Repository (caDSR)
● EPA Environmental Data Registry (EDR)
● NLM Uniform Medical Language System (UMLS)
● USGS Geographic Names Information System (GNIS)
● Integrated Taxonomic Information System (ITIS)
● NBII Biocomplexity Thesaurus
● ISO 639 Language Identifiers
● Logical Observations, Identifiers, Codes (LOINC)
● Getty Thesaurus of Geographical Names (TGN)
● NASA Semantic Web Earth and Environmental Terminologies (SWEET)
● Dublin Core Metadata (?)
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 26
Conclusions
● XMDR Activities ISO/IEC 11179 Revisions
● Support for ontologies, etc.● Relationships● Semantic types
Prototype Development Content (characterization, loading, query) Prototype testing, performance evaluation, demos
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 27
Coming in Second Part of Talk (Kevin Keck) :
● Detailed discussion of the architecture and technology of the prototype ...
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 28
Acknowledgements
● Financial support from U.S. Dept. of Defense, U.S. Environmental Protection Agency
● In kind contributions from U.S. National Cancer Institute, Mayo Clinic, US Geological Survey
● Support from program managers: Nancy Lawler (DOD) and Sam Chance (DOD)
● Comments on drafts of this talk by John L. McCarthy
2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 29
Contact Information:
● Project: http://xmdr.org/
● Frank Olken: Lawrence Berkeley National Laboratory Email: [email protected] Tel: 510-486-5891 URL: http://www.lbl.gov/~olken