rda metadata semantics rich metadata semantics needed for human and computer understanding but...

16
RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated procedure.... Metadata and Semantics Research Conference, since 2005 Gary Berg-Cross SOCoP, RDA US Advisory Committee

Upload: archibald-hensley

Post on 16-Jan-2016

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

RDA Metadata SemanticsRich Metadata Semantics needed for human AND computer

understanding butMapping metadata schemas to ontologies can be a complicated

procedure....

Metadata and Semantics Research Conference, since 2005

Gary Berg-CrossSOCoP, RDA US Advisory Committee

Page 2: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

Outline of Topics

1. Metadata- many Standards and some Ambitious MD Requirements2. RDA Metadata-Semantic Discussions & Background

1. Rich Metadata Semantics needed for human AND computer understanding

2. Semantic approaches needed for MD schemas1. Adding formal semantics to metadata schemas for discovery, and queries,

mediation/linking and reasoning use an be a complicated procedure....

3. Illustrating 2 Semantic approaches1. Semantic Annotation2. Example of an Ontological Schema

4. Are we ready for metadata semantics to be widely used?• Where are the opportunities?• Can we agree on common or domain principles (like modularity or building

blocks) or some formal semantic requirements?

Page 3: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

Recap on (Richer) Metadata Type Structure (includes Linked Data)

From: The potential of metadata for linked open data and its value for users and publishers by Anneke Zuiderwijk, Keith Jeffery, Marijn Janssen

Different types or degrees of semantics may be appropriate for different tasksLOD needs semantics for context...

CERIF provides a “much richer metadata than the standards used commonly with LOD and so improves greatly the experience of the end user (or the advantages of providing metadata.)”

Page 4: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

Metadata & Standards Evolution from file system names/types & Describing DB Fields to MD Schemas

for Exchange Dublin Core attaching categorical tags and descriptions via a MD schema

Attempt to make data more human understandable – capture agreed upon MD that affords understanding

The MD effort now requires many interacting pieces including Metadata Application Profiles and Workflow like entities

Page 5: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

Strategy of “Modular” Theory of General and Domain Specific MD (and Ontologies)

Standardized Geo-specific metadata

Standardized BioMed-specific metadata

Standardized EarthScience-specific metadata

Trans-Domain (General Consensus) Metadata ID, time.... ISO MD_Keywords:Discipline, Place, Stratum, Temporal, Theme?

Independent??

“Harmonized”And Packaged

Together

Modules should be easier to create, validate, understand and maintainThey may be substituted for and used and reused for composition

SupportInteroperabiity

Page 6: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

There are specific “standards” in domains• [ISO 19115:2003] Geographic information --

Metadata• [ISO 19115-2:2009] Geographic information

-- Metadata -- Part 2: Extensions for imagery and gridded data

• In OGC’s O&M model Earth Observations generate “products” that have metadata.

• These are organized into a metadata profile organized as a schema

General MD

Other MD

Support bridging heterogeneityTo achieve interoperability

Support data integration.

OGC Object TypesaxisaxisDirectiondatumdataTypederivedCRSTypedocumentTypeellipsoidfeatureTypegroupMeaning....

Page 7: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

Some Metadata Challenges (Earth Science from Ilya Zaslavsky, CINERGI* pipeline) Common deficiencies in existing metadata descriptions:1. Different metadata models and profiles,

1. Different details of requirements mandatory and optional fields (Dublin Core vs ISO)2. Different meaning of fields and initial purpose/emphasis of data collection3. Different local interpretations of how these fields should be filled out (eg “authors” and “contacts” are often mixed up).

2. Different classifications of resource types 1. Common resource types are: Organization, Webpage, Collection, Dataset (EPOS -Users, SW services, computing services)

3. Title may be non-descriptive 1. insufficiently unique (“Roads”) 2. meaningful, but opaque naming patterns (eg “AXXX34nn1”)

4. Keywords 1. may be missing or may be too specific to domain2. may lack references to a thesaurus/CV or are freeform text

5. Info missing such as Abstract, Contact saying “call”, location, time without reference, wrong URL6. Grouping: a range of metadata records from a single source may be very similar (only differ in one parameter e.g.

location) – they may be better discovered as a group of records7. Duplicates

• Several metadata records from different catalogs may point to the same physical dataset (or have overlapping susbsets of distributions) Provenance Issue?

...... * Community Inventory of EarthCube Resources for Geosciences Interoperability (http://workspace.earthcube.org/cinergi)

Page 8: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

8

Are we Ready to Break the MD Bottleneck, make up for deficiencies & satisfy Ambitious MD Requirements?

In large part from RDA MD discussing and also the work of Anneke Zuiderwijk, Keith Jeffery, Marijn Janssen and : Duval, Erik, et al. "Metadata principles and practicalities." D-lib Magazine 8.4 (2002): 16.

Easy to add, discover, download, access & exchange MD

Suitable representation for search, browsing & query

• Provide the possibility to link metadata.• Recommend/advise to link with certain other

datasets.• Warn if linking two datasets does not make

sense.• Use a good URI strategy.• Use identifiers (but which?).• Use well-accepted vocabularies.• Use well-accepted thesauri. (Ontologies?)• Warn about linking when datasets have

temporal aspects.• Provide advice.• Monitor links between data and make sure

that they are still up to date.• Make sure that linking is not just spatial, link to

other domains as well.

Be consistent & support interpretation of data

Support bridging heterogeneityTo achieve interoperability

Be sustainable

Researchers do not see value in metadata & its management tools (e.g. relational databases, wikis, etc.) There is perceived cost of adding and maintaining metadata.

Support linking of data

So how do we satisfy these and create quality Md and/ or extended it?

Support data integration.

Bridge different MD modelse.g. ISO vs DCFields may have diff meaning

Page 9: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

Broad View of Metadata (Schema) Status & Argument for More Semantics

Richness issue • Even when done well simple annotations and structured metadata are not

rich enough to support ad hoc use & certainly not reasoning based on meaning.• There are many MD schemas and a broad challenge is to link/integrate them.

• “Metadata schemas are created for resources’ identification and description and - most of the times - they do not express rich semantics. Even though the meaning of the metadata

information can be processed by humans and its relationship to the described resource can be understood, for machine processing the actual relationships are frequently not obvious. In contrast to metadata schemas, ontologies provide rich constructs to express the meaning of

data”

• Stasinopoulou, Thomais, et al. "Ontology-based metadata integration in the cultural heritage domain." Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. Springer Berlin Heidelberg, 2007. 165-175.

Page 10: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

RDA Background & Outreach on Semantics

• A growing interest in the topic of semantic interoperability. • The centrality of semantic issues was, for example, noted following the 1st Plenary.• Semantic issues and technologies are already part of the discussion on the

RDA Forum. Research communities need to adopt and deploy technologies that help them get the most from their data, understand context, and infer meaning. The semantic web community has much to contribute to an enabling global infrastructure and it would be great to see greater involvement in the RDA. • Fran Berman (Professor of Computer Science, RPI, Chair of the Research Data

Alliance/U.S.)

• RDA should take on this issue but how? And who will participate?

Page 11: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

RDA Metadata and Semantics Intersect

• Data Foundations and Terminology (WG & IG)• Data in Context IG• Data Fabric IG• Geospatial IG• Marine Data Harmonization IG ( ISO 19115 etc.)• Broker IG• Research Data Provenance.......• Semantic Interoperability BoF at RDA P3

• 3 Presentations to illustrate key concepts of SI & use of ontologies- Gary Berg-Cross & Yann Le Franc• Discussed Ontology Design Patterns and Lightweight methods• EUON effort• What is a quality ontology?

• 1st European Ontology Network (EUON) Workshop co-located at P4• http://www.eudat.eu/euon/euon-2014-workshop

Page 12: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

The Need for Some Semantics is (somewhat) Understood

1. MD need to be a first class, processable system, like a conceptual model, easier to use, manage and follow efforts to make data more understandable by computers.

2. Semantics helps address what MD annotations mean1. What the shared meanings are2. What the assumptions such as relations

between MD items are and3. How links to other data can be included?

http://www.slideshare.net/ISSGC/session-48-principles-of-semantic-metadata-management

Principles and Foundations of Ontologies and Semantic Grids - Session 48. July 15th, 2009 Oscar Corcho (Universidad Politécnica de Madrid)

Restrictive

Page 13: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

How do we add Semantics to MD? Depends on Intended Use : Example of Semantic Annotations (HTML -> RDFa)

• Start with a collection XHTML attributes in a web page• Embed RDF annotations in the web pages using things like

• DC and FOAF vocabularies easily used for most simple annotations -e.g. Creator, title, contact info

Becomes

From Introduction to Semantic Technologies, Ontologies and the Semantic Web Aug 2010 #39

For data description and context the semantics added can be like a formal, conceptual modelFor search it can be like a better annotation of keywords using RDF.

Page 14: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

14

Beyond Vocabularies: Good Semantics Needs Appropriate Conceptualization of Properties

Connect properties like stream flow, level, pollutants, evapotranspiration etc. in a schema

Water Body Water DensityUnit

Grams /cm3Water Density

For connecting to Chem/BioChem ontologies there might be sub-categories of Physical Features for elements – optical, hardness, color

See Dumontier Lab ontologies to represent bio-scientific concepts and relations.http://dumontierlab.com/?page=ontologies

hasConstituent hasFeaturehasDensityUnit hasUnit

ChesapeakeBay

IsA

Area

HasFeature

AreaQuantityhasQuantity

Real Number

Sq MileshasUnit

hasValue

hasLayer …..

Page 15: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

Ontology Design Patterns (ODPs) of Semantic Trajectory – Hydro/Ocean Observations as Annotations

• ODPs (aka microtheories) small, modular, & coherent schemas.• Relatively autonomous but conceivably

composable with other schemas.• Environmental Observations fit into this

schema.• Fixes may be hydrometric feature observations

& at some PoI (and offset Fix) for some point or period of time denoting important activities • Observations including time series sets might be

applied to something like streamflow or temperature plots or a pollution plume or data from an ocean glider

• You may query Schema : • “Show locations within Gulf of Mexico fishing area with

colored dissolved organic matter”

Hydro Var & attr/data

or value type ofInterest

HydroObject or moving device

HydroObs/Device

Paths & POIs HaveGeometries including

Polygon Areas

15A Geo-Ontology Design Pattern for Semantic Trajectories COSIT 2013: Yingjie Hu et al.

Page 16: RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated

Are we ready for metadata semantics to be widely used?

• How do we bring current MD practice and semantic practice together?• What is a practical MD vision of this enhanced MD?

• Where are the opportunities? • E.g. Is semantic annotation the sweet spot?• Do we just expand MD tags to semantic annotations and if so how?

• What about ontology design patterns (ODPs)? Where are they useful? • Thoughts on where to add semantics and its technology to MD in the data/MD

cycle?• How does it affect how data/md repositories function?• Some/considerable confusion about how MD should be integrated into

information systems. • Can we agree on common or domain principles (like modularity or

building blocks), practices and tools to employ ?