observations and ontologies achieving semantic interoperability of environmental and ecological data...

65
Observations and Ontologies Achieving semantic interoperability of environmental and ecological data Mark Schildhauer 1 , Shawn Bowers 2 , Josh Madin 3 , Matt Jones 1 1 NCEAS UC Santa Barbara, 2 Gonzaga University 3 Macquarie University, http://sonet.ecoinformatics.org NCEAS-ACEAS Workshop, Brisbane May 2010

Upload: kelley-golden

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Observations and Ontologies Achieving semantic interoperability of

environmental and ecological data

Mark Schildhauer1, Shawn Bowers2, Josh Madin3, Matt Jones1

1NCEAS UC Santa Barbara, 2Gonzaga University3Macquarie University,

http://sonet.ecoinformatics.org

NCEAS-ACEAS Workshop, Brisbane May 2010

Motivation-- Critical questions

Need to answer increasingly complex and critical questions about the environment:

are the world’s fisheries sustainable?how will climate change impact food production?

are GMOD crops safe to introduce to the environment?is deforestation accelerating climate change?

why are pollinators declining around the world? will nanotech wastes alter ecosystems? what are causes of ocean acidification on reef corals?

can we predict the spread of an invasive species

are there tipping points in environmental change?

Motivation– Environmental Synthesis

Answering complex, critical environmental questions requires integrating and analyzing many types of data:

Local to large scale, global coveragesFine-grain, high-resolution

Physical context: land-use/land-cover, geologysoils, atmosphere, hydrology, oceanography

Biotic context: from genes to ecosystems

Socioecology: traditions & customs, economics, governance

Good news-- more and more data

There is a growing deluge of environmental data to assistin these investigations …

Need for ecoinformatics

But…

locating desired information is already quite difficult… Culling through irrelevant information (precision) Failing to find all useful information (recall)

using the data you find is problematic… Interpretation (units, context, methods) Merging, transforming for re-use

Manual, ad-hoc, arduous

… Why?

Environmental Data-- State of Affairs

Environmental data are:

Stewarded/owned by many groups, individuals Sparsely documented (metadata, data catalog) Variably accessible via the Internet

Heterogeneous: broad range of relevant topics

The informatics challenge…

Environmental data are highly heterogeneous…geospatial data-- point, line, polygon, raster

time series/monitoring data

tables, spreadsheets/csv

grids, matrices

normalized DBMS

• Variable structure• Variable syntax (R, MATLAB, mySQL, .xls)• Variable semantics (what is “temp”?)

Data Integration

Combining heterogeneous data is necessary for synthesis

Approaches Develop consistent data

models within and across entire domains– “standardized schema”

“Describe” your data and its contents so that machines can process and integrate– “semantic mediation”

Data Integration

Combining heterogeneous data is necessary for synthesis

Impractical if not impossible to standardize schemas for all data sets being collected

Use emerging approaches of Semantic Web1

1 Berners-Lee, Hendler & Lassila 2001. The Semantic Web. http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html [18.04.2002 21:56:54]

Semantic Data Integration

Metadata standards are step in right direction…

Expose data in standard schema for transfer

Dublin Core ISO 19115 (geospatial metadata) Darwin Core (biodiversity specimen metadata) EML (Ecological Metadata Language) GeoSciML

All have XML implementations for document exchange

Can map one format to another to resolve minor differences

Importance of semantics

Descriptive metadata is insufficient “semantics” are expressed in natural language

Inconsistent, imprecise, not standardized

The computer can’t “understand”: what is being measured how measurements relate to one another how semantics map to logical structure

Importance of semantics

Efficient, effective integration and subsequent analysis depends on understanding the semantic contextual relationships of each data measurement, as well as the relationships among measurements in a table structure or other data format.

Usually an expert provides this, or a data catalog

How to capture and expose for machine processing? Semantic Mediation!

Semantic Data Integration

Metadata-- Cannot formally express complex constructs:

Define Specific Leaf Area What type of weight measurement is involved in

its calculation? How is SLA measurement in column 1 related

to plot ID measurement in column 2?

Cannot provide native reasoning: I measured a specimen with a prehensile tail,

extrusible tongue, eats insects, has fused toes What is it? Can I know anything more about it?

Semantic Data Integration

Ontologies do not have these limitations…

Can express complex constructs: SLA is an abbreviation that is a synonym for the functional

trait called Specific Leaf Area that is a measurement taken from a leaf, which is a part of a plant

SLA consists of a dry weight measurement divided into an areal measurement

Can natively reason: The specimen has a prehensile tail, extrusible tongue,

eats insects, has fused toes

infer: specimen is a chameleoninfer: chameleon is a reptileinfer: specimen has stereoscopic eyesInfer: specimen may be able to change color

Formal Ontologies and Reasoners

Use W3C standard: Semantic Web http://www.w3.org/standards/semanticweb/

Expose data syntax, schema and semantics through astandardized language that computers can parse and interpret: OWL, the WebOntology Language

OWL, RDF, XML Reasoners

What is an ontology?

A formal specification of concepts, and the relationships that may exist between those concepts.

How can ontologies help?

Classification and “reasoning” Data discovery Integration/merge

Concept mapping Units conversion Spatial & temporal scaling

How can ontologies help?

Classification and “reasoning” New “facts” derived from ontology Potential emergence

ArealDensity requires knowledge of Area and Abundance

If have Area and Abundance, might have ArealDensity

How can ontologies help?

Classification and “reasoning” Data discovery Integration/merge Analytical assistance

Statistical inference Data types Data transformations

How can ontologies help?

Use OWL-DL (OWL2 RL) W3C Recommendation

Provides complete and consistent reasoning

Standard, free, reasoners available Pellet, FaCT++

Construct and visualize ontologies using free tools Protégé, SWOOP OWLIFIER tool (Josh)

How can ontologies help?

Can “Define” Objects with equivalence classes

Specifies Necessary and Sufficient Conditions Reasoner will classify described Object

has Fur

locomotes Bipedal

native_to Australia

births UndevelopedYoung

has GoodJumpingAbility

What do ontologies consist of?

Objects (terms) Arrange in class (subsumption) hierarchies Can describe objects in terms of properties and

relationships to other objects

Relationships Specify relationships between Objects Can be reflexive, symmetric, transitive (or not)

View of SBC-OBOE ontology in Protégé

Beyond SQL…

OWL DL Symbol Example

Restrictions:

someValuesFrom ∃ hasPart some LeafallValuesFrom ∀ isPartof only PlanthasValue ∋ hasCountryOfOrigin value AustraliaminCardinality ≥ hasStoma min 1cardinality = hasStem exactly 1maxCardinality ≤ hasPetals max 100

Class constructors:

intersectionOf ⊓ WoodyBark and RiparianHabitatunionOf ⊔ Tree or BushcomplementOf ¬ not Grass

Model and define domain science concepts

Lots of domain ontologies emerging http://www.biofoundry.org

How to use these to advance data integration?

Model and define domain science concepts

http://www.biofoundry.org Mainly biomedical, genomics

Use of Ontologies

Genomics have largely homogeneous data

Ontologies “unify” vocabularies in model organisms(fruit fly, yeast, mouse, arabidopsis etc.)

Many ontologies emerging

Are these useful for semantic mediation and data integration?

Nature of scientific data sets

Scientific data often in tables

Tables consist of rows (records) and columns (attributes)

The association of specific columns together (tuple) in a scientific data set is often a non-normalized (materialized) view, with special meaning/use for researcher

Individual cells contain values that are measurements of characteristic of some thing

Semantic annotation

Data setslide from J. Madin

• computer doesn’t know that “Ht.” represents a “height” measurement

• computer doesn’t know whether Plot is nested within Site or vice-versa

• computer to determine if the Temp applies to Site or Plot or Species

Observation defined

Observations in scientific data sets typically co-occur with other observations

Ontologies must assist with describing the inter-relationships among observations within and across datasets

Observational Data Model

Observation defined

An observation represents any measurement of some characteristic (attribute) of some real-world entity or phenomenon.

A measurement consists of a realized value of some characteristic of an entity, expressed in some well-specified units (drawn from a measurement standard)

Observations can provide context for other observations (e.g. observations of spatial or temporal information would often provide context for some other observation)

Measurements are taken using some protocol

Another definition for observation

An observation is an act that results in the estimation of the value of a feature property, and involves application of a specified procedure, such as a sensor, instrument, algorithm or process chain. The procedure may be applied in-situ, remotely, or ex-situ with respect to the sampling location… The key idea is that the observation result is an estimate of the value of some property of the feature of interest, and the other observation properties provide context or metadata to support evaluation, interpretation and use of the result.

(OGC Observations and Measurements, 2010-01-05)

Extensible Observation Ontology (OBOE)

A scientific Observation is

Measurement of the Value

of a Characteristic

of some Entity

in a particular Context

using some Protocol

Provides extension points for loading specialized domain ontologies

To generically describe the structure of scientific observation and measurement as would be found in a scientific data set

OBOE - Extensible Observation Ontology

Entities represent real-world objects or concepts that can be measured.

Measurements assign values and units to characteristics of observed entities.

Observations are made about particular entities.

Every measurement has a characteristic, which defines the property of the entity being measured.

Every measurement has a unit.

Observations can provide context for other observations.

Entities, through observations, can be associated with one or more measured characteristics.

A value is typically a cell in a data set.

Extension points

Linking data values to concepts

Extensible Observation Ontology (OBOE)

OBOE provides a high-level abstraction of scientific observations and measurements

Enables data (or metadata) structures to be linked to domain-specific ontology concepts

Can inter-relate values in a tuple

Provides clarification of semantics of data set as a whole, not just “independent” values

OBOE - Domain concepts

EcologicalPaleontological

OBOE - Units

Standard and customized units and their relationships to one another can easily be loaded into OBOE

OBOE - Semantic units

Measurements can be of one or more characteristics of one or more entities (unit components)

Plant measured in StudyAreaStudyArea is on the Plant

OBOE - Context

Context provides essential semantic detail by linking Observations

OBOE - Context

Experimental design

Spatial & temporal scaling

“Smart” data merge

“Sensible” analysis

OBOE - Context

Experimental design

Spatial & temporal scaling

“Smart” data merge

“Sensible” analysis

Data Integration with OBOE

Observations can be aligned for data integration ...

Observation Measurement

0.1 1.3

Diameter

Meters

has-precision has-value

Tree

Apply conversions based on alignments, e.g.-use common Entity and Characteristic concepts-apply Unit conversions to values -select lowest precision and apply

3.2

OBOE: Aligning Observations

Observations can be aligned for data integration ...

Picea rubens

Observation Measurement

0.01 1.25

Diameter

Meters

has-precision has-value

Abbies balsa.

Observation Measurement

10 320

DBH

Centimeters

has-precision has-value

Two similar observations of trees

OBOE: Aligning Observations

Observations can be aligned for data integration ...

Picea rubens

Observation Measurement

0.01 1.25

Diameter

Meters

has-precision has-value

Abbies balsa.

Observation Measurement

10 320

DBH

Centimeters

has-precision has-value

Tree

isa

isa

Length

has-dimension

has-dimension

Align entities, characteristics, and standards

isa

Observation Based Structured Query

• Both datasets contain “tree lengths” • Annotation search for “tree length” would return both datasets• Structured search allows the search to be limited by the observed entity (e.g. a tree or a tree branch)• Increase precision and recall

Example: “Sensible” data summarization

Leveraging annotations Consistency checking

NOT sensible to summarize variables by “downstream” factors; e.g., Precipitation in the StudySite by TaxonomicName

IS sensible to summarize variables by “upstream” factors; e.g., Plant Height by StudySite or by Precipitation

IS sensible to summarize variables by factors in the same Observation; e.g., Plant Height by TaxonomicName or Precipitation by StudySite

Our Semantic Approach

Climbing the semantic ladder:

Ontologies

Semantic Annotations

Metadata

Data

Our Semantic Approach

Method for linking elements of data objects (e.g., columns in a table) to consistent and potentially rich sets of concepts

Semantic Annotations link EML attributes to concepts defined in a Formal Ontology

Store and retrieve annotations and ontologies in Metacat

Our “Semantic stack”

Semantic Annotation

Links data structures via metadata, to ontology termsvia OBOE

Actively working on materializing data result sets from these ontology-based queries

Investigating expressiveness of annotation language

Annotating to other data stores

Metacat Implementation

KNB metadata catalog

Stores EML (XML) and raw data objects

Extend to store Ontologies, domain and OBOE (OWL-DLs serialized in XML)

Extend to store Annotations (XML)

Jena to facilitate querying ontologies

Pellet to reason (consistency of ontologies; class subsumption)

Need for data interoperability

MANY different “semantic” efforts underway to unify data within earth/biodiversity/environmental disciplines, converging on use of OBSERVATIONAL data construct

SPECIALIZED needs and concerns of different domains may drive semantic technology solutions to be diverse and incompatible

OPPORTUNITY exists for communicating and coordinating among different domains to achieve greater interoperability of emerging semantic technology solutions

BENEFIT is providing cross-disciplinary scientists with more seamless and powerful access to a broad range of relevant data and information

USA NSF’s OCI INTEROP

This NSF crosscutting program supports community efforts to provide for broad interoperability through the development of mechanisms such as robust data and metadata conventions, ontologies, and taxonomies.

Support is provided… for consensus-building activities:

community workshops, web resources such as community interaction sites, and task groups. 

… and for providing the expertise necessary to turn the consensus into technical standards with associated implementation tools and resources: 

information sciences, software development, and ontology and taxonomy design and implementation.

Objectives of SONet

Broad Objectives

Address semantic interoperability issues in environmental (earth sciences) data [sharing, discovery, integration]

Build a network of practitioners (SONet), including domain scientists, computer scientists, and information managers

Build generic, cross-disciplinary data interoperability solutions

Immediate Goals to Develop

An extensible and open observations data model (“core model”) to unify existing domain-specific approaches

A semantic (ontology) framework for scientific terminology and corresponding domain extensions

Demonstration prototypes using these to address critical data interoperability issues

Prospective observation models…

Project Domain Observational data model

TDWG/OSR Biodiversity Meta-model to integrate field observational data with specimen data

VSTO Atmospheric sciences

Ontologies for interoperations among different meteorological metadata standards

ODM Hydrology CUAHSI’s Observational Data Model for storing diverse hydrological data

SERONTO Socioecological research

Ontology for integrating socio-ecological data

OGC’s O&M Geospatial Observations and Measurements standard for enhancing sensor data interoperability

SEEK’s OBOE Ecology Extensible Observation Ontology for describing data as observations and measurements

Variations of Observational Data Models

Developing a core model

Identify the key observational models in the earth and environmental sciences

Are these various observational models easily reconciled and/or harmonized?

Are there special capabilities and features enabled by some observational approaches?

What services should be developed around these observational models?

Working Groups

Subgroup 1:Core Data Model for

Observations

Subgroup 2:Catalog of Common Field Observations

Subgroup 3:Scientist-Oriented Term Organization

Subgroup 4:Demonstration

Projects

Subgroup 1

Collect interoperability requirements Define common, unified data model Engage tool & data providers, data

consumersSubgroup 2

Identify and catalog common observation types (semantics)

Engage data providers and information managers

Subgroup 3

Define general extension ontologies of scientific terms

Focus work on outputs of group 2 Engage range of domain scientists Subgroup 4

Define and prototype demonstration projects

Ensure compatability of subgroups

• Each group consists of two team leads

• Postdoc funded to work on demonstration projects & help ensure compatibility across subgroups

Core SONetTeam

Goals

Identify and resolve commonalities and discrepancies among observational efforts

Define a common core observational model for data

Test with use cases (cross-disciplinary data integration tasks)

Where we are at…

Identifying and resolving commonalities and discrepancies among observational models—O&M (ISO track) and OBOE

Developing best-practices and design patterns for constructing observation-model compliant earth science ontologies, e.g. “measurement type”

Developing cross-disciplinary use cases that exercise data integration capabilities of semantic approach

Where we are at…

SEMTOOLS project

Testing and enhancing semantic mediation

Leveraging SONet observation data model

Building semantic querying and annotation capabilities into Morpho

Use Cases include using ontologies for data integration involving: ecology at an LTER site, Salmon Monitoring, and for Vegetation Traits

Morpho semantic annotation interface…

Future directions…

Enabling semantic annotation onto disparate data resources

Ontologies for analysis

Ontologies for experimental design

Acknowledgments

Thanks to Chad Berkley, Ben Leinfelder, and Huiping Cao for ideas, implementation and slides.

This material is based upon work supported by:

The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, 0225676, 0619060, 0722079, 0743429.

The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus.

The Andrew W. Mellon Foundation.