data facilities workshop - panel on current concepts in data sharing & interoperability

59
PANEL DISCUSSION CURRENT CONCEPTS IN DATA SHARING & INTEROPERABILITY EarthCube Data Facilities Workshop Wednesday, January 15 th 2014

Upload: earthcube

Post on 25-May-2015

392 views

Category:

Technology


4 download

DESCRIPTION

This series of presentations was given at the EarthCube Data Facilities End-User Workshop held January 15-17, 2014 in Washington, DC. This workshop provided a forum to discuss the unique requirements and challenges associated with developing the communication, collaboration, interoperability, and governance structures that will be required to build EarthCube in conjunction with existing and emerging NSF/GEO facilities. This panel and discussion, specifically, outlined and explained several current concepts in data sharing and interoperability, featuring presentations by: Paul Morin (UMN): Polar Cyberinfrastructure Don Middleton (UCAR): Atmospheric/Climate Kerstin Lehnert (LDEO): Domain Repositories & Physical Samples David Schindel (CBOL, GRBio): Biological Perspective & Collections Hank Leoscher (NEON): Observation Networks Daniel Fuka (Virginia Tech) and Ruth Duerr (NSIDC): Brokering Ilya Zaslavsky (UCSD): Cross-Domain Interoperability

TRANSCRIPT

Page 1: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

PANEL DISCUSSION – CURRENT CONCEPTS IN DATA SHARING & INTEROPERABILITYEarthCube Data Facilities Workshop

Wednesday, January 15th 2014

Page 2: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability
Page 3: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability
Page 4: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability
Page 5: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

D-10White Island AWS

Page 6: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Archive includes 5 satellitesNew tasking is WV-1 and 2

GeoeyeQuickbird

Ikonos

Worldview 1

Worldview 2

Page 7: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

PGC Imagery Viewers · June 24, 2013 7

Page 8: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

PGC Imagery Viewers · June 24, 2013 8

Page 9: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

PGC Imagery Viewers · June 24, 2013 9

Page 10: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

PGC Imagery Viewers · June 24, 2013 10

Page 11: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability
Page 12: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

August 13, 2013Data Facility Workshop; Arlington, VA.

Data System Interoperability and Standards for UCAR/NCAR and

Collaborative Activities

Don Middleton (on behalf of many others)University Corporation for Atmospheric ResearchU.S. National Center for Atmospheric Research

Computational and Information Systems LaboratoryBoulder, Colorado, USA

Page 13: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Data Cyberinfrastructure for “Big Head” and “Long Tail” Scientific

Research

Computational and Information

Systems Laboratory

(CISL) and Earth System

Laboratory (NESL)

High Altitude Observatory

(HAO)

Mauna Loa SolarObservatory

Earth Observing Lab

(EOL)

Field Project Archive

Research Data Archive

Community Data Portal

Earth System Grid

ACADIS Arctic Gateway UCAR

Unidata

netCDF, THREDDS, TDS,

LDM, IDV, Rosetta

These systems federate in various ways among themselves, across organizations such as as ACADIS, and with external programs such as GCMD, the UN/WMO WIS, ESGF, TIGGE, and others.

ACADIS is joint venture of NCAR EOL & CISL, the National Snow and Ice Data Center, and UCAR

Unidata

NCAR Wyoming Supercomputing Center, Cheyenne. Disk, archive, and computational resources.

Page 14: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Data Users and Publishers (SelfPub)

ACADIS Gateway

Federation with Other Systems (GCMD, WMO,

ADE)

EOL ACADIS Collections

(via THREDDS)

NSIDC Arctic

Collections (via

Brokering)

Future Federated Collections

Core Technology• Spring Framework

• Hibernate• Liquibase

• Apache SOLR•OpenID4Java/OpenSAML• OAI-PMH, OpenSearch

• ActiveMQ• FreeMarker

• Java NetCDF Library• DOI’s via EZID/DataCite

Catalog Harvester (OpenSearch, DIF,

THREDDS)

OAI-PMH Repository

(DC, DIF, ISO)Discovery Services

(Apache SOLR)

Identity Management

(OpenID, SAML)

Data Services, Access Control

Publishing Services

Metadata and Database Services

Metrics

HPSS

RDBNWSC GLADE ACADIS Arctic

Collections

Automated Modeling

and Observation

Systems

RESTful PubServices

Bagit (from the LoC)

ACADIS is sponsored by NSF/GEO/PLR

Page 15: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

The Chronopolis Data Preservation

Network• A Consortium of UCSD Libraries, SDSC, Univ. of Maryland, and NCAR

• Using LoC Bagit for deposits

• Based on iRods and ACE (Audit Control Environment)

• TRAC-certified (i.e. ISO 16363)

Page 17: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability
Page 18: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Physical Samples in

18

courtesy of:

Lesley Wyborn, Geoscience Australia(talk at the IGSN workshop at IGC 2012)

EarthCube Data Facilities Workshop

Page 19: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Connection to Digital Data

Access to the physical samples is needed to verify & reproduce published observations.

Access to sample metadata is needed for proper interpretation and re-use of sample-based data.

Access to both is needed to facilitate sharing of samples for use & re-use.

▪ Samples are often expensive to collect (drilling, remote locations).

▪ Many samples are unique and irreplaceable.▪ Re-analysis augments utility of existing data.

EarthCube Data Facilities Workshop 19

Page 20: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Samples in EarthCube End-User Domains

Geochemistry Structural Geology and Tectonics Experimental Stratigraphy Critical Zone Community Envisioning a Digital Crust Cyberinfrastructure for Paleogeoscience Petrology and Geochemistry Inland Waters Deep Seafloor Processes and Dynamics Coral Reef Systems Science Geochronology Rock Deformation and Mineral Physics Research

EarthCube Data Facilities Workshop 20

Page 21: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Key Challenges/Needs

“Global Access to Global Collections: establish repositories for all physical samples and the biological, geochemical and physical measurements made from those samples.” (Paleogeoscience)

“Poor and uneven access and management of sample collections, incomplete sample tracking and linking of samples to analyses in the literature and databases, discoverability of existing samples” (Petrology & Geochem)

“Most geological terrains of interest do not have sufficient or even sample density through space and time.” (Petrology & Geochem)

“Central archive of experimental samples with integrated workflows, database templates, and community-wide DOI system for samples” (Mineral Physics & Rock Deformation)

EarthCube Data Facilities Workshop 21

Page 22: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

EarthCube SIG

EarthCube Data Facilities Workshop 22

Page 23: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Needs

Infrastructure and resources for preservation and access of physical samples

Tools for repositories to efficiently manage and improve online access to their collections.

Online registry for discovery, access, and preservation of sample data & metadata

Best practices & standards for sample curation and sample sharing for sample data & data exchange

Funding strategies, business modelsEarthCube Data Facilities Workshop 23

Page 24: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

geosamples.org

A multi-institutional initiative to build a “Digital Environment for Sample Curation” to advance access and re-use of physical samples to support and simplify the work of curators to advance best practices, standards, & policies for

sample curation, distribution, attribution, and citation

24EarthCube Data Facilities Workshop

Page 25: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

geosamples.org collaboration Physical collection facilities

NSF-funded repositories: LDEO, OSU, SIO, LacCore, WHOI, USPRR, UT Austin, ARF, and growing

State Surveys (AASG), USGS Industry

Data facilities & systems: IGSN/SESAR, IMLGS, USGIN

Computer & Information Science: RENCI, UT Austin

Biocollection informatics: iPlant, iDigBioEarthCube Data Facilities Workshop 25

Page 26: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

DESC Design

26

Curators (Admin GUI)

Public (Admin GUI)

Samplers (User GUI)

DESC (data, tools, services)

IGSN Registry Publications

Data Systems

EarthCube Data Facilities Workshop

Page 27: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability
Page 28: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

2009 recommendations included:• Increase impact and

improve management of collections

• Clarify and standardize management and budgeting for collections

• Create an online clearinghouse of information on Federal scientific collections

SciColl Priorities:• Develop first cross-

disciplinary registry of object-based scientific collections (GRSciColl)

• Promote interdisciplinary research utilizing scientific collections

US Interagency Working Group on Scientific Collections

(IWGSC)

• Covers all scientific disciplines• Created under White House S&T Council,

reports to Life Sciences Subcommittee• ~10 participating Departments/Agencies• USDA and Smithsonian Co-chairs

• Covers all scientific disciplines• Created under OECD Global

Science Forum• Independent project, no legal

status• National and Institutional

memberships• Governance by Executive Board• Secretariat Office at Smithsonian

Page 29: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Plants and animals in zoos, botanical

gardens, aquariums

Plants and animals in museums, herbaria

GRSciColl

Extraterrestrial samples

Global Registry of Scientific Collections (GRSciColl)

Microbes in BRCs

Human medical samples

Disease banks

Veterinary samples

Standards repositoriesFossils and microfossils

Rocks, sediment and ice cores

Air, water, soil samples

Human artefacts

Living material in genebanks, culture

collections

And more, what else?

SciColl and IWGSC ask:How can we connect collections across disciplines?

Page 30: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Institution Table• Institution ID • Institution Name• Institution Discipline(s)• Primary Contact

Institutional Collection Table• Institution ID• Collection ID• Collection Name• Collection Discipline• Content Type(s)• Primary Contact

Personal Collection Table• Institution ID = “Personal”• Collection ID• Collection Name• Collection Discipline• Content Type(s)• Primary Contact

Structure of GRSciColl

Contacts Table• Contact Name• Primary Institution• Primary Collection• Additional Inst/Coll

SciColl and IWGSC ask:What terms constitute the common

vocabularies of discipline and content type?

Page 31: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability
Page 32: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

INTEROPERABILITY PHILOSOPHY(OBSERVATIONAL INFRASTRUCTURE)

Hank Loescher | National Ecological Observatory Network (NEON)

Director Strategic Development | CEO Office

Page 33: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Get Specific Data

Many respondents appeared to desire more specific details and expressed an interest in data communicated that can be readily used in their work.

Page 34: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Lots and lots of data…

9/2008

10/2009 2/2011

3/2010

Page 35: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Data as a National Resource

NSF Director Suresh’s emphasis on:

• “Era of Observations”

• “Era of Data and Information”

March 2012: White House $200M “Big Data” initiative:

• NSF

• NIH

• DOE

• DOD

• DARPA

• USGS

Page 36: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

The President’s Council of Advisors on Science and Technology (PCAST)

The PCAST report (2011) urge that even as the government deals with our nation’s economic challenges, it must:

“…address the threats to both the environmental and the economic aspects of well-being that derive from the accelerating degradation of the environmental capital – the Nation’s ecosystems and the biodiversity they contain”.

PCAST New Directions…..

Page 37: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Weather

Increasing importance on designing new x-discipline data structures to support policy/decision-making

Societal Benefit Areas (SBAs)

Essential Climate Variables (ECVs)Essential Biodiversity Variables (EBVs)Essential Carbon Variables (ECVs)

Aligned with OSTP (NEO, US-GEO) NSF/EU Strategic PlanningAligned with GEO, GEO-BON, GCOS, Diversitas, WMO, WCRP, etc…Aligned with Suresh, S., 2012. Research funding: Global challenges need global solutions, Nature, 490, 337-338, doi:10.1038/490337a

Global Themes – Global Observations

Agriculture Biodiversity Climate Disasters EnergyEcosystems WaterHealth

Page 38: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Why Interoperability?

• The rapid pace of large-scale environmental global changes underscores the value of accessible long-term data sets.

• Natural, managed, and socioeconomic systems are subject to complex interacting stresses that play out over extended periods of time and space.

• An era of large-scale, interdisciplinary science fueled by large data sets.

• Data Interoperability enhances the value of current scientific efforts and investment.

• Interoperability is needed to forecast future conditions for basic understanding, and for future planning, policy, and societal benefit.

• Currently, there is no accepted approach to make large datasets interoperable

• Provides new leadership opportunities for Scientists globally

Page 39: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Interoperability Philosophy - scientific utility

Linking Science Questions and Hypotheses and Requirements

Traceability of Measurements

Algorithms/Procedures

Informatics

• Mapping Questions to ‘what must be done’ • ‘how’ data can/will be used jointly • Defining Joint Science Scope• Defines interfaces and Functionality

• What is the algorithm or procedural process to create a data product?

• Provides “consistent and compatible” data• Managed through intercomparisons• What are their relative uncertainties?

• Use of Recognized Standards• Traceability to Recognized Standards, or First

Principles• Known and managed signal:noise• Managing QA/QC• Uncertainty budgets (ISO traceable)

1.

2.

3.

4. • Standards - Data Formats• Standards - Metadata formats• Persistent Identifiers / Open-source /Policies• Discovery tools / Dissemination / Discovery• Ontologies, semantics and controlled

vocabularies• Archival and Curation Activities• Providence

Page 40: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Interoperability Philosophy - scientific utility

The degree to which Observatories are truly interoperable is the degree to which these four elements are adopted by collaborative

facilities

Signal:noise and uncertainty estimates must also be known in order for data to have broader, global utility and prognostic capability (ecological

forecasting)

Provides the frame for individual approaches and creativity, spans organizational and programmatic maturity

This Interoperability Framework is currently being implemented as part of a joint EU FP7 and US NSF Project called CoopEUS (www.coopeus.eu)

Facilitates establishing a Baseline/infrastructure with scientific creativity

Is a framework by which all parties can engage (policy and social dimension, incl)

Real work, real tasks can be defined

Page 41: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Frontiers - Interoperability

European Union - ICOS

European Union - Lifewatch

Australia – TERN

(EU) France – ANAEE

Mexico and Canada – CarboNA / MexFlux

Korea – KEON/KoFlux/AsiaFlux

China – CERN

iLTER - global

Bottom-up Organizations

Top-downOrganizations

Page 42: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

The National Ecological Observatory Network is a project sponsored by the National Science Foundation and managed under cooperative agreement by NEON Inc.

Page 43: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Stacking Environmental Observatories- SoS

NEON

Biodiversity Observatories

Other Terrestrial Datasets

Page 44: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Stacking Environmental Observatories - SoS

NEON

Others

Biodiversitydatasets

Collapse the layers

Page 45: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

“Stuff” in the middle

Page 46: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

The Type of Interaction and Efficacy is Dependant on the Organizational Development of the other Institution

NEON Interactions – Other Organizations

• Balancing Scientific Creativity vs. Baseline Infrastructure

• Level of System Engineering Maturity

• Base Capacity - Critical Mass

• Cultural Sensitivity

Page 47: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability
Page 48: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

BCube: A Broker Framework for Next Generation Geoscience

Siri Jodha - PI

Page 49: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Brokering Framework Principles

• A broker connects information resources by mediating interactions between those resources without requiring the maintainers of those resources to adapt their existing systems

EPOS Workshop, Erice 2013

Page 50: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Discover

Evaluate

Access Use… a new

technological revolution every

year …

Brokers mediate betweenService Buses

Page 51: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Preparing Data for Ingest, presented 10/27/09 by R. Duerr LID590DCL Foundations of Data Curation

What if....A scientist could find data and services that matched their interests as easy as subscribing to the news?

myData News.org

Greenland 1 km DEM has been published

A Digital Elevation Model (DEM) of Greenland acquired by A. Researcher is available in binary format at a 1 KM grid spacing in a polar stereographic projection ... moreGreenland Ice Sheet Melt Characteristics Data updatedGreenland Ice Sheet Melt Characteristics now available via OpenSearch API

Page 52: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Scientists could advertise AND INDEX their data so other scientists could find it AND REFERENCE IT, as simply as...1 - Filling out a web form2 - Saving it to your website3 - Adding it's link to your site

What if....

Page 53: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

• Service Bus Mediator• Scientific Field to Field Translator• Crawling, Advertising, (and Indexing)• http://nsidc.org/bcube• http://rd-alliance.org

BCube Broker

Page 54: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability
Page 55: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Domain data repositories and cross-disciplinary data integration

governance issuestechnical issues

ILYA ZASLAVSKY AND THE EARTHCUBE CINERGI PROJECT (NSF ICER-1343816)

Page 56: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability
Page 57: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

High-level inventory and readiness assessment: viewer

http://connections.earthcube.org

Page 58: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Community Inventory of EarthCube Resources for Geoscience Interoperability

data discovery is the most often cited issue in executive summaries on the EarthCube web site

CINERGI

Page 59: Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Interoperability

Short questionnaire

Function Importance Comments

Making metadata from your facility available for search using standard metadata, via standard APIs

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Tracking demand for and cross-domain usage of your resources 1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Identifying issues related to data and metadata quality and completeness

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Tracking search hits that become searches for resources managed by your data facility

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Connecting owners of relevant datasets to your facility for potential longer-term data management

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Connecting data from your facility with people, publications, models, and projects

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Identifying communities using data, tools, and models from your facility

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Validating published metadata and service signatures from your facility

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Finding and reporting to you resources that appear as duplicates across multiple registries

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Potential added value by a cross-domain systemIntegration with cross-domain searchKey characteristics for CINERGI