focus meeting fair developments...a particular class of fair data system that provides access to...

52
FOCUS MEETING ON FAIR DATA DEVELOPMENTS Luiz Olavo Bonino - [email protected]

Upload: others

Post on 02-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FOCUS MEETING ON FAIR DATA DEVELOPMENTS

Luiz Olavo Bonino - [email protected]

Page 2: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

SUMMARY

■ What is FAIR data?

■ The FAIR ecosystem

■ Plans and how to realise

Page 3: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point
Page 4: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

Produces Consumes

Page 5: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

Produces Consumes

storage

sustainability

maintenance

license

privacy security

stewardship

access

?

Page 6: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

Produces Consumes

RDF

MIAPEDBMS Excel

APISQL

SPARQLMetadata

DICOM

MIRIAM

Semantics

Page 7: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

Produces Consumes

access

find

query

format

license

integrate

Page 8: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

WHAT IS FAIR DATA?

FAIR Data aims to support existing communities in their attempts to enable valuable scientific data and knowledge to be published and utilised in a ‘FAIR’ manner.

Findable - (meta)data is uniquely and persistently identifiable. Should have basic machine readable descriptive metadata.

Accessible - data is reachable and accessible by humans and machines using standard formats and protocols.

Interoperable - (meta)data is machine readable and annotated with resolvable vocabularies/ontologies.

Reusable - (meta)data is sufficiently well-described to allow (semi)automated integration with other compatible data sources.

Page 9: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

THE FAIR ECOSYSTEM

FAIR Data Principles

FAIR Data Protocol

FAIR Data Resources

FAIR Data Core Technologies

FAIR Data Systems/Tools

Normative

Artefact

Software

www.nature.com/articles/sdata201618

Page 10: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

WWW.NATURE.COM/ARTICLES/SDATA201618

Page 11: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA RESOURCE

Datasets expressed using one of the prescribed standards of the FAIR Data Protocol, with metadata complying with the protocol and license. The original dataset is transformed into a FAIR format and proper metadata and license are added to produce a FAIR Data Resource. The original and the FAIR version can co-exist, each one fulfilling its own purpose.

Original dataset

FAIR Conversion

FAIR Data Resource

FAIR Format

Metadata License

Page 12: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR transformation FAIR transformation

Analysis transformation Analysis transformation

Page 13: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA APPLICATION ECOSYSTEM (NL APPROACH)

Page 14: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA RESOURCE

FAIR transformation

FAIR Data Resource

Page 15: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

BRING YOUR OWN DATA - BYOD

■ Goals: ■ Learn how to make data linkable “hands-on” with experts ■ Create a “telling story” to demonstrate its use

■ Composition: ■ Data owners – specialists on given datasets ■ Data interoperability experts ■ Domain experts

Source: Marcos Roos

Page 16: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

BYOD

Page 17: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIRIFIER

Page 18: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIRIFIER

DataFAIRportFind,&Access,&Interoperate&&&Re3use&DataNon-FAIR Dataset

FAIR Data Resource

FAIR Format

Metadata LicenseFAIRifier

input output publish

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Non-FAIR Dataset

FAIR Data Resource

FAIR Format

Metadata LicenseFAIRifier

publish output

Page 19: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA MODEL REGISTRY

FAIR DataModel Registry

Dataset

Data Model

Dataset

Data Model

Dataset

Data Model

Page 20: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIRIFIER AND FAIR DATA MODEL REGISTRY

Data OwnerNon-FAIR Dataset

FAIRifier FAIR DataModel Registry

submit

search referencedata model

return referenceFAIR Profile

FAIR Data Resource

FAIR Format

License

output

Metadata

F A

I R

Page 21: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point. Also, the source data can be a regular (non-FAIR) dataset or a FAIR Data Resource. If the source data is non-FAIR, the FAIR Data Point needs to made the necessary FAIR transformations on the fly.

FAIR Data Resource

non-FAIR Data Resource

Sensor

Page 22: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA POINT

Data Producer

Dataset

Data Consumer

Data Producer

Dataset

Data Producer

Dataset

Data Producer

Dataset

Data Consumer

Page 23: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA POINT

FAIR Data Point

Who are you? Can I

trust you?

Page 24: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA POINT

FAIR Data Point

Here is information about

myself

FDP Metadata

Page 25: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA POINT

FAIR Data Point

Ok, now that I know

you, tell me what you have to offer

reads

FDP Metadata

Page 26: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA POINT

FAIR Data Point

Here is information about my catalog of datasets

Catalog Metadata

Page 27: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA POINT

FAIR Data Point

Tell me more about your

genomic dataset

reads

Catalog Metadata

Page 28: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA POINT

FAIR Data Point

This is the detailed information about

the genomic dataset

Dataset & Data Record

Metadata

Page 29: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA POINT

FAIR Data Point

Ok, now that I know

what you have, give me the data.

reads

Dataset & Data Record

Metadata

Page 30: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA POINT

FAIR Data Point

Here is my data.

FAIR Data

Page 31: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA POINT - GENERAL ARCHITECTURE

FAIR API / GUI

Metadata Provider

FAIR Data Accessor

Metrics Gatherer Access Controller

FAIR Metadata FAIR Data

Page 32: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

EMBEDDED FAIR DATA POINT

FAIR API / GUI

Metadata Provider

FAIR Data Accessor

Metrics Gatherer Access Controller

FAIR Metadata FAIR Data

B2FAIR

EUDAT API / GUI

EUDAT Current ComponentsEUDAT Current

ComponentsEUDAT Current

ComponentsEUDAT Current

Components

https://www.eudat.eu

Page 33: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

DISTRIBUTED FAIR DATA POINTS

Biobank

FAIR Data PointBiobankDatabase

Patie

nt R

egist

ry

FAIR

Dat

a Po

int

UNIPROT

FAIR

Dat

a Po

int

HPA

FAIR Data Point

Page 34: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA POINT METADATA PROVIDER API

Page 35: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

METADATA LAYERS

Layer Description URL Example Standard

FDP (Data repository)

Information about the FDP as a data repository

http://myfdp/ PID, title, description, license, owner, API version, etc.

OAI-PMH (extended)

Catalog Information about the catalog of datasets offered

http://myfdp/catalog

PID, title, description, publisher, etc.

W3C DCAT #Catalog

Dataset Information about each of the offered datasets

http://myfdp/[datasetID]/

AccessURL, downloadURL, format, mediaType, etc.

W3C DCAT #Dataset, #Distribution

Data record Information about the actual data, types, identifiers, etc.

http://myfdp/[datarecordID]

Community/domain, ex.: DICOM, VCF,

Page 36: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FDP METADATA

@prefix dbp: <http://dbpedia.org/resource/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://fdp.biotools.nl:8080/fdp> a dct:Agent ; rdfs:label "FAIR Data Point of the Plant Breeding Group, Wageningen UR"^^xsd:string ; dct:description "This FDP provides metadata on plant-specific genotype/phenotype data sets"^^xsd:string ; dct:hasPart "catalog-01"^^xsd:string ; dct:identifier "FDP-WUR-PB"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "FAIR Data Point of the Plant Breeding Group, Wageningen UR"^^xsd:string ; dct:version "1.0"^^xsd:string ;

Page 37: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

CATALOG METADATA

@prefix dbp: <http://dbpedia.org/resource/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://fdp.biotools.nl:8080/catalog/catalog-01> a dcat:Catalog ; rdfs:label "Plant Breeding Data Catalog"^^xsd:string ; dct:description "Plant Breeding Data Catalog"^^xsd:string ; dct:hasPart <breedb> ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "Plant Breeding Data Catalog"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:dataset <breedb> ;

Page 38: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

DATASET METADATA

@prefix dbp: <http://dbpedia.org/resource/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://fdp.biotools.nl:8080/dataset/breedb> a dcat:Dataset ; rdfs:label "BreeDB tomato passport data"^^xsd:string ; dct:description "BreeDB tomato passport data"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "BreeDB tomato passport data"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:distribution <breedb-sparql>, <breedb-sqldump> ;

Page 39: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

METADATA DISTRIBUTION

<http://fdp.biotools.nl:8080/distribution/breedb-sparql> a dcat:Distribution ; rdfs:label "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ; dct:description "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:accessURL <http://virtuoso.biotools.nl:8888/sparql> .

<http://fdp.biotools.nl:8080/distribution/breedb-sqldump> a dcat:Distribution ; rdfs:label "SQL dump of the BreeDB tomato passport data"^^xsd:string ; dct:description "SQL dump of the BreeDB tomato passport data"^^xsd:string ; dct:issued "2015-11-24"^^xsd:date ; dct:language lang:en ; dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ; dct:modified "2015-11-24"^^xsd:date ; dct:publisher <http://orcid.org/0000-0002-4368-8058> ; dct:title "SQL dump of the BreeDB tomato passport data"^^xsd:string ; dct:version "1.0"^^xsd:string ; dcat:downloadURL <http://virtuoso.biotools.nl:8888/DAV/home/breedb/breedb.sql> .

Page 40: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

CURRENT STATUS

■ FDP, Catalog and Dataset metadata tested.

■ FAIR Accessor tested.

■ Demonstration application on rare diseases with FDPs exposing patient registry and biobank datasets.

■ Working on FDP for BreeDB (WUR).

Page 41: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FDP DEMONSTRATION

Biobank FAIR Data Point Patie

nt R

egis

try

FAIR

Dat

a

Poin

t

60 dataset metadata 3 biobanks data 3 patient registries data

Page 42: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

NEXT STEPS

■ Extend the demonstration application with more types of datasets.

■ Specific a metadata description format for the data record metadata.

■ Implement the Security Enforcer and Metrics Gatherer components.

■ Release version 1.0

■ Implement subscription/notification mechanism.

Page 43: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Page 44: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

■ A particular class of FAIR Data System to provide support for data interoperability;

■ Supports publication and access to FAIR data. ■ Fosters an ecosystems of applications and services; ■ Federated architecture: different FAIRports (and other

FAIR Data Systems) are interconnectable; ■ Supports citations of datasets and data items; ■ Provides metrics for data usage and citation;

DataFAIRport

Page 45: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA PUBLICATION

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Data Owners/Creators

Dataset

Metadata

Concept 2

Concept 3

Concept 4

Concept 1

Concept 2

Concept 3

Concept 4

Concept 1

Page 46: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIR DATA ACCESS

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Data User

DatasetDatasetDataset

DatasetDataset Dataset

Page 47: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

DISTRIBUTED ARCHITECTURE

DataFAIRportDTL

DataFAIRportDataFAIRport

DataFAIRportVLPB/WUR

DataFAIRport

Organiza(onX

DataFAIRport

Organiza(onY

Rare Diseases

Plant

Page 48: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

NETHERLANDS

FAIR Data Search Engine

FAIRifier + (Meta)Data Publication

Metadata storageData storage (optional)

TransformationServices Registry

(optional)FAIR Data Point

DataFAIRportDTL

FAIR Data PointFAIR Data Point

F A IR

Page 49: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIRPORT ECOSYSTEM

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Application & Services [BRAIN]

Infrastructural Services

Data Consumer

Data Producer

Page 50: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

FAIRPORT

DataFAIRportFind,&Access,&Interoperate&&&Re3use&Data

Stewardship API FAIR Data API

(Meta)Data Storage component

Metadata storage

Data storage

DataVerse EUDAT Data Repository

Semantic resolver Ontology storage

Data storage API / FAIR Data API

Data usage policy

Management component

GUI (Data publishing, search, mgmt)

Data Mgmt App

FAIR Data System

Metrics storage

Data ConsumerData Producer

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)

Data Consumer AppsEx. *APInatomy, BRAIN,

etc)Data Mgmt AppData Mgmt AppData

Stewardship Apps

Page 51: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

ROADMAP 2016

■ Implement the FAIR Data search engine

■ Implement the FAIR Data publication mechanism

■ Extend the demonstration application with more types of datasets.

■ Specific a metadata description format for the data record metadata.

■ Implement the Security Enforcer and Metrics Gatherer components on FAIR Data Points.

■ Start work on the the encryption and pseudonymisation of Personal Health Train

Page 52: Focus meeting FAIR Developments...A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point

QUESTIONS?

Luiz Olavo Bonino

[email protected]