metadata & brokering - a modern approach #2

27
Daniele Bailo METADATA & BROKERING a modern approach EPISODE#2

Upload: daniele-bailo

Post on 16-Aug-2015

129 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Metadata & brokering - a modern approach #2

Daniele Bailo

METADATA& BROKERINGa modern approach EPISODE#2

Page 2: Metadata & brokering - a modern approach #2

Previously on…Metadata & Brokering#1

Main concepts- Digital Data- Metadata- Brokering system- The triad <PID, MD, DO>- Database- APIs (web services)

Side concepts- Ontologies / Semantics- PID- Digital Object- Standard- Interoperability- Open Access

Page 3: Metadata & brokering - a modern approach #2

Dataset

Dataset

DatasetData

setDataset

DatasetData

setDataset

Dataset

API API API

Discovery (DC) and (CKAN, eGMS)

Contextual (CERIF metadata model)

Detailed (community specific)

Features1. APIs2. <PID, metadata,

DO>3. Contextualization

metadata4. Support ontologies

Data from Irpinia

<PID, metadata, DO>

request response

THE PERFECT SYSTEM#6 Metadata driven canonical Brokeringwith contextualization & PID

BROKERING SYSTEM

Page 4: Metadata & brokering - a modern approach #2

NEW & OLD CHARACTERS

Page 5: Metadata & brokering - a modern approach #2

Metadata

Purposes1. Discovery (humans

& machines)2. Contextualization:

which is the context of the data

3. Use it for processing or other advanced tasks

Usually attached to D.O.

Page 6: Metadata & brokering - a modern approach #2

Interoperability

What & WhyEnables 2 system to1. Exchange

information2. Understand

information

Usually achieved through:- Agreed language - Software

“translators” interfaces thin layers

...ma che parli Arabo???

Page 7: Metadata & brokering - a modern approach #2

Ontologies

Why an ontology?It is the way machines manage “meaning”

How does it work?1. Connects concepts2. Needs vocabulary

Issues• Many ontologies

exist• Vocabulary Mapping

Michelini

CNT

Is Director of

INGV

Is section of Gresta

Is president of

Sailing

Has hobby

Trieste

Is Born

Italy

Located in

Boat

use

sea

use

Page 8: Metadata & brokering - a modern approach #2

Metadata Catalogue#1

PurposesStore metadata:e.g. 1. producer 2. date of creation 3. data format format

Misleading Example (why?)

Page 9: Metadata & brokering - a modern approach #2

Metadata Catalogue#2How to implement it?

Single table (bad habit)One table with all data

Multi table (good habit)- Data is stored in

multiple tables (one for concept)

- Tables are linked- Can contextualize

data

Metadata catalogue = relational database *

(*)= also noSQL... We’ll see it later..

Single table

Multi table

Page 10: Metadata & brokering - a modern approach #2

Metadata Catalogue#2How to implement it?

Single table (bad habit)One table with all data

Multi table (good habit)- Data is stored in

unique tables (one for concept)

- Tables are linked- Can contextualize

data

Metadata catalogue = relational database *

(*)= also noSQL... We’ll see it later..

Single table

Multi table and contextualization

Page 11: Metadata & brokering - a modern approach #2

Catalogue Interface

Human interface (GUI)Website or portal

Machine interface- API or Web service - which execute

scripts or queries- Returns metadata in

a given standard

Page 12: Metadata & brokering - a modern approach #2

What is it?It does something for the user(deliver value to customer)*

A “thin layer”We usually don’t know what’s under the hood

Examples- FDSN stations- FDSN dataselect

(web) serviceFDSN stations

FDSN Dataselect

Database(MD catalogue)

Waveformrepository

Page 13: Metadata & brokering - a modern approach #2

CKAN

CKAN GUI

METADATAcatalogue

CKAN APIs

EIDA stations ISIDE stations

Metadatareplication

What is it?- Metadata Catalogue- With interfaces

(GUI+API)- No direct

CKAN <-> sources connection

Examples- Works FDSN stations- Doesn’t work with

FDSN dataselect

Plugins

Plugins

Plugins Plugins

Plugins

Plugins

Plugins Plugins

Page 14: Metadata & brokering - a modern approach #2

Brokering System(e.g. VERCE framework)

BROKER GUI

METADATAcatalogue

BROKER APIs

EIDA stations

ISIDE stations

Metadatareplication

What is it?- Metadata Catalogue- With interfaces

(GUI+API)- System manager- Other modules- BROKER <-> sources

interactive connection

Examples- EIDA stations- EIDA dataselect- Processing Job at

CINECA

System manager

Interactiveaccess to service

EIDA dataselect

Processing facility

? ? ?

Page 15: Metadata & brokering - a modern approach #2

Comments&

Questions

Why the example was misleading?

Page 16: Metadata & brokering - a modern approach #2

A global viewData initiatives

RDA-”regulate” data sharing/use

EUDAT- Common data infrastructure

EGI- Organize National Grid Infrastructures (CINECA)

EPOS- ESFRI integrating Solid Earth data

Page 17: Metadata & brokering - a modern approach #2

RDADo for data what has been done for the internet (TCP/IP)

Page 18: Metadata & brokering - a modern approach #2

RDA concepts

Data FabricWhat?Identifies mechanisms, standard, components and interfaces making data science efficient and cost effective

Data Management Plan• Data management • Data analysis • Data preservation • Data publication • Data sharing

[UK data Archive http://www.data-archive.ac.uk/]

Page 19: Metadata & brokering - a modern approach #2

RDA concepts

Data Fabric

[RDA WG outputs https://indico.cern.ch/event/370271/session/2/contribution/6/material/0/0.pdf]

How to store?How to register?

How to discover?How to cite?

How to document processing?

How to integrate?

How to collect new DP?

How to access?

Page 20: Metadata & brokering - a modern approach #2

How to describe data?How to discover data?Metadata system

WE ALREADY KNOW EVERYTHING ABOUT IT

METADATAcatalogue

Page 21: Metadata & brokering - a modern approach #2

How to have standards?How to preserve data?Registry systemWhat?

An agreed/legacy catalog of:- data formats

(schemas)- metadata formats- Vocabularies &

semantic categories- Data types- Trusted repositories- ….

Registry

Ahaa.. Ma ‘npratica è ‘n

database..

…anfatti…

Page 22: Metadata & brokering - a modern approach #2

How to register/cite data or publications?

PID system

Purpose - DO / publication can

be uniquely referenced

- Assign a PID at data creation times

Issues- Need for a simple

mechanism to implement it

- Now EUDAT can help- Peter & Massimo

comments…

Page 23: Metadata & brokering - a modern approach #2

How to access data?

AAI system (federeated & distributed)Purpose - Authenticate users- Authorize users

Issues- Delegation- Many system,

sometimes non interoperable

Page 24: Metadata & brokering - a modern approach #2

How to store data?

Data repository (trusted)What? - Store data- Couple with PIDs- Ensure preservation

(not curation)- Can be trusted (DSA)

Opportunity- INGV DSA

repository…

Page 25: Metadata & brokering - a modern approach #2

How to document data processing?

Workflow enginesPurpose - Tracks data

transformation- Allows versioning- Allows reproducibility

Comments- Interoperability

among various workflow engines

- VERCE did it

Page 26: Metadata & brokering - a modern approach #2

Brokering System(e.g. VERCE framework)

BROKER GUI

METADATAcatalogue

BROKER APIsFull version include- Metadata Catalogue- interfaces (GUI+API)- System manager- AAI system- Workflow engine

External actors- PID System- Trusted repositories- Registries- Processing facilities

System manager

Dataset

Dataset Data

setDataset Data

setDataset

API API

AAI system

Workflow Engine

Trusted repository

Trusted repository

RegistryPID

system

HPCcenter

Page 27: Metadata & brokering - a modern approach #2

Q&A