the eudat cdi -...

27
The EUDAT CDI and BSC activities as first level data service provider Nadia Tonello Head of Data Management [email protected] Open Data Workshop CERCA, Barcelona 06/06/2019

Upload: others

Post on 21-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

The EUDAT CDIand BSC activities as first level data service provider

Nadia TonelloHead of Data [email protected]

Open Data Workshop CERCA, Barcelona06/06/2019

Page 2: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

The beginnings: common issues, common services

EUDAT services suite

BSC activities on data management

Connection with other EOSC projects

Outline

2N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 3: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

High level expert Group on Scientific Data Submission to the European Commission

“The rising tide of data needs a novel approach to data management.”

The beginning of EUDAT

3N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

Users

Trus

t

Dat

a C

urat

ion

Common Data Services

User functionalitiesdata capture & transfer, virtual research environments

Persistent storage, identification, authenticity, workflow execution, mining

Data Generators

Community Support Services

Data discovery & navigation, workflow generation, annotation, interpretability

The emerging infrastructure for scientific data must be:

• flexible but reliable,

• secure yet open,

• local and global,

• affordable yet high-performance.

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 4: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

4N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

Data Generators / Users

Usage scenarios

Big communities

Upload anddownload

Periodic transfers,quality checks …

Upload, add metadata, share

Scientists teams Isolated researchers

High energy PhysicsAstronomy

Earth Sciences

Life sciencesGenomic

EconomicsSocial sciences

Large datasetsFew large communities

Medium size datasetsIntermediate size teams

Small datasetsLarge n. of users

Size

of d

atas

ets

Users

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 5: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

5N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

REGISTERED - SHARED

PUBLISHED DATA

PRIVATE WORKSPACE

Link DOs withpublicationsDiscover DOs

RegisterDOs Stage DOs

Objective

Status:- Data deluge- Increasing complexitiy- Cost of isolated solutions

EUDAT solution objective:• Provide a common shared framework.• Connect private worspaces and public

archives.• Offer federated services.

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 6: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

6N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

Generic services

free at the point of use

B2ACCESS, B2SHARE, B2DROP, ...

Interaction with providers/nodes for customized services or offers

Users benefit from the common service managementapproach

Enhance the value and quality of research

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 7: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

7N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

EUDAT CDI services suitehttps://www.eudat.eu/services/

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 8: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

8N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

WhoAnyone

WhatAccess the federated infrastructure and services

Organisation Identity ProviderSocial account (e.g. Google, Microsoft Live and Facebook)B2ACCESS ID

WhyCommunity defined access controlSecureEasy

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 9: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

9N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

WhoData Managers

WhatCreate DMPsB2AccessEdit, manage and share them online

WhyAvailable templates: ScienceEurope, H2020

easy.DMP

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 10: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

10N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

WhoSmall to Medium Teams

WhatStore data (incl. software) and add domain meta dataShare registered research data worldwidePreserve (small-scale) research data for long-term

WhyRegister Data for PublicationsMake known to wider community

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 11: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

11N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

WhoSmall to Medium Teams

WhatStore data (incl. software) and add domain meta dataShare registered research data worldwidePreserve (small-scale) research data for long-term

WhyRegister Data for PublicationsMake known to wider community

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 12: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

12N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

WhoCommunity Data ManagersComplex Organizations

WhatProvide an abstraction layer which virtualizes large-scale data resourcesGuard against data loss in long-term archiving and preservationOptimize access for users from different regions Bring data closer to powerful computers

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 13: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

13N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

WhoCommunity Data Managers‘Sophisticated’ Organizations

WhatProvide an abstraction layer which virtualizes large-scale data resourcesGuard against data loss in long-term archiving and preservationOptimize access for users from different regions Bring data closer to powerful computers

WhyPerformanceReplication between trusted sitesData Preservation

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 14: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

14N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

FREE 20 GB per user Small research data

Medium scientific data, metadata, PIDs

FREE 2 GB per fileunlimited number of files

Contact EUDAT center Large scientific datasets replica, metadata, PIDs

EUDAT CDI services suitehttps://www.eudat.eu/services/

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 15: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

15N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

WhoAnyone

WhatFind collections of scientific data quickly and easily, irrespective of their origin, discipline or communityGet quick overviews of available dataBrowse through collections using standardized facets

WhyUnique collectionEase of Searching

http://b2find.eudat.eu/group

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 16: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

16N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

Harvests information from ontology repositories Supports semi-automatic annotation using text miningSupports manual data annotationEasy to use user interfaceIntegrates with the different B2 services

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 17: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

17N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 18: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

18N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

BSC DM- activities

Part of the EUDAT initiative from 2011

Generic service provider (level 1)

Executive Board member

Deployment

Climate modelsBio-medicine

(forthcoming)

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 19: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

19N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

Coordination of the RES

Service: Supercomputing

Plan to expand to data services

Data management activities

Support users with challenging needs

BSC DM- activities

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 20: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

20N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

BSC DM - motivation

• Promote the efficient usage of the infrastructure.

• Offer data services (storage, exploration, analysis) and computing capacity to projects with high needs.

• Ease the access to public research data

• Promote the re-use and exploitation of public fundedresearch data

• Collaborate with institutions who have discipline specificexperience in DM and publications services.

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 21: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

21N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

As a provider:

Data management policy

Security, privacy, ethic, liability

Services and tools

DM team - support

As a user:

Preparation of DMPs

Training on data management

Guidelines and good practices adoption

https://twitter.com/RES_HPC

https://www.res.es/en/news

BSC DM – services offer and access

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 22: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

22N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

EOSCpilot – the “Design Study” of EOSC

EOSC vision: EOSC Governance, Science Demonstrators, Rules of

Engagement & Service Management.

EOSC-Hub – the “engine” of EOSC

Project Direction, Governance & Strategy, Service Integration,

Communications, etc.

Key service areas: data management, metadata, sensitive data, long term

preservation

Joint integration activities with several communities: LOFAR, EISCAT,

ECRIN, CLARIN, ENES, etc.

Connection with other EOSC projects

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 23: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

23N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

EOSC-Hub collaboration with OpenAIRE

Data Management Plans - Development of a joint DMP

Work on standards for measuring and exchanging usage statistics

AAI activities - to bridge the AAI domain between EOSC-hub and OpenAIRE

Semantic Annotation – assessment of B2NOTE service for OpenAIRE Research

Community Dashboard and Zenodo services.

Connection with other EOSC projects

RDA synergy, to foster interoperability at a global level and remove barriers forsharing and (re)using data.

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 24: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

24N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

European Data Infrastructure (with PRACE and GEANT)

EUDAT uniquely positionned

at the intersection of data & HPC

bringing together many research communities.

EUDAT has a natural interest in bridging the EDI & EOSC for the

benefits of its stakeholders which are also heavy users of HPC.

Connection with other EOSC projects

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 25: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

25N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 26: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

EUDAT offers common data services to fulfil generic users issuesB2AccessB2Find, B2Handle, B2NoteB2Share, B2Safe, B2Stage, B2Drop

BSC activities inside the CDIDM services and supportCoherent with internal and European activities

Connection with EOSC projectsEOSCPilot, EOSC-hub, RDA and PRACE

Summary

26N. Tonello, Open Science Workshop CERCA 2019 , Barcelona

Moderador
Notas de la presentación
Not just ‘big’ experiments such as LHC and SKA, but the number of communities
Page 27: The EUDAT CDI - cerca.catcerca.cat/wp-content/uploads/2019/06/20190606_OpenData_EUDAT_NTonello.pdfEUDAT uniquely positionned at the intersection of . data & HPC. bringing together

Thank you!

Nadia TonelloHead of Data [email protected]

06/06/2019 Open Data Workshop CERCA, Barcelona

Moderador
Notas de la presentación
Presentation Background Outline: BSC Contribution to Gaia Past, Present and Future