identifier infrastructure usage for global climate …...identifier infrastructure usage for global...

15
Tobias Weigel (DKRZ) Tobias Weigel Deutsches Klimarechenzentrum (DKRZ) World Data Center for Climate (WDCC) Identifier Infrastructure Usage for Global Climate Reporting IoT Week 2017, Geneva

Upload: others

Post on 27-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

Tobias Weigel Deutsches Klimarechenzentrum (DKRZ) World Data Center for Climate (WDCC)

Identifier Infrastructure Usage for Global Climate Reporting

IoT Week 2017, Geneva

Page 2: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

Scientific driver: Global climate modelling

2 09.06.2017 Identifier Infrastructure Usage for Global Climate Reporting

https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip6

Page 3: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

Scientific driver: Global climate modelling

3 09.06.2017 Identifier Infrastructure Usage for Global Climate Reporting

Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor: Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev., 9, 1937-1958, 2016. doi:10.5194/gmd-9-1937-2016

Operational phase ca. 2017-2021+

Community-driven, aligned with IPCC AR6

Global data volume in order of 100-250 PB full replication

impossible!

Page 4: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

The climate data life-cycle

4 09.06.2017 Identifier Infrastructure Usage for Global Climate Reporting

M. Lautenschlager

Page 5: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

The Earth System Grid Federation

Identifier Infrastructure Usage for Global Climate Reporting 5 09.06.2017

D. Williams (LLNL); U.S. DOE 2017. 6th Annual Earth System Grid Federation Face-to-Face Conference Report. DOE/SC-0188. U.S. Department of Energy Office of Science

http://esgf.llnl.gov http://esgf-data.dkrz.de

Page 6: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

DKRZ technical infrastructure and ESGF

Identifier Infrastructure Usage for Global Climate Reporting 6 09.06.2017

S. Kindermann

Page 7: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

Making it scalable requires additional effort

7 09.06.2017

Buurman, Weigel, Juckes, Lautenschlager, Kindermann: Persistent Identifiers for CMIP6 in the Earth System Grid Federation, EGU 2016

Identifier Infrastructure Usage for Global Climate Reporting

Page 8: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

Properties stored in Handle records for ESGF

8 09.06.2017

Files Datasets

URL URL

aggregation_level aggregation_level

url_replica replaced_by

tracking_ID replaces

checksum errata_IDs

is_part_of has_parts

DRS_ID DRS_ID

file_size

file_name

Identifier Infrastructure Usage for Global Climate Reporting

Page 9: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

Automation

Why do we care? What is the long-term strategy?

9 09.06.2017

Compute : I/O Data volume, complexity,

audience

Data life-cycle model File/object management

practice

Architectural layering Processing to the data: New

services, cultural change

Drivers

Induced change

Solution space

Identifier Infrastructure Usage for Global Climate Reporting

Insight and integrity (provenance, QC)

Page 10: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

The users‘ reality...

10 09.06.2017 Identifier Infrastructure Usage for Global Climate Reporting

Page 11: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

Type-Triggered Automated Processing (T-TAP)

11 09.06.2017 Identifier Infrastructure Usage for Global Climate Reporting

netCDF-Files

Collection

<Metadata> (xml)

? (third-party

input)

Processing service (WPS)

output

well-defined ways to publish it (automatically)

possible repacking into a new collection multiple types, e.g. netcdf,

xml, linked data, text reports, PROV record

described in DTR

script

Agent

Page 12: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

Data processing perspectives

Climate data analytics service (for EOSC)

Cluster-based, 2 pilot implementations, 2018+

Copernicus Climate Change Service (C3S)

coordinated by ECMWF, operational 2018+

WPS-based service ecosystem with multiple deployments

Identifier Infrastructure Usage for Global Climate Reporting 12 09.06.2017

Page 13: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

End users, developers, and automated

processes

deal with persistently identified, virtually aggregated digital objects, including

collections

which are overlays on multiple network services

Identifier Service Identifier Service

Repo/Registry Repo/Registry Repo/Registry

Repo/Registry

Repo/Registry

Identifier Service

which in turn are overlays on existing or

future information storage systems.

Global Digital Object Cloud (GDOC)

ID: 987/…

101110010101001010 010101010101010100 010101010101010100 111110101101010111

ID: 123…

ID: 876…

A

ID: XZY…

A

ID: HGY…

A

(object:collection)

ID: 843…

G

(object:publication)

(object:dataset)

L. Lannom / DFIG

09.06.2017 Identifier Infrastructure Usage for Global Climate Reporting 13

Page 14: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

GDOC and reusable data service components

Identifier Infrastructure Usage for Global Climate Reporting 14 09.06.2017

PID registry

Type registry

Collection builder

Processing executor

Search component

Schema registry

Broker

Page 15: Identifier Infrastructure Usage for Global Climate …...Identifier Infrastructure Usage for Global Climate Reporting 09.06.2017 3 Eyring, Bony, Meehl, Senior, Stevens, Stouffer, Taylor:

Tobias Weigel (DKRZ)

Thank you for your attention.

[email protected]

Identifier Infrastructure Usage for Global Climate Reporting 15 09.06.2017