status of cdi service (connections, netcdf introduction, cdi ingestion service and set-up of cdi...

21
Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator Trieste – Italy, 3 – 4 March 2015, Combined EMODNet Chemistry 2 TWG and SeaDataNet II TTG

Upload: alvin-webb

Post on 19-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI

ingestion pilot)

ByDick M.A. Schaap – Technical Coordinator

Trieste – Italy, 3 – 4 March 2015, Combined EMODNet Chemistry 2 TWG and SeaDataNet II

TTG

Page 2: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

CDI service for discovery and unified access of data

Already 106 data centres connected and more underway

Page 3: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

Sea

> 500 European data originators

NODCs; HOs; GEOs; BIOs; ICES; PANGAEA

> 100 data centres

Data discovery And access

GEOSS portal IODE ODP portal

CDI Data Discovery and Access service

Total collection

Black Sea portal Caspian portal Geo-Seas portal

Bathymetry

Physics

Chemistry

Geology

Biology

Aggregated collection

Regional subsets

Thematic subsets

Thematic subsets

Thematic subsets

Thematic subsets

Thematic subsets

Thematic subsets

SeaDataNet as driver behind many portals

Page 4: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

CDI Data Discovery & Access service

Coverage March 2015: > 1,71 million CDI entries from 106 data centres in 34 countries and 546 originators for physics, chemistry, geology, geophysics, bathymetry and biology; years 1800 – 2015; 85% unrestricted or under SeaDataNet licence

Page 5: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

CDI Data Discovery & Access service

1 March 2015: 106 data centres connected

Page 6: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

Situation (1st March 2015) with DM and MARINE_ID

In total 94 data centres connected by DM and 12 by interim solution. New DM V1.4.5 available for NetCDF support. 91 monitored by NAGIOS and 93 by Robot Shopper. Marine-ID now used by 80 data centres.

Page 7: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

NetCDF introduction into CDI service

SeaDataNet NetCDF (CF) format defined for profiles, trajectories and timeseries; indicated in L24 by CFPOINTDelivery to users of CDI service Started with DM V1.4.4.Principles:

Partners with pre-processed files - MEDATLAS and ODV – (modus 3 or 1) and with database generated ODV files (modus 2) can rely on DM software to generate NetCDF (CF) files on the fly;MARIS has to add CFPOINT as extra filetype to CDI metadata to have it working.MARIS has upgraded the CDI shopping mechanism to include CFPOINT

Initial test with IFREMER => went ok. Additional test with IEO => issue.Repair by installing DM V1.4.5Data centres should indicate to MARIS a one-time CFPOINT addition to CDI range. Thereafter data centres themselves have to deliver new / updated CDIs including CFPOINT where appropriate.

Page 8: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

NetCDF introduction into CDI service

IFREMER Argo floats

Requesting a sample set

Page 9: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

NetCDF introduction into CDI service

Chosing NetCDF (CF) format

Page 10: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

NetCDF introduction into CDI service

RSM – CFPOINT OK

Downloaded in NetCDF (CF) format

Page 11: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

D9.2 incl updating D4.5 and D5.3: Harvesting of CDI XMLs – planned implementationObjective is to upgrade the submission and processing of CDI entries from data centres to the CDI portal service by means of harvesting and ingestion

IFREMER has adopted GeoNetwork for supporting CDI XML output of MIKADO and making it available by means of local CS-W service

Testbed has been set-up with IFREMER, BSH and IEO that have installed and configured the GeoNetwork software as provided by IFREMER.

MARIS has tested central harvesting of CDI XML from the GeoNetwork CS-W service (could also be provided by other local software than GeoNetwork)

Final check needed on XML consistency and criterium for harvesting only entries since specific date

Page 12: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

Harvesting of CDI XMLs – check on subset

MARIS has implemented a simple HTML page at:

http://www.gasandoil.nl/csw_hits/check_hits.php

This allows pilot data centres to check whether the harvest selection goes well as it is supposed to be. MARIS does a selection on the CDI tag var06 = REVISION-DATE OF DATASET; this should be stable and represent the date that the CDI XML was generated / updated.

Query example for IEO for updates with revision date > 2014-12-31:http://www.seadatanet.ieo.es/geonetwork-sdn/srv/eng/csw-cdi?

SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecords&constraintLanguage=CQL_TEXT&CONSTRAINT_LANGUAGE_VERSION=1.1.0&CONSTRAINT=csw%3ArevisionDate%3E%272014-12-31%27&outputFormat=application%2Fxml&resultType=hits&maxRecords=10&ElementSetName=brief

Page 13: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

Harvesting of CDI XMLs – check on subset

The test page performs a series of date tests and gives results. See below for IFREMER:

Clicking on Details gives corresponding CDI XML records with their Local_CDI_ID’s

Action: IFREMER, IEO and BSH to check whether the revision date criterion performs ok.

Page 14: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

D4.5 and D5.3: Ingestion of harvested CDI XMLs – status of online CMS for data centresChallenge: central CDI ingestion taking into account the staging process and relational model CDI – coupling table – local data MARIS applies a staging process for populating new and updated CDI entries, received from data centres:

Validation of syntax and semanticsif okDuplicates check => report to data centre for check if okImport of CDIs incl GML validationif ok CDIs in Import CDI service and user interface for visual check by data centresif okData centres must update Coupling Table and arrange Local Data setsif okCDIs moved to production CDI service for public use

Agreed to upgrade this into an online system whereby data centres can manage themselves => establishing data centre self responsibility + 24 / 7

Page 15: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

Ingestion of harvested CDI XMLs – status of online CMS for data centres

CMS under development

Page 16: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

Ingestion of harvested CDI XMLs – status of online CMS for data centres

Principle is that per data provider each time only ONE harvested batch will be processed; when ready, then next harvest will take place at regular intervals (e.g. each week)

System runs in steps through a batch process in which data provider is asked to interact only a few times:

Check identified XML errors (syntax – semantics) for possible repair in next batchCheck identified potential duplicates (against import and production) and undertake action to delete real duplicates from the import databaseCheck overall remaining CDI import and submit action for moving to production (after confirming up-to-date coupling table and new data availability)

The overall process works in batch mode: for example moving from import to production takes place each night because also requires a lot of indexing.

Page 17: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

Ingestion of harvested CDI XMLs – status of online CMS for data centres

CMS with all possible ‘status’ examples

Page 18: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

Ingestion of harvested CDI XMLs – status of online CMS for data centres

Possible interactions for data provider

Batch no

Prevailing status

Option to retrieve logof XML errors

Options to retrieve logs, to inspect CDIs and to mark CDIs in import for deletion

Click to mark for removal or moving to production(see next page)

Option to inspect CDIs in import

Page 19: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

Ingestion of harvested CDI XMLs – status of online CMS for data centres

Final action: Removing remaining Batch or moving to Production

Page 20: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

CDI ingestion pilot

Pilot in SeaDataNet II with a number of data centres: IFREMER, IEO, BSH

Will get access to the new online CMS for trying out.

Test takes place in operational environment, but MARIS will monitor steps with roll-back options.

Data providers must still also provide CDI XML by mail for comparison

Note: CDI deletions continue as specials by email from data centres to MARIS

Page 21: Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI ingestion pilot) By Dick M.A. Schaap – Technical Coordinator

END