status of cdi service (connections, netcdf introduction, cdi ingestion service and set-up of cdi...
TRANSCRIPT
Status of CDI service (connections, NetCDF introduction, CDI ingestion service and set-up of CDI
ingestion pilot)
ByDick M.A. Schaap – Technical Coordinator
Trieste – Italy, 3 – 4 March 2015, Combined EMODNet Chemistry 2 TWG and SeaDataNet II
TTG
CDI service for discovery and unified access of data
Already 106 data centres connected and more underway
Sea
> 500 European data originators
NODCs; HOs; GEOs; BIOs; ICES; PANGAEA
> 100 data centres
Data discovery And access
GEOSS portal IODE ODP portal
CDI Data Discovery and Access service
Total collection
Black Sea portal Caspian portal Geo-Seas portal
Bathymetry
Physics
Chemistry
Geology
Biology
Aggregated collection
Regional subsets
Thematic subsets
Thematic subsets
Thematic subsets
Thematic subsets
Thematic subsets
Thematic subsets
SeaDataNet as driver behind many portals
CDI Data Discovery & Access service
Coverage March 2015: > 1,71 million CDI entries from 106 data centres in 34 countries and 546 originators for physics, chemistry, geology, geophysics, bathymetry and biology; years 1800 – 2015; 85% unrestricted or under SeaDataNet licence
CDI Data Discovery & Access service
1 March 2015: 106 data centres connected
Situation (1st March 2015) with DM and MARINE_ID
In total 94 data centres connected by DM and 12 by interim solution. New DM V1.4.5 available for NetCDF support. 91 monitored by NAGIOS and 93 by Robot Shopper. Marine-ID now used by 80 data centres.
NetCDF introduction into CDI service
SeaDataNet NetCDF (CF) format defined for profiles, trajectories and timeseries; indicated in L24 by CFPOINTDelivery to users of CDI service Started with DM V1.4.4.Principles:
Partners with pre-processed files - MEDATLAS and ODV – (modus 3 or 1) and with database generated ODV files (modus 2) can rely on DM software to generate NetCDF (CF) files on the fly;MARIS has to add CFPOINT as extra filetype to CDI metadata to have it working.MARIS has upgraded the CDI shopping mechanism to include CFPOINT
Initial test with IFREMER => went ok. Additional test with IEO => issue.Repair by installing DM V1.4.5Data centres should indicate to MARIS a one-time CFPOINT addition to CDI range. Thereafter data centres themselves have to deliver new / updated CDIs including CFPOINT where appropriate.
NetCDF introduction into CDI service
IFREMER Argo floats
Requesting a sample set
NetCDF introduction into CDI service
Chosing NetCDF (CF) format
NetCDF introduction into CDI service
RSM – CFPOINT OK
Downloaded in NetCDF (CF) format
D9.2 incl updating D4.5 and D5.3: Harvesting of CDI XMLs – planned implementationObjective is to upgrade the submission and processing of CDI entries from data centres to the CDI portal service by means of harvesting and ingestion
IFREMER has adopted GeoNetwork for supporting CDI XML output of MIKADO and making it available by means of local CS-W service
Testbed has been set-up with IFREMER, BSH and IEO that have installed and configured the GeoNetwork software as provided by IFREMER.
MARIS has tested central harvesting of CDI XML from the GeoNetwork CS-W service (could also be provided by other local software than GeoNetwork)
Final check needed on XML consistency and criterium for harvesting only entries since specific date
Harvesting of CDI XMLs – check on subset
MARIS has implemented a simple HTML page at:
http://www.gasandoil.nl/csw_hits/check_hits.php
This allows pilot data centres to check whether the harvest selection goes well as it is supposed to be. MARIS does a selection on the CDI tag var06 = REVISION-DATE OF DATASET; this should be stable and represent the date that the CDI XML was generated / updated.
Query example for IEO for updates with revision date > 2014-12-31:http://www.seadatanet.ieo.es/geonetwork-sdn/srv/eng/csw-cdi?
SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecords&constraintLanguage=CQL_TEXT&CONSTRAINT_LANGUAGE_VERSION=1.1.0&CONSTRAINT=csw%3ArevisionDate%3E%272014-12-31%27&outputFormat=application%2Fxml&resultType=hits&maxRecords=10&ElementSetName=brief
Harvesting of CDI XMLs – check on subset
The test page performs a series of date tests and gives results. See below for IFREMER:
Clicking on Details gives corresponding CDI XML records with their Local_CDI_ID’s
Action: IFREMER, IEO and BSH to check whether the revision date criterion performs ok.
D4.5 and D5.3: Ingestion of harvested CDI XMLs – status of online CMS for data centresChallenge: central CDI ingestion taking into account the staging process and relational model CDI – coupling table – local data MARIS applies a staging process for populating new and updated CDI entries, received from data centres:
Validation of syntax and semanticsif okDuplicates check => report to data centre for check if okImport of CDIs incl GML validationif ok CDIs in Import CDI service and user interface for visual check by data centresif okData centres must update Coupling Table and arrange Local Data setsif okCDIs moved to production CDI service for public use
Agreed to upgrade this into an online system whereby data centres can manage themselves => establishing data centre self responsibility + 24 / 7
Ingestion of harvested CDI XMLs – status of online CMS for data centres
CMS under development
Ingestion of harvested CDI XMLs – status of online CMS for data centres
Principle is that per data provider each time only ONE harvested batch will be processed; when ready, then next harvest will take place at regular intervals (e.g. each week)
System runs in steps through a batch process in which data provider is asked to interact only a few times:
Check identified XML errors (syntax – semantics) for possible repair in next batchCheck identified potential duplicates (against import and production) and undertake action to delete real duplicates from the import databaseCheck overall remaining CDI import and submit action for moving to production (after confirming up-to-date coupling table and new data availability)
The overall process works in batch mode: for example moving from import to production takes place each night because also requires a lot of indexing.
Ingestion of harvested CDI XMLs – status of online CMS for data centres
CMS with all possible ‘status’ examples
Ingestion of harvested CDI XMLs – status of online CMS for data centres
Possible interactions for data provider
Batch no
Prevailing status
Option to retrieve logof XML errors
Options to retrieve logs, to inspect CDIs and to mark CDIs in import for deletion
Click to mark for removal or moving to production(see next page)
Option to inspect CDIs in import
Ingestion of harvested CDI XMLs – status of online CMS for data centres
Final action: Removing remaining Batch or moving to Production
CDI ingestion pilot
Pilot in SeaDataNet II with a number of data centres: IFREMER, IEO, BSH
Will get access to the new online CMS for trying out.
Test takes place in operational environment, but MARIS will monitor steps with roll-back options.
Data providers must still also provide CDI XML by mail for comparison
Note: CDI deletions continue as specials by email from data centres to MARIS
END