guidelines for data collection technical document...european long-term ecosystem, critical zone and...
TRANSCRIPT
European long-term ecosystem, critical zone and socio-ecological systems research
infrastructure PLUS
Guidelines for data collection
Technical Document
06. October 2020
Author(s) and affiliations: Johannes Peterseil1, Sarah Geiger1, Melanie Tista1
1 Umweltbundesamt GmbH (EAA), Spittelauer Lände 5, 1090 Vienna, Austria
2 | Page TechDoc Guideline data collection
Prepared under contract from the European Commission Grant agreement No. 871128 EU Horizon 2020 Research and Innovation action Project acronym: eLTER PLUS
Project full title: European long-term ecosystem, critical zone and socio-ecological systems research infrastructure PLUS
Start of the project: Feb 2020 Duration: 60 months
Website: https://www.lter-europe.net/projects/elter-plus Deliverable title: Guidelines for data collection Deliverable n°: Technical document Nature of the deliverable: Report Dissemination level: Public
Citation: Peterseil, J. (2020) Guidelines for data collection. Technical Document, EU Horizon 2020 eLTER PLUS Project, Grant agreement No. 871128.
Deliverable status:
Version Status Date Author(s)
0.1 Draft 19. August 2020 Johannes Peterseil (EAA)
1.0 Final 06.10.2020 Johannes Peterseil & Sarah Geiger
The content of this deliverable does not necessarily reflect the official opinions of the European Commission or other institutions of the European Union.
3 | Page TechDoc Guideline data collection
Table of Contents
1 Introduction 4
1.1 Purpose of this document 4
2 Data provision workflow 5
2.1 Data Submission Report 7
3 Supported Data formats 8
4 Providing metadata 9
4.1 Keywords 9
4.2 Online distribution link 10
4.3 DEIMS-SDR 10
5 Providing data via Cloud store 10
5.1 Naming convention 10
5.2 EUDAT B2DROP 10
5.3 EUDAT B2SHARE 11
6 Annex – Data requirements 12
4 | Page TechDoc Guideline data collection
1 Introduction
The eLTER PLUS project is to improve the key infrastructure services of the LTER networks over the coming years. A key objective of this project is to use existing LTER data to address topical scientific research challenges, for example the impact of drought on ecosystem resilience. We aim for a unique holistic approach addressing the research questions in eight case studies focusing on biodiversity, biogeochemical, hydrological and socio-ecological stressors using the diverse resources of the evolving eLTER RI. Data from LTER sites are used to address the research challenges defined by eLTER being
Biodiversity loss
Bio-geochemical processes
Hydrological processes
Socio-ecological systems. Data from a range of sites are required to address cross-site and cross scale analysis. The outcome will be scientific and data publications enabling co-authorship, thereby increasing the visibility eLTER site both individually and as part of the network. In addition eLTER PLUS aims to develop, test and provide central workflows on data validation and harmonisation which aim to support data streams of the upcoming eLTER RI and should service local data management and publication. Data provided are collected in a safe repository only allowing access to the project team and ensuring the property rights and licenses by the data providers.
1.1 Purpose of this document
The aim of the present document is to provide the necessary information the technical workflows of the data provision based on the Virtual Access (VA) and contribution to the project aims. The different specification chapters are structured according to the following information:
Data provision workflow (chapter 2)
Data formats (chapter 3)
Metdadata documentation (chapter 4)
Data provision (chapter 5) Additional supporting documents are provided on the technical description on the provision of data (eLTER_T7.3_Data_Provision_B2SHARE, eLTER_T7.3_Data_Provision_B2DROP) and authoring of metadata (eLTER_T7.3_Metadata_Provision_DEIMSSDR). For general questions, please contact
● Johannes Peterseil ([email protected] ) or ● Herbert Haubold ([email protected] )
Technical information can be provided by
● Sarah Geiger ([email protected]) ● Melanie Tista ([email protected]) ● Christoph Wohner ([email protected]) ● Johannes Peterseil ([email protected])
5 | Page TechDoc Guideline data collection
2 Data provision workflow
The entity for data provision is a data package, which contains the data files (one or more containing defined data, e.g. time slices) and, if required, the descriptive meta-information (e.g. station, method). This is either provided as single zip-file or located in a specific directory.
Figure 1 provides a schematic overview on the data provision workflow
Figure 1 Structure of the data provision
6 | Page TechDoc Guideline data collection
The data provision workflow consists of the following steps:
1) Data needs to be provided in pre-defined data formats.
If data are already structured following well-established data formats, (e.g. ICP Integrated Monitoring, ICP Forest, IC Water, ICOS Ecosystem, Darwin Core) these well-established data formats are used. The selected data format needs to be mentioned in the data submission report (INST_DataSubmissionReport_DATE).
If data are not structured according to well established data formats, the eLTER Data Specification needs to be applied (see table 2).
2) Data need to be provided using a shared web link for download. The data can either be open access or only to be used internally. The data usage needs to be described in the metadata (e.g. license).
Data can be provided using a public or private accessible cloud repository provided by the data provider. The download link needs to be recorded in the data submission report.
If no cloud repository can be provided by the data provider, the use of B2DROP (https://b2drop.eudat.eu/, based on NextCloud) or B2SHARE (https://b2share.eudat.eu, public archive) is recommended. The download link needs to be recorded in the data submission report.
3) A data package needs to be described by open accessible metadata. The metadata needs to be provided in English to ensure the usability.
If descriptive metadata are already provided in a local metadata catalogue, the link to the publically available metadata URL is provided in the data submission report.
If no metadata are provided or metadata are provided in local languages, the data (package) needs to documented on DEIMS-SDR (https://deims.org)
4) For each data submission a Data Submission Report file on the data provided needs to be sent.
7 | Page TechDoc Guideline data collection
2.1 Data Submission Report
For each data submission, a Data Submission Report (as Excel file) needs to be generated and sent via e-mail. The report contains information on data packages provided including contact information. This information is only used for internal reporting purposes on the data collection for the project.
Table 1 provides an overview on the fields to be provided for the Data Submission Report.
Table 1 Data provision protocol file – core fields
Field name Field description Field type Example
ORG_NAME Name of institution Text UTF-8 Umweltbundesamt
CONTACT_NAME Name of data provider Text UTF-8 Johannes Peterseil
CONTACT_EMAIL Email of data provider email-adress
DATE_PUBLICATION Date of publication/ provision YYYY-MM-DD
2020-09-30
DATA_VERSION Individual code assigned to the data set by the data provider
number 1.0
GENERAL_COMMENTS Any comments related to the entire data set
Text UTF-8 data for plots uploaded
SITE_CODE Reference to the LTER site providing the deims.id accessible through https://deims.org
URL https://deims.org/8eda49e9-1f4e-4f3e-b58e-e0bb25dc32a6
METADATA_URL Reference to the metadata record describing the data files or data package, e.g. link to deims.org MD record (URL) or accessible metadata catalogue
URL https://deims.org/dataset/cfc1a860-840a-41f2-a208-cf66ab25dfa4
DATA_URL Link (URL) to the download location for the data files or data package, e.g. B2DROP or private Cloud store
URL http://hdl.handle.net/11097/71e09618-f769-4f99-a96f-fd6f90135011
DATA_FORMAT Comments on the data reporting format used, e.g. ICP Forest
Text UTF-8 eLTER Data Reporting V1.3
8 | Page TechDoc Guideline data collection
3 Supported Data formats
Transformation of data formats for data providers should be avoided as far as possible. Therefore a number of suggested data formats are provided in Table 2. If data cannot be provided in this recommended format, we ask to use the eLTER Data Specfication for observation data. Table 2 Recommended data formats for data provision
Name URL Format
ICP Integrated Monitoring
https://www.syke.fi/en-US/Research__Development/Nature/Monitoring/Integrated_Monitoring/Manual_for_Integrated_Monitoring/7_Methodology_and_Reporting_of_Subprogrammes
csv, xls
ICP Forests http://icp-forests.net/page/data-submission https://icp-forests.org/documentation/Surveys/index.html https://icp-forests.org/documentation/Introduction/General_remarks.html https://icp-forests.org/documentation-adds/csv-templates/csv_templates.zip
csv
ICP Waters http://www.icp-waters.no/data/submit-data/
eLTER Data Specification
(Data specification) https://drive.google.com/file/d/1ud7ZKScn3k5PUW0_QvA1JzBpg9iX1UC7/view?usp=sharing (Template) https://drive.google.com/file/d/15F0PeE4VVWpTFnSeQ17ahGqNSssi0CAf/view?usp=sharing
csv, xls
Biodiversity (DwC Event)
csv
OGC WFS/WMS gml, xml
OGC SOS xml
Spatial data vector gdb, shp, OGC WFS
Spatial data raster geoTiff, raster, OGC WMS
9 | Page TechDoc Guideline data collection
4 Providing metadata
A minimum set of metadata needs to be provided for the data package:
Data set title
Reference to the LTER site (deims.id see https://deims.org)
Abstract
Data range
Data set owner and contact
Keywords (see also below)
Parameters observed
Geographic context of the dataset (bounding box)
Online distribution link (see also below)
Method description
Data policy
4.1 Keywords
It is important that you add the keyword “eLTER PLUS VA” to your dataset as this is needed to easily identify datasets that have been shared within the eLTER PLUS project.
Additionally, a keyword describing the data topic is needed, those being:
● Biodiversity
● Biomass
● Climate
● Atmospheric deposition
● Soil-atmosphere gas exchange
● Air quality
● Soil climate
● Solid soil chemistry
● Soil water
● Runoff, streams and standing water
● Groundwater
● Land use
● Socio-economy
● Topography
● Vegetation
● Soil physics
● Land cover
● Habitat diversity
● Remote sensing
An example for proper usage of keywords for a dataset would be: “eLTER VA PLUS” “Vegetation” „ground vegetation“ „understory vegetation“ „forest vegetation“
10 | Page TechDoc Guideline data collection
4.2 Online distribution link
In case data are not shared via a public repository and have restricted use in the eLTER PLUS project, in the Online distribution link the email to the contact person needs to be provided.
Nevertheless, in the data Submission Report the download link to the data files need to be provided. These links are not shared and used only for project purposes.
4.3 DEIMS-SDR
DEIMS-SDR (https://deims.org) is the central metadata catalogue for eLTER. It allows to document research sites as well as related datasets. In case metadata are not provided via an alternative catalogue the data packages needs to documented in DEIMS-SDR.
A short description on how to use DEIMS-SDR and how to obtain a user account is provided in the document eLTER_T7.3_Metadata_Provision_DEIMSSDR.
5 Providing data via Cloud store
5.1 Naming convention
The naming convention for files is: Template: [2-digit Country code]_[LTER Site Name]_[Data group] _[Variable group]_[Time span]_[Version]
Example: AT_ZOEBELBODEN_VEG_SPECCOVER_2015_V20170315
Element Description Example
2-digit Country code
Reference to the country of the site as two-digit country code according to ISO 3166-1 alpha-2
AT
LTER Site Code Name of the site according to DEIMS-SDR, if the name is too long the site name can be shortened
ZOEBELBODEN
Data Topic Max 5-digit code for data topic or observation programme, e.g. METEO (Meteorology), BIODIV (biodiversity), DEPO (deposition), GHG (Green House gas), SW (Soil water). The abbreviation is defined by the data provider depending on the data.
VEG
Variable group Optional, list of variables or variable groups contained in the data. The abbreviation is defined by the data provider depending on the data.
SPECCOVER
Time span Time span covered in the data 2015
Version Data version in the format “V”YYYYMMDD V20170315
5.2 EUDAT B2DROP
B2DROP (https://b2drop.eudat.eu/) is a fully secured data exchange service on the basis of a NextCloud server. During transmission data are encrypted through the exclusive use of the https protocol for data transfer. Access to the data is only given to project members of the eLTER project. Additional documentation can be found at https://eudat.eu/services/userdoc/b2drop.
11 | Page TechDoc Guideline data collection
A short description on how to use B2DROP is provided in document eLTER_T7.3_Data_Provision_B2DROP.
5.3 EUDAT B2SHARE
B2SHARE (https://b2share.eudat.eu) allows the upload of data files to a web repository. B2SHARE is developed and by the EUDAT2020 project in order to provide a common data infrastructure for European research data. eLTER PLUS is using the services provided by the EUDAT2020 project in order to share and publish data files. When uploading data files to B2SHARE a PID (Persistent Identifier) and DOI (Digital Object Identifier) is created to allow an unambiguous identification of the file including a resolution service for the PID. B2SHARE supports an open data policy, allowing implementing a retention period. A data file uploaded to the B2SHARE repository needs to be accompanied by a metadata record in DEIMS-SDR describing the observation context. A short description on how to use B2SHARE is provided in the document eLTER_T7.3_Data_Provision_B2SHARE.
12 | Page TechDoc Guideline data collection
6 Annex – Data requirements
Following variables were indicated in the data requirements. Please check with your answers in the data availability survey.
Type Group Variable Unit Temporal resolution
BIO
_T8_1
BG
C_T8
_2
CW
F_T8_3
SES_T8_4
BIO
_T9_1
BG
C_T9
_2
CW
F_T9_3
SES_T9_4
TimeSeries
Biodiversity
all taxonomic groups
occurrence/abundance/composition/biomass
annual x x
TimeSeries
Biomass Terrestrial aboveground biomass
kg*m-2 annual x x
TimeSeries
Biomass Terrestrial aboveground vegetation growth
kg*m-2 monthly/annual
x x
TimeSeries
Biomass Terrestrial net primary production
kgC*m-2 daily/monthly/annual
x x x x x x
TimeSeries
Biomass Terrestrial gross primary production
kgC*m-2 daily/monthly/annual
x x x x x x
TimeSeries
Biomass Leaf area Index (LAI)
unitless daily/monthly/annual
x x x x x
TimeSeries
Biomass Tissue (Leaf) C and N content
% annual x x
TimeSeries
Biomass Aboveground litterfall
kg*m-2 monthly/annual
x x
TimeSeries
Biomass Belowground (soil) litterfall
kg*m-2 monthly/annual
x x
TimeSeries
Climate Air temperature °C 30 min x x x x x x x x
TimeSeries
Climate Precipitation mm 30 min x x x x x x x x
TimeSeries
Climate Relative air humidity
% 30 min x x x x
TimeSeries
Climate Wind speed / Wind direction
m*s-1, ° 30 min x x
TimeSeries
Climate Surface atmospheric pressure
mbar 30 min x x
TimeSeries
Climate Photosynthetic Active Radiation
W*m-2 or PPFD 30 min x x x
TimeSeries
Climate Direct incoming short wave radiation
W*m-2 30 min x x x
TimeSeries
Climate Reflected short wave radiation
W*m-2 30 min x x
TimeSeries
Climate Diffused long-wave radiation from the sky
W*m-2 30 min x x
TimeSeries
Climate Diffused long-wave radiation from the surface
W*m-2 30 min x x
TimeSeries
Atmospheric deposition
Bulk NH4-N, NO3-N, Ntot deposition in precipitation
kg*ha-1 monthly x x x x
TimeSeries
Atmospheric deposition
Bulk P deposition in precipitation
kg*ha-1 monthly x x x x
13 | Page TechDoc Guideline data collection
Type Group Variable Unit Temporal resolution
BIO
_T8_1
BG
C_T8
_2
CW
F_T8_3
SES_T8_4
BIO
_T9_1
BG
C_T9
_2
CW
F_T9_3
SES_T9_4
TimeSeries
Atmospheric deposition
Bulk K deposition in precipitation
kg*ha-1 monthly x x x x
TimeSeries
Atmospheric deposition
Bulk NH4-N, NO3-N, Ntot deposition in canopy throughfall (forests)
kg*ha-1 monthly x x x x
TimeSeries
Atmospheric deposition
Bulk P deposition in canopy throughfall (forests)
kg*ha-1 monthly x x x x
TimeSeries
Atmospheric deposition
Bulk K deposition in canopy throughfall (forests)
kg*ha-1 monthly x x x x
TimeSeries
Atmospheric deposition
NH4-N, NO3-N, Ntot deposition in stemflow (forests)
kg*ha-1 monthly x x x x
TimeSeries
Atmospheric deposition
P deposition in stemflow (forests)
kg*ha-1 monthly x x x x
TimeSeries
Atmospheric deposition
K deposition in stemflow (forests)
kg*ha-1 monthly x x x x
TimeSeries
Atmospheric deposition
Dry deposition of Nitrogen
kg*ha-1 monthly x x x x
TimeSeries
Soil-atmosphere gas exchange
Eddy covariance flux data product (compatible with ICOS L2, Ameriflux, Fluxnet)
various various x x
TimeSeries
Soil-atmosphere gas exchange
Soil CO2 flux gC*m-2 daily/monthly/annual
x x
TimeSeries
Soil-atmosphere gas exchange
Soil CH4 flux gC*m-2 daily/monthly/annual
x x
TimeSeries
Soil-atmosphere gas exchange
Soil N2O, NO, NOx flux
gC*m-2 daily/monthly/annual
x x
TimeSeries
Air quality
Ozone concentration
ppm 30 min x x
TimeSeries
Air quality
NOx concentration
mg*m-3 30 min x x
TimeSeries
Air quality
NH3 concentration
mg*m-3 30 min x x
TimeSeries
Soil climate
Soil temperature °C 30 min x x x x x
14 | Page TechDoc Guideline data collection
Type Group Variable Unit Temporal resolution
BIO
_T8_1
BG
C_T8
_2
CW
F_T8_3
SES_T8_4
BIO
_T9_1
BG
C_T9
_2
CW
F_T9_3
SES_T9_4
TimeSeries
Soil climate
Soil water content Vol% 30 min x x x x x x
TimeSeries
Solid soil chemistry
Soil organic C content (per horizon)
mg*kg-1 annual x x x x x x
TimeSeries
Solid soil chemistry
Soil total N content (per horizon)
mg*kg-1 annual x x x x x x
TimeSeries
Solid soil chemistry
Soil total P content (per horizon)
mg*kg-1 annual x x x x x x
TimeSeries
Solid soil chemistry
Soil total K content (per horizon)
mg*kg-1 annual x
TimeSeries
Solid soil chemistry
Soil pH (in H2O/KCl/CaCl2) (per horizon)
unitless annual x x x x x x
TimeSeries
Solid soil chemistry
Soil cation exchange capacity (per horizon)
cmolc*kg-1 annual x x x x x x
TimeSeries
Solid soil chemistry
Soil base saturation (per horizon)
% annual x x x x x
TimeSeries
Soil water
NH4-N, NO3-N, DON concentration
mg*l-1 monthly x x x x
TimeSeries
Soil water
NH4-N, NO3-N, DON leaching
kg*ha-1 monthly x x x x
TimeSeries
Soil water
DOC concentration
mg*l-1 monthly x x x x
TimeSeries
Soil water
DOC leaching kg*ha-1 monthly x x x x
TimeSeries
Soil water
Percolation mm monthly x x x x
TimeSeries
Runoff, streams and standing water
TOC, DOC concentration
mg*l-1 monthly or higher frequency
x x x
TimeSeries
Runoff, streams and standing water
Ntot, NH4-N, NO3-N concentration
mg*l-1 monthly (or finer)
x x x x x
TimeSeries
Runoff, streams and standing water
P concentration mg*l-1 monthly (or finer)
x x x x x
TimeSeries
Runoff, streams and standing water
Cation concentrations
mg*l-1 monthly (or finer)
x x x
TimeSeries
Runoff, streams
Anion concentrations
mg*l-1 monthly (or finer)
x x
15 | Page TechDoc Guideline data collection
Type Group Variable Unit Temporal resolution
BIO
_T8_1
BG
C_T8
_2
CW
F_T8_3
SES_T8_4
BIO
_T9_1
BG
C_T9
_2
CW
F_T9_3
SES_T9_4
and standing water
TimeSeries
Runoff, streams and standing water
pH value unitless monthly (or finer)
x x x
TimeSeries
Runoff, streams and standing water
Conductivity mS*m-1 monthly (or finer)
x x x x
TimeSeries
Runoff, streams and standing water
Discharge l*s-1 monthly (or finer)
x x x x
TimeSeries
Runoff, streams and standing water
Water temperature
°C monthly (or finer)
x x
TimeSeries
Runoff, streams and standing water
Pesticides mg*l-1 monthly (or finer)
x x
TimeSeries
Runoff, streams and standing water
Transparency m monthly (or finer)
x x
TimeSeries
Groundwater
Groundwater table height
cm monthly (or finer)
x x x x
TimeSeries
Groundwater
Cation concentrations
mg*l-1 monthly (or finer)
x x x
TimeSeries
Groundwater
Anion concentrations
mg*l-1 monthly (or finer)
x x
TimeSeries
Land use
N, P, K fertilisation
kg*m-2 annual x x x x x
TimeSeries
Land use
Liming kg*m-2 annual x x x x x
TimeSeries
Land use
Pesticides kg*m-2 annual x x x x x
TimeSeries
Land use
Grazing timing, stocking intensity, animal
time*yr-1, LU*ha-1 annual x x x x x x
TimeSeries
Land use
Crop, grassland harvesting
kg*m-2 annual x x x x x x
TimeSeries
Land use
Forest planting, thinning, clearcut
number/m³*ha-1 annual x x x x x x
TimeSeries
Land use
Irrigation timing, amount
m³*event annual x x x x
TimeSeries
Land use
Vistors number annual x x
TimeSeries
Socio-economy
Gross domestic product
EUR decadal x x x
TimeSeries
Socio-economy
Governance structure and character
descriptive decadal x x
16 | Page TechDoc Guideline data collection
Type Group Variable Unit Temporal resolution
BIO
_T8_1
BG
C_T8
_2
CW
F_T8_3
SES_T8_4
BIO
_T9_1
BG
C_T9
_2
CW
F_T9_3
SES_T9_4
TimeSeries
Socio-economy
Demography tbd decadal x x x
TimeSeries
Socio-economy
Property ownership/laws/institutions
tbd decadal x x
Descriptive
Topography
elevation, slope, aspect
m, °, ° x x
Descriptive
Vegetation
Forest stand age yr x x x x
Descriptive
Soil physics
Soil type categories x x x x x
Descriptive
Soil physics
Parent material (geology)
categories x x x x x
Descriptive
Soil physics
Soil bulk density g*cm-3 x x x x
Descriptive
Soil physics
Soil depth (per horizon)
cm x x x x
Descriptive
Soil physics
Soil fractions (Water stable aggregate fractionation)
% per fraction x x
Descriptive
Soil physics
Soil texture categories x x x x
Descriptive
Soil physics
Soil stone content (>2mm)
% x x x x
Descriptive
Soil physics
Soil water retention capacity (water retention curves; field capacity, water saturation, wilting point)
mm*cm-1 x x x x
Descriptive
Soil physics
Saturated hydraulic conductivity
m*s-1 x x x x
Descriptive
Land cover
Land cover categories x x x x x x x x
Descriptive
Land use
Crop type categories x x
Descriptive
Land use
Fire regime categories x
Descriptive
Land use
Forest management
categories x
Descriptive
Land use
Land use (historic) descriptive x x x
Descriptive
Land use
Land use change descriptive x x x
Descriptive
Habitat diversity
River Habitat Survey (RHS) information
categories x
Descriptive
Habitat diversity
Habitat map/list categories x
Remote Sensing
Remote sensing
Productivity (NDVI)
x
Remote Sensing
Remote sensing
Phenology/Seasonality
x
17 | Page TechDoc Guideline data collection
Type Group Variable Unit Temporal resolution
BIO
_T8_1
BG
C_T8
_2
CW
F_T8_3
SES_T8_4
BIO
_T9_1
BG
C_T9
_2
CW
F_T9_3
SES_T9_4
Remote Sensing
Land cover
CORINE Land cover (or similar)
categories x x x x x x x x