Ecological Information Management (EIM) 2008LTER Information Management Committee Meeting, July 23-25, 2013
Don HenshawH.J. Andrews Experimental Forest
Considering best practices in managing sensor data
COMMON THEMES FROM PARTICIPATING SITESJOINT NERC ENVIRONMENTAL SENSOR NETWORK/SENSOR NIS WORKSHOP, HUBBARD BROOK EXPERIMENTAL FOREST, NH, OCTOBER 25-27TH, 2011
Greatest Needs Middleware between sensor/data logger and
database/applications Programming supportTraining workshops to disseminate knowledge & solutions Ways to share experiences with software and tools that are
useful Clearinghouse for sharing code and solutions
Knowledge Base (web page) organized by topics (http://wiki.esipfed.org/index.php/EnviroSensing_Cluster)
LTER Information Management Committee Meeting, July 23-25, 2013
Joint NERC Environmental Sensor Network/LTER SensorNIS Workshop, October 25-27th, 2011
o Online resource guide outline• Sensor, site, and platform selection• Data acquisition and transmission• Sensor management, tracking, documentation• Streaming data management middleware• Sensor data quality assurance/quality control (QA/QC)• Sensor data archiving
ESIP EnviroSensing Cluster: Building a sensor network resource guide through
community participation
Software Tools for Sensor Networks, April 23-26, 2013
o Problem statement• Vast array of possible sensor/hardware packages for
multiple science applications• Communication among PI’s, techs, and specialists
o work together in considering options and planning • Deployment may be based on interacting factors
o e.g., permitting, geography, access• Considerations:
o seasonal weather patterns, power sources, communications options, land ownership, distance from managing institution, available personnel/expertise, and potential expansion/future-proofing
Sensor, site, and platform selection
LTER Information Management Committee Meeting, July 23-25, 2013
Data acquisition and transmissionProblem statement
• Manual downloads of environmental sensor data may not be sufficient to assure data security or data integrity, or allow direct control of devices
• Considerations: o need for immediate accesso need for one- or two-way transmission methodso bandwidth requirements to transfer the datao need for line-of-site communication or repeaterso hardware and network protocolso power consumption of the system components o physical and network security requirements
LTER Information Management Committee Meeting, July 23-25, 2013
Sensor management, tracking, and documentation
Problem statement• Documentation of field procedures need to be
sufficient to withstand personnel changes over time• Noted sensor issues and problems need to be quickly
communicated among field technicians, lead investigators and data managers
• Sensor histories are typically tracked in field notebooks or field check sheets and are essential for internal review of data streams, but are often inaccessible to data handlers
• Noted field problems may provide insight into quality control issues and data behavior and should be captured in data qualifier flags
LTER Information Management Committee Meeting, July 23-25, 2013
SENSOR MANAGEMENT, TRACKING, DOCUMENTATION
Software Tools for Sensor Networks Training, 1 May 2012
• Develop protocols for installation, calibration, maintenance, and removal of sensors
Track sensor events and history Record sensor events and failures, deployment
information, calibration events, maintenance history, operational dates, etc.
Record sensor descriptions, methodology changes, sampling frequency, geo-location, photo points, etc.
Documentation Standardize field notebooks or field checklists Build log files or databases for annotation of sensor
events, e.g.,• Timestamp (or range), DataloggerID, SensorID,
event category, description and note taker of event
LTER Information Management Committee Meeting, July 23-25, 2013
Sensor data quality assurance and quality control (QA/QC)
• Preventative QA measures in the field are desirable• Automated QC is necessary for
o near real-time use of datao efficient processing of high volume data streams
• Manual methods are unavoidableo a hybrid QC system will include subsequent manual
inspection and additional QC checking• QC system must
o provide qualifier flags to sensor datao accommodate feedback to policies and procedureso assure that all QC workflows are documented
LTER Information Management Committee Meeting, July 23-25, 2013
QUALITY ASSURANCE – PREVENTATIVE MEASURES Routine calibration and maintenance
Anticipate common repairs and replacement parts Record known events that may impact
measurements Continuous monitoring and evaluating
of sensor network Early detection of problems Automated alerts; in situ web cams
Sensor redundancy Ideal: Triple the sensor, triple the logger! Practical: Cheaper, lower cost, lower resolution
sensors, or correlated (proxy) sensors Alternative: Datalogger-independent sensor spot
checks; portable instrument package
LTER Information Management Committee Meeting, July 23-25, 2013
QUALITY CONTROL ON STREAMING DATA: POSSIBLE QUALITY CONTROL CHECKS IN NEAR REAL-TIME
Timestamp integrity (Date/time) Sequential, fixed intervals, i.e., checks for time step or frequency variation
Range checks Sensor specifications - identify impossible values; not unlikely ones Seasonal/reasonable historic values
Internal (plausibility) checks E.g., TMAX-TMIN>0, snow depth>snow water equivalence Consistency of derived values
Variance checks Sigma (standard deviation), Delta/step (difference of subsequent pairs),
change in slope checks e.g., outlier detections, indicator of sensor degradation Sensitivity is specific to site and sensor type
Persistence checks Check for repeating values that may indicate sensor failure
E.g., freezing, sensor capacity issues Spatial checks
Use correlations with redundant or nearby sensors, e.g., check for sensor drift LTER Information Management Committee Meeting, July 23-25, 2013
QUALITY CONTROL ON STREAMING DATA:DATA QUALIFIERS (DATA FLAGS)
Many vocabularies of data flags Good approach
Rich vocabulary of fine-grained flags for streaming data – intended to guide local review site-specific flags
Simpler vocabulary of flags for “final” data for public consumption, e.g., ‘Accepted’, ‘Missing’, ‘Estimated’, ‘Suspicious’, estimate uncertainty
Certain types of qualifiers may be better as data columns Method shifts, sensor shifts Place key documentation as close to data value as possible
Image from Campbell et.al., Bioscience, In Press.
SENSOR DATA ARCHIVING
Archiving strategies create well documented data snapshots assign unique, persistent identifiers maintain data and metadata versioning store data in text-based formats
Partner with community supported archivesE.g., the LTER NIS, or federated archive initiatives such as
DataONEBest practices
develop an archival data management plan implement a sound data backup plan archive raw data (but they do not need to be online) make data publicly available that have appropriate QA/QC
procedures applied assign QC level to published data sets
LTER Information Management Committee Meeting, July 23-25, 2013
QUALITY CONTROL ON STREAMING DATA:QUALITY LEVELS
Quality control is performed at multiple levels Level 0 (Raw streaming data)
Raw data, no QC, no data qualifiers applied (data flags) Preservation of original data streams is essential
Level 1 (QC applied, qualifiers added) Provisional level (near real-time preparation)
if released, provisional data must be labeled clearly Published level (delayed release)
QC process is complete; data is unlikely to change Level 2 (Gap-filled or estimated data)
Involves interpretation – may be controversial Desirable when generating summarized data, but
transparency critical – flag estimated values
LTER Information Management Committee Meeting, July 23-25, 2013
• Examples: o Read, reformat, export of different data types or
structures (input/output)o Automated QA/QC on data streamso Integration of field notes and documentation with the
datao Archiving
Streaming data management middlewareo Definition/Purpose• “Middleware” in conjunction with sensor networks is
computer software that enables communication and management of data from field sensors to a client such as a database or a website
• Purpose of middleware includes the collection, analysis, and visualization of data
• Middleware is chained together into a scientific workflow
LTER Information Management Committee Meeting, July 23-25, 2013
Streaming data management middlewareo Middleware/software – Proprietary• Campbell Scientific LoggerNet
o functionality to set up and configure a network of loggerso tools to program, visualize, monitor, and publish data
• Vista Engineering: Vista Data Vision (VDV)o tools to store and organize data from various loggerso visualization, alarming, reporting, and web publishing features
• YSI EcoNet (for YSI monitoring instrumentation)o delivery of data from the field to the YSI web servero visualization, reports, alarms, and email notification tools
• NexSens: iCharto Windows-based data acquisition software packageo interfaces with popular products such as YSI, OTT, ISCO sensors
LTER Information Management Committee Meeting, July 23-25, 2013
SENSOR DATA MANAGEMENT MIDDLEWARE OPEN SOURCE ENVIRONMENTS FOR STREAMING DATA
Matlab GCE toolbox (Proprietary/ limited open source) GUI, visualization, metadata-based analysis, manages QA/QC
rules and qualifiers, tracks provenance
Open Source DataTurbine Initiative Streaming data engine, receives data from various sources
and sends to analysis and visualization tools, databases, etc.
Kepler Project (open source) GUI, reuse and share analytical components/workflows with
other users, tracks provenance, integrates software components and data sources
SENSOR MANAGEMENT BEST PRACTICES WORKSHOP PARTICIPANTS
Don Henshaw (AND) - organizerCorinna Gries (NTL) - organizerRenee Brown (SEV)Adam Kennedy (AND)Richard Cary (CWT)Mary Martin (HBR)Christine Laney (UTEP, JRN)Jennifer Morse (NWT)Chris Jones (DataONE)Branko Zdravkovic (Univ of Saskatchewan)Scotty Strachan (Univ of Nevada-Reno)
Jordan Read (USGS) - vtcWade Sheldon (GCE) - vtc
LTER Information Management Committee Meeting, July 23-25, 2013