christine laney ken ramsey mark servilla information management issues and the trends project: a...
TRANSCRIPT
Christine LaneyKen RamseyMark Servilla
Information management issues and the Trends project: A drawing board for making
cross-site comparisons feasible
THANK YOU!! LTER Information Managers – you
know who you are LTER Network Office – Mark, James
B., Bob, Duane, Marshall, Inigo, James V.
NCEAS – Mark, Callie, Will, Jim, Rick Jornada staff – Ken, Justin &
technicians
Trends in Long-Term Ecological Data: a multi-agency synthesis project
Objectives to create a platform for synthesis by
producing a compendium of easily accessible long term graphs and data from long-term ecological research sites
to illustrate the utility of this platform in addressing important within-site and network-level scientific questions
Products
• Folio-sized book to be published by
Oxford Univ. Press • Website (data, metadata, graphs) for
synthesis and analysis
Book organization Introduction: value and importance of long-term research
Within-site graphs/tables arranged by four themes in the LTER Planning Process
Climate and variability in the physical environment, including disturbance characteristics
Human population and economy Biogeochemistry (e.g., atmospheric deposition, surface water
chemistry) Biotic structure (e.g., ANPP, plant biomass, species richness)
Among-site comparison graphs: e.g., atmospheric chemistry, N fertilization, climate variability, ENSO signal responses
Site descriptions and photos organized by biomes
Website – Trends Data Storewww.ecotrends.info
Initial design: Static datasets, metadata & book graphs listed by
chapter and figure number, some search capability Metadata for static data provide access to raw data
and the script used to generate the derived product Prototype near completion
Final design: Routinely harvested and derived data from ongoing
projects Metadata & links back to sources and revisions Search, sort, analysis & graphing tools Prototype in development
Coming soon: www.ecotrends.info
Participating Sites
ProcessSelecting variables
Submitted broad request for long-term data Downloaded data from other online compilations Examined submitted data for consistent variables across
sites (e.g., precipitation, nitrogen, etc.) Refined data request and requested additional data from
sites for variables that should exist, but may not have been submitted (e.g., ANPP, species richness, etc.)
Generated “wish list” of variables that may be important for cross-site and network-level questions, but long-term data don’t exist yet at very many sites (e.g., soil respiration, foliar nutrients). This will be used in planning grant activities.
Selecting variables for the webpage Use variables from book in static form first Update data sets with time and include additional variables
Contributors: 26 LTER (84%), 13 FS (12%), 6 ARS sites (3%) & Santa Rita ER (<1%)
Climate datasets ~300Biogeochemistry datasets ~150 Biotic datasets ~100Others ~50 Total : over 600 datasets
Plus 190 llustrative graphs
Human population and economy: collected for all LTER sites from census data (funded by NSF supplement)
Metadata: Most data have at least rudimentary metadata, few have full EML with attribute level description of the datasets.
Progress to date
What we’re doing with the data Downloading and storing data & documentation
Writing R or SAS scripts to generate: Datasets containing monthly or annual averages or totals,
depending on the variable Strict time plots with simple linear regression Tables that record all derived statistics Plots that show change over time among different sites for each
variable Anomaly plots of monthly climate data
Generating metadata with EML for each derived product. Metadata contains links to original data and associated scripts.
Recording each product (data, metadata, graphs), along with links between products, in a multi-purpose database.
Step 1. Graph similar data through time for sites with those data.
Step 2. Determine trend line by site.
Nitrate in precipitationMULTI-SITE ANALYSES
Step 3. Compare slopes of trend lines among sites.
Step 4. Compare slopes spatially.
Mean change in total deposition of N in nitrate form in precipitation
Challenges, solutions & opportunities
Obtaining data
Quality and quantity of data and documentation
Utilizing data toward specific goals
Properly documenting received data and products derived from the data
Making final products accessible to editorial committee and available on website
Obtaining Data: time-intensive and inconsistent process on both sides!
Located data on individual websites Few had their long-term data separated out from
short-term data Unable to search for long-term data
Utilized metacat via LTER, KNB, Morpho Slow search engine Unable to search for particular record lengths Unable to sort filtered records by time Metadata often available without attached data files
No pre-knowledge of types of available long-term data beyond basics (precip, temp, etc).
Result: a lot of emails and phone calls!
Challenges, opportunities & solutions
Obtaining data
Quality and quantity of data and documentation
Utilizing data toward specific goals
Properly documenting received data and products derived from the data
Making final products accessible to editorial committee and available on website
Quality and quantity of data and documentation
Lots of great data, varied level of detailed metadata in text or EML format
Small problems with single datasets large problems with many datasets
Online data sometimes not quality-checked or ready for use – but no markers to say so
Examples:
Looks nice…but….
The nit-picky details
Dates as an example: 2-digit years range of dates in single cell (e.g.,
02/01-03/2006 or 02/01/2006,02/03/2006)
date with a letter appended to the end (ex: 02/01/1999A)
single digit day and month, especially when there are no delimiters between month, day, year.
Preferred data formats for synthesis
Simple ascii delimited with commas, spaces, tabs, etc. with headers, or very simple excel spreadsheets. If fixed-width, give widths and spaces.
Metadata in separate file
All data in single file, not separated by year. If not possible, each file in exactly the same format.
Complex formatting systems, like multisheets & several tables in one sheet, are more difficult to interpret and extract information.
Challenges, opportunities & solutions
Obtaining data
Quality and quantity of data and documentation
Utilizing data toward specific goals
Properly documenting received data and products derived from the data
Making final products accessible to editorial committee and available on website
Utilizing data toward specific goals Selected variables with specified summary time spans
(monthly or yearly) with specified units. Converting short time scales to longer time scales – OK Converting long time scales to shorter – Impossible Unit conversion – often simple
FC W/m^2 MJ/m^2
Can be really difficult Flow in m from a weir m^3/s using weir dimensions Raw shield count data without calibrations given %
moist: impossible. Missing data – leads to bias in particular months/years
especially with totals.
Lots of consultation with metadata and PIs. What happens when metadata is incomplete & PIs are unavailable?
Challenges, opportunities & solutions
Obtaining data
Quality and quantity of data and documentation
Utilizing data toward specific goals
Properly documenting received data and products derived from the data
Making final products accessible to editorial committee and available on website
Properly documenting received data and products derived from the data
Morphing system Hierarchical folder system with emails Attempted EML documentation. Help from
NCEAS. Current Versioning System (CVS) &
multipurpose SQL Server & MySQL database. Documentation of deriving data and graphs
EML template Scripts
Metacat (versioning)
Challenges, opportunities & solutions
Obtaining data
Quality and quantity of data and documentation
Utilizing data toward specific goals
Properly documenting received data and products derived from the data
Making final products accessible to editorial committee and available on website
Trends editorial pagejornada-www.nmsu.edu
Voting page
Trends IM meeting, 15 min breakout
Site involvement/commitment to Trends Within site:
Percentage of IM time/resources spent compared to PIs
Percentage of time/resources spent on Trends compared to time spent on site needs
Too much, enough, too little? Among sites:
Has there been communication between sites about trends data requests?
Has Trends triggered any new collaborations or strengthened old ones?
Communication Progress reports: often and/or adequate enough? Recommendations for further communications
Trends IM meeting, 15 min breakout
Keeping track of data use & proper citation Now (by the trends project itself) In the future via the website
Trends IM meeting, 15 min breakout
International site involvement Interest in Trends project – how can
ILTER sites use the current set of data in their own research
Reasons pro and con for initiating a similar effort among ILTER sites
What would it take to do a Trends-like project at the international level?
List of contacts