christine laney ken ramsey mark servilla information management issues and the trends project: a...

Post on 11-Jan-2016

220 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Christine LaneyKen RamseyMark Servilla

Information management issues and the Trends project: A drawing board for making

cross-site comparisons feasible

THANK YOU!! LTER Information Managers – you

know who you are LTER Network Office – Mark, James

B., Bob, Duane, Marshall, Inigo, James V.

NCEAS – Mark, Callie, Will, Jim, Rick Jornada staff – Ken, Justin &

technicians

Trends in Long-Term Ecological Data: a multi-agency synthesis project

Objectives to create a platform for synthesis by

producing a compendium of easily accessible long term graphs and data from long-term ecological research sites

to illustrate the utility of this platform in addressing important within-site and network-level scientific questions

Products

• Folio-sized book to be published by

Oxford Univ. Press • Website (data, metadata, graphs) for

synthesis and analysis

Book organization Introduction: value and importance of long-term research

Within-site graphs/tables arranged by four themes in the LTER Planning Process

Climate and variability in the physical environment, including disturbance characteristics

Human population and economy Biogeochemistry (e.g., atmospheric deposition, surface water

chemistry) Biotic structure (e.g., ANPP, plant biomass, species richness)

Among-site comparison graphs: e.g., atmospheric chemistry, N fertilization, climate variability, ENSO signal responses

Site descriptions and photos organized by biomes

Website – Trends Data Storewww.ecotrends.info

Initial design: Static datasets, metadata & book graphs listed by

chapter and figure number, some search capability Metadata for static data provide access to raw data

and the script used to generate the derived product Prototype near completion

Final design: Routinely harvested and derived data from ongoing

projects Metadata & links back to sources and revisions Search, sort, analysis & graphing tools Prototype in development

Coming soon: www.ecotrends.info

Participating Sites

ProcessSelecting variables

Submitted broad request for long-term data Downloaded data from other online compilations Examined submitted data for consistent variables across

sites (e.g., precipitation, nitrogen, etc.) Refined data request and requested additional data from

sites for variables that should exist, but may not have been submitted (e.g., ANPP, species richness, etc.)

Generated “wish list” of variables that may be important for cross-site and network-level questions, but long-term data don’t exist yet at very many sites (e.g., soil respiration, foliar nutrients). This will be used in planning grant activities.

Selecting variables for the webpage Use variables from book in static form first Update data sets with time and include additional variables

Contributors: 26 LTER (84%), 13 FS (12%), 6 ARS sites (3%) & Santa Rita ER (<1%)

Climate datasets ~300Biogeochemistry datasets ~150 Biotic datasets ~100Others ~50 Total : over 600 datasets

Plus 190 llustrative graphs

Human population and economy: collected for all LTER sites from census data (funded by NSF supplement)

Metadata: Most data have at least rudimentary metadata, few have full EML with attribute level description of the datasets.

Progress to date

What we’re doing with the data Downloading and storing data & documentation

Writing R or SAS scripts to generate: Datasets containing monthly or annual averages or totals,

depending on the variable Strict time plots with simple linear regression Tables that record all derived statistics Plots that show change over time among different sites for each

variable Anomaly plots of monthly climate data

Generating metadata with EML for each derived product. Metadata contains links to original data and associated scripts.

Recording each product (data, metadata, graphs), along with links between products, in a multi-purpose database.

Step 1. Graph similar data through time for sites with those data.

Step 2. Determine trend line by site.

Nitrate in precipitationMULTI-SITE ANALYSES

Step 3. Compare slopes of trend lines among sites.

Step 4. Compare slopes spatially.

Mean change in total deposition of N in nitrate form in precipitation

Challenges, solutions & opportunities

Obtaining data

Quality and quantity of data and documentation

Utilizing data toward specific goals

Properly documenting received data and products derived from the data

Making final products accessible to editorial committee and available on website

Obtaining Data: time-intensive and inconsistent process on both sides!

Located data on individual websites Few had their long-term data separated out from

short-term data Unable to search for long-term data

Utilized metacat via LTER, KNB, Morpho Slow search engine Unable to search for particular record lengths Unable to sort filtered records by time Metadata often available without attached data files

No pre-knowledge of types of available long-term data beyond basics (precip, temp, etc).

Result: a lot of emails and phone calls!

Challenges, opportunities & solutions

Obtaining data

Quality and quantity of data and documentation

Utilizing data toward specific goals

Properly documenting received data and products derived from the data

Making final products accessible to editorial committee and available on website

Quality and quantity of data and documentation

Lots of great data, varied level of detailed metadata in text or EML format

Small problems with single datasets large problems with many datasets

Online data sometimes not quality-checked or ready for use – but no markers to say so

Examples:

Looks nice…but….

The nit-picky details

Dates as an example: 2-digit years range of dates in single cell (e.g.,

02/01-03/2006 or 02/01/2006,02/03/2006)

date with a letter appended to the end (ex: 02/01/1999A)

single digit day and month, especially when there are no delimiters between month, day, year.

Preferred data formats for synthesis

Simple ascii delimited with commas, spaces, tabs, etc. with headers, or very simple excel spreadsheets. If fixed-width, give widths and spaces.

Metadata in separate file

All data in single file, not separated by year. If not possible, each file in exactly the same format.

Complex formatting systems, like multisheets & several tables in one sheet, are more difficult to interpret and extract information.

Challenges, opportunities & solutions

Obtaining data

Quality and quantity of data and documentation

Utilizing data toward specific goals

Properly documenting received data and products derived from the data

Making final products accessible to editorial committee and available on website

Utilizing data toward specific goals Selected variables with specified summary time spans

(monthly or yearly) with specified units. Converting short time scales to longer time scales – OK Converting long time scales to shorter – Impossible Unit conversion – often simple

FC W/m^2 MJ/m^2

Can be really difficult Flow in m from a weir m^3/s using weir dimensions Raw shield count data without calibrations given %

moist: impossible. Missing data – leads to bias in particular months/years

especially with totals.

Lots of consultation with metadata and PIs. What happens when metadata is incomplete & PIs are unavailable?

Challenges, opportunities & solutions

Obtaining data

Quality and quantity of data and documentation

Utilizing data toward specific goals

Properly documenting received data and products derived from the data

Making final products accessible to editorial committee and available on website

Properly documenting received data and products derived from the data

Morphing system Hierarchical folder system with emails Attempted EML documentation. Help from

NCEAS. Current Versioning System (CVS) &

multipurpose SQL Server & MySQL database. Documentation of deriving data and graphs

EML template Scripts

Metacat (versioning)

Challenges, opportunities & solutions

Obtaining data

Quality and quantity of data and documentation

Utilizing data toward specific goals

Properly documenting received data and products derived from the data

Making final products accessible to editorial committee and available on website

Trends editorial pagejornada-www.nmsu.edu

Voting page

Trends IM meeting, 15 min breakout

Site involvement/commitment to Trends Within site:

Percentage of IM time/resources spent compared to PIs

Percentage of time/resources spent on Trends compared to time spent on site needs

Too much, enough, too little? Among sites:

Has there been communication between sites about trends data requests?

Has Trends triggered any new collaborations or strengthened old ones?

Communication Progress reports: often and/or adequate enough? Recommendations for further communications

Trends IM meeting, 15 min breakout

Keeping track of data use & proper citation Now (by the trends project itself) In the future via the website

Trends IM meeting, 15 min breakout

International site involvement Interest in Trends project – how can

ILTER sites use the current set of data in their own research

Reasons pro and con for initiating a similar effort among ILTER sites

What would it take to do a Trends-like project at the international level?

List of contacts

top related