christine laney ken ramsey mark servilla information management issues and the trends project: a...

32
Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Upload: jerome-taylor

Post on 11-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Christine LaneyKen RamseyMark Servilla

Information management issues and the Trends project: A drawing board for making

cross-site comparisons feasible

Page 2: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

THANK YOU!! LTER Information Managers – you

know who you are LTER Network Office – Mark, James

B., Bob, Duane, Marshall, Inigo, James V.

NCEAS – Mark, Callie, Will, Jim, Rick Jornada staff – Ken, Justin &

technicians

Page 3: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Trends in Long-Term Ecological Data: a multi-agency synthesis project

Objectives to create a platform for synthesis by

producing a compendium of easily accessible long term graphs and data from long-term ecological research sites

to illustrate the utility of this platform in addressing important within-site and network-level scientific questions

Page 4: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Products

• Folio-sized book to be published by

Oxford Univ. Press • Website (data, metadata, graphs) for

synthesis and analysis

Page 5: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Book organization Introduction: value and importance of long-term research

Within-site graphs/tables arranged by four themes in the LTER Planning Process

Climate and variability in the physical environment, including disturbance characteristics

Human population and economy Biogeochemistry (e.g., atmospheric deposition, surface water

chemistry) Biotic structure (e.g., ANPP, plant biomass, species richness)

Among-site comparison graphs: e.g., atmospheric chemistry, N fertilization, climate variability, ENSO signal responses

Site descriptions and photos organized by biomes

Page 6: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Website – Trends Data Storewww.ecotrends.info

Initial design: Static datasets, metadata & book graphs listed by

chapter and figure number, some search capability Metadata for static data provide access to raw data

and the script used to generate the derived product Prototype near completion

Final design: Routinely harvested and derived data from ongoing

projects Metadata & links back to sources and revisions Search, sort, analysis & graphing tools Prototype in development

Page 7: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Coming soon: www.ecotrends.info

Page 8: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Participating Sites

Page 9: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

ProcessSelecting variables

Submitted broad request for long-term data Downloaded data from other online compilations Examined submitted data for consistent variables across

sites (e.g., precipitation, nitrogen, etc.) Refined data request and requested additional data from

sites for variables that should exist, but may not have been submitted (e.g., ANPP, species richness, etc.)

Generated “wish list” of variables that may be important for cross-site and network-level questions, but long-term data don’t exist yet at very many sites (e.g., soil respiration, foliar nutrients). This will be used in planning grant activities.

Selecting variables for the webpage Use variables from book in static form first Update data sets with time and include additional variables

Page 10: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Contributors: 26 LTER (84%), 13 FS (12%), 6 ARS sites (3%) & Santa Rita ER (<1%)

Climate datasets ~300Biogeochemistry datasets ~150 Biotic datasets ~100Others ~50 Total : over 600 datasets

Plus 190 llustrative graphs

Human population and economy: collected for all LTER sites from census data (funded by NSF supplement)

Metadata: Most data have at least rudimentary metadata, few have full EML with attribute level description of the datasets.

Progress to date

Page 11: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

What we’re doing with the data Downloading and storing data & documentation

Writing R or SAS scripts to generate: Datasets containing monthly or annual averages or totals,

depending on the variable Strict time plots with simple linear regression Tables that record all derived statistics Plots that show change over time among different sites for each

variable Anomaly plots of monthly climate data

Generating metadata with EML for each derived product. Metadata contains links to original data and associated scripts.

Recording each product (data, metadata, graphs), along with links between products, in a multi-purpose database.

Page 12: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Step 1. Graph similar data through time for sites with those data.

Step 2. Determine trend line by site.

Nitrate in precipitationMULTI-SITE ANALYSES

Page 13: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Step 3. Compare slopes of trend lines among sites.

Page 14: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Step 4. Compare slopes spatially.

Mean change in total deposition of N in nitrate form in precipitation

Page 15: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Challenges, solutions & opportunities

Obtaining data

Quality and quantity of data and documentation

Utilizing data toward specific goals

Properly documenting received data and products derived from the data

Making final products accessible to editorial committee and available on website

Page 16: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Obtaining Data: time-intensive and inconsistent process on both sides!

Located data on individual websites Few had their long-term data separated out from

short-term data Unable to search for long-term data

Utilized metacat via LTER, KNB, Morpho Slow search engine Unable to search for particular record lengths Unable to sort filtered records by time Metadata often available without attached data files

No pre-knowledge of types of available long-term data beyond basics (precip, temp, etc).

Result: a lot of emails and phone calls!

Page 17: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Challenges, opportunities & solutions

Obtaining data

Quality and quantity of data and documentation

Utilizing data toward specific goals

Properly documenting received data and products derived from the data

Making final products accessible to editorial committee and available on website

Page 18: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Quality and quantity of data and documentation

Lots of great data, varied level of detailed metadata in text or EML format

Small problems with single datasets large problems with many datasets

Online data sometimes not quality-checked or ready for use – but no markers to say so

Examples:

Page 19: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Looks nice…but….

Page 20: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible
Page 21: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

The nit-picky details

Dates as an example: 2-digit years range of dates in single cell (e.g.,

02/01-03/2006 or 02/01/2006,02/03/2006)

date with a letter appended to the end (ex: 02/01/1999A)

single digit day and month, especially when there are no delimiters between month, day, year.

Page 22: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Preferred data formats for synthesis

Simple ascii delimited with commas, spaces, tabs, etc. with headers, or very simple excel spreadsheets. If fixed-width, give widths and spaces.

Metadata in separate file

All data in single file, not separated by year. If not possible, each file in exactly the same format.

Complex formatting systems, like multisheets & several tables in one sheet, are more difficult to interpret and extract information.

Page 23: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Challenges, opportunities & solutions

Obtaining data

Quality and quantity of data and documentation

Utilizing data toward specific goals

Properly documenting received data and products derived from the data

Making final products accessible to editorial committee and available on website

Page 24: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Utilizing data toward specific goals Selected variables with specified summary time spans

(monthly or yearly) with specified units. Converting short time scales to longer time scales – OK Converting long time scales to shorter – Impossible Unit conversion – often simple

FC W/m^2 MJ/m^2

Can be really difficult Flow in m from a weir m^3/s using weir dimensions Raw shield count data without calibrations given %

moist: impossible. Missing data – leads to bias in particular months/years

especially with totals.

Lots of consultation with metadata and PIs. What happens when metadata is incomplete & PIs are unavailable?

Page 25: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Challenges, opportunities & solutions

Obtaining data

Quality and quantity of data and documentation

Utilizing data toward specific goals

Properly documenting received data and products derived from the data

Making final products accessible to editorial committee and available on website

Page 26: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Properly documenting received data and products derived from the data

Morphing system Hierarchical folder system with emails Attempted EML documentation. Help from

NCEAS. Current Versioning System (CVS) &

multipurpose SQL Server & MySQL database. Documentation of deriving data and graphs

EML template Scripts

Metacat (versioning)

Page 27: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Challenges, opportunities & solutions

Obtaining data

Quality and quantity of data and documentation

Utilizing data toward specific goals

Properly documenting received data and products derived from the data

Making final products accessible to editorial committee and available on website

Page 28: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Trends editorial pagejornada-www.nmsu.edu

Page 29: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Voting page

Page 30: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Trends IM meeting, 15 min breakout

Site involvement/commitment to Trends Within site:

Percentage of IM time/resources spent compared to PIs

Percentage of time/resources spent on Trends compared to time spent on site needs

Too much, enough, too little? Among sites:

Has there been communication between sites about trends data requests?

Has Trends triggered any new collaborations or strengthened old ones?

Communication Progress reports: often and/or adequate enough? Recommendations for further communications

Page 31: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Trends IM meeting, 15 min breakout

Keeping track of data use & proper citation Now (by the trends project itself) In the future via the website

Page 32: Christine Laney Ken Ramsey Mark Servilla Information management issues and the Trends project: A drawing board for making cross-site comparisons feasible

Trends IM meeting, 15 min breakout

International site involvement Interest in Trends project – how can

ILTER sites use the current set of data in their own research

Reasons pro and con for initiating a similar effort among ILTER sites

What would it take to do a Trends-like project at the international level?

List of contacts