introduction to ecoinformatics: past, present & future william michener lter network office,...

52
Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Upload: barry-wilcox

Post on 26-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Introduction to Ecoinformatics: Past, Present & Future

William Michener

LTER Network Office, University of New Mexico

January 2007

Page 2: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Outline Ecoinformatics: a definition A science vision Information challenges Ecoinformatics “solutions”

Page 3: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Outline Ecoinformatics: a definition A science vision Information challenges Ecoinformatics “solutions”

Page 4: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Ecoinformatics

A broad S&T discipline A broad S&T discipline

thatthat

incorporates both incorporates both concepts and concepts and practicalpractical toolstools

for thefor the

understanding, generation, understanding, generation, processing, and propagationprocessing, and propagation of of ecological data, information and ecological data, information and

knowledge.knowledge.

Page 5: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Outline Ecoinformatics: a definition A science vision Information challenges Ecoinformatics “solutions”

Page 6: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Many studies employ a Many studies employ a restricted scale of observation --restricted scale of observation --

Commonly 1 mCommonly 1 m22

The literature is biased toward The literature is biased toward single and small scale resultssingle and small scale results

Page 7: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Space

Space

ParametersParameters

Tim

eTim

e

Thinking Thinking OutsideOutside the “Box” the “Box”

LTERLTER

BiocomplexityBiocomplexity

NEON, WATERS, OOI, ….NEON, WATERS, OOI, ….

Increase in breadth and depth of understanding.....Increase in breadth and depth of understanding.....

Page 8: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

2001

2004

2004

1998

2000

2003

Grand environmental challenges

Page 9: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

More and more of the ecological questions that confront society are national, continental and global in scope

Source: CDC

Drought

Source: Drought Monitor

Page 10: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

LTER

26 NSF LTER Sites in the U.S. and the Antarctic: > 1,600 Scientists; 6,000+ Data Sets—different themes, methods, units, structure, ….

Page 11: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007
Page 12: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

NEON Climate Domains

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

120

Northeast

Mid Atlantic

Southeast

Atlantic Neotropical

Great Lakes

Prairie Peninsula

Appalachians / Cumberland Plateau

Ozarks Complex

Northern Plains

Central Plains

Southern Plains

Northern Rockies

Southern Rockies / Colorado Plateau

Desert Southwest

Great Basin

Pacific Northwest

Pacific Southwest

Tundra

Taiga

Pacific Tropical11

10

9

8

7

6

5

4

3

2

1

12

16

15

14

13

17

19

18

20

19

18

16

Page 13: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Aquatic Arrays

BioMesonet Tower and Sensor Arrays

Page 14: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Soil Sensor Arrays

Micron-scalenitrate ISE

Page 15: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Small-Organism Tracking: Mobile animals as bio-sentinels for environmental change, forecasting biological invasions, emerging disease spread

Page 16: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Outline Ecoinformatics: a definition A science vision Information challenges Ecoinformatics “solutions”

Page 17: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Characteristics of Ecological Data

Complexity/Metadata RequirementsComplexity/Metadata Requirements

SatelliteImages

DataDataVolumeVolume(per(perdataset)dataset)

LowLow

HighHigh

HighHigh

Soil CoresSoil Cores

PrimaryPrimaryProductivityProductivity

GISGIS

Population DataPopulation Data

BiodiversityBiodiversitySurveysSurveys

Gene Sequences

Business Data

WeatherStations Most EcologicalMost Ecological

DataData

MostMost SoftwareSoftware

Page 18: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Info

rmat

ion

Co

nte

nt

Time

Time of publication

Specific details

General details

Accident

Retirement or career change

Death

(Michener et al. 1997)

Data Entropy

Page 19: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Date Site Species Area Count 10/1/1993 N654 PIRU 2 26 10/3/1994 N654 PIRU 2 29 10/1/1993 N654 BEPA 1 3

Date Site picrub betpap 31Oct1993 1 13.5 1.6 14Nov1994 1 8.4 1.8

Date Site Species Density 10/1/1993 N654 Picea

rubens 13

10/3/1994 N654 Picea rubens

14.5

10/1/1993 N654 Betula papyifera

3

10/31/1993 1 Picea rubens

13.5

10/31/1993 1 Betula papyifera

1.6

11/14/1994 1 Picea rubens

8.4

11/14/1994 1 Betula papyifera

1.8

A B

• Schema transform• Coding transform• Taxon Lookup• Semantic transform

Imagine scaling!!

C

Date Site Species Area Count 10/1/1993 N654 PIRU 2 26 10/3/1994 N654 PIRU 2 29 10/1/1993 N654 BEPA 1 3

Date Site Species Density

10/1/1993 N654 Picea rubens

13

10/3/1994 N654 Picea rubens

14.5

10/1/1993 N654 Betula papyifera

3

10/31/1993 1 Picea rubens

13.5

10/31/1993 1 Betula papyifera

1.6

11/14/1994 1 Picea rubens

8.4

11/14/1994 1 Betula papyifera

1.8

B

C

Semantics

Page 20: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Semantics—Linking Taxonomic Semantics to Ecological Data

Rhynchospora plumosa s.l.

Elliot 1816

Gray 1834

Kral 1998

Peet 2002?

Chapman1860

R. plumosa

R. plumosa

R. Plumosav. intermedia

R. plumosav. plumosa

R. Plumosav. interrupta

R. plumosa

R. intermedia

R. pineticola

R. plumosav. plumosa

R. plumosav. pinetcola

R. sp. 1

Taxon concepts change over time (and space)Multiple competing concepts coexistNames are re-used for multiple concepts

from R. Peet

Date Species # 1830 R.plumosa 39 1840 R.plumosa 49 1900 R.plumosa 42 1985 R.plumosa 48 1995 R.plumosa 22 2000 R.plumosa 19

A B C0

10

20

30

40

50

60

1/1/00 1/2/00 1/3/00 1/4/00 1/5/00 1/6/00

Page 21: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

What Users Really Want…

Page 22: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Outline Ecoinformatics: a definition A science vision Information challenges Ecoinformatics “solutions”

Page 23: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Experimental DesignMethods

Data DesignData Forms

Data Entry

Field Computer Entry

ElectronicallyInterfaced Field

EquipmentElectronicallyInterfaced Lab

Equipment

Raw Data File

Quality Assurance Checks

Data Contamination

Data verified?

Data ValidatedArchive Data File

Archival Mass StorageMagnetic Tape / Optical Disk / Printouts

Access Interface

Off-site Storage

Secondary Users

Publication

Synthesis

Investigators

Summary Analyses

Quality Control

Metadata

Research ProgramInvestigators

Studies

yes

no

• Standard Operating Procedures• Policies

• Data sharing• Computer use• Archive storage

Page 24: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Ecoinformatics solutions

Data design Data acquisition QA/QC Data documentation (metadata) Data archival

Page 25: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Data design Data acquisition QA/QC Data documentation (metadata) Data archival

Ecoinformatics solutions

Page 26: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Data Design

Conceptualize and implement a logical structure within and among data sets that will facilitate data acquisition, entry, storage, retrieval and manipulation.

Page 27: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Database Types

File-system based Hierarchical Relational Object-oriented Hybrid (e.g., combination of relational and

object-oriented schema)

Porter 2000

Page 28: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Data Design: 7 Best Practices

Assign descriptive file names Use consistent and stable file formats Define the parameters Use consistent data organization Perform basic quality assurance Assign descriptive data set titles Provide documentation (metadata)

from Cook et al. 2000

Page 29: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

1. Assign descriptive file names File names should be unique and reflect the file contents Bad file names

Mydata 2001_data

A better file name Sevilleta_LTER_NM_2001_NPP.asc

Sevilleta_LTER is the project name NM is the state abbreviation 2001 is the calendar year NPP represents Net Primary Productivity data asc stands for the file type--ASCII

Page 30: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

2. Use consistent and stable file formats

Use ASCII file formats – avoid proprietary formats Be consistent in formatting

don’t change or re-arrange columns include header rows (first row should contain file name, data set

title, author, date, and companion file names) column headings should describe content of each column, including

one row for parameter names and one for parameter units within the ASCII file, delimit fields using commas, pipes (|), tabs, or

semicolons (in order of preference)

Page 31: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

3. Define the parameters

Use commonly accepted parameter names that describe the contents e.g., precip for precipitation

Use consistent capitalization e.g., not temp, Temp, and TEMP in same file

Explicitly state units of reported parameters in the data file and the metadata SI units are recommended

Choose a format for each parameter, explain the format in the metadata, and use that format throughout the file e.g., use yyyymmdd; January 2, 1999 is 19990102

Page 32: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

4. Use consistent data organization (one good approach)

Station Date Temp Precip

Units YYYYMMDD C mm

HOGI 19961001 12 0

HOGI 19961002 14 3

HOGI 19961003 19 -9999

Note: -9999 is a missing value code

Page 33: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

4. Use consistent data organization (a second good approach)

Station Date Parameter Value Unit

HOGI 19961001 Temp 12 C

HOGI 19961002 Temp 14 C

HOGI 19961001 Precip 0 mm

HOGI 19961002 Precip 3 mm

Page 34: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

5. Perform basic quality assurance Assure that data are delimited and line up in proper

columns Check that there no missing values for key parameters Scan for impossible and anomalous values Perform and review statistical summaries Map location data (lat/long) and assess errors Verify automated data transfers

e.g. check-sum techniques For manual data transfers, consider double keying data

and comparing 2 data sets

Page 35: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

6. Assign descriptive data set titles

Data set titles should ideally describe the type of data, time period, location, and instruments used (e.g., Landsat 7).

Data set title should be similar to names of data files Good: “Shrub Net Primary Productivity at the Sevilleta LTER,

New Mexico, 2000-2001” Bad: “Productivity Data”

Page 36: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

7. Provide documentation (metadata)

Page 37: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Ecoinformatics solutions Data design Data acquisition QA/QC Data documentation (metadata) Data archival

Page 38: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

High-quality data depend on:

Proficiency of the data collector(s) Instrument precision and accuracy Consistency (e.g., standard methods and

approaches) Design and ease of data entry

Sound QA/QC Comprehensive metadata (e.g., documentation

of anomalies, etc.)

Page 39: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Plant Life Stage______________ _____________________________ _____________________________ _____________________________ _____________________________ _____________________________ _____________________________ _____________________________ _______________

What’s wrong with this data sheet?

Page 40: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Important questions

How well does the data sheet reflect the data set design?

How well does the data entry screen (if available) reflect the data sheet?

Page 41: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Plant Life Stageardi P/G V B FL FR M S D NParpu P/G V B FL FR M S D NPatca P/G V B FL FR M S D NPbamu P/G V B FL FR M S D NPzigr P/G V B FL FR M S D NP

P/G V B FL FR M S D NPP/G V B FL FR M S D NP

PHENOLOGY DATA SHEET Rio Salado - Transect 1

Collectors:_________________________________Date:___________________ Time:_________Notes: ________________________________________________________________________________________

P/G = perennating or germinating M = dispersingV = vegetating S = senescingB = budding D = deadFL = flowering NP = not presentFR = fruiting

Page 42: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

PHENOLOGY DATA SHEET Rio Salado - Transect 1

Collectors Troy Maddux

Date: 16 May 1991 Time: 13:12

Notes: Cloudy day, 3 gopher burrows on transect

ardi P/G V B FL FR M S D NP

Y N Y N Y N Y NY NY NY N Y N Y N

arpu P/G V B FL FR M S D NP

Y N Y N Y N Y NY NY NY N Y N Y N

asbr P/G V B FL FR M S D NP

Y N Y N Y N Y NY NY NY N Y N Y N

deob P/G V B FL FR M S D NP

Y N Y N Y N Y NY NY NY N Y N Y N

Page 43: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Ecoinformatics solutions

Data design Data acquisition QA/QC Data documentation (metadata) Data archival

Page 44: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Experimental DesignMethods

Data DesignData Forms

Data Entry

Field Computer EntryElectronically

Interfaced FieldEquipment

ElectronicallyInterfaced Lab

Equipment

Raw Data File

Quality Assurance Checks

Data Contamination

Data verified?

Data ValidatedArchive Data File

Archival Mass StorageMagnetic Tape / Optical Disk / Printouts

Access Interface

Off-site Storage

Secondary Users

Publication

Synthesis

Investigators

Summary Analyses

Quality Control

Metadata

Research ProgramInvestigators

Studies

yes

no

Brunt 2000

Generic Data Processing

Page 45: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Ecoinformatics solutions

Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) – to be addressed Data archival

Page 46: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Ecoinformatics solutions

Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

Page 47: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Planning

Problem

Analysis and

modeling

Cycles of Research“A Conventional View”

Collection

Publicati

ons Data

Page 48: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Cycles of Research“A New View”

PlanningProblem Definition

(Research Objectives)

Analysis and

modeling

Planning

CollectionSelection andextraction

Archive of Data

OriginalObservations

SecondaryObservations

Publicati

ons

Page 49: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Data Archive

A collection of data sets, usually electronic, stored in such a way that a variety of users can locate, acquire, understand and use the data.

Examples: ESA’s Ecological Archive NASA’s DAACs (Distributed Active Archive Centers)

Page 50: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007
Page 51: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007
Page 52: Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Brunt (2000) Ch. 2 in Michener and Brunt (2000)

Porter (2000) Ch. 3 in Michener and Brunt (2000)

Edwards (2000) Ch. 4 in Michener and Brunt (2000)

Michener (2000) Ch. 7 in Michener and Brunt (2000)

Cook, R.B., R.J. Olson, P. Kanciruk, and L.A. Hook. 2000. Best practices for preparing ecological and ground-based data sets to share and archive. (online at http://www.daac.ornl.gov/cgi-bin/MDE/S2K/bestprac.html)

References