integrated data management system for critical zone ... · integrated data management system for...

37
Integrated Data Management System for Critical Zone Observatories CZOData II Mark Williams, UC-Boulder. Anthony K. Aufdenkampe, SWRC. Kerstin Lehnert, IEDA/Columbia. Ilya Zaslavsky, SDSC. David Tarboton, USU Jeff Horsburgh, USU. Emilio Mayorga, UW-APL

Upload: buinhi

Post on 22-May-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Integrated Data Management System

for Critical Zone Observatories CZOData II

Mark Williams, UC-Boulder.

Anthony K. Aufdenkampe, SWRC.

Kerstin Lehnert, IEDA/Columbia.

Ilya Zaslavsky, SDSC.

David Tarboton, USU

Jeff Horsburgh, USU.

Emilio Mayorga, UW-APL

Goals for CZOData II

• extensive and iterative interaction and feedback from the

community of CZO PIs, scientists and data managers

• uniform web portal appearance for the CZO sites and the

national CZO program

• development of a consistent metadata strategy for CZO data,

supported by a respective collection of data submission forms

and tools

• enhancing publication and data discovery workflows for

geochemical, hydrologic,spatial and other data

• creating a uniform data discovery portal

• ensuring that the data descriptions follow consistent semantics

• integrating with the EarthChem system

• developing a consistent online data visualization interface for

CZO time series data

CZOData II Architecture

LocalCZOs

LocalCZOWebSite

CZODisplayFiles

Standards-based

WebServiceClients

EarthChem

CZOMainWebPortal

CZOMainWebSite

Open-Topography(LiDAR)

CUAHSIHIS

CZOCentralDataManagementSystem

CZOCentralHarvester

LocalCZOs CZOCentralCoordina onFunc ons

CZOCentralData

Repositories

Non-CZOIntegratedData&DiscoverySites

Clients

CZO-ISGNRegistra on

System

SharedVocabularySystem

CZchemDBSystem

(w/EarthChemService

Interface)

CZOCentralHydroSystem(w/WFS&CUAHSIHISService

Interface)

DataONE

DataONEInterface

CZOMetadataCatalog

(w/CSWServiceInterface)

SESAR

DataManagement

Tools

CZODataDiscoveryPortal

TimeSeriesDataDisplay&AccessTool

CZchemDBDataAccess

Tool

Community Involvement

• Instigate and Support an Information Management

Committee (IMC)

• 1-2 investigators/CZO + site data managers

• Monthly telecon & annual face to face meeting w/ CZOData

developers

• Feedback to CZOData team

• Data use scenarios

• Meta-data requirements, shared vocabulary, etc.

• Web-based information events and workshops (>2/year);

mailing lists

• Subdiscipline workshops (three workshops)

• Hydrology (all sensor-based data)

• Geochemistry (all sample-based data)

• Geospatial data

• Synthesis working group (two workshops)

Get Started Now

• Form IMC and set first workshop date

• Content for new website

• LIDAR to OpenTopgraphy

• Start registering samples with SESAR

• Start registering datasets with IEDA

Challenges to CZO Data Management

Atmosphere

Biosphere

Hydrosphere

Lithosphere

Many Object & Data Types!

• Diverse media

• Sensor-based

• Stationary

• Mobile

• Spectra/photos

• Sample-based

• Sub-samples

• Preparations/Fractions

• Numeric & Categorical

Hillslope Catchment Watershed

Minutes

Decades

Millenia

Eons

Examples from Different Disciplines

• Climate & Hydrology

• Point observation (sensor) time series

• Raster observation (remote sensing) time series

• Vector networks for water routing

• Geochemistry

• Sample-based lab analyses

• Geophysics

• Seismic and other subsurface profiles

• Biology

• Phylogenic trees

Sensor- vs. Sample-Centric Data

• Sensor-centric Data Models (i.e. ODM)

• Site DataValue

• Sample-centric Data Models (i.e. EarthChem)

• Site Sample/Subsample Prep/Batch DataValue

ODM - Data Structure

ODM - Data Structure

GeoChemical Data Model

observed value

publication data

source

method/DQ

sample feature of interest

collection,

geospatial

analysis

material

preparation,

obs. point

CZO Chemistry Database Schema

08_Precision

(PK) precisionID (FK) MediumID (FK) variableCode (FK) methodID detectLimit stDeviation (FK) unitID precNote

31_SampleMedium

(PK) meduimID mediumName mediumNote

CZO_CHEM_DB_SCHEMA V4

PK – Primary Key FK – Foreign Key

Lookup tables

Main data

Meta data

1 : 1

1 : n

LEGEND:

Note: All contactIDs , authorID, and scientistID are linked to the personID in the table “Person”

91_ReferenceGroup

(FK) sourceID (PK) refGroupID (FK) projectID (FK) referenceID (FK) contactID refGroupNote

93_Project

(PK) projectID projTitleAbbrv projTitleFull citationFull projSponsorID projStartYr projEndYr (FK) contactID projNote

93_ProjScientist

(FK) projectID (FK) scientistID scientistRole

09_Source

(PK) sourceID (FK) contrabutorID sourceNote

92_Reference

(PK) referenceID (FK) corAuthorID yearPub articleTitle journalName bookTitle bookeditor bookPublisher jourVolume jourIssue jourPages citationFull refWebURL refNote

PSU-EESI

Feb. 16, 2010

71_MethodType

(PK) methdTypeID mthdTypeName mthdTypeNote

11_State

(FK) countryCode (PK) stateCode stateAlphaCode stateNumericCode stateName stateCategory

11_Country

(PK) countryCode countryName countryNumericCode countryAlpha2 countryNameFull

02_Site

(FK) locationID (PK) siteID siteName longitudeDeg latitudeDeg elevation_m slopeDeg aspect landscapePosition landUse vegSpecies parentLithology exposureAge erosionRate depthToRock_m soilTaxonomy (FK) SSURGO_ID siteNote

01_Location

(FK) stateCode (PK) locationID locNameFull locNameAbbrv annlPrecip_mm anlMeanTemp_oC (FK) contactID locNote

04_Preparation

(FK) subSampleID (PK) prepID (FK) methodID (FK) contactID prepNote

03_Sample

(FK) siteID (PK) sampleID (FK) smplMediumID depthTop_cm depthBot_cm waterTemp_oC samplingDate smplLocalTime smplUTCTime (FK) methodID (FK) contactID sampleNote

05_Analysis

(FK) prepID (PK) analysisID labName

analysisDate

(FK) sourceID (FK) methodID (FK) contacted analyNote

03_SubSample

(FK) sampleID (PK) subSampleID splitNumber (FK) methodID (FK) contactID subsmplNote

06_DataValue

(PK) dataID (FK) analysisID (FK) variableCode dataValue (FK) unitID dataNote

61_VariableLookup

(PK) variableCode variableName (FK) varTypeID varNote

62_VariableType

(PK) varTypeID varTypeName varTypeNote

64_Units

(PK) unitID unitCode unitName unitNote

72_Standard

(PK) methdStdID (FK)methodID mthdStdNote ???

10_Person

(PK) personID lastName firstName (FK) instituteID departmentName eMail phoneNumber faxNumber persnAddress persnTitle persnNote

10_Institute

(PK) instituteID instName instNameAbbrv (FK) countryCode (FK) stateCode instCity instZipCode instAddress instPhone instWebURL (FK) contactID instNote

07_Method

(PK) methodID methodName mthdNameAbbrv mthdDescription equipmentName (FK) mthdTypeID (FK) contactID mthdNote

Sample Fractions for Soil Geochemistry

EA-IRMS

FTIR

SA

EA-IRMS

FTIR

EA-IRMS

FTIR

Ziplock (~500g)

Bulk soil

horizon or

depth increment

Al Can (~70 g)

For Gamma

Counting 137Cs

DRY SIEVE

2 mm

glass vial:

<2mm fines

dry sieved

(1) Pick out plant

roots & detritus,

rinse with DI

water, oven dry,

mill (SPEX?)

>2mm:

glass vial:

plant detritus

milled

(2) Remaining

pebbles & rocks,

hard grind

glass vial:

pebbles

hard ground

<2mm

ICP-MS after

Li-borate fusion

XRD?

WET SIEVE, or DENSITY, or

SETTLING

(with or without sonication)

glass vial:

sand +

small detritus

glass vial:

silt + clay

The choice here is

important. Do we want

aggregates or not?

EA-IRMS

FTIR

ICP-MS after

Li-borate fusion

XRD

CEC

SPEX mill

EA-IRMS

FTIR

ICP-MS after

Li-borate fusion

SPEX mill

SA

XRD

CEC

SA

Extractions

Dithionite-Citrate extraction

Na pyrophosphate extraction

Ammonium oxalate extraction

Geoinformatics for Geochemistry

Core

Core

Section 1

Core

Section 3

Core

Section 2

Sample 1

Sample 2

Sample 1

Sample 2

Sample 3

Sample 1

Sample 2

Sample 3

Rock powder

Mineral conc.

Leachate

Fossil separate

Microprobe mount

Parent Parent Child

Child Child Parent

IGSN:XXX000120

IGSN:XXX0065B3

IGSN:XXX9K23G6

IGSN:XXX07ST4K

IGSN:XYZ0G693M

IGSN:ABC0L98SW

IGSN:ABC0L53NW

IGSN:ABC0L653X

IGSN:ABC078HGB

Needed Capabilities for ODM

Sample table

• Optional direct link between Sample & Site

• Need to assign SampleID before data values exist

• Natural one-to-many hierarchy

• 1 site many samples, 1 sample many values

• Recursive parent-child relationships

• Sample metadata

• Medium, fraction, preservation, container, dilution, etc.

ODM - Data Structure

ODM v1.1 Suggested

Sample

SampleID (PK)

SampleType

LabSampleCode

LabMethodID (FK)

Sample

SampleID (PK)

SampleCode

SampleNote

IGSN

FieldSampleFlag

LocalDateTime

UTCOffset

MediumTypeID (FK)

FractionTypeID (FK)

MethodID (FK)

SourceID (FK)

Table Notes

alpha-numeric, ~ 20 char

~200 char

Intl. Geo-Sample Number

Y/N

Creation, when container filled

i.e. surface water, soil gas, soil solid

i.e. whole sample, >63 um, acid extract

Types: collection, fractionation, prep.

Who performed method above

ParentSample

ParentSampleID (FK)

ChildSampleID (FK)

FK to SampleID

FK to SampleID Notes:

• Method Type should not include analysis method, b/c its in the values table.

• LIMS info is recorded in Values table (i.e. sample amount, budget #, dilution ratio, sample location, container type)

• Analysis “Batch” or “Run” is treated as a sample group

• ParentSample table allows for composite samples

SiteSample

SiteID (FK)

SampleID (FK)

ODM v1.1 Suggested

GroupDescriptions

GroupID (PK)

GroupDescription

Groups

GroupID (PK)

GroupCode

GroupNote

GroupTypeID (FK)

Table Notes

alpha-numeric, ~ 20 char

~200 char

Types: Value, Sample, Site, Person?, etc.

SiteGroups

SiteID (FK)

GroupID (FK)

Notes:

• Sample Groups: Analysis Batch, Profile, etc.

• Site Groups: Transect, station, observatory, etc.

• Value Groups: ???

• Person Groups: ?Research Teams, etc.?

SampleGroups

SampleID (FK)

GroupID (FK)

ValueGroups

ValueID (FK)

GroupID (FK)

Groups

GroupID (FK)

ValueID (FK)

ODM v1.1 Suggested

Sources/Institution PersonInstitution

Soil/Sed intervals OffsetValueMin & OffsetValueMax

Only one offset value add horizontal offsets?

Horizon Descriptions? Add DataValueNote field to Data Values

Methods table insufficient add MethodType, PersonID, etc.

CensorCode insufficient Need value field (i.e. Method Detection Limit)

Other outstanding issues:

• Do spatial offsets also belong in samples table? [yes]

• Spectral data, photos?

• Dataset versioning

Importance of Sample/Site Tracking

• CZO scientists share samples!

• Data often needs to be merged at level of

subsamples

• SWRC’s biggest data management

headaches always come from merging data

from different instruments/labs by sample and

by site.

• International Geo-Sample Number (IGSN) is

the answer!

Object Types in IGSN/SESAR Existing • Core • Core half round • Core quarter round • Core piece • Core section • Core section half • Core sub-piece • Core whole round • Cuttings • Dredge • Grab • Hole • Individual sample • Oriented core • Other • Rock powder • Terrestrial sample • Trawl

Considering? • Sampling events:

• holes, cores, dredges, stratigraphic sections

• Individual samples: • Specimens, rocks, minerals, fossils,

precipitates, synthetic material, etc. • Fluid samples: seawater,

hydrothermal fluids, groundwater, etc. (to be completed)

• Particulates: aerosols, suspended matter

• Soil pedons and samples thereof

• Sub-samples of any of above: • processed samples such as mineral

or fossil separates, leachates, thin sections, etc.

http://www.geosamples.org/sampletypes

CZO Geo-Object Types • Site/Location (x,y. z treated via vertical offset)

• surface water station, well, lysimeter, piezometer, soil pit,

borehole, monument, meteorological station/tower, tree?

• Fluid Sample (Water Sample?)

• stream/river water, pond/lake water, wetland surface water,

groundwater, soil water (unsaturated), sediment porewater, sap?

• Gas Sample (also fluid?)

• atmospheric gas, dissolved gas, soil gas

• Soil/Lithology/Sediment Sample (need help with names)

• Surface grab, core, auger interval, pit interval, rock, saprolite?,

bedrock?, cuttings?

• Plant Sample

• Whole plant, tissue, ???

CZO Sample Fraction Types • Subsample

• Duplicate or split that does not fractionate whole sample

• Size Fraction

• i.e. > 2 mm, 63-2000 um, <63 um

• Extracted Fraction

• Acid soluble, total lipid extract, dithionate-citrate-bicarbonate

extract

• Extraction residue

• etc.

Generalized, Extensible ODM

Suggested v2

• After discussions and on plane ride home

ODM v1.1 Suggested v2

DataValues

ValueID (PK)

DataValue

ValueAccuracy

LocalDateTime

UTCOffset

DateTimeUTC

SiteID (FK)

VariableID (FK)

OffsetValue

OffsetTypeID

CensorCode

QualifierID

MethodID (FK)

SourceID (FK)

SampleID (FK)

DerivedFromID (FK)

QualityControlLevelD

DataValues

ValueID (PK)

DataValue

LocalDateTime

UTCOffset

DateTimeUTC

SiteID (FK)

VariableID (FK)

CensorCode

MethodID (FK)

SourceID (FK)

QualityControlLevelD

SampleID (FK)

DataValuesExtension

DataValueExtensionID

DataValueID (FK)

AttributeID (FK)

DataValueAttributeValue

Example Attributes:

Offset, OffsetMin, OffsetMax, QualifierID, DerivedFromID, ValueAccuracy, InstrumentType, InstrumentID (FK), SensorID (FK) AnalysisNote, DataValueNote, ProjectName, CensorType, CensorLimitValue

Attributes

AttributeID (PK)

AttributeType (CV)

AttributeDescription

Units (FK)

AttributeTypes:

Correspond directly to table that is being extended.

i.e. Site, Sample, Value

Bold fields in tables are required

Non-bold fields are optional

ODM v1.1 Suggested v2

Samples

SampleID (PK)

SampleType

LabSampleCode

LabMethodID (FK)

Samples

SampleID (PK)

SampleCode

SampleNote

IGSN

IsFieldSample

LocalDateTime

UTCOffset

ObjectTypeID (FK)

FractionTypeID (FK)

MethodID (FK)

SourceID (FK)

Table Notes

alpha-numeric, ~ 20 char

~200 char

Intl. Geo-Sample Number

Y/N, to distinguish ultimate parent

Creation, when container filled

Corresponding to IGSN Object Types

i.e. whole sample, >63 um, acid extract

Types: collection or prep., not analysis

Who performed method above

ParentSampleXRef

ParentSampleID (FK)

ChildSampleID (FK)

Notes:

• Method Type should not include analysis method, b/c its in the values table.

• Analysis “Batch” or “Run” is treated as a sample group

• ParentSample table allows for composite samples

SiteSampleXRef

SiteID (FK)

SampleID (FK)

SamplesExtension

SampleExtensionID (PK)

SampleID (FK)

AttributeID (FK)

SampleAttributeValue

Example Attributes:

VerticalOffset, VerticalOffsetMin, VerticalOffsetMax, HorizontalOffset, HorizonalOffsetDirection (deg.), Medium, AlternateSampleCode, FieldCampagneName, Amount, StorageLocation, ContainterType, DilutionRatio, CollectionNote, PreparationNote, FractionNote, IsExperimentalSample, ExperimentID

ODM v1.1 Suggested v2

Sites

SiteID (PK)

SiteCode

SiteName

Latitude

Longitute

LatLongDatumID (FK)

Elevation_m

VerticalDatum

LocalX

LocalY

LocalProjectionID (FK)

PosAccuracy_m

State

Country

Comments

Notes:

• Method Type should not include analysis method, b/c its in the values table.

• LIMS info is recorded in Values table (i.e. sample amount, budget #, dilution ratio, sample location, container type)

• Analysis “Batch” or “Run” is treated as a sample group

• ParentSample table allows for composite samples

SitesExtension

SiteExtensionID

SiteID (FK)

AttributeID (FK)

SiteAttributeValue

Example Attributes:

From ODM 1.1: SiteDescription, LocalX, LocalY, LocalProjectionID (FK), PosAccuracy_m (LatLongAccuracy_m?), City/Township, State/Province, Country, Comments

From IGSN/SESAR: Physiographic feature, Name of physiographic feature, Location description, Locality, Locality description, Field Program/Cruise, Platform type, Platform name

From Sue Brantley: annlPrecip_mm, anlMeanTemp_oC, slopeDeg, aspect, landscapePosition, landUse, vegSpecies, parentLithology, exposureAge, erosionRate, depthToRock_m, soilTaxonomy, SSURGO_ID (FK), siteNote, ContactName

From SWRC: AlternateSiteCode, WatershedName, HUC, ElevationAccuracy,

Sites

SiteID (PK)

SiteCode

SiteName

Latitude

Longitute

LatLongDatumID (FK)

Elevation_m

VerticalDatum

ODM v1.1 Suggested v2

GroupDescriptions

GroupID (PK)

GroupDescription

Groups

GroupID (PK)

GroupCode

GroupDescription

GroupTypeID (FK)

Table Notes

alpha-numeric, ~ 20 char

~200 char

Types: Value, Sample, Site, Person?, etc.

SiteGroupsXRef

SiteID (FK)

GroupID (FK)

Notes:

• Sample Groups: Analysis Batch, Profile, Experiment, etc.

• Site Groups: Transect, station, observatory, etc.

• Value Groups: Profile, Analysis, Spectra

• Person Groups: ?Research Teams, etc.?

SampleGroupsXRef

SampleID (FK)

GroupID (FK)

ValueGroupsXRef

ValueID (FK)

GroupID (FK)

Groups

GroupID (FK)

ValueID (FK)

ODM v1.1 Suggested v2

Methods

MethodID (PK)

MethodDescription

Method

MethodID (PK)

MethodCode

MethodDescription

MethodTypeID (CV)

MethodLink

SourceID

Table Notes

alpha-numeric, ~ 20 char

~200 char

Types: Collection, Preparation, Analysis

URL or DOI

Could be paper, report, person/lab

Sources

SourceID (PK)

Organization

SourceDescription

SourceLink

ContactName

Phone

Email

Address

City

State

ZipCode

Citation

MetadataID (FK)

Persons

PersonID (PK)

LastName

FirstName

Phone

Email

InstitutionID (FK)

PersonLink

Institutions

InstitutionID (PK)

InstitutionName

Department

Address

City

State

ZipCode

InstitutionLink

Sources

SourceID (PK)

SourceDescription

SourceLink

Corresponding PersonID (FK)

DataSeries: better sample integration?

• Uses DataSeries table modified from HydroDesktop (next page)

• DataSeries Table can act as a XRef Table, but requires creation

of a data series upon registration of the FieldSample

• A DataSeries from/for a single sample can be viewed as

equivalent to EarthChem’s Analysis table

• Joins do not require passing through huge DataValues table

Sites Samples

DataSeries

DataValues

HydroDesktop data schema

HydroDesktop Suggested ODM v2

DataSeries

SeriesID (PK)

SiteID (FK)

VariableID (FK)

IsCategorical

MethodID (FK)

SourceID (FK)

QualityControlLevelD

BeginDateTime

EndDateTime

BeginDateTimeUTC

EndDateTimeUTC

ValueCount

CreationDateTime

Subscribed

UpdateDateTime

LastCheckedDateTime Example DataSeriesAttributes:

BeginDateTime, EndDateTime,BeginDateTimeUTC EndDateTimeUTC, ValueCount, CreationDateTime, Subscribed, UpdateDateTime, LastCheckedDateTime

InstrumentType, InstrumentID (FK), SensorID (FK), PlatformID (FK), AnalysisNote, UTCOffset,

DataSeries

SeriesID (PK)

SiteID (FK)

SampleID (FK)

IsCategorical

MethodID (FK)

SourceID (FK)

QualityControlLevellD

CreationDateTime

DataValues

ValueID (PK)

SeriesID (FK)

DataValue

VariableID (FK)

CensorCode

LocalDateTime

DateTimeUTC

DataValues

ValueID (PK)

SeriesID

DataValue

ValueAccuracy

LocalDateTime

UTCOffset

DateTimeUTC

OffsetValue

OffsetTypeID (FK)

CensorCode

QualifierlD

SampleID (FK)

FileID (FK)

DataSeriesExtension

DataSeriesExtensionID

DataSeriesID (FK)

AttributeID (FK)

DataSeriesAttributeValue

Thank You

Funded by: US National Science Foundation

ODM - Data Structure

ODM – Linked Table Example