application of international geosample number (igsn) to sample collections

37
Application of International GeoSample Number (IGSN) to Sample Collections Sri Vinay Geoinformatics for Geochemistry (GfG) Program Lamont Campus of Columbia University 2007 September 25

Upload: jesse

Post on 17-Jan-2016

23 views

Category:

Documents


1 download

DESCRIPTION

Application of International GeoSample Number (IGSN) to Sample Collections. Sri Vinay Geoinformatics for Geochemistry (GfG) Program Lamont Campus of Columbia University 2007 September 25. Presentation Outline. Unique identifiers and their application to sample and data management - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Application of International GeoSample  Number (IGSN)  to Sample Collections

Application ofInternational GeoSample Number

(IGSN) to Sample Collections

Sri VinayGeoinformatics for Geochemistry (GfG)

ProgramLamont Campus of Columbia University

2007 September 25

Page 2: Application of International GeoSample  Number (IGSN)  to Sample Collections

Presentation Outline

Unique identifiers and their application to sample and data management

System for Earth SAmple Registration (SESAR) and International GeoSample Number (IGSN)

Current Status and Activities of SESAR IGSN Implementation Strategies Discussion

Page 3: Application of International GeoSample  Number (IGSN)  to Sample Collections

Unique IdentifiersAn identifier is an unambiguous label which specifies an entity.

Unique identifiers are widely used to designate physical objects, assisting in trading (e.g., the Universal Product Code bar code system), and the extension of similar principles to digital and abstract entities is a prerequisite for digital commerce of rights and intellectual content.

Although the design of unique identification schemes is a technical problem, it is also a business issue with implications for what is identified and how identified items are made available.

Page 4: Application of International GeoSample  Number (IGSN)  to Sample Collections

tel:+1-816-555-1212URN:NBN:fi-fe976238

DOI:10.1000/ISSN1047-935X

“In a dynamic and distributed information environment, the effective management of both metadata records and the resources they describe requires a systematic way of generating and assigning unique identifiers.”(N. Friesen 2002: Recommendations for Globally Unique, Location-Independent, Persistent Identifiers)

Page 5: Application of International GeoSample  Number (IGSN)  to Sample Collections

Life Sciences - Bioinformatics

LSID = Life Science Identifier LSID = Life Science Identifier

“The World-Wide Web provides a globally distributed communication framework that is essential for almost all scientific collaboration, including bioinformatics. However, several limits and inadequacies have become apparent, one of which is the inability to programmatically identify locally named objects that may be widely distributed over the network. This shortcoming limits our ability to integrate multiple knowledgebases, each of which gives partial information of a shared domain, as is commonly seen in Bioinformatics”(Clark, T., Martin S., Liefeld T., 2004: Globally distributed object identification for biological knowledgebases. Briefings in Bioinformatics. Vol.5 (1), 59-70.)

URN:LSID:ncbi.nlm.nih.gov:GenBank.accession:NT_001063:2

Page 6: Application of International GeoSample  Number (IGSN)  to Sample Collections

Geosciences - Geoinformatics

Kai Lin (SDSC): “ “Ontology Based Resource Registration and Integration in GEON”, Lecture July 2005

Page 7: Application of International GeoSample  Number (IGSN)  to Sample Collections

Examples from thePetDB Database

Name Location Publication CruiseD3-1 SEIR ANDERSON, 1980 VM3301 (Vema)D3-1 North Fiji Basin EISSEN 1994 Starmer 1 (Nadir)D3-1 Shimada Smt GRAHAM 1988 S1-79 (Sea Sounder)D3-1 Gorda Ridge CLAGUE 1984 KK2-83NP (Kana Keoki)3-1 Lamont Smts BATIZA 1982 RISE III (New Horizon)

Name Location Publication CruiseD3-1 SEIR ANDERSON, 1980 VM3301 (Vema)D3-1 North Fiji Basin EISSEN 1994 Starmer 1 (Nadir)D3-1 Shimada Smt GRAHAM 1988 S1-79 (Sea Sounder)D3-1 Gorda Ridge CLAGUE 1984 KK2-83NP (Kana Keoki)3-1 Lamont Smts BATIZA 1982 RISE III (New Horizon)

Sample names are duplicated.

Sample names are modified or changed.

D3 Engel 1964D-3 Scheidegger 1981, Schilling 1971PD3 Tatsumoto 1965, 1966PD-3 Hedge 1970, Muehlenbach 1972PV D-3 Engel 1965AMPH3D Pineau 1976AMPH-D3 MacDougall 1986AMPH D-3 Sun 1980, Schilling 1975AMPH 3-PD-3 Hart 1971S-10 Subbarao 1972

Dredge sample 3, Amphitrite Cruise 1963/4D3 Engel 1964D-3 Scheidegger 1981, Schilling 1971PD3 Tatsumoto 1965, 1966PD-3 Hedge 1970, Muehlenbach 1972PV D-3 Engel 1965AMPH3D Pineau 1976AMPH-D3 MacDougall 1986AMPH D-3 Sun 1980, Schilling 1975AMPH 3-PD-3 Hart 1971S-10 Subbarao 1972

Dredge sample 3, Amphitrite Cruise 1963/4

46396B 22 3,28-38 Dungan 1978396B 22 3,28-38 Muehlenbach 1979249 Dungan 1978DSDP046-0396B-022-003/28-38 PetDB

DSDP Leg 46, Hole 396B, Section 22, Sample 3, 28-33cm

Sample Naming in the Geosciences

Page 8: Application of International GeoSample  Number (IGSN)  to Sample Collections

Geosciences - Geoinformatics Integration of data in a distributed

system requires unique identification of samples.

Currently, naming of samples is ambiguous. Different samples have identical names. Samples are renamed. Metadata that allow unique identification are often missing

for terrestrial samples. Institutions have their own naming protocols, no

assurance that names are unique on a global scale. Access to information about the samples

Need to ensure proper evaluation and facilitate interpretation of sample-based data.

Links to physical specimens to make observations & measurements and the science

derived from them reproducible. to allow discovery & re-use of samples for improved use of

existing collections.

Page 9: Application of International GeoSample  Number (IGSN)  to Sample Collections

Urgency to Act

Growing number of data systems with sample-based data

Growing demand for ‘fine-grained’ access to data at the level of individual samples

New technologies for linking and integrating data (interoperability)

Increasing need to share samples

Page 10: Application of International GeoSample  Number (IGSN)  to Sample Collections

Generating Unique IDs: Options “Registration-based schemes”

Require a central clearinghouse Register personal or institutional names Register prefix or namespace (e.g. URN) Register metadata that allow the central

clearinghouse to generate identifiers

Schemes without registration use a computational process (naming protocol) to produce

an ID based on metadata No central authority

Page 11: Application of International GeoSample  Number (IGSN)  to Sample Collections

No-Registration Scheme

Risk of incorrect application of naming protocol

Risk of name duplication Identifier might grow to impracticable

length to insure uniqueness Metadata missing for legacy samples Easy implementation

Page 12: Application of International GeoSample  Number (IGSN)  to Sample Collections

SESAR - A Centralized Approach Response to urgent need for unique ID Easier to prevent duplicate registrations Easier to ensure links between parent

and child samples Provide a central access point for

Peer2Peer registration Facilitate international collaboration Build a Global Sample Catalog

Page 13: Application of International GeoSample  Number (IGSN)  to Sample Collections

SESAR – A Centralized Approach Proposed to NSF in July 2004 SGER (EAR) award received for September

2004 - August 2005 First presented to community at Marine

Curators’ Meeting at LDEO, September 2004 Supplement received in Sept 2005 until May

2006 Workshop at SDSC January 2005 Proposal to NSF August 2005 Three year grant awarded in April 2006 (NSF-

OCE).

Page 14: Application of International GeoSample  Number (IGSN)  to Sample Collections

Unique user code String of random characters

IGSN:SIO001324

International GeoSample Number:

A Global Unique Identifier for Earth Samples

Managed at central clearinghouse (SESAR) Strict Syntax (9 characters: letters [A-Z] &

numbers [0-9]) Fits sample labels Fits data tables in publications Allows 2,176,782,336 sample identifiers per registrant

Generated by SESAR or by users Does not replace personal or institutional

names

Page 15: Application of International GeoSample  Number (IGSN)  to Sample Collections

Benefits of the IGSN & SESAR

Ability to unambiguously identify samples allows to link & integrate data for a single sample

advances interoperability among digital data management systems & the development of Geoinformatics.

helps build more comprehensive data sets for samples. fosters new cross-disciplinary approaches in science.

aids preservation and curation, orphaned samples can be identified.

ensures proper linking of data from samples and sub-samples. facilitates sharing of samples.

Page 16: Application of International GeoSample  Number (IGSN)  to Sample Collections

SESAR: Status Basic version of system functional since

Fall 2004 Nearly 3.6 Million GeoObjects registered

All DSDP/ODP GeoObjects (holes, cores, core sections, core samples)

Dredge and core collections from Scripps, WHOI, Lamont, Antarctic Research Facility (ARF)

>40,000 mineral specimens from Harvard Museum Rocks & minerals from the US Polar Rock Repository

IGSN implemented in Geoscience data systems (e.g. EarthChem, MetPetDB, PaleoStrat, CoreWall)

Revised & extended version to be released in phases by end of 2007

Page 17: Application of International GeoSample  Number (IGSN)  to Sample Collections

SESAR: Sample Registration

Obtain account via website Set up login/password Get a unique user code

Submit sample information Via Batch Registration Forms (.xls workbooks) Via web site (currently off-line for upgrade) Via web services (under development)

Page 18: Application of International GeoSample  Number (IGSN)  to Sample Collections

Registration via Spreadsheet FormsAvailable Batch RegistrationForms1. Coring GeoObjects2. Dredges/trawl/grabs3. Individual samples4. Sections, Suites, & Sequences

Available Batch RegistrationForms1. Coring GeoObjects2. Dredges/trawl/grabs3. Individual samples4. Sections, Suites, & Sequences

Page 19: Application of International GeoSample  Number (IGSN)  to Sample Collections

Registration via Web Site:Currently off-line for upgrades

Page 20: Application of International GeoSample  Number (IGSN)  to Sample Collections

Registration via Web Services:

Under Development Registration of objects via collaborating

data systems Automatically register samples when sample metadata are

entered into collaborating data systems (e.g. IODP, MGDS) Eliminates redundant metadata submission

Systems communicate via web services Starting with REST based services. Could support SOAP in

future. Authentication

Investigating different technologies including GEON/GAMA Metadata exchange and validation

XML schema

Page 21: Application of International GeoSample  Number (IGSN)  to Sample Collections

SESAR Service “MyGeoSamples”

Current Services: Long-term preservation of

information about samples Lists of personal sample collections Store images, field notes, etc.

Current Services: Long-term preservation of

information about samples Lists of personal sample collections Store images, field notes, etc.

Assist investigators to manage their samples.Assist investigators to manage their samples.

Page 22: Application of International GeoSample  Number (IGSN)  to Sample Collections

SESAR Service “MyGeoSamples”

Services “Under Construction” Search & sort personal sample collections Create maps of sample locations Establish links to data (publications, data systems) Download tabular sample information to

spreadsheets

Antarctic Research Facility, FSUAntarctic Research Facility, FSUCa. 7,000 coresCa. 7,000 cores

Antarctic Research Facility, FSUAntarctic Research Facility, FSUCa. 7,000 coresCa. 7,000 cores

Page 23: Application of International GeoSample  Number (IGSN)  to Sample Collections

SESAR Service “MyGeoSamples”

Potential Services: Modules to manage administrative metadata

(customizable) Modules for creating & operating web interfaces to

collections

Advantages No IT infrastructure required (except a computer and an

internet connection) No maintenance and risk & contingency management Access from anywhere by authorized individuals. Platform independent

Extended Services for Sample Curation?

Extended Services for Sample Curation?

Page 24: Application of International GeoSample  Number (IGSN)  to Sample Collections

The SESAR Global Sample Catalog

SESAR integrates the World’s sample collections

Allows users to find/discover existing samples

Provides access to “sample profiles” View sample information in SESAR as provided Link to the specimen’s ‘home’ (archive) Link to data (publications, databases)

Page 25: Application of International GeoSample  Number (IGSN)  to Sample Collections

The Challenges Diversity of collections

Repositories Museums Individual Investigators Structured science & field

programs Metadata requirements Sample types & relations Vocabularies

Global Scope Data Generated by

International Collaborations

IODP ICDP InterMARGINS, InterRidge

Data are shared globally Scientific literature Web bases repositories

Samples are shared globally

Multiple systems and catalogs Data Management Systems

for Science Programs Ridge2000 - MGDS MARGINS - MGDS IODP

Domain Specific Catalogs NGDC – IMLGS

National Catalogs Canadian National Sample

Management System SESAR Issues

Redundancies Unacceptable demands on

investigators Inconsistencies Fragmentation Competition rather than

collaboration Adoption

Sample curation Data publication

Page 26: Application of International GeoSample  Number (IGSN)  to Sample Collections

IGSN Implementation Strategies Work with investigators, curators and

repositories to define & integrate registration process and IGSN into existing sample and data management workflows

Joint Workshop of SESAR & NGDC, February 26 & 27, 2007, Boulder, CO

Registration of repository and museum collections ongoing

Advance adoption of IGSN Work with editors to make IGSN a requirement for data

publication (e.g. Editors’ Round Table, Societies) Work with funding agencies, large science programs (e.g.

IODP, MARGINS, ANDRILL), CI projects (e.g. GEON, CHRONOS), and repositories on sample and data archiving policies

Work with CI Partners on system design & interoperability

Interoperability Workshop, January 2005 at SDSC Working with GEON on authentication scheme Working with IODP and KU/EarthChem on web services

Page 27: Application of International GeoSample  Number (IGSN)  to Sample Collections

Editor’s Breakout*Editor’s Breakout*- Reporting Data:

- Published paper is point of record. All data should be reported. No “representative data”, no “data can be obtained from author”, no data available at personal websites

- Submission to databases should be strongly encourage

- Unique sample identifier (IGSN)- This may solve the problem of poor sample metadata- This system is being implemented.- Essential component of successful database -

contains sample metadata, allows samples to be followed through its analytical history.

- Tracks samples and subsamples.

- We should start using it now.

- Reporting Data:- Published paper is point of record. All data should be

reported. No “representative data”, no “data can be obtained from author”, no data available at personal websites

- Submission to databases should be strongly encourage

- Unique sample identifier (IGSN)- This may solve the problem of poor sample metadata- This system is being implemented.- Essential component of successful database -

contains sample metadata, allows samples to be followed through its analytical history.

- Tracks samples and subsamples.

- We should start using it now.

*at the GERM Meeting, May 2006, recommendations of Editors’ Breakout presented by Steve Goldstein

Page 28: Application of International GeoSample  Number (IGSN)  to Sample Collections

Support by Funding Agencies

“We have also funded an effort (SESAR) to uniquely identify all samples so that various analyses on the same samples can be cross referenced and listed. I would also like you to indicate in your dissemination plan that your suite of samples will be registered with SESAR.”

Letter of NSF Program Manager (OCE/MG&G) to a PI, processing paperwork for a grant (January 2007)

Page 29: Application of International GeoSample  Number (IGSN)  to Sample Collections

Kerstin Lehnert: The Digital Specimen

identifying, organizing, documenting, and cataloging existing data collections, preferably in a digital format;

constructing logical linkages and search engines that facilitate access to organizations and their geoscience sample and data collections;

dedicating adequate space — physical and digital — for storing and efficient accessing of existing and future samples and data sets;”

“Government, educational, and private sector organizations, individually as well as collectively, are

encouraged to aggressively address the following Geoscience data-preservation challenges”

Page 30: Application of International GeoSample  Number (IGSN)  to Sample Collections

Joint Workshop of SESAR & NGDC IMLGS Boulder, CO, February 26 & 27, 2007

Define procedures & best-practices for Creating & assigning IGSNs Submitting metadata for GeoObjects to SESAR

Work towards an integrated system of sample catalogs Recommend ways to define & implement standards for metadata

and vocabularies Identify possibilities for streamlining procedures for submission of

sample metadata to catalogs

Page 31: Application of International GeoSample  Number (IGSN)  to Sample Collections

Workshop Recommendations

Streamlined Registration Process Registration process should be simple Options to integrate easily into existing sample and data

management workflows Ability to adopt required metadata from existing forms in use to

avoid redundant metadata submission to multiple systems Support automated registration from other systems via web

services to avoid manual/redundant metadata submission

Best Practices Objects should receive an IGSN at the time of labeling Objects should have an IGSN before being distributed among

multiple investigators and users Parent objects should be registered before child objects Metadata should include geospatial info (coordinates prefd.)

Page 32: Application of International GeoSample  Number (IGSN)  to Sample Collections

Workshop Recommendations

Batch Registration Forms It is preferred that forms for the MGDS, IMLGS, and SESAR have the

same column headers, which the metadata listed under this header clearly defined. The order of the headers can vary.

An XML schema for sample metadata should be developed to which the metadata in any spreadsheet can be exported.

SESAR Batch Registration Forms should be customizable, e.g. buttons beneath the header should allow to hide unnecessary columns. Columns for metadata that are identified as ‘recommended’ should always be visible.

SESAR should develop a manual for filling out the forms. The manual should include instructions regarding definition of parent – child relations. It needs to be decided if a site should get an IGSN. It is possible to link multiple stations taken at one site by including the site name as metadata.

Vocabularies and Classification Schemes Adopt from existing standards as much as possible and work with

repositories and other systems to use common schemes It is preferable for different systems (MGDS, IMLGS, SESAR) to allow

multiple vocabularies List allowed vocabularies on the Marine Metadata Initiative (MMI) web

site.

Page 33: Application of International GeoSample  Number (IGSN)  to Sample Collections

Registration Procedures to Support Integration with Existing Workflows:Under Implementation

Trusted Agents A registrant can apply to become a Trusted Agent. Trusted Agents are

authorized to generate unique IGSNs within their registered name space (user code). They can use tools, e.g. Excel, on the ship or in the field, to generate IGSNs within their given name space, have the samples labeled with IGSN, and submit the IGSN along with metadata via web services within a short time frame. Trusted Agents must sign a MOU outlining policy and procedures related to handling IGSN with trusted agents.

Example IODP: Name Space “DR0”, “DR1”,…

Data System

Ship/Field1. Generate Label with

IGSN

Trusted Agent Operation

2. Ingest IGSN & Metadata

SESAR

3. Submit Metadata & IGSN to SESAR (Web Services)

Page 34: Application of International GeoSample  Number (IGSN)  to Sample Collections

Registration Procedures to Support Integration with Existing Workflows:Under Implementation

Pre-Assigned IGSNs Upon request, SESAR provides forms (spreadsheets) with pre-assigned

IGSNs to chief scientists/investigators/repositories to take on ship/field. Forms filled with metadata should be submitted to SESAR post-collection. E.g.: SCRIPPS.

Other systems or repositories pre-populate their existing forms with IGSNs, obtained from SESAR, and provide to chief scientists. E.g.: MGDS provide forms with IGSNs to PIs in advance of R2K and MARGINS cruises. Post-cruise, MGDS will submit the sample metadata to SESAR.

Data SystemShip/Field

3. Enter metadata with IGSN

SESAR

5. Submit Metadata & IGSN to SESAR (Web Services)

Ship/Field2. Enter metadata with

IGSN

1. Get forms with IGSN

3. Submit forms with metadata and IGSN

1. Get IGSN2. Forms with

IGSN

4. Forms with metadata and IGSN

Page 35: Application of International GeoSample  Number (IGSN)  to Sample Collections

Collaboration with Repositories & Systems:Ongoing IODP

Registered DSDP/ODP holes, cores, core sections, core samples

“Trusted Agent” arrangement in progress MGDS

Registered existing dredges, cores, and core samples Incorporating IGSN into existing MGDS forms

LDEO (Lamont) Registered existing dredge and core collections

WHOI Registering existing dredge and core collections Future arrangements like “Trusted Agent” to be discussed

SIO (SCRIPPS) Used SESAR forms with pre-assigned IGSNs on cruise for

dredge collections Metadata need to be updated

Page 36: Application of International GeoSample  Number (IGSN)  to Sample Collections

Collaboration with Repositories & Systems:Ongoing Antarctic Research Facility (ARF)

Registering existing dredge and core collections US Polar Rock Repository

Registered existing rocks and minerals Need pre-assigned IGSNs and web service registration

Harvard Museum Registered existing mineral specimens Project for adding simple sample curation module in

progress OSU

Start with IGSN for historic samples Then become trusted agent and issue IGSNs to new samples

including those given to PIs

NGDC May register some orphaned historical samples Work with curators/repository and SESAR to streamline and

standardize metadata fields and entry forms

Page 37: Application of International GeoSample  Number (IGSN)  to Sample Collections

Collaboration with Repositories & Systems:Ongoing Canadian National Marine Geoscience Collections

Likely to register existing collections May become “Trusted Agent” in future

Limnological Research Center (LRC/LacCore) Likely to register via batch registration forms May use pre-assigned IGSNs or become “Trusted Agent” in

future

USGS Discussions are on-going with USGS to make them aware of

SESAR effort Plan to contact state geological surveys

Other Repositories Efforts are under way to reach out and propose suitable process OSU model may be most applicable (First register legacy

samples and then become trusted agent or use pre-assigned IGSNs)

Could offer sample curation module for small operations