moondb: restoration & synthesis of planetary geochemical data
TRANSCRIPT
1
www.iedadata.org
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
Agenda
• Introduction
• Overview of the MoonDB Project
• Overview of relevant EarthChem Systems & Services• Data Publication & the EarthChem Library
• Data Rescue
• Data Synthesis - PetDB
• Discussion, questions• What data do you have?
• What help do you need?
• How should we stay in touch?
2LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 3
MoonDB’s Goals
• advance preservation, access and utility of lunar sample data• help investigators ‘rescue’ (restore) and share
unpublished data
• compile data from the literature into a PetDB-type synthesis
• provide a platform for future data to be made openly accessible, while being seamlessly integrated with the historical data and equivalent data for terrestrial samples.
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 4
MoonDB’s Deliverables
• Development of MoonDB
• compile lunar sample data from the literature into an online searchable synthesis database
• Rescue of Lunar Sample Data at Risk
• rescue unpublished legacy data and metadata that are in danger of being lost to complement the published datasets
• MoonDB Reference Catalog
• consolidate the various reference databases
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 5
PetDB
IEDA Data Rescue
EarthChem Library
AACO Databases
Timeline
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 6
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
Reference Catalog prototype released
MoonDB User Interface Beta version
Ingest first submitted datasets
Release of MoonDB Reference Catalog
Release of MoonDB full version
10/1/2015 9/31/2017
Two Steps to Make Data Useful
7
1. Restore the data
2. Synthesize the data
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
Restoration of “Data at Risk”
• "Data at Risk” are scientific data that are • not in formats that permit full electronic access to the information they
contain.
• Data at Risk may be • non-digital (e.g., handwritten or photographic),
• on near-obsolete digital media (such as floppy disks),
• or insufficiently described (lacking metadata).
• Some born-digital data are considered "at risk" if they cannot be ingested into managed databases because they lack adequate formatting or metadata.
Definition from the ICSU CODATA Data at Risk Task Group (DARTG)
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 8
Data Rescue Initiatives
• ‘Data at Risk’ Task Group @ CODATA• Phase I: Build inventory of data that are at risk
• Phase II: Design missions to rescue that information
• ‘Heritage Data’ Interest Group @ Research Data Alliance
• International Data Rescue Award in the Geosciences• Joint initiative of IEDA (Integrated Earth Data Applications) and Elsevier
• First award in 2013
• 2015 award to be announced at EGU 2015
• IEDA Data Rescue Mini-Awards
9LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
IEDA Data Rescue
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 10
IEDA Data Rescue11LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
Lessons Learned
• Investigators Lessons• Take ownership of your own legacy
• Data curation by others may not be complete or correct
• Data rescue of an entire career does not need to be overwhelming • Start with small steps• Disciplinary repositories will help and guide you to what is needed
• Despite the time investment, data rescue is worth it• Others will now be able to re-use the data• Notes taken years ago actually explain anomalies
• Repository Lessons• For Long Tail Data, every project is different • A small incentive will motivate investigators• Data Rescue missions help the repository determine next steps for
development of tools and services
12LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
13LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
EarthChem Data Systems
14
MetadataData &
Metadata
Data Data Data Data Data
EarthChem Library
Data Data Data
Search Search
Data
Synthesis DB’s
Data
Search
EarthChem Portal
DB DB DB DB DB
Data & Metadata
[.xls]
[XML]
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
Data Restoration Data Synthesis
EarthChem: Making Data Useful for Science
15
• Map of basalt samples from mid-ocean ridges• Color scaled to the 87Sr/86Sr ratio measured on these samples• Data from >300 references compiled within 2 minutes
(Visualization with GeoMapApp: add another 2 minutes)
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
Synthesis Database: PetDB
16LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
as of 3/15/2015
PetDB: Impact in Science
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 17
Meyzen et al. (2007): „Isotopic portrayal of theEarth's upper mantle flow field.“
Gale, A; Dalton, C A; Langmuir, C H; Su, Y; Schilling, J-G (2013): “The mean composition of ocean ridge basalts”
As of 3/2015, PetDB has been cited in >550 published
articles.
Data Search: Geospatial
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 18
Data Search: Geo-Feature
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 19
Data Search: Lithology
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 20
Data Search: Expedition/cruise
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 21
Data Search: Data Availability
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 22
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 23
Filter by Data Quality
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 24
EarthChem Library
• Data repository for geochemical data and related data types
• Operated as part of IEDA (Interdisciplinary Earth Data Alliance)• Sustainable funding through a Cooperative Agreement with NSF
• Community governance & guidance
• Follows Leading Practices for data publication• Persistent identification of data & samples (DOI, IGSN)
• Agreements with data centers for long-term archiving
• Easy data submission
• Release dates set by contributors (up to 2 years moratorium)
• Links between different versions of a dataset
• Cross-referencing with publishers, data citation index, etc.
• Links to awards (compliance!)
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 25
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 26
Data Provenance & Quality
27LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
28LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
Data Summary of a Sample
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 29
more data
Getting the data for the synthesis
• Many challenges• Data are dispersed throughout the literature.
• Many data were never published.
• Data are not sufficiently documented (inconsistent or missing metadata).
• Solutions• Data rescue
• Data managers
• Students/interns
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 30
Data Restoration Needs
• Digitization – transcribe from analog media into spreadsheets; help from students?
• Documentation – samples, provenance (lab, instrument, etc.), data quality; EarthChem data templates
• Standardization – MoonDB vocabularies & data templates
• Accessibility – data publication (ECL), links between systems
• Citability – DOIs, example citations
• Guidance/Training – calls and emails with disciplinary repository staff, regular communication (meetings, webinars)
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 31
Data Templates
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 32
Data Templates
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 33
IEDA Data Rescue Initiative
• preserve valuable legacy data sets that are in danger because of impending retirement or degradation
• augment data collections maintained by IEDA
• improve procedures and tools for user contributions
• 2013 & 2015 International Data Rescue Award in the Geosciences
• Town Hall at EGU General Assembly 2015
• IEDA Data Rescue Mini-Awards
• Data Rescue Process Study (collaboration with Elsevier Research Data Services)
34LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
IEDA Data Rescue Mini-awards
• $7,000 awards to support investigators for properly compiling, documenting, and transferring data that are in danger
• Open competition, announced across the Earth Sciences• Proposal evaluated by IEDA User Committee
• Criteria: highest impact on future research based on quality, size, rarity, unique location or data type
• Requirement: Data need to be made accessible to the community for re-use by inclusion in IEDA data collections (EarthChem, MGDS, SESAR)
35LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
EarthChem Library
• Mechanism for easy data submission
• Review of contributed data by data managers – QA/QC
• Data become citable – credit for contributors!
• Mechanism for making data persistently discoverable & accessible
• Long-term archiving at PDS
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 36
ECL Data Submission
37LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
Data Discovery & Access
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 38
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 39
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 40
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 41
Data Journals
42LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
Data Publication Example
43LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 44
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 45
The Future of Data in Scientific Articles
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 46
http://www.copdess.org
Commitment of Publishers
“Earth and space science data should, to the greatest extent possible, be stored in appropriate domain repositories that are
widely recognized and used by the community, follow leading practices, and
can provide additional data services.”
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 47
COPDESS Signatories
Publishers
• American Astronomical Society• American Geophysical Union• American Meteorological Society• Center for Open Science• Elsevier• European Geophysical Union
• Geochemical Society• ICSU World Data System• John Wiley and Sons• Meteoritic Society• Mineralogical Society of America• Nature Publishing Group• Paleonotological Society• Proceedings of the National Academy of Sciences• Science
Data Facilities
• BCO-DMO• CLIVAR and Carbon Hydrographic Data Office
(CCHDO)• CINERGI• CUAHSI• Continental Scientific Drilling Coordination Office
(CSDCO)• Council of Data Facilities• Geological Data Center of Scripps Institution of • IRIS• IEDA• LacCore: National Lacustrine Core Facility• Magnetics Information Consortium (MagIC)• Neotoma Paleoecology Database• National Snow and Ice Data Center• OpenTopography• Rolling Deck to Repository (R2R) Program• UNAVCO
48LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
MoonDB: The Data Restoration Team
• Richard Carlson, Carnegie Institution of Washington
• Erik Hauri, Carnegie Institution of Washington
• Bradley L. Jolliff, Washington University in St. Louis
• Clive Neal, University of Notre-Dame
• Marc Norman, Australian NationalUniversity
• Larry Nyquist, NASA JSC
• Charles Shearer, University of New Mexico
• Chi-Yu Shih, NASA JSC
• Lawrence A. Taylor, University of Tennessee in Knoxville
• G. Jeffrey Taylor, University of Hawaii
• Paul Warren, University of California Los Angeles
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 49
Join!
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data 50
Links to Literature
51LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
International GeoSample Number
• A globally unique and persistent identifier for physical objects in the Earth Sciences that is guaranteed to be unique via a centralized control mechanism.
• Resolves to virtual sample representations (sample metadata profiles) managed at federated IGSN Allocating Agents.
52
The EarthChem Portal shows 75 publications with geochemical data referenced to a sample with the name M1 (or M-1). The map shows the locations of M1 samples. (www.earthchem.org)
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
IGSN
53
Unique, persistent and resolvable identifier for sampling features
Governed by an international non-profit organization IGSN eV
Objective: ensure proper citation of samples and persistent access to sample metadata
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
IGSN Metadata
• Identification• Sample name(s), registrant
• Description• Material, classification, age, size, comments
• Geospatial information• Geographical names, coordinates
• Collection• Expedition/cruise, platform, date, collector,
technique
• Archiving/access• Physical location of sample (repository), contact
• Relationship to other (sub-)samples
54LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
Sample Geneology
55LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
SESAR Services:Links to Relevant Resources
• Images
• Documents (.pdf, .xls, .doc)
• Publications
• Data
56LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
57LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
Linking Samples, Data, & Publications
58LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
SESAR System for Earth Sample Registration
• Registry for the International Geo Sample Number IGSN
• Catalog of sample metadata• Search for samples
• Access to object metadata profiles
• Tools for sample registration and metadata management: MySESAR• User interface to submit sample metadata (registration)
• User interface for metadata management (e.g., edit metadata, transfer ownership etc.)
59LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
IGSN: Registration
60
Allocating Agent
• Sample Name• Location• Sample type• ….
IGSN:XYZ08H7JG
IGSN eVRegistry
Sample Label
1. Submit metadata
2. Create IGSN, store metadata
3. Register IGSN
5. Send to user
6. Use IGSN
4. Confirm uniqueness
Lab or field
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
Governance: IGSN e.V.
• Non-profit organization registered in Germany (“eingetragener Verein”) to operate an IGSN registration service with a distributed infrastructure for use by and benefit of its members
• Currently 14 members from US, Germany, Australia
• By-laws modeled after the DataCite Consortium
• Membership required for organizations that want to set up their own Allocating Agent.
• Membership NOT required to use IGSNs.
61LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data