rescue of long-tail data from the ocean bottom to the moon

19
IEDA iedadata.org IN12A. Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science Fall AGU 2013 Rescue of Long-Tail Data from the Ocean Bottom to the Moon Leslie Hsu, Kerstin Lehnert, Suzanne Carbotte, Vicki Ferrini, John Delano 1 , James B. Gill 2 , Maurice Tivey 3 Lamont-Doherty Earth Observatory, Columbia University, 1 University of Albany, 2 University of California, Santa Cruz, 3 Woods Hole Oceanographic Institution

Upload: hsuleslie

Post on 11-Jul-2015

267 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

!

!

!

!IN12A. Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science!Fall AGU 2013!

Rescue of Long-Tail Data from the Ocean Bottom to the Moon!

Leslie Hsu, Kerstin Lehnert, Suzanne Carbotte, Vicki Ferrini,! John Delano1, James B. Gill2, Maurice Tivey3!

!Lamont-Doherty Earth Observatory, Columbia University,!1University of Albany, 2University of California, Santa Cruz, 3Woods Hole Oceanographic Institution!!

Page 2: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Data at Risk!¤  "Data at Risk" is scientific data that are !

¤  not in formats that permit full electronic access to the information they contain. !

¤  Data at Risk may be !¤  non-digital (e.g., handwritten or photographic), !¤  on near-obsolete digital media (such as floppy disks), !¤  or insufficiently described (lacking metadata). !

¤  Some born-digital data are considered "at risk" if they cannot be ingested into managed databases because they lack adequate formatting or metadata.!

!

Definition from the ICSU CODATA Data at Risk Task Group (DARTG)!

Page 3: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Data Rescue!¤  A “Data Rescue Mission” is any effort to preserve data at risk. Rescue

missions can come in the form of digitization, format migration, treating damaged materials (e.g., water or mold), adding metadata or any action taken to make data accessible in the long term.!

Definition from ICSU CODATA Data at Risk Task Group (DARTG)

M. Tivey

Page 4: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Long Tail Data are often Data at Risk!

Long Tail Characteristics!q  More specialised!q  Low volume!q  On C drives!q  Hard to find!q  Heterogeneous!q  Collected by many

people!q  Citizen science!q  Etc!q  Etc!

Long Tail: Environmental and Earth sciences

The Head: Astronomy, Climate, High Energy Physics, Genomics

L. Wyborn http://juliegood.wordpress.com/tag/long-tail/

Page 5: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

IEDA Data Rescue Mini-Awards!

¤ Established to preserve valuable legacy data sets that are in danger by impending retirement or degradation!

¤  Evaluated by highest impact on future research by quality, size, rarity, unique location or data type!

¤  Made accessible to the community for re-use by inclusion in the IEDA data collections (EarthChem, MGDS, SESAR)!

¤  $7000 award to support proper compilation, documentation, transfer!

¤  3 awardees chosen from 11 entries over a wide range of geochemical and geophysical data!

!

Page 6: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

1: Geologic samples and geochemistry!

¤  WHAT: Compilation of sample metadata and geochemical analyses from three areas – Fiji, Izu Arc, and Endeavour segment. (James B. Gill)!

¤  WHY: study of intra-ocean arcs and spreading centers!

¤  HOW: Check and add incomplete data, digitize data, add persistent identifiers. Link between related resources!

¤  Major challenge: Physical sample management!

Maps made with GeoMapApp

Page 7: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

The importance of Sample identification!

¤  Individual samples can play a large role in scientific conclusions, so accurate documentation of sample metadata is critical.!

¤  The key measurement was the one backarc basalt called "PPTUW”... Subsequent efforts to confirm the observation ran into problems. The apparently-same sample was variously called PPTU, PPTUW/5, PPTUW-1, and TVZ19 in four other papers. None of those papers gave its latitude and longitude… (J. Gill and E. Todd)!

Page 8: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

2: Near-bottom magnetics!

¤  WHAT: Compilation of near-bottom magnetometer data, including raw, merged, processed, and navigation metadata (Maurice Tivey)!

¤  WHY: study of magnetic reversals, effect of tectonics on magnetic field!

¤  HOW: gather data from different formats, add complete metadata and workflow!

¤  Challenge: over three decades of technology and file formats!

Page 9: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Evolution of equipment: 1985, 1992, 2004, 2011 !

Page 10: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Evolution of storage media!

M. Tivey

Page 11: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Addition of “sufficient” metadata!

Page 12: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

3: Lunar sample geochemistry!

¤  WHAT: Compilation of lunar sample geochemistry (John W. Delano et al.)!

¤  WHY: composition of the Moon!

¤  HOW: Digitize photos, label specific grains, compile geochemistry in data templates!

¤  Challenge: nothing was digital!

!

LPI

Page 13: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Use of IEDA EarthChem templates!

Page 14: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Common needs addressed!

¤ Accessibility – web access, links between systems!

¤ Documentation – README files, additional descriptions!

¤ Standardization – IEDA EarthChem geochemical templates !

¤ Persistent links – DOIs and IGSNs!

¤ Citability – DOIs, example citations!

¤ Guidance/Training – calls and emails with disciplinary repository staff!

Page 15: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Page 16: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Lessons learned: investigator!

¤ Take ownership of your own legacy!¤  Data curation by others may not be complete or correct!

¤ Data rescue of an entire career does not need to be overwhelming !¤  Start with small steps!¤  Disciplinary repositories will help and guide you to what is needed!

¤ Despite the time investment, data rescue is worth it!¤  Others will now be able to re-use the data!¤  Notes taken years ago actually explain anomalies!!

Page 17: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Lessons learned: repository!

¤ For Long Tail Data, every project is different !¤  There is not an established workflow – just past experience!¤  Time commitment from staff is nontrivial!

¤ Disciplinary training helps a great deal!¤  Investigators need help determining the best products!

¤ A small incentive will motivate investigators!

¤ Data Rescue missions help the repository determine next steps for development of tools and services!

Page 18: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

Summary of Long-tail Data Rescue!

¤ Three Data Rescue efforts this past year by IEDA have made data that were at risk!¤  digitized from analog data and near-obsolete media!¤  sufficiently described for reuse!¤  in formats that permit full electronic access!¤  Citable, with persistent identifiers, and ready for reuse!

¤ The projects also helped IEDA identify improvements in data rescue workflow, and future tools and services!

Page 19: Rescue of Long-Tail Data from the Ocean Bottom to the Moon

IEDA iedadata.org

More Data Rescue Activities!

¤ Elsevier-IEDA Data Rescue Process Study!¤  A data entry tool for lunar geochemistry: MoonDB!

¤ Elsevier-IEDA International Data Rescue Award!¤  Winner announced at reception tonight, Monday Dec 9th, 2013!¤  Intercontinental Hotel, Twin Peaks Room, 7:00-8:30pm!