Curation-Friendly Tools for the Scientific Researcher
Brian WestraUniversity of Oregon
Data services needs assessment: 2009-2010
Interviewed 25 faculty:
BiologyCenter for Advanced Materials Characterization at OregonChemistryComputer & Information ScienceGeological SciencesHuman PhysiologyInstitute for a Sustainable EnvironmentMuseum of Natural and Cultural HistoryPhysicsPsychology
Some background
o Connecting data sources to data viewing and usage
o Data organizationo Metadata/annotation of fileso Recording workflow, procedures,
provenance
Preservation, archiving and publishing data were farther down the list
Primary issues/needs
Clearly articulated need and opportunity; also tie-in to data management plan implementations
Logical extension of the role for libraries beyond traditional services
Support for e-Science is a goal
Working in the data lifecycle/ecosystem is more robust than ‘just’ archiving/preservation
Why we’re involved…
Maintaining, preserving and adding value to digital research data throughout its lifecycle.
http://www.dcc.ac.uk/digital-curation/what-digital-curation
What is digital curation?
File management tools: i.e., Sharepoint
Best practices: naming conventions, version control software
Are there other solutions or services?
What might meet these needs?
Going beyond file management systems to embedded, more holistic tools/systems:
o Electronic Lab Notebooks
o Content/format-specific data management software
What might meet these needs?
“…how a laboratory tracks and manages its information resources, particularly the data that represents the laboratory’s product.” (Avery, McGee, & Falk, 2000)
“a data and sample management system that is designed to improve the management of laboratory workflow” (“Clinical LIMS,” 2011)
Most basic function: sample handling and reporting.
LIMS – Lab Info Mgmt System
Data (create, store, share, organize, analyze) +
information (notes)
May include: sample handling, storeroom inventory, signatures, collaboration, protocols and SOPs, embedded workflows, data analysis and visualization
LIMS and ELN functions and features often overlap
ELN – Electronic Lab Notebook
Many of them! UWisconsin-Madison RFI responses included these vendors:
o Accelryso Agilento Amphorao Axiopeo Conturo IDBSo Kinematiko Labtracko Notebookmakero Rescentriso Waters
ELN options
Continuously changing field of vendors and products
o Nature article
o Other options: open source, or a mix of basic tools, often used in open science
ELN options
Some UO considerations:
o Academic audience (vs. FDA compliance)o Cost – S/W, hardware, sys-admin, trainingo Interface and ease of useo Account managemento Platformo Research domain integration*o Metadata support*o Data file management*
*curation characteristics
Narrowing the field
o Research domaino Workflow integration with analytical tools, methodso Data capture from typical hardware/sourceso Ontologies
o Metadatao Capture/extractiono Representation, standardso Export with files
o Data file managemento File format standards, transformationso Export optionso Metadatao Provenance, version controlo Archiving raw and derivatives
Curation-friendly: compatible with or supporting:
Wisconsin-Madison RFI
o Some highlights from an excellent list of considerations
o Good process
o Plan to field test with 60 participants
Narrowing the field
What might be your “make or break” issues?
How would you assign weights or ranking to the metrics?1. Costs2. Platform3. Product lock-in4. etc.
Narrowing the field
‘Ground truth’ the metrics and values/comparators
Satellite or high-altitude (pre-pilot) might not conform to on the ground (during the pilot)
Thoughts on evaluation
http://www.seawead.org/index.php?option=com_content&view=article&id=29:ground-truthing&catid=9&Itemid=9
Have realistic team work load and timeline expectations
It’s progress! It may be difficult to apply measures of curation capacity to an ELN
oArchiving and preservation capacityoExportable relational (semantic) representationoPublication of data
Thoughts & observations
It may be more realistic to ask:
o Will this help you (the PI) find and understand the data and notes this week/ next year/after the student is gone?
o Can this improve your ability to do data management (and write a better plan for the next grant proposal)?
o Is it simple enough that it will become part of the routine? i.e., folklore: info everyone knows but no one records
Thoughts & observations
Example: publish direct to ChemSpider
Chemspider record
ELN data exchange project: Dial-a-molecule
Thoughts & observations
A compelling reason for faculty to participate
Collaboration and coordination with stakeholders (Office of Research, IT, Libraries, research faculty, Tech Transfer)
Champion(s) – these are usually not easy or inexpensive to implement, in the lab or with limited budgets
Characteristics of a good pilot study
What is the scope of a “pilot case”?o Durationo Number of participantso Hardware capacityo Level of training and supporto Evaluation criteria and roleso Exit strategy – and dealing with success
Who’s going to pay for this (right now)?
Might anticipate who is going pay for this (if it works well and goes to production)
Set expectations and build concensus
“Data you enter in the ELN software will be stored in a secure
location, however; at the end of the pilot period, the data will
be removed and we cannot guarantee that it can be recovered
fully from the ELN. Therefore, we very strongly encourage you
to keep an additional copy of all data and notebook entries in
electronic and/or hard copy format during the pilot as a backup
measure and as a means of keeping a complete and continuous
record of your work during the pilot period.”
https://academictech.doit.wisc.edu/informed-consent-electronic-lab-notebook-pilot
Expectations
Many biology labs produce a lot of still images and video
Image management
Cresko lab - UO
Open Microscopy Environment (OME)-developed system for image file management
OMERO
Embeds/supports curation:
o Uses a metadata standard for description (OME XML)
o Employs file format standards (import to tiff) o Can archive raw and derivative fileso Provides intuitive organizational schemao Annotation and description support on
multiple levelso Export of files with metadata
OMERO strengths
video
Annotation
It’s open source – what is the level of support/installation base? Longevity/stability?
How well does it fit into the workflow of the lab?
Can it support the proprietary formats generated in the labs?
What are the IT/systems requirements?
Primary evaluation questions
Finding a host and participants
Establishing realistic expectationso Host obligationso Project scope
Barriers to the pilot study
DCXL: Digital Curation for Excel
Discussion: what other options are you exploring?
Other projects and ideas:
Thank you!
ReferencesAvery, G., McGee, C., & Falk, S. (2000). Product Review: Implementing LIMS: A “how-to” guide. Analytical Chemistry, 72(1), 57 A-62 A. American Chemical Society. doi:10.1021/ac0027082
CIO Office, U. of W.-M. (n.d.). Charter 6.7: eLab Notebooks | CIO Office | UW-Madison. Retrieved February 9, 2012, from http://www.cio.wisc.edu/plan-docs-Charter6-7.aspx
Clinical LIMS. (2011). Retrieved from http://www.scientificcomputing.com/product-IN-Clinical-LIMS-072811.aspx?terms=LIMS
Giles, J. (2012). Going paperless: The digital lab. Nature, 481(7382), 430-1. doi:10.1038/481430a
PerkinElmer. (n.d.). PerkinElmer Informatics. Retrieved February 9, 2012, from http://www.cambridgesoft.com/?l=en
Rescentris. (n.d.). Rescentris | CERF Software. Retrieved February 9, 2012, from http://rescentris.com/cerf-software/
University of Dundee & Open Microscopy Environment. (n.d.). About OMERO — OME. Retrieved February 9, 2012, from http://www.openmicroscopy.org/site/products/omero
University of Wisconsin-Madison. (2012). Informed Consent for Electronic Lab Notebook Pilot | Technology Solutions for Teaching and Research. Retrieved February 9, 2012, from https://academictech.doit.wisc.edu/informed-consent-electronic-lab-notebook-pilot
University of Wisconsin-Madison. (n.d.-a). Electronic Lab Notebooks | Technology Solutions for Teaching and Research. Retrieved February 9, 2012, a from http://academictech.doit.wisc.edu/ideas/electronic-lab-notebooks
University of Wisconsin-Madison. (n.d.-b). Electronic Lab Notebook Request for Information - University of Wisconsin-Madison. Retrieved February 9, 2012, b from https://academictech.doit.wisc.edu/files/115349rfi.pdf