data management and data management planning · data management and data management planning boston...

40
Data Management and Data Management Planning Boston College OSP Briefing -- Nov. 21, 2017 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental Sciences, Environmental Studies [email protected] Barbara Mento, Data/GIS Librarian, Sr. Bibliographer for Computer Science, Economics, Mathematics [email protected] Sally Wyman, Sr. Bibliographer for Chemistry, Physics, Environmental Studies [email protected]

Upload: others

Post on 30-May-2020

37 views

Category:

Documents


1 download

TRANSCRIPT

Data Management and Data Management Planning

Boston College OSP Briefing -- Nov. 21, 2017

Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental Sciences, Environmental Studies [email protected]

Barbara Mento, Data/GIS Librarian, Sr. Bibliographer for Computer Science, Economics, Mathematics [email protected]

Sally Wyman, Sr. Bibliographer for Chemistry, Physics, Environmental Studies

[email protected]

Fits into “responsible conduct of research” compliance

Risk of data loss to researcher and to University

Facilitates fulfillment of requests from others to see individual researcher data

Preserves understanding of details for later

Shared data (“open access”) higher citation rate!

Why Have a Data Management Plan (“DMP”)?

NSF – all grants since Jan. 18, 2011 require 2-page DMP

Directorate of Mathematics and Physical Sciences, Directorate of Chemistry (CHE)

NIH – all grants since Oct. 1, 2003 require data sharing

DOE – all grants since Oct. 1, 2014 require DMP that describes how sharing/preservation will enable validation of results (or statement that describes how these goals will be met without sharing/preservation)

Many more agencies now require some level of sharing or DMP -- since Feb. 22, 2013 OSTP memorandum

Most federal grants now require this:

More scholarly publishers do, too, now:

American Economic Association journals, Nature, Science, PNAS, PLoS, etc. require or encourage that data be clearly documented

available for sharing

detailed enough to permit replication of analysis

“Data” journals are now published Nature Publishing Group’s Scientific Data provides a place for curated descriptions of scientifically valuable datasets to foster sharing and re-use of data.

There are indexes that focus on data … Data Citation Index

Many grants now require this, but it’s bigger than that:

What is “data”? From the NSF FAQ on

Data Management Plans: “DMP” covers recorded factual material commonly accepted in the [specific] scientific community as necessary to validate research findings. May include, but is not limited to:

Data

Publications

Samples

Physical collections

Software and models

But not: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. (Office of Management and Budget (OMB) Circular A-110 )

Elements of a “Typical” Data Management Plan (DMP)

1-2 pages describing the project and how data will be:

Collected (including formats, size, etc.) … Secured … Analyzed … Shared … Preserved

Details about access/sharing

Potential audience(s) for the data

How access will be provided and how others will find it: “Access” (freely-available) vs. “Sharing” (by request)

Stipulations for privacy, confidentiality, IP or other rights

Allowed re-use of the data, derivative products

Metadata standards to be used

How long data will be retained -- archiving, long-term preservation and format migration

Boston College Libraries Data Management Plan

Research Guide

Guidance on content

Templates/examples

Additional resources

DMPTool

Dataverse

(More later in session)

To arrange a consultation with a subject specialist

https://libguides.bc.edu/dataplan

Key DMP Concepts

1. A “Read me” file or Code Book

2. Use of “open” (non-proprietary) file formats

3. Consistent naming practices for all files

4. Metadata

5. Back up plan

6. Long-term storage strategy

7. Data sharing

8. Plan for true “archiving”

Key DMP Concept #1

A “Readme” file or “Code Book”

This file (or document) describes the research process for collecting data, how it is stored, how it is backed-up and file

formats chosen … and more, as described, below.

Key DMP Concept #2

Use of “open” file formats, avoiding proprietary formats

Whenever possible, researchers should save data using open standards. Some examples:

TXT, PDF/PDF Archival, not Word (doc, docx)

ASCII, not Excel (xls, xlsx)

MPEG-4, not Quicktime (qtff)

TIFF or JPEG2000, not GIF or JPG

XML or RDF, not RDBMS

Ideally, files are saved in both original format AND one of the preferred ones listed above.

Key DMP Concept #3

Consistent naming practices for all files

File names should be brief and unique, and might contain:

Project acronyms, research initials, file type information, version, date, file status, like this one:

Internet Usage Study version 2, Sept. 2011, final draft, in csv format: IUS_v02_092011_final.csv

Evidence of maintenance of both archival (unmodified) and updated “versioned” files (clearly labelled)

Organization of Files

Directory Structure

Use folders!

Possible ways to organize:

By types of data

fMRI, interview, video

By experiment/study

By collection method

Choose option that works best for your research group … it should be understandable to others

Image: digitalart / FreeDigitalPhotos.net

Version Control of Files

Keep an archival (unmodified) version, and updated versions (clearly labelled)

Use ordinal numbers (1, 2, 3) for major changes and decimals for minor changes (V1.1, V1.2 …)

Version control software can help, and some software has this built-in… especially instrument software

Key DMP Concept #4

Metadata … It’s “data about the research data”

This “data” (subject-based terminology) helps others discover your data (more about this, shortly), helps YOU remember, and may be needed for later journal

publication/data deposit…

Metadata standards exist

per discipline

per type of data (.cif for crystallographic data, for example)

Per individual repository (ICPSR, GenBank, etc.)

Metadata is recorded in the “readme” file or code book

Data Documentation (“Metadata”)

Metadata captures the most critical information about a particular project. Best when captured early on… helps jog memories later …

It helps others discover the research being shared.

Metadata may be required for journal publication/data deposit.

For help, contact your

Subject Librarian

ISO suggested Minimum Data Elements o Title o Creator (Principal Investigators) o Date Created (also versions) o Instrument and model o Format (and software required) o Subject o Unique Identifier o Description of the specific data resource o Coverage of the data (spatial or temporal) o Publishing Organization o Type of Resource o Rights o Funding or Grant

Data Documentation – What do you do with it once you have it?

Record it in a readme.txt file

In some fields, “codebooks” are used to record methodology and other data management notes (e.g. IRB compliance statements, etc.)

Consider including a “data dictionary”

Inserted with deposited data these files facilitate “discovery” of your data on the Web

Key DMP Concept #5

Back up Strategy

Regular, scheduled back up protects against loss

Back up strategy will depend on your needs, collection volume, update frequency, etc.:

Back up all versions of the files or certain ones?

How often to back up files?

Listing at least two back up locations (so, 3 copies)

Internal (researcher computer)

External (i.e. the BC Research Data Archive or departmental servers)

Assign responsibility for backing up

Key DMP Concept #6

Long-term Storage Strategy

Plan should describe how data will be stored … in the safest long-term locations (not a laptop or flash-drive!)

Local (lab computer, flash-drive): convenient, but much less secure

Centralized – ITS Servers, departmental servers, BC ELN server

Remote – BC Dataverse, disciplinary servers (GenBank, ICPSR, etc.) most tailored to disciplinary needs, but may be open (and that may be problematic for some researchers)

Grants can sometimes cover cost of long-term storage.

Key DMP Concept #7

Data Sharing – … the ultimate goal of DMPs Options include: personal website … but researchers can do better:

Journal “supplementary materials” (ACS, etc.) … now in figshare

Institutional repository, e.g. eScholarship@bc, BC Dataverse

Disciplinary (or multidisciplinary) repository e.g. Cambridge Structure Database

Or, a combination: journal-designated repository ( Nature or Review of Economics and Statistics Dataverse, for two examples)

Examples of Subject Repositories

Biomedicine:

GenBank* -- sequence data

RSCB Protein DataBank* -- biomolecule crystal structure coordinates, etc.

Chemistry:

Cambridge Structural Database (CSD)*

PubChem (Part of NCBI Entrez, covering biological activities of small molecules)

Social Sciences

ICPSR (Inter-university Consortium for Political and Social Research)

IQSS (The Institute for Quantitative Social Science)

Multidisciplinary: figshare.com (Open, Free)

Finding Aid: https://www.re3data.org/ -- search for repositories in all disciplines

*A few of the data repositories that fulfill Science magazine requirements for data deposition

Key DMP Concept #8

Archiving Plans

Archiving data means not just preserving the data in the original format but also in a format that is non-platform reliant, using a standard that ensures that the data can be re-used in the future.

Additional Essential Elements in the DMP:

Ethics/privacy

Data Ownership

Intellectual Property/Technology Transfer

Ethics and Privacy Sensitive data should be redacted

before depositing in a public archive or repository.

Access to data may be embargoed (access limited for a time) for confidentiality, legal, patentability or other reasons.

Dark archives ensure permanent protection of confidentiality.

Where human subjects/privacy is involved, BC’s Institutional Review Board (IRB) must approve. https://www.bc.edu/research/oric/human.html

Image: digitalart / FreeDigitalPhotos.net

Good DMP, then What Happens? Data is Shared, then … Cited

Data Citation

Why is proper data citation important?

Ensures that original producers of the data are credited in citation indexes*

Allows researchers to locate research data used in an article

May be required by the archive that stores the data to be repurposed

*Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.doi:10.1371/journal.pone.0000308

Citing Data Sets Essential citation elements; style will vary:

• author or creator

• title or description

• year of publication

• publisher and/or the database/archive from which it was retrieved

• the URL or DOI if the data set is online

Hitchcock, Colleen; Manning, Deirdre; Keegan, Kevin; Utsun, Ekin, 2016, "Boston College tree inventory data archive", https://doi.org/doi:10.7910/DVN/IBSB2R, Harvard Dataverse, V1

Pitt-Catsouphes, Marcie, and Steven Sweet. Talent Management Study: U.S. Workplaces In Today's Business Environment, 2009. ICPSR34836-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2013-09-09. http://doi.org/10.3886/ICPSR34836.v1

https://doi.org/doi:10.7910/DVN/IBSB2R

Questions?

Library Tools and Resources to Help

Barbara, Enid and Sally! We are here to help.

Research Guide for Data Management Plans https://libguides.bc.edu/dataplan

The BC DMP Tool http://www.bc.edu/sites/libraries/dmptool/

Dataverse https://libguides.bc.edu/dataverse

E-Scholarship@bc https://dlib.bc.edu/

ORCID at BC – unique researcher identifier https://bc.edu/orcid

Other Library Services

Digital Humanities Team

Data Visualization and Analysis Team

DMPTool Template Example:

Dataverse

Map created with data deposited in Dataverse.

Visualization of inventories, species and health of trees to

determine BC carbon footprint in 2010.

Data is being repurposed in BC Ecology and Evolution class

From presentation IRODS Meeting, “ Dataverse DataTags: Sharing Data You Can’t Share.” Merce Crosas, Ph.D. Director of Data Science, IQSS, Harvard University, 2014

Dataverse version 5 will include DataTags for confidential data. Coming December 2017

eScholarship@BC

eScholarship@BC is our institutional repository

• Faculty can deposit scholarly work including – Working papers, published articles, teaching materials,

conference presentations, posters • Reasons to deposit:

– Compliance with funder mandates for open access – Global visibility and readership – Search engine harvesting – Eliminates economic barriers to knowledge – Increase citation counts – Get a permanent URL for the CV – Long-term preservation

• Link your data to accompanying publications

Benefits ✓ Improves visibility in

field ✓Self-updating CV ✓Eliminates name

ambiguity

What is ORCID ✓Unique, persistent

identifier for researchers & scholars

✓Follows you wherever you go

✓Constant through life events and name changes

Less than 60 seconds to sign up bc.edu/orcid

Faculty Annual Report

eScholarship @BC

Faculty Profiles

Manuscript submission

Grant applications Professional

Orgs

bc.edu/orcid

ORCID at BC ✓ Import citations to

Faculty Annual Report ✓Display on work in

eScholarship@BC ✓Display on faculty

profiles ✓Use in grant and

manuscript submissions

More Ways We Can Help: Librarians are expert searchers

Need information on the broader impacts or applications of the proposed work? May call for a more comprehensive literature search -- particularly into literature outside of the immediate field – ask your subject specialist librarian. Who else is doing similar research? The Scopus database can make this more visible. See next slide.

Who is funding similar work? Scopus and Web of Science both report funder sources and both allow searching by funder to find funded work.

Where to publish/how to best disseminate the work? JCR Database – use it to find journal Impact Factors (IF) Scopus – use to find CiteScore, SJR, SNIP – alternatives to IF

Scopus Data Visualization: key journals for the topic

Scopus Data Visualization: key authors for the topic

Scopus Data Visualization: key institutions for the topic

Any More Questions?