introduction to data management

51
Introduction to Data Management Cunera Buys Pam Shaw May, 7, 2015 https://www.flickr.com/photos/hellocatfood/7957989238/ (CC BY-NC-SA 2.0)

Upload: cunera

Post on 18-Aug-2015

242 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Introduction to Data Management

Introduction to Data Management

Cunera BuysPam Shaw

May, 7, 2015

https://www.flickr.com/photos/hellocatfood/7957989238/ (CC BY-NC-SA 2.0)

Page 2: Introduction to Data Management

Data Snafu

Data Sharing and Management Snafu in 3 Short Actshttps://www.youtube.com/watch?v=N2zK3sAtr-4

Page 3: Introduction to Data Management

What are data?

https://www.flickr.com/photos/rh2ox/9990024683/ (CC BY-SA 2.0)

Page 4: Introduction to Data Management

Data- Some DefinitionsDigital Curation Center (UK): “Data, any information in binary digital form, is at the centre of the Curation Lifecycle.”

Office of Management and Budget: “Research data means the recorded factual material commonly accepted in the scientific community as necessary to validate research findings”

The Oxford English Dictionary (OED)defines “data” as: Related items of (chiefly numerical) information considered collectively, typically obtained by scientific work and used for reference, analysis, or calculation.

Data can be both analogue and digital materials.

Page 5: Introduction to Data Management

Data in the Sciences and Humanities

BICEP2 (South Pole telescope) Performativity, Place, Space

Burgess and Hamming, 2011BICEP2 Collaboration, 2014

Page 6: Introduction to Data Management

Every discipline has data!

Types of data include:• observational data• laboratory

experimental data• computer simulation• textual analysis • physical artifacts or

relics

Examples of data include:• Audio and video files• Code or scripts• Digital text• Lab notebooks• Geospatial images• Instrumental data• Photographs• Rock samples• Survey results• Scanned documents• Spreadsheets• Video games

https://www.flickr.com/photos/23165290@N00/9338136777/(CC BY-SA 2.0)

Page 7: Introduction to Data Management

Federal Funding Agency Requirements

https://www.flickr.com/photos/pdenker/2556591663/ (CC By 2.0)

Page 8: Introduction to Data Management

Brief History of Data Sharing Requirements

• February 26, 2003 - NIH requires a Data Sharing Policy for projects above $500K.• January 18, 2011- NSF requires Data Management Plans (DMPs) to be submitted

with all new grant proposals.• February 22, 2013- Memo issued by White House Office of Science and

Technology Policy (OSTP). http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf

• March 24, 2014 – Follow up memo issued by OSTP. http://www.whitehouse.gov/sites/default/files/microsites/ostp/OpenAccess_March-2014.pdf

• November 13, 2014- Progress update on policies to increase public access to the results of federally funded scientific research issued by OSTP. http://www.whitehouse.gov/sites/default/files/microsites/ostp/public_access_report_to_congress_ostp_11.13.14.pdf

• July 24, 2014, the DOE releases its Public Access Plan for article and data sharing• 2015 - 16 Agencies/Departments have released their responses

Page 9: Introduction to Data Management

Responding Agencies to OSTP MemoAgency for Healthcare Research and Quality (AHRQ)HHS Office of the Assistant Secretary for Preparedness and Response (ASPR)Centers for Disease Control and Prevention (CDC)Department of Commerce (DOC)Department of Defense (DOD)Department of Energy (DOE)Department of the Interior (DOI) Department of Health and Human Services (HHS)Department of Homeland Security (DHS)Department of Transportation (DOT)Department of Education (ED)Environmental Protection Agency (EPA)Food and Drug Administration (FDA)National Aeronautics and Space Administration (NASA)National Institutes of Health (NIH)National Institute of Standards and Technology (NIST)National Oceanic and Atmospheric Administration (NOAA)National Science Foundation (NSF)Office of the Director of National Intelligence (ODNI)Smithsonian Institution (SI)United States Agency for International Development (USAID)United States Department of Agriculture (USDA)United States Department of Veterans Affairs (VA)

Page 10: Introduction to Data Management

Agency Responses Summary- Articles

AGENCIES USING PUBMEDCENTRALAgency for Healthcare Research and Quality (AHRQ)HHS Office of the Assistant Secretary for Preparedness and Response (ASPR)Centers for Disease Control and Prevention (CDC)Food and Drug Administration (FDA)National Aeronautics and Space Administration (NASA) National Institutes of Health (NIH)National Institute of Standards and Technology (NIST)United States Department of Veterans Affairs (VA)

AGENCIES USING DOE’S PAGES (Public Access Gateway for Energy & Science)Department of Energy (DOE)National Science Foundation (NSF)

AGENCIES WITH OWN REPOSITORIESDepartment of Defense (DOD)-- Defense Technical Info CenterNational Oceanic and Atmospheric Administration (NOAA)United States Department of Agriculture (USDA)-USDA public access archive system

OTHER (TBD)Department of Transportation (DOT)United States Agency for International Development (USAID)United States Geological Survey (USGS)

Page 11: Introduction to Data Management

Agency Responses SummaryTime Frame for Depositing Data in a Publically Accessible Repository

At time of article publicationAgency for Healthcare Research and Quality (AHRQ)Department of Energy (DOE)Food and Drug Administration (FDA) National Institutes of Health (NIH)National Institute of Standards and Technology (NIST)National Science Foundation (NSF) (exploring this option)United States Agency for International Development (USAID)

With article publication or within 30 months of collection HHS Office of the Assistant Secretary for Preparedness and Response (ASPR)Centers for Disease Control and Prevention (CDC)

With article publication or within 1 year of collectionNational Oceanic and Atmospheric Administration (NOAA)

At time of publication or within a reasonable time period after publicationNational Aeronautics and Space Administration (NASA)

Within a reasonable timeDepartment of Defense (DOD)-- Defense Technical Info Center

Doesn’t specify United States Department of Veterans Affairs (VA) United States Department of Agriculture (USDA) Department of Transportation (DOT) United States Geological Survey (USGS)

Page 12: Introduction to Data Management

Journal Requirements

PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception.

Page 13: Introduction to Data Management

Why do funders and broader science community want to share and preserve

data?

https://www.flickr.com/photos/joyvanb/11111295964/ (CC BY-NC-ND 2.0)

Page 14: Introduction to Data Management

Prevent Data Loss

Page 15: Introduction to Data Management

Scientific Reproducibility

Page 16: Introduction to Data Management
Page 17: Introduction to Data Management
Page 18: Introduction to Data Management

Benefits of Sharing Data

• Clearly documents and provides evidence for research in conjunction with published results.

• Meet copyright and ethical compliance (i.e. HIPAA).

• Increases the impact of research through data citation.

• Preserves data for long-term access and prevents loss of data.

• Describes and shares data with others to further new discoveries and research.

• Prevent duplication of research.

• Accelerates the pace of research.

• Promotes reproducibility of research.

Page 19: Introduction to Data Management

Recognition

Chapter II.C.2.f(i)(c), Biographical Sketch(es), has been revised to rename the “Publications” section to “Products” and amend terminology and instructions accordingly. This change makes clear that products may include, but are not limited to, publications, data sets, software, patents, and copyrights.

Page 20: Introduction to Data Management

Data Management• Managing data effectively across the data lifecycle is critical for

the success of a research project– Make a data management plan

• Data management refers to all aspects of creating, housing, delivering, maintaining, and archiving and preserving data

• It is one of the essential areas of responsible conduct of research

• All subject areas (humanities, social science, and hard sciences) engage with data in many formats.

• Absence of data documentation and management will limit the potential use of that data.

Page 21: Introduction to Data Management

From: Fary, Michael and Owen, Kim, Developing an Institutional Research Data Management Plan Service, Educause ACTI white paper, January 2013, http://net.educause.edu/ir/library/pdf/ACTI1301.pdf

Common Data Lifecycle Stages

Page 22: Introduction to Data Management

Aspects of Research Data Management

•DMPs/Planning•Storage & backup•File organization & naming•Documentation & metadata•Legal/ethical considerations•Sharing & reuse•Preservation & Archiving

Page 23: Introduction to Data Management

Start with a plan…

Page 24: Introduction to Data Management

• Types of data to be produced. • Standards or descriptions that would be used with the data

(metadata).

• How these data will be accessed and shared.

• Policies and provisions for data sharing and reuse.

• Provisions for archiving and preservation.

https://flickr.com/photos/inl/5097547405 (CC BY 2.0)

Points to address in your Data Management Plan (DMP)

Page 25: Introduction to Data Management
Page 26: Introduction to Data Management
Page 27: Introduction to Data Management
Page 28: Introduction to Data Management
Page 29: Introduction to Data Management

Aspects of Research Data Management

•DMPs/Planning•Storage & backup•File organization & naming•Documentation & metadata•Legal/ethical considerations•Sharing & reuse•Preservation & Archiving

Page 30: Introduction to Data Management

Metadata• Commonly defined as “data about data”• It is information that describes the data• When talking to faculty, don’t use library

jargon like metadata. It is confusing to researchers.

https://www.flickr.com/photos/musebrarian/3289649684/ (CC BY-NC-SA 2.0)

Page 31: Introduction to Data Management

Some good data practicesFile organization and naming

• Label and define the content of your data files in a systematic way• Use descriptive file names

– For example not- FIAGC (Fluffy is a great cat) but age, blood pressure etc.

• Use consistent date formatting ( e.g. YYMMDD)• Keep file names short (no more than 25 characters)• Don’t use special characters• Use underscores instead of blank spaces• Keep track of versions• Don’t use confusing labels ( e.g. Pete’s data, final, final2, really final,

really really final)

Page 32: Introduction to Data Management

Data nightmares

Page 33: Introduction to Data Management

Data nightmares

Tweeted in 2012 by Gail Steinhart, Head of Research Services, Mann Library, Cornell University

Page 35: Introduction to Data Management

Toy Story 2

How Toy Story 2 Almost Got Deleted: Stories From Pixar Animation: ENTVhttps://www.youtube.com/watch?v=8dhp_20j0Ys

Page 36: Introduction to Data Management

Storage, back up and securing data

• Have at least 3 copies of your data• Don’t use your personal computer, data sticks or CDs if

you can avoid it– They break, get lost, lose data over time

• Use a hard drive if you can• Use cloud storage if you can ( but be aware of sensitive

data)• Northwestern has a subscription to Box.net for faculty,

staff and graduate students – See http://www.it.northwestern.edu/file-sharing/box.html

flickr.com/photos/s_w_ellis/3877534599 (CC By 2.0)

Page 37: Introduction to Data Management

Preservation and Sharing data

• Some options for preserving and sharing data – Self-archive– Institutional repository– Open data repository– National or international data archive or

repository

By Florian Hirzinger - www.fh-ap.com (Own work (Florian Hirzinger)) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons

Page 38: Introduction to Data Management

Northwestern Libraries

• Stewardship, institutional memory• Long tradition of broad subject expertise, liaisons to and in every

discipline• Potential Data services:

• finding data• licensing data• depositing data• software for working with data• assistance/ support with DMP’s• training• metadata assistance• outreach

Page 39: Introduction to Data Management
Page 40: Introduction to Data Management
Page 41: Introduction to Data Management
Page 42: Introduction to Data Management

Considerations for the medical campus

• All human subjects data is subject to IRB approval– Implications for knowledge of data management

plans– Researchers need exposure to and awareness of

new NIH Sharing Plan

Page 44: Introduction to Data Management

Resources at the CDSIREDCap secure survey platform

• REDCap– http://www.nucats.northwestern.edu/resources-s

ervices/data-informatics-services/software-tools/redcap

• REDCap (Research Electronic Data Capture) is a secure, web-based application for building and managing online data capture for research studies

Page 45: Introduction to Data Management

Precision medicine• Precision medicine is the #1 priority for DJ

Patil, Chief Data Scientist and Deputy Chief Technology Officer for Data Policy at the White House in the Office of Science and Technology Policy– Source: NSF Data Science webinar with DJ Patil

May 1, 2015

Page 46: Introduction to Data Management

Resources at the CDSI – i2b2Informatics for Integrating Biology & the Bedside

i2b2 at NUCATS

Page 47: Introduction to Data Management

Finding partners• Get to know who your departments’ Grant Officers are in

the OSR: http://osr.northwestern.edu/?src=or-hdr

Page 48: Introduction to Data Management

Finding partners

• NUIT Research Computing– http://www.it.northwestern.edu/research/– Seminars & events– Visualization and consultation services

• Sometimes knowing the resources means knowing where to refer the user

Page 49: Introduction to Data Management

Preparing to meet a researcher

• Know their work– Read their papers, or at least scan them– This helps you to ask meaningful questions about

their data– It also helps warm them up to you

• Go to their seminars or department meetings• Already mentioned: avoid library jargon

– Ask the user to explain or describe their data

Page 50: Introduction to Data Management

RESOURCES:Northwestern University Library Data Management LibGuide: http://libguides.northwestern.edu/datamanagement

DMPTool: https://dmp.org/

Northwestern University's Research Data: Ownership, Retention and Access Policy: http://www.research.northwestern.edu/policies/documents/research_data.pdf

Cunera Buys- e-science librarian: [email protected]

Page 51: Introduction to Data Management

Additional Resources or Training?