introduction to data management
TRANSCRIPT
![Page 1: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/1.jpg)
Introduction to Data Management
Cunera BuysPam Shaw
May, 7, 2015
https://www.flickr.com/photos/hellocatfood/7957989238/ (CC BY-NC-SA 2.0)
![Page 2: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/2.jpg)
Data Snafu
Data Sharing and Management Snafu in 3 Short Actshttps://www.youtube.com/watch?v=N2zK3sAtr-4
![Page 3: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/3.jpg)
What are data?
https://www.flickr.com/photos/rh2ox/9990024683/ (CC BY-SA 2.0)
![Page 4: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/4.jpg)
Data- Some DefinitionsDigital Curation Center (UK): “Data, any information in binary digital form, is at the centre of the Curation Lifecycle.”
Office of Management and Budget: “Research data means the recorded factual material commonly accepted in the scientific community as necessary to validate research findings”
The Oxford English Dictionary (OED)defines “data” as: Related items of (chiefly numerical) information considered collectively, typically obtained by scientific work and used for reference, analysis, or calculation.
Data can be both analogue and digital materials.
![Page 5: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/5.jpg)
Data in the Sciences and Humanities
BICEP2 (South Pole telescope) Performativity, Place, Space
Burgess and Hamming, 2011BICEP2 Collaboration, 2014
![Page 6: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/6.jpg)
Every discipline has data!
Types of data include:• observational data• laboratory
experimental data• computer simulation• textual analysis • physical artifacts or
relics
Examples of data include:• Audio and video files• Code or scripts• Digital text• Lab notebooks• Geospatial images• Instrumental data• Photographs• Rock samples• Survey results• Scanned documents• Spreadsheets• Video games
https://www.flickr.com/photos/23165290@N00/9338136777/(CC BY-SA 2.0)
![Page 7: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/7.jpg)
Federal Funding Agency Requirements
https://www.flickr.com/photos/pdenker/2556591663/ (CC By 2.0)
![Page 8: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/8.jpg)
Brief History of Data Sharing Requirements
• February 26, 2003 - NIH requires a Data Sharing Policy for projects above $500K.• January 18, 2011- NSF requires Data Management Plans (DMPs) to be submitted
with all new grant proposals.• February 22, 2013- Memo issued by White House Office of Science and
Technology Policy (OSTP). http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf
• March 24, 2014 – Follow up memo issued by OSTP. http://www.whitehouse.gov/sites/default/files/microsites/ostp/OpenAccess_March-2014.pdf
• November 13, 2014- Progress update on policies to increase public access to the results of federally funded scientific research issued by OSTP. http://www.whitehouse.gov/sites/default/files/microsites/ostp/public_access_report_to_congress_ostp_11.13.14.pdf
• July 24, 2014, the DOE releases its Public Access Plan for article and data sharing• 2015 - 16 Agencies/Departments have released their responses
![Page 9: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/9.jpg)
Responding Agencies to OSTP MemoAgency for Healthcare Research and Quality (AHRQ)HHS Office of the Assistant Secretary for Preparedness and Response (ASPR)Centers for Disease Control and Prevention (CDC)Department of Commerce (DOC)Department of Defense (DOD)Department of Energy (DOE)Department of the Interior (DOI) Department of Health and Human Services (HHS)Department of Homeland Security (DHS)Department of Transportation (DOT)Department of Education (ED)Environmental Protection Agency (EPA)Food and Drug Administration (FDA)National Aeronautics and Space Administration (NASA)National Institutes of Health (NIH)National Institute of Standards and Technology (NIST)National Oceanic and Atmospheric Administration (NOAA)National Science Foundation (NSF)Office of the Director of National Intelligence (ODNI)Smithsonian Institution (SI)United States Agency for International Development (USAID)United States Department of Agriculture (USDA)United States Department of Veterans Affairs (VA)
![Page 10: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/10.jpg)
Agency Responses Summary- Articles
AGENCIES USING PUBMEDCENTRALAgency for Healthcare Research and Quality (AHRQ)HHS Office of the Assistant Secretary for Preparedness and Response (ASPR)Centers for Disease Control and Prevention (CDC)Food and Drug Administration (FDA)National Aeronautics and Space Administration (NASA) National Institutes of Health (NIH)National Institute of Standards and Technology (NIST)United States Department of Veterans Affairs (VA)
AGENCIES USING DOE’S PAGES (Public Access Gateway for Energy & Science)Department of Energy (DOE)National Science Foundation (NSF)
AGENCIES WITH OWN REPOSITORIESDepartment of Defense (DOD)-- Defense Technical Info CenterNational Oceanic and Atmospheric Administration (NOAA)United States Department of Agriculture (USDA)-USDA public access archive system
OTHER (TBD)Department of Transportation (DOT)United States Agency for International Development (USAID)United States Geological Survey (USGS)
![Page 11: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/11.jpg)
Agency Responses SummaryTime Frame for Depositing Data in a Publically Accessible Repository
At time of article publicationAgency for Healthcare Research and Quality (AHRQ)Department of Energy (DOE)Food and Drug Administration (FDA) National Institutes of Health (NIH)National Institute of Standards and Technology (NIST)National Science Foundation (NSF) (exploring this option)United States Agency for International Development (USAID)
With article publication or within 30 months of collection HHS Office of the Assistant Secretary for Preparedness and Response (ASPR)Centers for Disease Control and Prevention (CDC)
With article publication or within 1 year of collectionNational Oceanic and Atmospheric Administration (NOAA)
At time of publication or within a reasonable time period after publicationNational Aeronautics and Space Administration (NASA)
Within a reasonable timeDepartment of Defense (DOD)-- Defense Technical Info Center
Doesn’t specify United States Department of Veterans Affairs (VA) United States Department of Agriculture (USDA) Department of Transportation (DOT) United States Geological Survey (USGS)
![Page 12: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/12.jpg)
Journal Requirements
PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception.
![Page 13: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/13.jpg)
Why do funders and broader science community want to share and preserve
data?
https://www.flickr.com/photos/joyvanb/11111295964/ (CC BY-NC-ND 2.0)
![Page 14: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/14.jpg)
Prevent Data Loss
![Page 15: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/15.jpg)
Scientific Reproducibility
![Page 16: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/16.jpg)
![Page 17: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/17.jpg)
![Page 18: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/18.jpg)
Benefits of Sharing Data
• Clearly documents and provides evidence for research in conjunction with published results.
• Meet copyright and ethical compliance (i.e. HIPAA).
• Increases the impact of research through data citation.
• Preserves data for long-term access and prevents loss of data.
• Describes and shares data with others to further new discoveries and research.
• Prevent duplication of research.
• Accelerates the pace of research.
• Promotes reproducibility of research.
![Page 19: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/19.jpg)
Recognition
Chapter II.C.2.f(i)(c), Biographical Sketch(es), has been revised to rename the “Publications” section to “Products” and amend terminology and instructions accordingly. This change makes clear that products may include, but are not limited to, publications, data sets, software, patents, and copyrights.
![Page 20: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/20.jpg)
Data Management• Managing data effectively across the data lifecycle is critical for
the success of a research project– Make a data management plan
• Data management refers to all aspects of creating, housing, delivering, maintaining, and archiving and preserving data
• It is one of the essential areas of responsible conduct of research
• All subject areas (humanities, social science, and hard sciences) engage with data in many formats.
• Absence of data documentation and management will limit the potential use of that data.
![Page 21: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/21.jpg)
From: Fary, Michael and Owen, Kim, Developing an Institutional Research Data Management Plan Service, Educause ACTI white paper, January 2013, http://net.educause.edu/ir/library/pdf/ACTI1301.pdf
Common Data Lifecycle Stages
![Page 22: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/22.jpg)
Aspects of Research Data Management
•DMPs/Planning•Storage & backup•File organization & naming•Documentation & metadata•Legal/ethical considerations•Sharing & reuse•Preservation & Archiving
![Page 23: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/23.jpg)
Start with a plan…
![Page 24: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/24.jpg)
• Types of data to be produced. • Standards or descriptions that would be used with the data
(metadata).
• How these data will be accessed and shared.
• Policies and provisions for data sharing and reuse.
• Provisions for archiving and preservation.
https://flickr.com/photos/inl/5097547405 (CC BY 2.0)
Points to address in your Data Management Plan (DMP)
![Page 25: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/25.jpg)
![Page 26: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/26.jpg)
![Page 27: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/27.jpg)
![Page 28: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/28.jpg)
![Page 29: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/29.jpg)
Aspects of Research Data Management
•DMPs/Planning•Storage & backup•File organization & naming•Documentation & metadata•Legal/ethical considerations•Sharing & reuse•Preservation & Archiving
![Page 30: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/30.jpg)
Metadata• Commonly defined as “data about data”• It is information that describes the data• When talking to faculty, don’t use library
jargon like metadata. It is confusing to researchers.
https://www.flickr.com/photos/musebrarian/3289649684/ (CC BY-NC-SA 2.0)
![Page 31: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/31.jpg)
Some good data practicesFile organization and naming
• Label and define the content of your data files in a systematic way• Use descriptive file names
– For example not- FIAGC (Fluffy is a great cat) but age, blood pressure etc.
• Use consistent date formatting ( e.g. YYMMDD)• Keep file names short (no more than 25 characters)• Don’t use special characters• Use underscores instead of blank spaces• Keep track of versions• Don’t use confusing labels ( e.g. Pete’s data, final, final2, really final,
really really final)
![Page 32: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/32.jpg)
Data nightmares
![Page 33: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/33.jpg)
Data nightmares
Tweeted in 2012 by Gail Steinhart, Head of Research Services, Mann Library, Cornell University
![Page 35: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/35.jpg)
Toy Story 2
How Toy Story 2 Almost Got Deleted: Stories From Pixar Animation: ENTVhttps://www.youtube.com/watch?v=8dhp_20j0Ys
![Page 36: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/36.jpg)
Storage, back up and securing data
• Have at least 3 copies of your data• Don’t use your personal computer, data sticks or CDs if
you can avoid it– They break, get lost, lose data over time
• Use a hard drive if you can• Use cloud storage if you can ( but be aware of sensitive
data)• Northwestern has a subscription to Box.net for faculty,
staff and graduate students – See http://www.it.northwestern.edu/file-sharing/box.html
flickr.com/photos/s_w_ellis/3877534599 (CC By 2.0)
![Page 37: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/37.jpg)
Preservation and Sharing data
• Some options for preserving and sharing data – Self-archive– Institutional repository– Open data repository– National or international data archive or
repository
By Florian Hirzinger - www.fh-ap.com (Own work (Florian Hirzinger)) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons
![Page 38: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/38.jpg)
Northwestern Libraries
• Stewardship, institutional memory• Long tradition of broad subject expertise, liaisons to and in every
discipline• Potential Data services:
• finding data• licensing data• depositing data• software for working with data• assistance/ support with DMP’s• training• metadata assistance• outreach
![Page 39: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/39.jpg)
![Page 40: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/40.jpg)
![Page 41: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/41.jpg)
![Page 42: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/42.jpg)
Considerations for the medical campus
• All human subjects data is subject to IRB approval– Implications for knowledge of data management
plans– Researchers need exposure to and awareness of
new NIH Sharing Plan
![Page 43: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/43.jpg)
Resources at the CDSIhttp://www.nucats.northwestern.edu/centers-programs/cdsi
![Page 44: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/44.jpg)
Resources at the CDSIREDCap secure survey platform
• REDCap– http://www.nucats.northwestern.edu/resources-s
ervices/data-informatics-services/software-tools/redcap
• REDCap (Research Electronic Data Capture) is a secure, web-based application for building and managing online data capture for research studies
![Page 45: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/45.jpg)
Precision medicine• Precision medicine is the #1 priority for DJ
Patil, Chief Data Scientist and Deputy Chief Technology Officer for Data Policy at the White House in the Office of Science and Technology Policy– Source: NSF Data Science webinar with DJ Patil
May 1, 2015
![Page 46: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/46.jpg)
Resources at the CDSI – i2b2Informatics for Integrating Biology & the Bedside
i2b2 at NUCATS
![Page 47: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/47.jpg)
Finding partners• Get to know who your departments’ Grant Officers are in
the OSR: http://osr.northwestern.edu/?src=or-hdr
![Page 48: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/48.jpg)
Finding partners
• NUIT Research Computing– http://www.it.northwestern.edu/research/– Seminars & events– Visualization and consultation services
• Sometimes knowing the resources means knowing where to refer the user
![Page 49: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/49.jpg)
Preparing to meet a researcher
• Know their work– Read their papers, or at least scan them– This helps you to ask meaningful questions about
their data– It also helps warm them up to you
• Go to their seminars or department meetings• Already mentioned: avoid library jargon
– Ask the user to explain or describe their data
![Page 50: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/50.jpg)
RESOURCES:Northwestern University Library Data Management LibGuide: http://libguides.northwestern.edu/datamanagement
DMPTool: https://dmp.org/
Northwestern University's Research Data: Ownership, Retention and Access Policy: http://www.research.northwestern.edu/policies/documents/research_data.pdf
Cunera Buys- e-science librarian: [email protected]
![Page 51: Introduction to Data Management](https://reader033.vdocuments.site/reader033/viewer/2022042821/55d22207bb61eb804d8b4774/html5/thumbnails/51.jpg)
Additional Resources or Training?