data management for graduate students
TRANSCRIPT
Data Management for Graduate
StudentsMarriott Library Graduate Student Workshop Series
Rebekah Cummings, Research Data Management LibrarianJ. Willard Marriott Library, University of Utah
September 27, 2016
• Introductions•What are data? •Why manage data? •Data Management Plans
•Data Organization•Metadata•Storage and Archiving•Questions
In the next hour…
NameMajorResearch Project
What is data management?
Activities and practices that support long-term preservation,
access, and use of data
What are data? “The recorded factual material
commonly accepted in the research community as
necessary to validate research findings.”
- U.S. OMB Circular A-110
Data are diverse
Data are messy
We manage data first and foremost for
ourselves
Why else manage data?
•Meet grant and journal requirements
•Promote reproducible research
•Enable new discoveries from your data
We are trying to avoid this scenario…
Two bears data management
problems1. Didn’t know where he stored the data
2. Saved one copy of the data on a USB drive
3. Data was in a format that could only be read by outdated, proprietary software
4. No codebook to explain the variable names
5. Variable names were not descriptive
6. No contact information for the co-author Sam Lee
Data Management Plans•What data are generated by your
research?•What is your plan for managing the data? •How will your data be shared?
Elements of a DMP•Types of data, including file
formats•Data description•Data storage•Data sharing, including
confidentiality or security restrictions
•Data archiving and responsibility•Data management costs
DMPTool – CDL
Data organization
File naming
MyData.xls
MeetingNotes.doc
Presentation.ppt
Assignment1.pdf
File naming best practices
1. Be descriptive not generic
2. Appropriate length (about 25 chars or less)
3. Be consistent4. Think critically about
your file names
File naming best practices•Files should include only letters,
numbers, and underscores/dashes.•No special characters. •No spaces; Use dashes,
underscores, or camel case (likeThis).
•Avoid case dependency. Assume this, THIS, and tHiS are the same.
•Have a strategy for version control.•Don’t overwrite file extensions
One potential strategy
Version Control - Numbering
001002003009010099
Use leading zeros for scalability
Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version changes and decimals for minor changes
(v1.1, v2.6)
110
239
99
Version Control - Dates
If using dates use YYYYMMDDJune2015 = BAD!
06-18-2015 = BAD!
20150618 = GREAT!
2015-06-18 = This is fine too
From a DMP…“Each file name, for all types of data, will contain the project acronym PUCCUK; a reference to the file content (survey, interview, media) and the date of an event (such as the date of an interview).
•PLPP_EvaluationData_Workshop2_2014.xlsx
•MyData.xlsx
•publiclibrarypartnershipsprojectevaluationdataworkshop22014CummingsHelenaMontana.xlsx
Who filed better?
Who filed better? •July 24 2014_SoilSamples%_v6•20140724_NSF_SoilSamples_Cum
mings•SoilSamples_FINAL
Structuring folders and files
• Consider all the types of files you will handle during the course of your project.
• Develop a nested folder structure that makes sense for your project and your team’s retrieval needs.
• Name folders clearly, without special characters. • Use a standard folder structure for each project or
subproject (including making folders for files not yet created)
• Create a reference document (README file) that notes the purpose of different folder.
University of Massachusetts Medical School Library http://libraryguides.umassmed.edu/file_management
README files
File organization exercise
Describing data
Research Documentation •Grant proposals and related reports•Applications and approvals (e.g. IRB)•Codebooks, data dictionaries•Consent forms•Surveys, questionnaires, interview protocols•Transcripts, hard copies of audio and video
files•Any software or code you used (no matter
how insignificant or buggy)
IJ?XVAR?
FNAME?
What goes in a codebook?
•Variable name•Variable meaning•Variable data types•Precision of data•Units
•Known issues with the data
•Relationships to other variables
•Null values•Anything else someone
needs to better understand the data
MetadataUnstructure
d DataStructured
Data
There was a study put out by Dr. Gary Bradshaw from the University of Nebraska Medical Center in 1982 called “ Growth of Rodent Kidney Cells in Serum Media and the Effect of Viral Transformation On Growth”. It concerns the cytology of kidney cells.
Title Growth of rodent kidney cells in serum media and the effect of viral transformations on growth.
Author Gary BradshawDate 1982Publisher
University of Nebraska Medical Center
Subject Kidney -- Cytology
At the very least…• Title• Creator• Description• Date• Type• Publisher
• Format• Identifier (DOI) • Rights• Any other critical
information to understand or cite the data.
Data ownership
Data Storage
LOCKSS (Lots of Copies
Keeps Stuff Safe)
Options for data storage
•Personal computers or laptops
•Networked drives•External storage devices
3-2-1 Backup RuleHave 3 copies of your data
On 2 different mediaIn more than 1 physical
location
Ubox – box.utah.edu
Language from a DMP
“All data files will be stored on the University server that is backed up nightly. The University's computing network is protected from viruses by a firewall and anti-virus software. Digital recordings will be copied to the server each day after interviews.
Signed consent forms will be stored in a locked cabinet in the office. Interview recordings and transcripts, which may contain personal information, will be password protected at file-level and stored on the server.
Original versions of the files will always be kept on the server. If copies of files are held on a laptop and edits made, their file names will be changed.”
Thinking long-term
Archiving options•Domain-specific repository
•General Purpose Data Repository
•Institutional repository
When you archive…• Save the data in both its proprietary and non-
proprietary format (e.g. Excel and CSV; Microsoft Word and ASCII)
• Consider any restrictions on your data (copyright, patent, privacy, etc.)
• When possible/mandated/desired, share your data online with a persistent identifier (DOI or ARK)
• Include a data citation and state how you want to get credit for your data
• Link your data to your publications as often as possible
Your data librarians
Daureen Nesdill
Research Data Management
Librarian, Sciences
Darell SchmickResearch Librarian,
Health Sciences
Rebekah Cummings
Research Data Management
Librarian, Social Sciences & Humanities
Major takeaways•Data management starts at the
beginning of a project•Document your data so that someone
else could understand it•Have more than one copy of your
data•Consider archiving options when you
are done with your project
Questions?
Rebekah [email protected](801) 581-7701Marriott Library, 1705Y…or ask now!