how to manage your data · 2015. 10. 2. · how to manage your data jean aroom, lisa spiro &...
Post on 07-Oct-2020
1 Views
Preview:
TRANSCRIPT
How to Manage Your DataJean Aroom, Lisa Spiro & Kathy Weimer
October 2015
This workshop draws heavily on materials from the University of Minnesota Libraries, New England Collaborative Data Management Curriculum and DataOne.
Quick Poll: Raise Your Hand If You Have Ever...
● Forgotten what you called a file and/or where you put it
● Discovered unnecessary duplicates, then struggled over which to keep
● Lost data due to hardware failure, lost devices, etc.
Objectives for This Session● Understand the importance of managing data.● Create a basic data inventory.● Name and organize your files effectively.● Document your data to promote future use
(including by you). ● Know options for storing, backing up and
archiving your data.
Overview of Data Management Process1. plan2. organize3. document4. store, backup & archive
http://www.data-archive.ac.uk/create-manage/life-cycle
Why Managing Your Data Matters
What Is Research Data?“the information recorded or produced in any form or media during the course of a research investigation”(Rice Research Data Management Policy)Examples:● experimental results● lab notebooks ● protocols● charts● publications
Types of Research Data● Raw data
o Data generated during research project● Existing data
o E.g. Census data● Processed data
o Cleaned & formatted data● Analyzed data
o Summarized and analyzed data● Finalized/published data
o Data used to support a publication NEDMC Module 2
"Data visualization process v1" by Farcaster, Wikipedia
Formats of Research Data● text files● spreadsheets● lab notebooks● survey responses● audio/ video● maps/ spatial data
● government stats● images● databases● models● software code● ...
Why Is Managing Your Data Important?● Keep track of your data, working more
efficiently.● Prevent data loss.● Uphold standards of research integrity.● Make it easier to share and re-use data.● Meet funder, university & increasingly journal
requirements.
New Transparency and Openness Promotion (TOP) Standards
Nosek et al, “Promoting an open research culture,” Science, 26 June 2015
1. PLAN
Typical Elements of a Data Management Plan: DMP Toolhttps://dmptool.org/
Want to learn more about writing a data management plan?
co-facilitated by Fondren & the Office of Proposal Development
Utilities for Keeping Track of Your Data● List files in a directory using the command
line, copy & paste, or free toolsFrom the command line:○ Mac: ls > filenames.txt○ PC: dir > filenames.txt
● Use tools such as dupeGuru to identify and address duplicates.
Use a Data Inventory to Understand, Track & Share Your DataPlan for, monitor & prepare to share your data by recording:● what the dataset is ● who owns it● where it is● how important it is● who can access it● where it is stored and preserved
Data inventory template courtesy U of Minnesota Libraries
Exercise 1: Jot Down What Might Belong in Your Data InventoryDetailed data inventory
U of Minnesota Libraries
2. ORGANIZE
File Naming
Juliet: "What's in a name? That which we call a rose by any other name would smell as sweet."
Which file naming scheme works the best?
A. bridgedata1bridgedata2bridgedata3
B. bridge1_sensor2_02142013 bridge1_sensor2_02152013 bridge1_sensor2_02162013
C. madisonavebridge_sensor2_20130214 madisonavebridge_sensor2_20130215 madisonavebridge_sensor2_20130216
D. madisonavebridge_sensor2_feb142013 madisonavebridge_sensor2_02152013 madbridge_s2_feb162013 University of Minnesota Libraries
Common Ways to Organize Your Files: Structuring the Directory
Your directory should be predictable, easily identify information in each folder, and organized by: • Object type - e.g. photos, music, research
• Topical - e.g. transportation, buildings, demographic groups
• Organizational structure - e.g. Department, sub-unit, or by individual
Common Ways to Organize Your Files, Continued
• Project name or type
• Location/Geography - e.g. Country/State then subdivision of place
• Combination: structure your directory by a combination of any of the above (be consistent in the order!)
File Naming Best Practices
• List versions alphanumerically, eg. v1, v2, v3 (rather than textually, e.g. last, final, finalfinal, useTHISone)
• Special characters, some computers will not understand file names with UPPERCase letters, weird characters (/ , . # ?), or spaces between words
• Date/time: yyyymmdd is preferred, rather than Dec09
University of Minnesota Libraries
File Naming Best Practices, Continued
● Be Descriptive: 75092238.txt is not helpful. Instead use: 20120814_instrument8_rainyday_raw.txt (up to 255 characters)
● Don’t rely on nesting in folders: 2012/august/instrument8/day14/raw.txt
● Use consistent structure that falls into a useful order (for sorting) and decide on shared terminology
University of Minnesota Libraries
Good or Bad?
● Test_data_2013● Project_Data● Design for project.doc● Lab_work_Eric● Second_test● Meeting Notes Oct 23
Example from Purdue Libraries, Data Management LibGuide
Good or Bad?● 20130503_DOEProject_DesignDocument_Smith_v2-01.docx● 20130709_DOEProject_MasterData_Jones_v1-00.xlsx● 20130825_DOEProject_Ex1Test1_Data_Gonzalez_v3-03.xlsx● 20130825_DOEProject_Ex1Test1_Doc_Gonzalez_v3-03.xlsx● 20131002_DOEProject_Ex1Test2_Data_Gonzalez_v1-01.xlsx● 20141023_DOEProject_ProjectMeetingNotes_Kramer_v1-00.docx
Exercise
Instructions: Review the handout, then partner with 2-3 people to decide on a file naming system in order to archive all files in one folder and sort by interviewee name.
3 minutes to discuss
University of Minnesota Libraries
Organizing and Naming Your Data
Instructions: Discuss a directory structure and file naming convention for your shared data.
University of Minnesota Libraries
Batch Renaming Tools (for free!)
•All Platformso PSRenamer
•Maco Name Changer
•Windowso Bulk Rename Utility
University of Minnesota Libraries
3. DOCUMENT
Why Document Data?● Makes it easier for you to interpret your own data
● Facilitates collaboration, sharing, and reuse
● Ensures successful long-term preservation of findings
New England Collaborative Data Management Curriculum
Data Documentation● Who collected this data? Who/what were the subjects
under study?● What was collected, and for what purpose? What is the
content/structure of the data?● Where was this data collected? What were the
experimental conditions?● When was the data collected? Is it part of a series or
ongoing experiment?● Why was this experiment performed?
New England Collaborative Data Management Curriculum
Files to replicate Sean Bolks and Richard J. Stoll, “The Arms Acquisition Process: The Effect of Internal and External Constraints on Arms Race Dynamics,” The Journal of Conflict Resolution 44, no. 5 (October 1, 2000): 580–603.
File Contenttable1.dta Stata data file with data for Table 1table1.do Stata .do file with commands to replicate Table 1table2.dta Stata data file with data for Table 2table2.do Stata .do file with commands to replicate Table
Study-level documentation
Data-Level
Data-level documentation
Data-Level
Variable-level documentation
Data Documentation Handout
4. STORE, BACKUP & ARCHIVE
Data Storage Definition
● The media (optical or magnetic) to which you save your data files and software.
● All storage media are vulnerable to risk and obsolescence.
● Storage media should be evaluated and updated every 2-5 years.
New England Collaborative Data Management Curriculum
Data Storage Considerations
● Location (Internal/External HD, Network, Remote)● Disk size or storage quota● Computing performance● Accessibility
Data Backup Definition
● Allows you to restore your data if original data is lost or damaged due to:○ Hardware or software malfunction○ Environmental disaster (fire, flood)○ Theft○ Unauthorized access
New England Collaborative Data Management Curriculum
Data Backup Considerations
● Location (On-site, off-site)● Procedure (Full, differential, incremental, mirror)● Frequency (Hourly, daily, weekly, monthly)● Retention (Months, years)● Performance
TEST YOUR BACKUP PLAN!
Data Backup Summary
Backup type Backed up Backup time Restore time Storage space
Full/snapshot All data Slowest Fast High
Differential All data since last full Moderate Moderate Moderate
Incremental Only new/ modified files Fast Slowest Lowest
Mirror Only new/ modified files Fastest Fastest Highest
Data Archiving Definition
● Provides a final version of your data● Stored for the long-term
Data Archiving Considerations
● Location● File formats● Responsibility● Accessibility
Storage: storage.rice.edu● Location: Networked● Storage quotas
○ Undergraduates: 2 GB○ Graduates, Staff, Faculty: 5 GB○ Colleges, Depts, Centers, Institutes: 40 GB
● Performance - Subject to network● Accessibility
○ NetID folder: Private, not shared○ Groups: Any Rice NetID holder by request
\\storage.rice.edu
Backup: storage.rice.edu
● Location: On-site● Procedure: Full replication● Frequency: Daily● Retention
○ Personal access: 2 weeks○ Request IT restoration: 6 months
\\storage.rice.edu\?-home\~snapshot
Backup: CrashPlan
● Availability: Rice-owned computers● Cost: $82.56/year/person (up to 4 devices)● Location: Off-site cloud storage● Procedure: Incremental● Frequency: Adjustable up to every minute● Retention: Adjustable up to forever
CrashPlan PROe or crashplan.rice.edu
Sharing: Rice Dropbox/Dropoff
● Size: 1 GB per file, 2 GB per dropoff● Retention: 10 days● Accessibility
○ Inside users can share with anyone○ Outside users can only share with inside users
dropbox.rice.edu
Product Use Location Quota Performance Accessibility
Storage S/B Networked 2-5-40 GB Network NetID
Rice Box S/C US Cloud Unlimited Internet NetID & External
Google Drive S/C Global Cloud Unlimited Internet NetID & External
Crate S Networked 500 GB/award Network NetID
Archive A On-site tape 500 GB/award Network NetID
CrashPlan B Off-site cloud Unlimited Internet Your NetID
Data Security
● Confidential (SSN, CC#, DL#)○ Financial records○ Health records○ Education records
● Sensitive (Birth date, address, emergency contact, EID/SID)
LevelSecurity
ClassificationRice On-Site Most Secure
Rice Cloud Contract Semi-Secure
0 Not secure Google Drive
1 Research Crate
2 Sensitive/ Confidential Rice Box
3 Financial/ HR Rice Dropbox CrashPlan
4 PII (SSN) Storage, Archive
Resources● DataOne Primer on Data Management, https://www.dataone.
org/sites/all/documents/DataONE_BP_Primer_020212.pdf● Dataverse, Data Management Plans, http://best-practices.dataverse.org/data-
management/● ICPSR Guide to Social Science Data Preparation and Archiving, http://www.
icpsr.umich.edu/icpsrweb/content/deposit/guide/● Svend Juul et al, “Take good care of your data,” http://www.epidata.
dk/downloads/takecare.pdf● UK Data Archive, Managing and Sharing Data: Best Practices for Researchers,
http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
Thanks!Please contact researchdata@rice.edu with any questions.Visit us online at http://researchdata.rice.edu/.Help us shape future workshops! Please complete this evaluation: http://goo.gl/2luM63.
top related