how to manage your data · 2015. 10. 2. · how to manage your data jean aroom, lisa spiro &...

Post on 07-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

How to Manage Your DataJean Aroom, Lisa Spiro & Kathy Weimer

October 2015

This workshop draws heavily on materials from the University of Minnesota Libraries, New England Collaborative Data Management Curriculum and DataOne.

Quick Poll: Raise Your Hand If You Have Ever...

● Forgotten what you called a file and/or where you put it

● Discovered unnecessary duplicates, then struggled over which to keep

● Lost data due to hardware failure, lost devices, etc.

Objectives for This Session● Understand the importance of managing data.● Create a basic data inventory.● Name and organize your files effectively.● Document your data to promote future use

(including by you). ● Know options for storing, backing up and

archiving your data.

Overview of Data Management Process1. plan2. organize3. document4. store, backup & archive

http://www.data-archive.ac.uk/create-manage/life-cycle

Why Managing Your Data Matters

What Is Research Data?“the information recorded or produced in any form or media during the course of a research investigation”(Rice Research Data Management Policy)Examples:● experimental results● lab notebooks ● protocols● charts● publications

Types of Research Data● Raw data

o Data generated during research project● Existing data

o E.g. Census data● Processed data

o Cleaned & formatted data● Analyzed data

o Summarized and analyzed data● Finalized/published data

o Data used to support a publication NEDMC Module 2

"Data visualization process v1" by Farcaster, Wikipedia

Formats of Research Data● text files● spreadsheets● lab notebooks● survey responses● audio/ video● maps/ spatial data

● government stats● images● databases● models● software code● ...

Why Is Managing Your Data Important?● Keep track of your data, working more

efficiently.● Prevent data loss.● Uphold standards of research integrity.● Make it easier to share and re-use data.● Meet funder, university & increasingly journal

requirements.

New Transparency and Openness Promotion (TOP) Standards

Nosek et al, “Promoting an open research culture,” Science, 26 June 2015

1. PLAN

Typical Elements of a Data Management Plan: DMP Toolhttps://dmptool.org/

Utilities for Keeping Track of Your Data● List files in a directory using the command

line, copy & paste, or free toolsFrom the command line:○ Mac: ls > filenames.txt○ PC: dir > filenames.txt

● Use tools such as dupeGuru to identify and address duplicates.

Use a Data Inventory to Understand, Track & Share Your DataPlan for, monitor & prepare to share your data by recording:● what the dataset is ● who owns it● where it is● how important it is● who can access it● where it is stored and preserved

Data inventory template courtesy U of Minnesota Libraries

2. ORGANIZE

File Naming

Juliet: "What's in a name? That which we call a rose by any other name would smell as sweet."

Which file naming scheme works the best?

A. bridgedata1bridgedata2bridgedata3

B. bridge1_sensor2_02142013 bridge1_sensor2_02152013 bridge1_sensor2_02162013

C. madisonavebridge_sensor2_20130214 madisonavebridge_sensor2_20130215 madisonavebridge_sensor2_20130216

D. madisonavebridge_sensor2_feb142013 madisonavebridge_sensor2_02152013 madbridge_s2_feb162013 University of Minnesota Libraries

Common Ways to Organize Your Files: Structuring the Directory

Your directory should be predictable, easily identify information in each folder, and organized by: • Object type - e.g. photos, music, research

• Topical - e.g. transportation, buildings, demographic groups

• Organizational structure - e.g. Department, sub-unit, or by individual

Common Ways to Organize Your Files, Continued

• Project name or type

• Location/Geography - e.g. Country/State then subdivision of place

• Combination: structure your directory by a combination of any of the above (be consistent in the order!)

File Naming Best Practices

• List versions alphanumerically, eg. v1, v2, v3 (rather than textually, e.g. last, final, finalfinal, useTHISone)

• Special characters, some computers will not understand file names with UPPERCase letters, weird characters (/ , . # ?), or spaces between words

• Date/time: yyyymmdd is preferred, rather than Dec09

University of Minnesota Libraries

File Naming Best Practices, Continued

● Be Descriptive: 75092238.txt is not helpful. Instead use: 20120814_instrument8_rainyday_raw.txt (up to 255 characters)

● Don’t rely on nesting in folders: 2012/august/instrument8/day14/raw.txt

● Use consistent structure that falls into a useful order (for sorting) and decide on shared terminology

University of Minnesota Libraries

Good or Bad?

● Test_data_2013● Project_Data● Design for project.doc● Lab_work_Eric● Second_test● Meeting Notes Oct 23

Example from Purdue Libraries, Data Management LibGuide

Good or Bad?● 20130503_DOEProject_DesignDocument_Smith_v2-01.docx● 20130709_DOEProject_MasterData_Jones_v1-00.xlsx● 20130825_DOEProject_Ex1Test1_Data_Gonzalez_v3-03.xlsx● 20130825_DOEProject_Ex1Test1_Doc_Gonzalez_v3-03.xlsx● 20131002_DOEProject_Ex1Test2_Data_Gonzalez_v1-01.xlsx● 20141023_DOEProject_ProjectMeetingNotes_Kramer_v1-00.docx

Exercise

Instructions: Review the handout, then partner with 2-3 people to decide on a file naming system in order to archive all files in one folder and sort by interviewee name.

3 minutes to discuss

University of Minnesota Libraries

Organizing and Naming Your Data

Instructions: Discuss a directory structure and file naming convention for your shared data.

University of Minnesota Libraries

3. DOCUMENT

Why Document Data?● Makes it easier for you to interpret your own data

● Facilitates collaboration, sharing, and reuse

● Ensures successful long-term preservation of findings

New England Collaborative Data Management Curriculum

Data Documentation● Who collected this data? Who/what were the subjects

under study?● What was collected, and for what purpose? What is the

content/structure of the data?● Where was this data collected? What were the

experimental conditions?● When was the data collected? Is it part of a series or

ongoing experiment?● Why was this experiment performed?

New England Collaborative Data Management Curriculum

Files to replicate Sean Bolks and Richard J. Stoll, “The Arms Acquisition Process: The Effect of Internal and External Constraints on Arms Race Dynamics,” The Journal of Conflict Resolution 44, no. 5 (October 1, 2000): 580–603.

File Contenttable1.dta Stata data file with data for Table 1table1.do Stata .do file with commands to replicate Table 1table2.dta Stata data file with data for Table 2table2.do Stata .do file with commands to replicate Table

Study-level documentation

Data-Level

Data-level documentation

Data-Level

Variable-level documentation

Data Documentation Handout

4. STORE, BACKUP & ARCHIVE

Data Storage Definition

● The media (optical or magnetic) to which you save your data files and software.

● All storage media are vulnerable to risk and obsolescence.

● Storage media should be evaluated and updated every 2-5 years.

New England Collaborative Data Management Curriculum

Data Storage Considerations

● Location (Internal/External HD, Network, Remote)● Disk size or storage quota● Computing performance● Accessibility

Data Backup Definition

● Allows you to restore your data if original data is lost or damaged due to:○ Hardware or software malfunction○ Environmental disaster (fire, flood)○ Theft○ Unauthorized access

New England Collaborative Data Management Curriculum

Data Backup Considerations

● Location (On-site, off-site)● Procedure (Full, differential, incremental, mirror)● Frequency (Hourly, daily, weekly, monthly)● Retention (Months, years)● Performance

TEST YOUR BACKUP PLAN!

Data Backup Summary

Backup type Backed up Backup time Restore time Storage space

Full/snapshot All data Slowest Fast High

Differential All data since last full Moderate Moderate Moderate

Incremental Only new/ modified files Fast Slowest Lowest

Mirror Only new/ modified files Fastest Fastest Highest

Data Archiving Definition

● Provides a final version of your data● Stored for the long-term

Data Archiving Considerations

● Location● File formats● Responsibility● Accessibility

Storage: storage.rice.edu● Location: Networked● Storage quotas

○ Undergraduates: 2 GB○ Graduates, Staff, Faculty: 5 GB○ Colleges, Depts, Centers, Institutes: 40 GB

● Performance - Subject to network● Accessibility

○ NetID folder: Private, not shared○ Groups: Any Rice NetID holder by request

\\storage.rice.edu

Backup: storage.rice.edu

● Location: On-site● Procedure: Full replication● Frequency: Daily● Retention

○ Personal access: 2 weeks○ Request IT restoration: 6 months

\\storage.rice.edu\?-home\~snapshot

Backup: CrashPlan

● Availability: Rice-owned computers● Cost: $82.56/year/person (up to 4 devices)● Location: Off-site cloud storage● Procedure: Incremental● Frequency: Adjustable up to every minute● Retention: Adjustable up to forever

CrashPlan PROe or crashplan.rice.edu

Sharing: Rice Dropbox/Dropoff

● Size: 1 GB per file, 2 GB per dropoff● Retention: 10 days● Accessibility

○ Inside users can share with anyone○ Outside users can only share with inside users

dropbox.rice.edu

Product Use Location Quota Performance Accessibility

Storage S/B Networked 2-5-40 GB Network NetID

Rice Box S/C US Cloud Unlimited Internet NetID & External

Google Drive S/C Global Cloud Unlimited Internet NetID & External

Crate S Networked 500 GB/award Network NetID

Archive A On-site tape 500 GB/award Network NetID

CrashPlan B Off-site cloud Unlimited Internet Your NetID

Data Security

● Confidential (SSN, CC#, DL#)○ Financial records○ Health records○ Education records

● Sensitive (Birth date, address, emergency contact, EID/SID)

LevelSecurity

ClassificationRice On-Site Most Secure

Rice Cloud Contract Semi-Secure

0 Not secure Google Drive

1 Research Crate

2 Sensitive/ Confidential Rice Box

3 Financial/ HR Rice Dropbox CrashPlan

4 PII (SSN) Storage, Archive

Resources● DataOne Primer on Data Management, https://www.dataone.

org/sites/all/documents/DataONE_BP_Primer_020212.pdf● Dataverse, Data Management Plans, http://best-practices.dataverse.org/data-

management/● ICPSR Guide to Social Science Data Preparation and Archiving, http://www.

icpsr.umich.edu/icpsrweb/content/deposit/guide/● Svend Juul et al, “Take good care of your data,” http://www.epidata.

dk/downloads/takecare.pdf● UK Data Archive, Managing and Sharing Data: Best Practices for Researchers,

http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

Thanks!Please contact researchdata@rice.edu with any questions.Visit us online at http://researchdata.rice.edu/.Help us shape future workshops! Please complete this evaluation: http://goo.gl/2luM63.

top related