data management for librarians: an introduction

30
Data Management for Librarians: An Introduction February 19 th 2013 Gareth Knight Manager RDM Support Service

Upload: garethknight

Post on 24-Jan-2015

1.020 views

Category:

Documents


0 download

DESCRIPTION

Slides from a training session given to librarians on data management. The session was intended to help librarians to consider the challenges associated with maintaining research data and steps that may be taken to address these issues. It was also used to discuss their role in supporting data management activities within LSHTM

TRANSCRIPT

Page 1: Data Management for Librarians: An Introduction

Data Managementfor Librarians:

An Introduction

February 19th 2013

Gareth KnightManager

RDM Support Service

Page 2: Data Management for Librarians: An Introduction

May originate from various sources: Primary and/or secondary

May contain different content:Quantitative and/or qualitative

May be expressed in different forms:Datasets, still images, audio‐video, audio recordings, interactive resources  

May be held in a number of variations:Raw, cleaned, anonymised/pseudomised, analysed

May be encoded in different formats:MS Excel, TIFF, MPEG2, STATA, FoxPro

What is Data?

What type of data do you have at home?

“Data are facts, observations or experiences on which an argument, theory or test is based. Data may be numerical, descriptive or visual. Data may be raw or 

analysed, experimental or observational.“http://research.unimelb.edu.au/integrity/conduct/data/review

Page 3: Data Management for Librarians: An Introduction

Data in the Research Lifecycle

Brainstorm

Develop Proposal

Plan Project

Perform Research

Write‐up Results

Finalise & submit

Page 4: Data Management for Librarians: An Introduction

Data in the Research Lifecycle

Brainstorm

Develop Proposal

Plan Project

Perform Research

Write‐up Results

Finalise & submit

Develop Proposal

Produce Data Management 

Plan

Page 5: Data Management for Librarians: An Introduction

Data in the Research LifecycleBrainstorm

Develop Proposal

Plan Project

Perform Research

Write‐up Results

Finalise & submit

Perform Research

Create / Reuse

Analyse

Store

Describe

Share

Page 6: Data Management for Librarians: An Introduction

Data in the Research LifecycleBrainstorm

Develop Proposal

Plan Project

Perform Research

Write‐up Results

Finalise & submit

Perform Research

Create / Reuse

Analyse

Store

Describe

Share

Share

Finalise & submit

Archive

Page 7: Data Management for Librarians: An Introduction

What is Data Management?1. Plan

• Determine requirements• Identify risks & opportunities• Decide approach

2. Implement3. Monitor

• Evaluate approach• Change approach/perform 

corrective action

4. Evaluate• Is it Fit for purpose?• What additional action is 

needed?

‘Benign neglect’ and Poorly‐made decisions in short‐term will have long‐term implications

Page 8: Data Management for Librarians: An Introduction

Short-term decisionswith long-term implications

Software products File formats & standards

Data organisation & labelling Quality Controls

Page 9: Data Management for Librarians: An Introduction

Why does data need to be managed? Ensure data can be located Enable analysis

Ability to understand for current and future need

Interesting paper. Where’s

the data?

Enable sharing & validation

Page 10: Data Management for Librarians: An Introduction

Why does data need to be managed? Ensure data can be located Enable analysis

Ability to understand for current and future need

Interesting paper. Where’s

the data?

Enable sharing & validation

Comply with Funder & School requirements

Page 11: Data Management for Librarians: An Introduction

Researcher ChallengesIssues/challenges encountered when creating, managing,

and sharing research data (web survey results)

Other challenges• Database creation & management• Storage of physical questionnaires• Lack of time• Software instability (particularly

NVivo)• Ability to enter & access data at

different locations

Response TypeMultiple choice checkbox + free

text for other challenges

Page 12: Data Management for Librarians: An Introduction

Training NeedsInterest in training on topics related to data management (web survey results)

Note:Graph omits percentages for other responses

(None, slight, moderate, no opinion)

Page 13: Data Management for Librarians: An Introduction

RDM Support Service

Location of Library staff

Page 14: Data Management for Librarians: An Introduction

RDM Support Service

Location of Library staff

Role of Library staff

Provide first point of contact

Help researchers to express requirements & needs

Direct to potential solution (staff, website)

Contribute to training activities

Incorporate data considerations into teaching

Page 15: Data Management for Librarians: An Introduction

Data Access Over Timedigital vs. analogue

data

=

informationcontent

computer

+

OS

+ +

application

“traditionally, preserving things meant keeping them unchanged; however … if we hold on to digital information without modifications, accessing the information will become increasingly more difficult, if not impossible.”Su‐Shing Chen, 2001

Page 16: Data Management for Librarians: An Introduction

Change in Process over Time

operating system

software applicationhardware information

content

Intel PC, 2000

Mac laptop, 2006

X64 Ubuntu laptop, 2010

Page 17: Data Management for Librarians: An Introduction

Change in Process over Time

operating system

software applicationhardware information

content

Intel PC, 2000

Mac laptop, 2006

X64 Ubuntu laptop, 2010

Page 18: Data Management for Librarians: An Introduction

Task• Select two of the following problems when managing digital data:

1. Difficulty locating data2. Difficulty accessing media3. Difficulty rendering data in an understandable form4. Difficulty recreating data as originally intended5. Difficulty understanding information content6. Uncertain provenance

Consider the following questions:a. In what circumstances will the chosen problem occur?

b. What consequences may occur if the problem occurs (e.g. financial implications)

c. How could you ensure that the problem doesn’t occur?d. What could you do to resolve the problem after it has

occurred? (Can direct to someone for help)

Page 19: Data Management for Librarians: An Introduction

1. Difficulty Locating Data

“I created some data 5 years ago. Where is it?”“I’ve lost my original disk. Do I have the data elsewhere?

Preventative:• Copy data to several storage devices – increase likelihood

of finding it

Post event:• Find better discovery software?• Attempt to recreate content?

Problem

Loss of storage mediaLots of data stored in many locations

Vague filenames make it difficult to locate

Scenarios & Reasons

(Potential) Solutions

Page 20: Data Management for Librarians: An Introduction

2. Difficulty accessing Media

“How do I access this old media?”“Why can’t I read this disk?”

Preventative:• Copy data to several storage devices• Transfer data to new storage media on obsolescence / every 3 years• Deposit data into a data archive and/or copy to server

Post event:• Data recovery software

Problem

Media obsolescencePhysical deterioration & failure

Scenario & Reasons

(Potential) Solutions

Page 21: Data Management for Librarians: An Introduction

Potential Storage LocationsPros:Cheap, high capacity storage, fast accessCons:Lack of support; potential for theft, loss, or damage

Pros:Automatic monitoring & backup, multiple redundancy, remote access, secure (if required)Cons:Limited space allocation, Not always accessible overseas

Pros:Automated backup, accessible in diff. countries (usually)Cons:Security concerns, ownership concerns, services can close account at any time 

Local machine & Storage

Academic Storage Systems

Third party service providers

http://www.flickr.com/photos/m0n0/4479450696/

Recommended

Page 22: Data Management for Librarians: An Introduction

3. Difficulty Rendering Data

“How can I view data?“Where do I find software to access my data?”

Preventative:• Transform data to new formats (format conversion strategy)• Maintain original machine and software to access content (computer museum)

Post event:• Track down original software product• Emulate original environment (emulation/virtualisation)

Problem

Software obsolescenceNew software use different decoding method

Scenarios & Reasons

(Potential) Solutions

Page 23: Data Management for Librarians: An Introduction

Choosing File Formats

DisseminationPreservationCreation

When working with multiple copies, decide which is the master copy

Content Type Preferred Format Acceptable AlternativesDocuments Rich Text Format Microsoft DocX

Open Document Format

Still Images TIFFJPEG 2000 (uncompressed)

PNG,RAW

Audio Wav formatAIFFFLAC

MP3

AudioVideo MPEG2,MPEG4

Page 24: Data Management for Librarians: An Introduction

4. Difficulty Maintaining Authenticity

“Why does my data look different?”

Preventative:• Determine significant properties that should be maintained• Maintain original machine and software to access content (computer museum)

Post event:• Emulate original environment (emulation/virtualisation)

Problem

New version of software application use different decoding method

Different software application in use

Scenarios & Reasons

(Potential) Solutions

Page 25: Data Management for Librarians: An Introduction

5. Difficulty Understanding Content

“Where was this information created?Why did the creator make this decision?

“What does this value mean?”“How does this data relate to other content?”

Problem

Memory fails – cannot remember decisions madeDisorganised and poorly labelled data

Lack of documentation

Scenarios & Reasons

Does a Rosetta stone existfor your data? 

(Potential) Solutions• Organise data (Chronology, Experiment type, 

location, content type)• Adopt labelling conventions• Documentation

Page 26: Data Management for Librarians: An Introduction

Filename conventions• Consider the elements that will help you to organise and locate 

content– E.g. Participant ID, site of data collection,date of data collection

• Consider how data files and directories may be organised & sorted– 001, 002, 003, 004, can be used for sequential files– YYYY‐MM‐DD (2012‐12‐04) useful for organising by date (use year first)

• Identify different versions of content in filename (and in content)– Creation date (YY‐MM‐DD)– Version/draft number

• Consider how your filenames will look to others– Avoid spaces ‐ ‘My file.pdf’ becomes ‘My%20file.pdf’ on the web– Avoid capitalisation ‐ Alters file sorting & CAUSES HEADACHES!

Golden Rule: Be Consistent

Page 27: Data Management for Librarians: An Introduction

Data Documentation

1. What is the context of creation?• Why did you create it? For what purpose?• What methodology did you use? What assumptions were made?• Who is the target audience?

2. Collection and set of files:• What information does each file contain?• When was it created?• By whom?• What actions were performed?• How does the data contained in the collection relate to each other?

3. Individual components• What is the meaning of this word/column/row, etc.?• How are these items measured?• What are the boundaries of the measurement?

What would someone want to know if theywere looking at your data the first time?

Page 28: Data Management for Librarians: An Introduction

6. Uncertain Provenance

1. “When was the data created and/or modified?”2. “Who created/modified the data?”3. “Why was it created and/or modified?

Problem

• Lack/Loss of trust in information content• Reluctance to use information content

Scenarios & Reasons

(Potential) SolutionsPreventative:• Limit update to authorised users only• Store change history• Keep each versionPost event:• Locate data creator & editor?

Page 29: Data Management for Librarians: An Introduction

Things to RecommendAdvise researchers to:

1. Choose an appropriate storage location and create backups

2. Organise data in a consistent and logical manner

3. Document the data and information content (as well as structure)

4. Consider how you will ensure that information can be accessed in the long‐term

5. Consider potential for data sharing and ensure it is performed with consideration of ethics  

Page 30: Data Management for Librarians: An Introduction

A Few Good References• Digital Curation Centre

http://www.dcc.ac.uk/resources

• MANTRA – Data Management training for PhD studentshttp://datalib.edina.ac.uk/mantra/

• UK Data Archive – Managing and Sharing Datahttp://www.data‐archive.ac.uk/media/2894/managingsharing.pdf

• Cambridge University – RDM Guidancehttp://www.lib.cam.ac.uk/dataman/index.html

• Australia National Data Servicehttp://ands.org.au/resource/data‐management‐planning.html

• LSHTM Research Data Management Support Service• http://blogs.lshtm.ac.uk/rdmss/