what is data and why should you care?

25
What is data and why should you care? Dr. Kalpana Shankar School of Information and Library Studies, UCD 5 November 2012

Upload: leanna

Post on 24-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

What is data and why should you care?. Dr. Kalpana Shankar School of Information and Library Studies, UCD 5 November 2012. What do Apollo 11, the Domesday Project, and award winning scientists from the US National Science Foundation have in common?. What is research data?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What is data and why should you care?

What is data and why should you care?

Dr. Kalpana ShankarSchool of Information and Library Studies, UCD

5 November 2012

Page 2: What is data and why should you care?

What do Apollo 11, the Domesday Project, and award winning scientists from the US National Science

Foundation have in common?

Page 3: What is data and why should you care?

What is research data?“The data, records, files or other evidence, irrespective of their content or form (e.g. in print, digital, physical or other forms), that comprise research observations, findings or outcomes, including primary materials and analysed data.” – Australian National Data Service

Examples:•Statistics and measurements •Results of experiments or simulations•Observations e.g. fieldwork•Survey results – print or online•Interview recordings and transcripts•Images, from cameras and scientific equipment

Page 4: What is data and why should you care?

What is ‘data’?

Any information you use in your research

Page 5: What is data and why should you care?

“PhD students lose material all the time…and they are exactly the people who want to be backing up. These are people who are creating data which are life and death important to them”

Why are we talking about data management?

“The whole thing is incredibly dull.”

Page 6: What is data and why should you care?

Rising volume and complexity of research data

• According to the European Bioinformatics Institute, the volume of new biological data is doubling every 5 months

• For example, in genomics:– we can now analyse the equivalent of

a human genome every 14 minutes at a cost of $5,000 - 400 times quicker than when the draft human genome was first published in 2000.

– 1,000 Genomes Project: 200 terabytes — the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs

Page 7: What is data and why should you care?

A hard drive after 6 years’ research

113 Gb 42,699 Files 3,466 Folders

Image by Lindsay Lloyd-Sm

ith

Page 8: What is data and why should you care?

So, why is data management important for research?

• It is increasingly integral to all areas of research

• It is a rapidly escalating issue• It is important to research funders –

likely to be increased follow-up in the future

• It has major resource implications – which need to be planned for carefully

• In short, it creates major challenges which aren’t going to go away!

Page 9: What is data and why should you care?

“Fire” by andrewmalone via flickr.: http://www.flickr.com/photos/andrewmalone/2032844649/

What would happen to your data if there was a fire or

theft in your office, department or home?

Why data management is important to YOU (II)

Page 10: What is data and why should you care?

Writing a Data Management Plan

1. Formalises the definition of your research data

2. Documents the contextual and technical details of your data

3. Check on File Structure / Naming

4. Plans for data sharing, access, and archiving

Page 11: What is data and why should you care?

• Your Data Management Plan won’t be perfect

• It is not a static document

– Change and update it as your research progresses and you understand more about your data

• Think about key issues that might affect your data…o …while you work on themo …in the future

• It’s better to have a plan that covers some aspects than no plan at all

• Ask for advice if you’re uncertain

Getting started

Page 12: What is data and why should you care?

Questions to ask yourself• Platform: Windows, Macintosh and/or Unix ?• Objective: Store? Manage? Share? Publish?• Extent of collaboration

– Your research group/lab only– Your group + externals– Cast of thousands?

• Nature of data?– Level of security? – Human records (de-identified)? – Intellectual Property?

• Amount of data? MB? GB? TB?– Rate of accumulation of data?– How much needed online to do useful work?– Period of preservation?

Page 13: What is data and why should you care?

Give your data a structure… B

y A

nne

(Flic

kr ID

: I li

ke):

“Vol

taire

& R

ouss

eau”

http

://w

ww

.flic

kr.c

om/p

hoto

s/ili

ke/2

6163

4273

9/C

C B

Y-N

C-N

D 2

.0

By tw

echy (Flickr ID): “Library B

ookshelf”http://w

ww

.flickr.com/photos/tw

echy/6829994084/C

C B

Y 2.0

…it makes it easier to find things

Page 14: What is data and why should you care?

Something to try:

Use post-it notes to create a map of your file structure

• Write each existing file and folder name onto a post-it• Arrange folders on your desk in a sensible hierarchy• Put your ‘files’ into ‘folders’• Do you need new folders? Do you have too many?

Page 15: What is data and why should you care?

What’s in a name?• Names tell us what a file is (contextual information)• Use a combination of different types of information to make

context and content clear, eg– Author (or Initials)– Date– Data source– Theme– Experiment– Sample

• …But try not to let file names get too long

Page 16: What is data and why should you care?

Why create documentation?

• Creating documentation might seem like a waste of time

• Good documentation will include a lot of information that might seem obvious

www.flickr.com/photos/smutjespickles/2434418686/

Page 17: What is data and why should you care?

Document your data as you go

If you don’t, it may become impossible for you – or someone else – to

understand and re-use data later on

Question Mark Sign by Colin_K on flickr:http://www.flickr.com/photos/colinkinner/2200500024/

Page 18: What is data and why should you care?

What’s obvious now might not be in a few months, years, decades…

Image: http://www.flickr.com/photos/archer10/5692813531/

MAKE SURE YOU CAN

UNDERSTAND IT LATER

Make research material understandable

Page 19: What is data and why should you care?

Make research reproducible• Detailing your

methodology helps people understand your research better

• Explaining your algorithms, search methods etc makes your work reproducible

• Conclusions can be verified

Image by woodleywonderworks on flickr: http://www.flickr.com/photos/wwworks/4588700881/

Page 20: What is data and why should you care?

• Material may be re-used by someone in a different discipline

• Provide context to minimise the risk of it being misunderstood/ misused

Make material reusable

Page 21: What is data and why should you care?

Backing up• Lots Of Copies Keeps Stuff

Safe (LOCKSS): make multiple back-ups

• Keep back-ups in a separate place to the original

• Use different types of storage media, eg CDs, pen drives, networked storage, external hard drive

From: “Copy Copy Copy” by David Goehring (CarbonNYC) via flickr

Page 22: What is data and why should you care?

For everything you keep….

Make sure you can:• find it again later• understand later

Page 23: What is data and why should you care?

Where to get help

• Earth Institute will be putting up links on Website

• Your supervisor• Library• Funding agencies• Earth Institute will be putting up links on

Website

Page 24: What is data and why should you care?

Oh yes…what do Apollo, the Domesday Project, and award winning scientists from the US National Science

Foundation have in common?

Page 25: What is data and why should you care?

Questions?

• My contact information:– Kalpana Shankar ([email protected])