rdm liasa webinar
DESCRIPTION
Presentation given by Sarah Jones and Joy Davidson to a group of South African librarians at a webinar organised by LIASA HELIG. http://www.liasa.org.za/node/977TRANSCRIPT
Digital curation: why managing and sharing data matters to universities
Sarah Jones and Joy DavidsonDigital Curation Centre
[email protected]@glasgow.ac.uk
a LIASA HELIG webinar, 30th April 2013, www.liasa.org.za/node/977
Digital Curation CentreJisc-funded consortium comprising units from the
– Universities of Bath (UKOLN)
– Edinburgh (DCC Centre)
– Glasgow (HATII)
Launched 1st March 2004 as a national centre for solving challenges in digital curation that could not be tackled by any single institution or discipline
Overview of session: four brief modules
1. Introduction to digital curation – how does research data management fit into the curation lifecycle?
2. Benefits and drivers for research data management
3. Review of current research data management activity in UK Universities
4. What role does the library have to play in research data management?
Please feel free to ask questions at any time!
• During the session you can ask questions. Simply type these into the chat box.
• Questions will be gathered and speakers will respond to selected questions at the end of each module.
• There will be a chance for additional questions at the end of the session.
DIGITAL CURATION, PRESERVATION AND RESEARCH DATA MANAGEMENT– AN INTRODUCTION
An introduction to digital curation• What is digital curation?
• What is the difference between curation, preservation and data management?
• What sort of activities are involved in digital curation?
• Who should be involved in digital curation?
6
“the active management and appraisal of data over the lifecycle of scholarly and scientific
interest”
Data have importance as the evidential base of scholarly conclusions
Curation is part of good research practice
What is data curation?
Are data curation, preservation and management different?
• Lots of different terms being used - are the they same or different?
• Essentially, they are all part of the curation lifecycle
Curation Lifecycle Model
Key questions to consider:• what data will be created? • how much storage is needed? • where will data be stored in the short and longer term?• are there ethical issues that require consent?
Many funders expect data management & sharing plans at the grant application stage!
Data Management Planning
Key questions to consider:
What information do users need to understand the data?
- descriptions of all variables / fields and their values - code labels, classification schema, abbreviations list- information about the project and data creators- tips on usage e.g. exceptions, quirks, questionable results
How will this capture this and who will capture/record it?
Are there standards that need to be followed?
Metadata & documentation
Key questions to consider:
• What data must be kept? (for validation, etc)
• What must not be kept? (e.g. personal data)
• Is it worth keeping the data? – cost/benefits
• Where will the data be kept?
Selecting what to keep
Storing data
Key questions to consider:
What amount of storage is available for the active phase?
What facilities are needed in the active phase?- remote access to work from home- file sharing with others- high-levels of security for sensitive data
How will the data be backed up?
Where will data be stored for the longer-term?
Institutional data repositories
Not intended to replace national, subject or other established data
collections
Acknowledge hybrid environment
http://datashare.is.ed.ac.uk
www.dspace.cam.ac.uk/https://databank.ora.ox.ac.uk
Essex-RDR and DataPool at Southampton
External data centres
Research funders’ data centres…
List of data centres: http://databib.org
Structured databases
Disciplinary& community initiatives
Finding and reusing data
Key questions to consider:
How can researchers make their data visible and citable?
Data catalogues
Develop a research dataextension to the cerif standard
JISC & DCC planning National coordination
http://cerif4datasets.wordpress.com
Who should be involved in curation?
Research Organisations
Funders
Data centresAdvisory bodies
Support services
Researchers
Publishers
BENEFITS AND DRIVERS– THE UK POLICY LANDSCAPE
“Data sets are becoming the
new instruments of science”
Dan Atkins, University of Michigan
Digital data as the new special
collections?
Sayeed Choudhury, Johns Hopkins
Research data: institutional
crown jewels?
http://www.flickr.com/photos/lifes__too_short__to__drink__cheap__wine/4754234186/
Expectations of public access
“Publicly funded research data are a public good, produced in the public interest, which should be
made openly available with as few restrictions as possible in a timely and responsible manner that
does not harm intellectual property.”
RCUK Common Principles on Data Policyhttp://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
24http://www.bis.gov.uk/innovatingforgrowth
…open data
...personal data
Benefits of data sharing (1)
www.nytimes.com/2010/08/13/health/research/13alzheimer.html?pagewanted=all&_r=0
“It was unbelievable. Its not science the way most of us have practiced in our careers. But we all realised that we would never get biomarkers unless all of us parked our egos and intellectual property noses outside the door and agreed that all of our data would be public immediately.”
Dr John Trojanowski, University of Pennsylvania
... scientific breakthroughs
Benefits of data sharing (2)
www.guardian.co.uk/politics/2013/apr/18/uncovered-error-george-osborne-austerity
... validation of results
“It was a mistake in a spreadsheet that could have been easily overlooked: a few rows left out of an equation to average the values in a column.
The spreadsheet was used to draw the conclusion of an influential 2010 economics paper: that public debt of more than 90% of GDP slows down growth. This conclusion was later cited by the International Monetary Fund and the UK Treasury to justify programmes of austerity that have arguably led to riots, poverty and lost jobs.”
Benefits of data sharing (3)
“There is evidence that studies that make their data available do indeed receive more citations
than similar studies that do not.”
Piwowar H. and Vision T.J 2013 "Data reuse and the open data citation advantage“ https://peerj.com/preprints/1.pdf
9% - 30% increase
... more citations
Why YOU need a Data Management Plan
http://blogs.ch.cam.ac.uk/pmr/2011/08/01/why-you-need-a-data-management-plan
Direct benefits to individuals
“Research organisations will ensure that effective data curation is provided throughout the full data lifecycle,
with ‘data curation’ and ‘data lifecycle’ being as defined by the Digital Curation Centre. The full range of responsibilities associated with data curation over
the data lifecycle will be clearly allocated...”
www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx
...institutional responsibility
Research funder data policies
www.dcc.ac.uk/resources/policy-and-legal/ overview-funders-data-policies
Ultimately funders expect:
• timely release of data- once patents are filed or on (acceptance for) publication
• open data sharing- minimal or no restrictions if possible
• preservation of data - typically 5-10+ years if of long-term value
See the RCUK Common Principles on Data Policy: www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
Jisc MRD programmes
Managing Research Data programmes funded by the Jisc: • MRD 01: October 2009 – July 2011
– £4.3 million investment– www.jisc.ac.uk/whatwedo/programmes/mrd.aspx
• MRD 02 – October 2011 – July 2013– £4.6 million investment– www.jisc.ac.uk/whatwedo/programmes/di_researchmanagement/
managingresearchdata.aspx
Programme Manager: Simon Hodson [email protected]
Twitter: #jiscmrd
The DCC Mission
“Helping to build capacity, capability and skills in data management and curation
across the UK’s higher education research
community”
Phase 3 Business Plan
www.dcc.ac.uk
DCC Institutional Engagements
With funding from HEFCE we’re:
• Working intensively with 21 HEIs to increase RDM capability– 60 days of effort per HEI drawn from a mix of DCC staff– Deploy DCC & external tools, new approaches & best practice
• Support varies based on what each institution wants/needs
• Lessons & examples will be shared with the community
www.dcc.ac.uk/community/institutional-engagements
Some unis we are working with
Common DCC IE activities
• Establishing steering groups
• Making the case for RDM
• Assessing needs
• Developing policy and strategy
• Piloting tools
• Offering DMP consultations
• Delivering training
• Setting up guidance websites
• ...
CURRENT RDM INITIATIVES IN UK UNIVERSITIES
How to develop RDM services
Guide and case studies: www.dcc.ac.uk/resources/developing-rdm-services
Components of a research data service
Institutional RDM policies
www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies
Early research data policies
“Statement of commitment” Infrastructure policy
“10 commandments”mutual promises
aspirational
Baseline of RCUK Code+ procedures & support
legal tone / languagea section in uni DM policyuseful guide as appendix
Based on Edin. with a few additions
RDM strategies and roadmaps
A series of blog postswww.dcc.ac.uk/news
Links to example roadmapshttp://tiny.cc/EPSRCroadmaps
University of Bath RDM roadmap
• Based on Monash University RDM strategy• Identifies the current position and proposes activity• Defines roles and responsibilities and timeframes
http://www.bath.ac.uk/rdso/University-of-Bath-Roadmap-for-EPSRC.pdf
Guidance webpages
www.gla.ac.uk/datamanagement
www.bath.ac.uk/research/data
Disciplinary RDM training
www.dcc.ac.uk/training/train-trainer/ disciplinary-rdm-training
Online training for PhD students
http://datalib.edina.ac.uk/mantra
Data Management Planning support
• Guidelines / templates on what to include in plans
• Example answers, guidance and links to local support
• A library of successful DMPs to reuse
• Tailored consultancy services
• Online tools (e.g. customised DMPonline)
• Links / flags embedded in grant systems
• ...
Research data storage
Blue Peta at Bristol1st 5TB free per Data Steward then £400 per TB p.a. for disk storage; tape backup £40 per TB
http://data.bris.ac.uk
• £2m funding to date• Petascale facility – expandable• 3 machine rooms – resilience (tape archive 2012)• Available to all researchers for research data
Institutional data repositories
Not intended to replace national, subject or other established data
repositories
Acknowledge hybrid environment
http://datashare.is.ed.ac.uk
www.dspace.cam.ac.ukhttps://databank.ora.ox.ac.uk
Research Data at Essex and DataPool at Southampton
Data catalogues
Develop a research dataextension to the cerif standard
JISC & DCC planning National coordination
http://cerif4datasets.wordpress.com
Bringing it all together into a service
Diagram courtesy of Sally Rumsey, University of Oxford
THE ROLE OF THE LIBRARY – RE-SKILLING FOR DATA CURATION
How are libraries engaging in RDM?
Library
IT
ResearchOffice
The library is leading on most DCC institutional engagementswww.dcc.ac.uk/community/institutional-engagements
They are involved in: defining the institutional strategy developing RDM policy delivering training courses helping researchers to write DMPs advising on data sharing and citation setting up data repositories ...
Why should libraries support RDM?
• existing data and open access leadership roles
• often run publication repositories
• have good relationships with researchers
• proven liaison and negotiation skills
• knowledge of information management, metadata...
• highly relevant skill set
Possible Library RDM roles• Leading on local (institutional) data policy
• Bringing data into undergraduate research-based learning
• Teaching data literacy to postgraduate students
• Developing researcher data awareness
• Providing advice, e.g. on writing DMPs or advice on RDM within a project
• Explaining the impact of sharing data, and how to cite data
• Signposting who in the Uni to consult in relation to a particular question
• Auditing to identify data sets for archiving or RDM needs
• Developing and managing access to data collections
• Documenting what datasets an institution has
• Developing local data curation capacity
• Promoting data reuse by making known what is available
RDMRose Lite
Training for librarians• RDM for librarians, DCC
http://www.dcc.ac.uk/training/rdm-librarians
• RDMRose, University of Sheffieldhttp://rdmrose.group.shef.ac.uk
• Data Intelligence for librarians, 3TU, Netherlandshttp://dataintelligence.3tu.nl/en/about-the-course
• DIY Training Kit for Librarians, University of Edinburgh
http://datalib.edina.ac.uk/mantra/libtraining.html
• SupportDM modules, University of East Londonhttp://www.uel.ac.uk/trad/outputs/resources
RDM for Librarians
• 3 hour course by the DCC covering:– Research data and RDM– Data management planning– Data sharing– Skills– RDM at [INSERT YOUR UNI]
• Slides and accompanying handbook
• Used UKDA guide as pre-reading
• http://www.dcc.ac.uk/training/rdm-librarians
RDMRose
• Taught and CPD learning materials in RDM tailored for information professionals, by the Uni of Sheffield
• 8 sessions, each of which is half day of study
• Strong emphasis on practical hands-on activities
• Also offer a short (2hr) course called RDMRose Lite
• http://rdmrose.group.shef.ac.uk
Data Intelligence for Librarians
• A course produced by 3TU, a consortium of technical universities in the Netherlands
• Combination of online and face-to-face education
• Four meetings to learn and share knowledge
• Theory (on website) and assignments are conducted between sessions
• http://dataintelligence.3tu.nl/en/home
DIY Training Kit for Librarians• By EDINA and Data Library at University of Edinburgh
• Self-directed course, intended to be used by a group of librarians to build confidence in supporting researchers
• MANTRA modules as pre-reading, short presentation, reflective questions and exercises to guide discussion
• Five face-to-face sessions– Data Management Planning– Organising and documenting data– Data security and storage– Ethics and copyright– Data sharing
• http://datalib.edina.ac.uk/mantra/libtraining.html
SupportDM• By the TraD project at the University of East London
• SupportDM comprises five sessions– About research data management– Providing guidance and support for researchers– Data Management Planning– Selecting which data to keep– Cataloguing and sharing data
• Each topic is introduced in a face-to-face session and explored via exercises and discussion
• Learning is reinforced via an online tutorial and practical exercises to do before the next session
• http://www.uel.ac.uk/trad/outputs/resource
Thanks – any questions?
DCC guidance, tools and case studies:www.dcc.ac.uk/resources
Follow us on twitter: @digitalcuration and #ukdcc