“filling the digital preservation gap”an update from the jisc research data spring project at...

27
“Filling the digital preservation gap” an update from the Jisc Research Data Spring project at York and Hull Jenny Mitcham Digital Archivist Borthwick Institute for Archives University of York 13 August 2015

Upload: jenny-mitcham

Post on 14-Apr-2017

178 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

“Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Jenny MitchamDigital ArchivistBorthwick Institute for ArchivesUniversity of York

13 August 2015

Page 2: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Project aim

“…to investigate Archivematica and explore how it might be used to provide digital preservation functionality within a wider infrastructure for Research Data Management.”

Page 3: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

What about Hydra?• Hydra is not mentioned much in our project

report ...this is deliberate!• We wanted to keep our findings generic to make

it most useful to a wide range of institutions who may be interested in digital preservation...

• ...this means we are more likely to get further funding

• However...

Page 4: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Project team

University of Hull:• Chris Awre – Head of Information Services,

Library and Learning Innovation• Richard Green – Independent Consultant• Simon Wilson – University ArchivistUniversity of York:• Julie Allinson – Manager, Digital York• Jen Mitcham – Digital Archivist

Page 5: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

About the project

• Funded as part of Jisc Research Data Spring• Started 30th March 2015• Phase 1 is complete• Phase 2 has just started and will run until

November• …and we hope phase 3 will

be funded

Page 6: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Project structure• Phase 1 – explore: testing, research, thinking -

produce a report (3 months)• Phase 2 – develop: make Archivematica better

for RDM, plan implementation (4 months)• Phase 3 – implement: set up proof of

concepts at York and Hull (6 months)

Page 7: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Phase 1 -The key questions• Why? Why are we bothering to 'preserve' research data. What are the

drivers here and what are the risks if we don't? Why are we looking at Archivematica?

• What? What are the characteristics of research data and how might it differ from other born digital data that memory institutions are establishing digital archives to manage and preserve? What types of files are our researchers producing and how would Archivematica handle these? What does Archivematica offer us and what benefits does it bring?

• How? How would we incorporate Archivematica into a wider technical infrastructure for research data management and what workflows would we put in place? Where would it sit and what other systems would it need to talk to? How can we improve Archivematica for RDM?

• Who? Who else is using Archivematica (or other digital preservation systems) to do similar things and what can we learn from them? What staff resource is needed to preserve research data with Archivematica?

Page 8: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

http://digital-archiving.blogspot.co.uk/

Page 9: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Why Archivematica?“The goal of the Archivematica project is to give archivists and librarians with limited technical and financial capacity the tools, methodology and confidence to begin preserving digital information today.”

Page 10: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Why Archivematica?• Standards-based• Open Source• Flexible and customisable• Compatible with hundreds of file formats• Advanced search and storage management• Integrated with third-party systems

From https://ww.archivematica.org/en/

Page 11: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

What does research data look like?

York RDM questionnaire 2013: Please select the main types of electronic research data you generate

Page 12: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Top research data applications at York

Page 13: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

The importance of identificationHow well are these formats identified by digital preservation tools?• Better than expected!• Sometimes partial• Sometimes quite

generic (without a version number)

MATLAB NSPSS Partial

Stata N

R N

EndNote Partial

NVivo N

LaTeX Partial

Python NWolfram Mathematica Partial

Gaussian NChemDraw Partial

SAS PartialArcGIS Partial

GraphPad Prism Partial

Adobe Photoshop PartialATLAS.ti N

C++ N

Eclipse NA? No native file formats

MS Excel Y

RSB - ImageJ Partial

Page 14: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

What does research data look like?• Potentially quite big• Wide range of file formats (some well understood

but a long tail of more specialist/obscure formats)• Sometimes sensitive and/or confidential• Ever changing (new software and techniques are

used for dynamic and cutting edge research)• May be different versions of the data (as new

publications are released)• Value not well understood at the point of deposit

Page 15: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

What does Archivematica do?The short answer:

“It packages data up in a standards compliant way and prepares it to be stored for the long term”

Page 16: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

What does Archivematica do?The longer answer:

• Assigns unique identifiers • Creates a checksum for each object• Creates a text file with a directory tree of the transfer• Option to quarantine data for a specified period• Runs virus checks• Cleans up file and directory names (removing characters that may cause problems)• Runs identification tools so you can find out what file formats you have• Extracts data from zip files (or not if you would rather not)• Extracts metadata embedded in the files (if you want)• Normalises files (if a migration path exists)• ...

Page 17: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

What does Archivematica do?The really really long answer (if you have time):

• Read the manual

https://ww.archivematica.org/en/docs/archivematica-1.4/

Page 18: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

What does Archivematica do?One final answer (honest):

It gives us a greater level of confidence that we will be able to continue to provide access to usable copies of research data over the longer term

Page 19: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

What are the downsides?• It isn’t a magic bullet• There is no guarantee your data will be

readable in the future• It can only be as good as current digital

preservation practice• It can be fiddly to install correctly• The GUI isn’t that intuitive• You need staff who understand it

Page 20: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Phase 2: ‘develop’1. Enable better workflows for RDM (producing a

DIP on request)2. Allowing the DIP (access copy of data) to be

usable by different repository systems3. Helping reduce bottlenecks for big data4. Workflows for unidentified files5. Enabling easier querying of data within

Archivematica by third party applications6. Better documentation

Page 21: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Phase 2: RDM Workflows at York• We get a copy of data from researcher• We transfer it to Archivematica• Archivematica packages it up for storage and creates

the Archival Information Package (AIP)• Archivematica sends the AIP to archival storage• Metadata is published in data catalogue• If someone requests the data Archivematica will

create a Dissemination Information Package (DIP)• DIP will be uploaded to Digital Library for access

Page 22: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

How do York plan to use Archivematica?

Page 23: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

How do York plan to use Archivematica?

Page 24: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Where to find out more

http://www.york.ac.uk/borthwick/

Page 25: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Where to find out more

http://digital-archiving.blogspot.co.uk/2015/07/archivematica-fills-digital.html

Page 26: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Where to find out more

Page 27: “Filling the digital preservation gap”an update from the Jisc Research Data Spring project at York and Hull

Thanks for listening• You can contact me on:

[email protected]