project update: a collaborative approach to "filling the digital preservation gap" for...

Post on 08-Apr-2017

132 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Project update:A collaborative approach to “filling the digital preservation gap” for Research Data ManagementJulie AllinsonTechnology Development ManagerLibrary & ArchivesUniversity of York

6 November 2015

Filling the digital preservation gap:Project aim

“…to investigate Archivematica and explore how it might be used to provide digital preservation functionality within a wider infrastructure for Research Data Management.”

This is a collaborationUniversity of Hull:• Chris Awre – Head of Information Services, Library and

Learning Innovation• Richard Green – Independent Consultant• Simon Wilson – University ArchivistUniversity of York:• Julie Allinson – Technology Development Manager• Jen Mitcham – Digital ArchivistArtefactual Systems Jisc

Project structure• Phase 1 – explore: testing, research,

thinking -produce a report (3 months)• Phase 2 – develop: make

Archivematica better for RDM, plan implementation (4 months)

• Phase 3 – implement: set up proof of concepts at York and Hull (6 months)

Phase 1: Read all about it!

http://digital-archiving.blogspot.co.uk/

Why do we need digital preservation for research data?

• There is a digital preservation gap in current RDM infrastructures

• We can’t ignore digital preservation – moving targets for data retention mean we need to take this seriously

• Funder requirements around retention

University of York RDM questionnaire 2013

• Which data management issues have you come across in your research over the last five years?– “Inability to read files in old software formats on old

media or because of expired software licences”– 24% of 181 researchers who answered this question

admitted this had been a problem for them

Why do we need digital preservation for research data?

Why Archivematica?

“The goal of the Archivematica project is to give archivists and librarians with limited technical and financial capacity the tools,

methodology and confidence to begin preserving digital information today.”

Why Archivematica?• Standards-based• Open Source• Flexible and customisable• Compatible with hundreds of file formats• Advanced search and storage management• Integrated with third-party systems

From https://ww.archivematica.org/en/

Archivematica for RDM?• Flexible - can support different institutional needs and

workflows• Automates many digital preservation tasks• Can be integrated with other systems• Good for those with limited resources• Enhancements driven by and for the digital preservation

community

Archivematica for RDM?

It gives institutions greater confidence that they will be able to continue to provide access to usable copies of research data over time

Phase 2: Improving Archivematica1. Deliverable 1: Automated DIP regeneration 2. Deliverable 2: METS parsing tools3. Deliverable 3: Generic search REST API

(proof-of-concept)4. Deliverable 4: Support multiple checksum

algorithms5. Deliverable 5: Enhance PRONOM integration

6.Deliverable 6: Automation tools documentation

Deliverable One

✓Research Data needs to be kept,

but we don’t know if anyone will ever want it

and it might be *massive*

The Solution: enable the DIP to be generated ‘on request’ and not as part of the initial ingest

Deliverable Two

✓We want to be able to grab the DIP, and

metadata about it for pulling into our

repository

The Solution: a library to help with parsing and creating METS fileshttps://github.com/artefactual-labs/mets-reader-writer

Deliverable Three✓We want to be able to report on what we

haveThe Solution: a search API to answer basic questions about the number of files in storage, their formats, date of ingest, etc.** we’re working with DMAOnline @lancaster

Deliverable Four

✓With large datasets, the current checksum

mechanism in Archivematica could be a

bottleneck

The Solution: support for multiple checksum algorithms

Deliverable Five

✓What about all those file formats that

Archivematica can’t identify?

The Solution: mechanism for running file identification with multiple tools and a report of unidentified formats, working with PRONOM to improve their coveraage

Deliverable Six

✓We want to make it easier for Institutions to

adopt archivematica

The Solution: documentation and screencasts for Archivematica automation tools, eg.https://wiki.archivematica.org/Getting_started#Installation

All of these new features will become part of the core Archivematica code in

2016

Phase 3• The plan is to run a third phase of the project to:

✓implement prototype RDM workflows with preservation using the new Archivematica features at York and Hull

✓use the search API to populate DMAOnline with stats about datasets

✓do more community outreach • We will be pitching to Jisc in December for phase

three #fingerscrossed

How do York plan to use Archivematica?

Pure RDMonitor Archivematica

AIP

AIP Store

PUREWeb Services

Archivematica REST API

DIPRepository

Data Catalogue

Key:human to humanmachine to machinehuman to machine

Where to find out more

http://www.york.ac.uk/borthwick/

The Bigger Picture•Jisc are looking at building shared services for

RDM• Our project is inputting into the specification

and discussion• One area we’d be interested to find out more

about is the appetite for ‘above campus’ options - discussion planned for later.

How could you use Archivematica?• Host it in-house and link it to an existing

repository/access system (for example DSpace, CONTENTdm, Fedora/Hydra ...or a CRIS)

• Host it in-house and use as a standalone system (you would need to have a storage system in place and establish a way of facilitating access to the data)

• Sign up for a hosted instance of Archivematica with archivesDIRECT (combines Archivematica with DuraCloud storage)

• Sign up for a hosted instance of Archivematica with Arkivum (combines Archivematica with Arkivum storage)

Thanks!

julie.allinson@york.ac.uk

Useful links:Borthwick website: http://www.york.ac.uk/borthwick/Digital archiving blog: http://digital-archiving.blogspot.co.uk/Archivematica: https://www.archivematica.org/en/Report: http://dx.doi.org/10.6084/m9.figshare.1481170

top related