remap web services for records management and digital preservation technology watch

Upload: christopher-awre

Post on 09-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    1/30

    with

    REMAP Project _______________________________________________

    D3Records management and digital preservation Web

    Services: a technology watch report, version 2

    Chris Awre

  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    2/30

    Records management and digital preservation Web Services - 2 -

    April 2009

    The REMAP Project

    Project Director: Richard Heseltine, Director of Academic Services andLibrarian, University of Hull

    ([email protected])Project Manager: Richard Green ([email protected])Repository Domain Specialist: Chris Awre ([email protected])Technical lead (Hull): Robert Sherratt ([email protected])Software developer (Hull): Simon Lamb ([email protected])Archivist: Judy Burg ([email protected])Records Manager: Vicky Mays ([email protected])Project Director for GCU: David Donald ([email protected])Project Manager for GCU : Iain Wallace ([email protected])Technical leads for GCU: Graeme West ([email protected])

    Caroline Noakes ([email protected])

  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    3/30

    Records management and digital preservation Web Services - 3 -

    The REMAP Project is being undertaken by the e-Services Integration Group at the University of Hull and Spoken Word Services at Glasgow Caledonian University. It is funded by the JISCRepositories and Preservation Programme.

    1. Introduction

    The JISC-funded RepoMMan project (2005-7) investigated the feasibility of embedding the use of an institutional digital repository within the day-to-day activity of staff within a University,whether working within the domains of research, teaching or administration. The repositorysystem deployed for the project, and also as the repository for the University of Hull, was theopen source Fedora system. In order to facilitate this regular usage and interaction, a tool wasdeveloped to permit the easy upload and download of digital objects to and from the repository.This tool, which was delivered through a Flex interface either standalone or through the Universityportal, made use of Fedoras Web Service interfaces to communicate with the repository. TheseWeb Service interactions were orchestrated using WSBPEL (Web Services Business ProcessExecution Language), an open OASIS standard.

    The project built on previous experience in the use of WSBPEL to coordinate Web Serviceinteractions in the field of online assessment. It demonstrated the validity of using Web Servicesas the basis for repository interaction and how these could be orchestrated in a flexible wayaccording to varied requirements.

    A second strand of the RepoMMan project investigated the automated generation of metadata,again to ease interaction with the repository and avoid the presentation of blank metadata formsto users with limited time and knowledge with which to complete them. The iVia tool wasidentified as having good potential in this area, and was successfully included within the overalltool. This was achieved through wrapping a locally hosted version of the iVia tool as a WebService. By doing so, WSBPEL could be used to include it within an overall workflow, addingfunctionality and value to the repository and its interface for users. The JHOVE object validationservice was also investigated for its potential to be included via a similar route and was found tobe equally open to this adaptation.

    The REMAP project (2007-9) is examining the application of the work carried out withinRepoMMan to records management and digital preservation (RMDP) activity, both within aninstitution (at the University of Hull) and a digital library service (Spoken Word Services). Inconsidering user needs in these areas amongst a number of stakeholders workflows for a varietyof processes have emerged. The RepoMMan architecture provides us with a mechanism foradapting WSBPEL processes to meet these workflows and so deliver what users need. Thisrequires that appropriate functionality is available as a Web Service. The architecture has beenused to date for internal, local Web Services, but the technology and standards employed areequally capable of incorporating externally hosted Web Services and orchestrating them asrequired.

    This technology watch report provides information on those Web Services and initiatives that havebeen identified that could provide relevant Web Services for use within REMAP: findings aresummarised in section 3. This report is an update to the original version produced in April 2008,and provides an update on where development of web services to support records managementand digital preservation has reached.

    The work described in this report is not provided with any sense of comprehensiveness, but isbased on a review of available information through literature searching, web crawling andconference proceedings. Where particular services have been missed, the author would begrateful of further information to inform subsequent iterations of the document. Please send

  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    4/30

    Records management and digital preservation Web Services - 4 -

    details to Chris Awre at [email protected] . Many thanks.

    mailto:[email protected]:[email protected]
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    5/30

    Records management and digital preservation Web Services - 5 -

    2. Tools for records management and digital preservation

    2.1 Contents

    PANIC (Preservation Web Services Architecture for New Media, Interactive Collections andScientific Data)

    5

    SHERPA DP2 / SOAPI 7Planets 9PRONOM / DROID 11GDFR (Global Digital Format Registry) 13TOM (Typed Object Model) & FRED 15JHOVE (JSTOR/Harvard Object Validation Environment) 16AONS (Automated Obsolescence Notification System) 18CRiB 20Xena 22DocMorph / CDS Convert 23SWORD 25

  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    6/30

  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    7/30

    Records management and digital preservation Web Services - 7 -

    Current status

    Following an initial period of activity in 2003-4, the PANIC project now appears to be in abeyance,though it is still indicated as active in the projects list on the School of Information Technology &

    Electrical Engineering website. However, a number of outputs from the project remain availableat http://www.itee.uq.edu.au/~eresearch/projects/panic/results.html .

    By way of informing step 2 of the PANIC architecture, outputs are available for preservationmetadata generation (including the PREMINT tool, the metadata schema used and a number of test cases). There are also a number of ontology files available that were generated to inform thedevelopment of Web Services for step 3 of the PANIC architecture. A demonstration of thepreservation services developed is available: this requires a login, details of which are availablefrom Professor Jane Hunter, the principal investigator for the PANIC project.

    Available tools

    As a research project, PANIC has not developed any tools that are made available on a servicebasis, trial or otherwise. The PREMINT preservation metadata generation tool is available for use,though only via the PANIC website. The ontology files developed can inform the structure of WebServices to be used for preservation, but are not encapsulated in any service directly.

    The work of the PANIC project has informed the development of AONS, the AutomatedObsolescence Notification System, details of which are available elsewhere in this document. Thesystem as created was centred on the invocation component, which was then able to call otherWeb Services wherever they happened to be made available from.

    Contact

    Professor Jane Hunter, School of Information Technology & Electrical Engineering, University of Queensland, Brisbane, Australiahttp://www.itee.uq.edu.au/~jane/

    Further information

    The project website remains available athttp://www.itee.uq.edu.au/~eresearch/projects/panic/index.html . This lists a number of publications and presentations related to the project and its aims as well as providing adescription of the project and its outputs.

    Update April 2009

    There has been no further development of the PANIC work in the past year.

    http://www.itee.uq.edu.au/~eresearch/projects/panic/results.htmlhttp://www.itee.uq.edu.au/~jane/http://www.itee.uq.edu.au/~eresearch/projects/panic/index.htmlhttp://www.itee.uq.edu.au/~eresearch/projects/panic/results.htmlhttp://www.itee.uq.edu.au/~jane/http://www.itee.uq.edu.au/~eresearch/projects/panic/index.html
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    8/30

    Records management and digital preservation Web Services - 8 -

    2.3 SHERPA DP2 / SOAPI

    Description

    SHERPA DP2 and SOAPI are JISC-funded projects based at what was the Arts & Humanities DataService (now Kings College London Centre for e-Research). They were funded under the samefunding stream as REMAP, and are treated together in this document, as they are largelycomplementary in their aims and objectives.

    SHERPA DP2 is a follow-on project to SHERPA DP (SHERPA Digital Preservation), itself an off-shoot of the SHERPA project ( http://www.sherpa.ac.uk ). SHERPA DP investigated adisaggregated service model for enabling digital preservation of e-prints compliant with the OAISReference Model. The project also identified where rights and responsibilities lay in providing sucha service. SHERPA DP2 is applying the same model to a wider range of digital content types andrange of repositories.

    SOAPI (Service-Oriented Architecture for Preservation and Ingest of digital objects) is developing an architecture and toolkit for (partially) automating preservation and ingest workflows, basedon a set of atomic Web Services. It is intended to complement and inform the delivery of theSHERPA DP preservation service through the provision of a framework for the inclusion andorchestration of distributed preservation functions where theses are available as Web Services.The project is also investigating the use of semantic annotation of Web Services to allowpreservation Web Services to be dynamically discovered and executed.

    Current status

    The SHERPA DP2 project is ongoing to November 2008. The University of Hull is a partner in theproject and will contribute content as part of the application of the SHERPA DP preservationsystem to other content types. The mechanism for exchange of content/metadata is to beconfirmed.

    The SOAPI project is ongoing to September 2008. The project has gathered extensive experiencewith the use of jBPM as a workflow technology for orchestrating available preservation WebServices.

    Available tools

    SHERPA DP implemented a system based on Fedora that can accept content gathered frominstitutional repositories. This content is then characterised using the PRONOM-DROID and JHOVE

    services via custom workflow management tools. The enhanced records are delivered back to theinstitutional repositories, but can also be stored by the preservation service. The service wastrialled as a demonstrator with a number of e-print repository partners, but has not been madeavailable more widely as yet. SHERPA DP2 is also investigating the viability such a preservationservice for the academic community.

    As indicated above, the SOAPI toolkit is based on the jBPM (java Business Processing Modelling)approach to web service workflow and orchestration. This is complementary to the investigationand use of WSBPEL within REMAP. No software or service is yet available for use outside of theproject, though it is expected that outputs will become available as open source software.

    http://www.sherpa.ac.uk/http://www.sherpa.ac.uk/
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    9/30

    Records management and digital preservation Web Services - 9 -

    Contact

    For SHERPA DP2 the project manager is Katrin Weidemann and the preservation office is GarethKnight. Contact details are available at http://www.sherpadp.org.uk/contacts.html .

    For SOAPI the project manager is Mark Hedges. Contact details are available via the projectwebsite (see below)

    Further information

    The SHERPA DP2 project website is available at http://www.sherpadp.org.uk/sherpadp2.html .The SOAPI project website is available at http://ahds.ac.uk/about/projects/soapi/index.htm .

    Update April 2009

    The two projects have come together closely, with SHERPA DP2 making use of the SOAPI toolkitproduced. The University of Hull has had a successful experience as a test site for the SHERPADP2 toolkit, which has made use of the SWORD deposit API as well as harvesting via OAI, and theproject has been exploring the potential for delivery of SHERPA DP2 as a service that otherinstitutional repositories might use. The outcome of this assessment is awaited.

    In terms of what SHERPA DP2 is seeking to achieve, there is a strong parallel with the Planetsdevelopment (see next section). A fuller assessment of the capability of each offering will only bepossible once (a) SHERPA DP2 reports on its service offering and (b) Planets releases its own webservices toolkit in 2010. Whilst there remains uncertainty about the exact nature of whatpreservation services will be made available, and under what terms, it is helpful that there aremultiple potential offerings being considered.

    As indicated, the SOAPI toolkit has been used as part of the SHERPA DP2 work. As of the date of this report it is not available publicly. The experience of using jBPM, whilst useful, was notultimately successful and orchestration of the web services within the toolkit is carried out using acustom lightweight workflow language designed for the project. Adoption of a different frameworkto facilitate orchestration is under consideration and may be incorporated in the future.

    http://www.sherpadp.org.uk/contacts.htmlhttp://www.sherpadp.org.uk/sherpadp2.htmlhttp://ahds.ac.uk/about/projects/soapi/index.htmhttp://www.sherpadp.org.uk/contacts.htmlhttp://www.sherpadp.org.uk/sherpadp2.htmlhttp://ahds.ac.uk/about/projects/soapi/index.htm
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    10/30

    Records management and digital preservation Web Services - 10 -

    2.4 Planets

    Description

    The EU-funded Planets (Preservation and Long-term Access through NETworked Services) projectis a European-wide collaboration involving national libraries and archives, university libraries, anda range of technology partners. As described on the projects website, The primary goal forPlanets is to build practical services and tools to help ensure long-term access to our digitalcultural and scientific assets. The project will deliver preservation-planning servicesmethodologies, tools and services for the characterisation of digital objects, preservation actiontools, and a testbed to demonstrate these outputs in action.

    Underpinning the various outputs will be an interoperability framework to seamlessly integrate thetools and services from within the project and outside it (where these are available elsewhere).

    It is anticipated that Planets will have to support the specification and execution of complexworkflows, and is working with BPEL to structure and orchestrate these based on provision of theservices involved as Web Services.

    The Planets software will seek to enable scenarios where a variety of preservation actions may becalled for, including content migration, provision of plug-ins to enable access to materials, andemulation.

  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    11/30

    Records management and digital preservation Web Services - 11 -

    Current status

    The Planets project is currently halfway through its four-year period, completing in mid-2010.The interoperability framework has been fully scoped and work on its development is underway.

    There are clear parallels with work being undertaken within the SHERPA DP2 and SOAPI projects,with both following a similar architectural approach and respecting the approach that preservationservices are likely to require both flexibility and the orchestration of distributed services andrepositories.

    Available tools

    There are no available tools at the time of writing. The clear synergy between Planets and REMAPin the use of BPEL is noted and collaboration to explore this further is being investigated.

    Contact

    The first point of contact for the Planets project is via [email protected] .

    Further information

    Further information on the Planets project can be found at http://www.planets-project.eu/ .

    Update April 2009

    The Planets project is now nearly three-quarters of the way through its lifespan, and appears tobe making steady progress. The initial work with BPEL using the Eclipse plugin has been droppedas it was found to be too difficult to work with. Instead a lightweight custom workflow language isbeing used (much as for SOAPI, which is proving very workable. Whilst it is good to hear that theworkflow components of this and the SOAPI toolkits have been enabled in this way, there mustremain a concern that custom languages used for this might limit community development in thefuture.

    It is planned to release the outputs of Planets in 2010, when they will be made available fortesting.

    mailto:[email protected]://www.planets-project.eu/mailto:[email protected]://www.planets-project.eu/
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    12/30

    Records management and digital preservation Web Services - 12 -

    2.5 PRONOM/DROID

    Description

    PRONOM is an online registry of impartial and definitive information about the file formats,software products and other technical components required to support long-term access toelectronic records and other digital objects. It is maintained and provided by The NationalArchives and is freely available for use. Its primary means of access is through a user-facing webinterface. This can be used as a reference tool for manual look-up of information about fileformats etc. for use in preservation activities. The registry was compiled primarily as a resourceto inform The National Archives own preservation, and is now shared for the community to use aswell. Community recommendations for expansion of the registry are welcome to ensure it has thewidest possible coverage.

    DROID is a related software tool that can be downloaded for local implementation and use. Itallows automatic batch checks of digital objects against the PRONOM registry and returnsinformation on the file formats of the objects checked, allowing preservation actions to besubsequently taken on an informed basis.

    Current status

    Both PRONOM and DROID are currently provided services to the community that are also beingactively developed further to enhance their functionality. Aspects of these developments arebeing undertaken within the Planets and PRESERV2 projects.

    Available tools

    As indicated above, the DROID tool, currently at version 1.1, is available for local implementationunder a BSD licence. The tool is a platform independent Java-based application. The informationabout the file formats is generated as XML signature files. Local DROID implementations can gainaccess to update signatures from PRONOM via Web Services. However, the DROID service itself is not available online as a web service itself. Access is via a Swing GUI or command lineinterface, and there is documentation describing how the command line API can be used tointegrate DROID within other systems (e.g., local repositories).

    Contact

    General queries can be sent to [email protected] . DROID has its own, albeit not

    heavily used, mailing list at http://sourceforge.net/mail/?group_id=160809 . The PRONOM andDROID tools have originated from the Digital Preservation Department at The National Archives [email protected] , which is run by Adrian Brown.

    Further information

    PRONOM is available at http://www.nationalarchives.gov.uk/pronom DROID is available via its own SourceForge site athttp://droid.sourceforge.net/wiki/index.php/Introduction

    mailto:[email protected]://sourceforge.net/mail/?group_id=160809mailto:[email protected]://www.nationalarchives.gov.uk/pronomhttp://droid.sourceforge.net/wiki/index.php/Introductionmailto:[email protected]://sourceforge.net/mail/?group_id=160809mailto:[email protected]://www.nationalarchives.gov.uk/pronomhttp://droid.sourceforge.net/wiki/index.php/Introduction
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    13/30

    Records management and digital preservation Web Services - 13 -

    Update April 2009

    The DROID and PRONOM services continue to be available to the community, and offer the mostviable tools that can be actively used in the context of repositories. The REMAP project has madeuse of a locally implemented copy of DROID by wrapping a Web Service around it to incorporate itwithin a BPEL-orchestrated workflow. The DROID tool has subsequently been used to capture

    information from PRONOM using the Web Services link between them. In this way, objects canhave DROID and PRONOM information about the files of the object captured for subsequentmanagement of the objects format(s).

    See also the update on the GDFR project for further information about a consolidation of formatregistry services and initiatives.

  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    14/30

    Records management and digital preservation Web Services - 14 -

    2.6 GDFR (Global Digital Format Registry)

    Description

    The GDFR is a collaborative project between Harvard University and OCLC, supported by theMellon Foundation, to develop a registry that will provide sustainable distributed services tostore, discover, and deliver representation information about digital format. The work recognisesthe need to have full information about the format of digital files in order to be able to preservethem effectively. The information it is intended to provide is complementary to that held withinPRONOM, though there has also been active work on the interfaces that would be available ontothe information, including SRU and OAI-PMH, allowing for integration of the registry in otherrepository workflows.

    Current status

    The workplan for the project indicates a completion date of January 2008. However, the websiteprovides no current information on the status of the work and further announcements areawaited. A team at the University of Maryland implemented the concepts of GDFR in their ownsystem (FOCUS FOrmat CUration Service) as part of a Library of Congress NDIIP project in2005, but this demonstrator does not appear to have been taken forward (available demos are nolonger available). This project implemented a web service agent to manage interactions betweenthe user and registry.

    Available tools

    The availability of the outputs from the GDFR project is awaited and no current tools are available.The FOCUS system also appears to be no longer available.

    Contact

    No contact details are provided for GDFR, though Stephen L Abrams at Harvard University was theauthor of the proposal ( [email protected] ). The FOCUS work was carried out byJoseph F JaJa (see http://www.umiacs.umd.edu/~joseph/ ) at the University of Maryland.

    Further information

    Information on the GDFR work is available at http://hul.harvard.edu/formatregistry/ and at

    https://collaborate.oclc.org/wiki/gdfr/about.html . These appear identical, though the latter isconsidered the official website.

    Information on the FOCUS system and project can be found athttp://www.umiacs.umd.edu/research/adapt/focus/ . The work is also described in a paper andpresentation available at http://www.umiacs.umd.edu/research/adapt/focus/publications.html ,given at the IS&T Archiving 2006 Conference in Ottawa, Canada.

    mailto:[email protected]://www.umiacs.umd.edu/~joseph/http://hul.harvard.edu/formatregistry/https://collaborate.oclc.org/wiki/gdfr/about.htmlhttp://www.umiacs.umd.edu/research/adapt/focus/http://www.umiacs.umd.edu/research/adapt/focus/publications.htmlmailto:[email protected]://www.umiacs.umd.edu/~joseph/http://hul.harvard.edu/formatregistry/https://collaborate.oclc.org/wiki/gdfr/about.htmlhttp://www.umiacs.umd.edu/research/adapt/focus/http://www.umiacs.umd.edu/research/adapt/focus/publications.html
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    15/30

    Records management and digital preservation Web Services - 15 -

    Update April 2009

    Information on this initiative is now accessible via a new website at http://www.gdfr.info (whichthe first, Harvard-based, link above re-directs to: the OCLC link is no longer available). Theoriginal project delivered its software in August 2008. Following this, discussions on thecontinuing role of format registries highlighted that the community could not support more than

    one. Discussions in spring 2009 between The National Archives, Harvard University and otherinterested parties concluded that bringing the work of GDFR and PRONOM together would bebeneficial.

    An announcement on this page in April 2009 stated the following:

    In April 2009 the GDFR initiative joined forces with the UK National Archives' PRONOM registryinitiative under a new name - the Unified Digital Formats Registry (UDFR). The UDFR will supportthe requirements and use cases compiled for GDFR and will be seeded with PRONOM's softwareand formats database.

    Information on the new initiative will be available at http://www.udfr.info . In the interim periodfurther information can be found at http://www.gdfr.info/udfr.html . The UDFR has a 16-monthroadmap, and can be contacted through Pam Armstrong ( [email protected] ), whohas agreed to chair an ad hoc governing body pending the formation of a permanent governingbody by November 2009, or Andrea Goethals ( [email protected] ).

    The GDFR registry can be searched via a web interface at http://www.formatregistry.org/registry ,though it will not be maintained. Documentation from the GDFR project is also available athttp://www.gdfr.info/docs.html .

    There is no mention of Web Services in the work of the GDFR initiative, though the developmentof interoperable Web systems is referenced. In taking forward the development of UDFR there isan indication that it will be technically based on the existing PRONOM system, and existinguse of Web Services to support its work is thus anticipated.

    The UDFR service is scheduled for delivery by July 2010.

    http://www.gdfr.info/http://www.udfr.info/http://www.gdfr.info/udfr.htmlmailto:[email protected]:[email protected]://www.formatregistry.org/registryhttp://www.gdfr.info/docs.htmlhttp://www.gdfr.info/http://www.udfr.info/http://www.gdfr.info/udfr.htmlmailto:[email protected]:[email protected]://www.formatregistry.org/registryhttp://www.gdfr.info/docs.html
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    16/30

    Records management and digital preservation Web Services - 16 -

    2.7 TOM (Typed Object Model) & FRED

    Description

    The Typed Object Model was the basis of a PhD thesis at Carnegie Mellon University in 2004.TOM, as it has become known, is a model for identifying and describing data formats in well-defined, machine-processable manners. The system built to demonstrate the model can workwith all data formats as long as they are byte sequences. The model envisages a series of typebrokers that record and interpret the information about types (a type being the informationcontained within an object: a format is a type plus its associated encodings), and call on variousactions as required. The demonstrated implementation included a web-based file conversionservice.

    The project also implemented FRED (a Format Registry Demonstrator) to show how TOM wouldinteract with a format registry.

    Current status

    The project appears to have ceased since the conclusion of the PhD. The project website isreferenced in a number of other documents, but does not currently respond at the time of writing.

    Available tools

    The availability of the demonstrator is unclear. The project also produced software that could bedownloaded for running local TOM brokers, though availability via the website is currently notavailable.

    Contact

    The contact for the TOM work is John Ockerbloom at the University of Pennsylvania (seehttp://www.cs.cmu.edu/~spok/vita.html for details). The TOM website is also located at theUniversity of Pennsylvania.

    Further information

    The website for TOM is http://tom.library.upenn.edu .

    Update April 2009

    Contact with the TOM project team was made in April 2008 and an indication that further workwould be undertaken. However, no subsequent work appears to have taken place at this time.

    http://www.cs.cmu.edu/~spok/vita.htmlhttp://tom.library.upenn.edu/http://www.cs.cmu.edu/~spok/vita.htmlhttp://tom.library.upenn.edu/
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    17/30

    Records management and digital preservation Web Services - 17 -

    2.8 JHOVE (JSTOR/Harvard Object Validation Environment)

    Description

    Briefly, JHOVE provides functions to perform format-specific identification, validation, andcharacterization of digital objects. The purpose of creating a tool to carry out these functions isan acknowledgement that all aspects of managing repositories and the digital objects within themrequire policy and processing decisions. JHOVE is intended to inform these decisions byautomating the generation of information about the formats of the digital objects. The metadataproduced by JHOVE can then be stored alongside the digital objects in a repository and used toinform preservation and other curation functions.

    The three areas of functionality can be described as follows. Identification allows a repository tofind out what an objects file format is. Validation allows the checking of objects to ensure thatthey are what they purport to be and consistency checking across a range of objects.Characterisation provides information on the properties of the file format for greaterunderstanding of what is being held.

    Current status

    JHOVE is currently available for use (see details below). The range of formats for whichinformation is available covers many widely used formats, but is not comprehensive: there is noregistry plugged in, hence the interest and participation by Harvard in the GDFR project.Additional development of JHOVE will take place as funding becomes available.

    Available tools

    The latest version of JHOVE, version 1.1, is implemented as a Java application that can be calledeither via a command-line interface or a Swing-based GUI interface. The command-line API isdesigned to allow for JHOVE to be integrated into other applications and workflows. The softwarecan be downloaded from the JHOVE website (with additional details athttp://hul.harvard.edu/jhove/distribution.html ) and is made available under a LGPL license. Thesoftware can be wrapped up as a web service, a feature implemented as part of the FOCUS workat the University of Maryland. A discussion list for JHOVE is available seehttp://hul.harvard.edu/jhove/community.html.

    Contact

    All enquiries should be directed to [email protected] .

    Further information

    Further information on the services provided through JHOVE is available athttp://hul.harvard.edu/jhove/ .

    http://hul.harvard.edu/jhove/distribution.htmlmailto:[email protected]://hul.harvard.edu/jhove/http://hul.harvard.edu/jhove/distribution.htmlmailto:[email protected]://hul.harvard.edu/jhove/
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    18/30

    Records management and digital preservation Web Services - 18 -

    Update April 2009

    The JHOVE service continues to be made available for use by the community. The REMAP projecthas made use a locally implemented version by wrapping it in a Web Services to incorporatewithin a BPEL-orchestrated workflow that enables the storing of technical metadata about imagesand other materials within the repository as part of the upload process.

    In April 2008 the JHOVE2 project was initiated. This is a two-year collaboration between theCalifornia Digital Library, Portico and Stanford University to develop and deploy a next-generation architecture providing enhanced performance, streamlined APIs, and significant newfeatures. The JHOVE2 project generalizes the concept of format characterization to includeidentification, validation, feature extraction, and policy-based assessment. The target of thischaracterization is not a simple digital file, but a (potentially) complex digital object that may beinstantiated in multiple files. Additional information can be found athttp://confluence.ucop.edu/display/JHOVE2Info

    There is no explicit mention of the use of Web Services in the development of JHOVE2, though, of course, the original JHOVE has not made explicit use of this approach either. However, theintention to provide streamlined APIs offers scope for more flexibly incorporating the JHOVEservice within local preservation workflows.

    http://confluence.ucop.edu/display/JHOVE2Infohttp://confluence.ucop.edu/display/JHOVE2Info
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    19/30

    Records management and digital preservation Web Services - 19 -

    2.9 AONS (Automated Obsolescence Notification System)

    Description

    The original AONS prototype was developed in 2006 as part of the Australian Partnership forSustainable Repositories project in collaboration with the National Library of Australia. AONS isintended to automatically provide information from authoritative international registries tosupport decisions on preservation action. The second phase of the project, AONS II, is beingdesigned to allow users to be informed when file formats used in repositories are obsolete or atrisk of becoming obsolete, using information sourced from appropriate format registries.

    It is intended that AONS II can be deployed in two ways, either as a local implementation or viaREST interfaces for inclusion within a SOA-based environment. The software implementsabstraction layers for repositories and registries, allowing multiple types of both to be connectedto AONS via adapters. Fedora and generic REST adaptors are planned, as well as adaptors toPRONOM and GDFR.

  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    20/30

    Records management and digital preservation Web Services - 20 -

    Current status

    AONS II is currently ongoing and outputs from this project are awaited. The project plan indicatesan original completion date of the end of 2007.

    Available tools

    The software from the first development phase of AONS is available for download throughSourceforge at http://sourceforge.net/projects/aons/ .

    Contact

    Contact details for the project are available via the project website/wiki below. The projectmanager is David Pearson at the National Library of Australia.

    Further information

    Information on the development of AONS is available athttp://pilot.apsr.edu.au/wiki/index.php/AONS .

    Update April 2009

    There appear to have been no further updates to the original work in the past year.

    http://sourceforge.net/projects/aons/http://pilot.apsr.edu.au/wiki/index.php/AONShttp://sourceforge.net/projects/aons/http://pilot.apsr.edu.au/wiki/index.php/AONS
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    21/30

    Records management and digital preservation Web Services - 21 -

    2.10 CRiB

    Description

    CRiB is a preservation recommendation system developed at the University of Miho in Portugal.It has been built using a service-oriented architecture and is Web Services based. It provides thefollowing functionality (taken from the website):

    Recommendation of optimal migration alternatives that take into consideration thepreservation requirements of the client institution;

    Conversion of digital objects to up-to-date encodings that most users will be capable of interpreting;

    Evaluation of migration's outcome by comparing the original digital object with its

    converted counterparts and identifying the significant properties that have not beencorrectly preserved; and

    Generation of migration reports in appropriate forms for inclusion in the preservationmetadata of migrated objects

    The architecture of the system is as displayed below.

  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    22/30

    Records management and digital preservation Web Services - 22 -

    Current status

    The CRiB work is ongoing. A demonstrator is available for testing, and the project teamencourage participation in the project by others. In particular they welcome contribution to thedevelopment of conversion services (beyond the current migrations available seehttp://digitarq.di.uminho.pt/MigrationWorkbench/SupportedMigrations.aspx ), evaluationtaxonomies (which determine the criteria for evaluation by CRiB for each object), propertyextractors, object comparators, and contributions to the Format Knowledge Base. In this light,CRiB can be considered as an architecture that is steadily being implemented but is not allavailable yet.

    Available tools

    A Migration Workbench demonstrator is available for use via a web-based interface. This allowsthe conversion of files according to format migration files available, and produces an evaluationreport that can be used to inform further action. WSDL files for the format migrations are alsoavailable for the development of local services.

    Contact

    Details of the CRiB team are available via the project website (see below). General enquiries canalso be sent to [email protected] .

    http://digitarq.di.uminho.pt/MigrationWorkbench/SupportedMigrations.aspxmailto:[email protected]://digitarq.di.uminho.pt/MigrationWorkbench/SupportedMigrations.aspxmailto:[email protected]
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    23/30

    Records management and digital preservation Web Services - 23 -

    Further information

    Further information on the CRiB system is available at http://crib.dsi.uminho.pt/ .

    Update April 2009The CRiB architecture and software have been used in tandem with the RODA project to developan OAIS-conformant repository of authentic digital objects for the Portuguese National Archives.RODA also uses a SOA-based approach in its design, and is built on top of an underlying Fedorarepository. Details of this project and how CRiB relates to it can be found athttp://repositorium.sdum.uminho.pt/handle/1822/8226 , and the work was presented at the iPres2008 conference.

    The presentation makes it clear that the CRiB software is free and available for use as adevelopment toolkit for preservation services. A consultancy, KEEP Solutions( http://www.keep.pt ), has been established to provide assistance with its use for a variety of purposes. However, no further development of the CRiB services per se appears to have takenplace in the past year.

    http://crib.dsi.uminho.pt/http://repositorium.sdum.uminho.pt/handle/1822/8226http://www.keep.pt/http://crib.dsi.uminho.pt/http://repositorium.sdum.uminho.pt/handle/1822/8226http://www.keep.pt/
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    24/30

    Records management and digital preservation Web Services - 24 -

    2.11 Xena

    Description

    Xena (Xml Electronic Normalising for Archives) is free and open source software available from theNational Archives of Australia to aid in the long term preservation of digital records. Xenaenables the detection of file formats of digital objects, and facilitates their conversion into openformats for preservation. This is adopting the preservation strategy of migrating objects toformats that have a higher probability of being usable in the future. The Xena software makesuse of OpenOffice 2.x for this purpose, and generally advocates the use of XML-based formats forpreservation.

    Current status

    The software is available for use under a GPL2 licence. It is currently at version 4.1 and is beingactively maintained and developed by the National Archives of Australia.

    Available tools

    The Xena software is available for download at http://xena.sourceforge.net/download.php . It isself-contained software that can be run on either a Windows or Linux platform. The installation of Java Runtime Environment and OpenOffice 2.x are pre-requisites for using Xena.

    Contact

    Contact details for Xena are available at http://xena.sourceforge.net/contactus.php .

    Further information

    Information about Xena can be found at http://xena.sourceforge.net/ .

    Update April 2009

    Version 4.2.1 of the software was released in January 2009. No major changes have been madeto the software and it continues to play a role in supporting the normalisation of files to facilitatetheir content. Xena is in active use at the National Archives of Australia for their own purposes in

    connection with the Digital Preservation Recorder (DPR), a workflow tool through which digitalpreservation tasks are carried out. This software is also available for download: furtherinformation can be found at http://sourceforge.net/projects/dpr . Although the software can beused individually through local installation, the DPR can also call it as part of a wider workflow.

    The Integrated Content Environment (ICE) project at the University of Southern Queensland alsoseeks to provide a normalisation capability through the integration of templates for the creation of content within Microsoft or Open Office. A variety of file format conversions can take placethrough the tool to help achieve a desired level of file structure standardisation, a feature that canbe of value to digital preservation. See http://ice.usq.edu.au/default.htm for more information.

    http://xena.sourceforge.net/download.phphttp://xena.sourceforge.net/contactus.phphttp://xena.sourceforge.net/http://sourceforge.net/projects/dprhttp://ice.usq.edu.au/default.htmhttp://xena.sourceforge.net/download.phphttp://xena.sourceforge.net/contactus.phphttp://xena.sourceforge.net/http://sourceforge.net/projects/dprhttp://ice.usq.edu.au/default.htm
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    25/30

    Records management and digital preservation Web Services - 25 -

    2.12 DocMorph / CDS Convert

    Description

    The two tools described here are examples of online tools that enable the conversion of files fromone format to another. Whilst this capability is quite often a feature of digital preservation (asevidenced by other tools described in this document), it can also be used as a means for movingobjects to formats that are easier to work with for other purposes, which may then also feed intopreservation activity and workflow where this is required. The tools described here weredeveloped mainly for the purposes of conversion for use, without any specific preservation role inmind. Nevertheless, they are considered as alternatives where file migration is a preferredpreservation strategy.

    DocMorph is provided by the National Library of Medicine in the United States and enables theconversion of over 50 different file formats (see http://docmorph.nlm.nih.gov/docmorph/files.htm for details) into PDF, TIFF or text according to need.

    CDS Convert is made available by CERN in Geneva, Switzerland. It enables conversion between avariety of different commonly used formats: postscript, DVI, GIF, Word, PowerPoint, Outlook,PDF, TIFF and OpenOffice Impress files.

    Current status

    Both tools are available for use on a free basis and are well established. Whilst both wereoriginally created to serve the needs to employees at the two organisations, there are norestrictions on who can use them now. Registration is required to use DocMorph, but this is forinformation only.

    Available tools

    DocMorph is made available as an online browser-based tool, allowing the conversion of one file ata time. A separate, downloadable tool, MyMorph, can also be used for batch conversion. It is aWindows-based tool, currently at version 2, and makes use of the online DocMorph service behindthe scenes. Access to DocMorph is available athttp://docmorph.nlm.nih.gov/docmorph/docmorph.htm , whilst the MyMorph tool can bedownloaded from http://docmorph.nlm.nih.gov/docmorph/mymorph.htm .

    CDS Convert is delivered as an online tool only. It can be used for single file and batch fileconversion, the latter through the use of packaged zip and other archive file formats. CDS

    Convert is available at http://cdsconv.cern.ch/ .

    Neither tool is available for embedding in other automated workflows.

    Contact

    For further information see the websites for the two services. Support for using CDS Convert isavailable from [email protected] . Queries relating to DocMorph and MyMorph can be sentto Frank Walker details are available at http://docmorph.nlm.nih.gov/docmorph/contact.htm .

    http://docmorph.nlm.nih.gov/docmorph/files.htmhttp://docmorph.nlm.nih.gov/docmorph/docmorph.htmhttp://docmorph.nlm.nih.gov/docmorph/mymorph.htmhttp://cdsconv.cern.ch/mailto:[email protected]://docmorph.nlm.nih.gov/docmorph/contact.htmhttp://docmorph.nlm.nih.gov/docmorph/files.htmhttp://docmorph.nlm.nih.gov/docmorph/docmorph.htmhttp://docmorph.nlm.nih.gov/docmorph/mymorph.htmhttp://cdsconv.cern.ch/mailto:[email protected]://docmorph.nlm.nih.gov/docmorph/contact.htm
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    26/30

    Records management and digital preservation Web Services - 26 -

    Further information

    Further information on DocMorph and MyMorph can be found athttp://docmorph.nlm.nih.gov/docmorph/default.htm .

    Further information on CDS Convert can be found at http://cdsconv.cern.ch/ .

    Update April 2009

    The DocMorph and MyMorph tools continue to be made available, though no further developmenthas occurred over the past year. CDS Convert is also still available as a service, again as before.

    For PDF conversion Open Office can also be used (as utilised within the REMAP project). The iTextsuite offers similar capability, and is designed for inclusion within other web applications as part of a wider, automated workflow see http://www.lowagie.com/iText/ .

    http://docmorph.nlm.nih.gov/docmorph/default.htmhttp://cdsconv.cern.ch/http://www.lowagie.com/iText/http://docmorph.nlm.nih.gov/docmorph/default.htmhttp://cdsconv.cern.ch/http://www.lowagie.com/iText/
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    27/30

    Records management and digital preservation Web Services - 27 -

    2.13 SWORD (Simple Web-service Offering Repository Deposit)

    Description

    The SWORD project was originally funded in 2007 by the JISC as a result of a earlier workinggroup that had convened to investigate how a common deposit API for repositories might bedeveloped. The SWORD II project was funded in 2008.

    From the project website:

    SWORD is a lightweight protocol for depositing content from one location to another. It is aprofile of the Atom Publishing Protocol (known as APP or ATOMPUB).

    Whilst not specifically a Web Service that supports digital preservation, or records management,SWORD does make use of Web Services to support the deposit, or movement as the descriptionstates, of content into repositories, and between them. It can thus be used as a tool in thepreservation of materials where that preservation requires the content to be either deposited intoa repository (where it can be processed using the other tools described in this document) ormoved from one repository to another (for the same purposes and dependent on the role of therepository). For example, it has been used as part of the ICE project described on page 22 toenable documents created using the templates to be transferred into a local repository for theirfurther management.

    Current status

    The SWORD II project has recently concluded. The outputs from the projects are freely availablefor use. A community of interest around the SWORD developments has developed, which suggestfuture development may be carried out. Work being undertaken by other projects and initiativescan be found on the project website. A direct follow-on to the SWORD projects is work within theYODL-ING project at the University of York( http://www.york.ac.uk/library/electroniclibrary/yorkdigitallibraryyodl/ ).

    Available tools

    Version 1.3 of the SWORD APP profile has been released. Implementations of the previousversion 1.2 for use with Fedora, DSpace, and EPrints repository software are available, as well asJava and PHP libraries for use in development, and version 1.3 implementations will be availablein the near future. Additionally a 1.3 validator will be available to check that the SWORDmessages are correct.

    A specific instance of using SWORD to support regular deposit into repositories is the testMicrosoft Office SWORD plugin (see http://www.codeplex.com/OfficeSWORD ), though it is unclearwhether this will be taken any further as yet.

    Contact

    The current project manager and coordinator of work related to SWORD is Adrian Stevenson atUKOLN ( [email protected] ).

    http://www.york.ac.uk/library/electroniclibrary/yorkdigitallibraryyodl/http://www.codeplex.com/OfficeSWORDmailto:[email protected]://www.york.ac.uk/library/electroniclibrary/yorkdigitallibraryyodl/http://www.codeplex.com/OfficeSWORDmailto:[email protected]
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    28/30

    Records management and digital preservation Web Services - 28 -

    Further information

    Further information is available through the project website at http://www.swordapp.org/ . Thereis also a mailing list for developers at https://lists.sourceforge.net/lists/listinfo/sword-app-tech .

    http://www.swordapp.org/https://lists.sourceforge.net/lists/listinfo/sword-app-techhttp://www.swordapp.org/https://lists.sourceforge.net/lists/listinfo/sword-app-tech
  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    29/30

    Records management and digital preservation Web Services - 29 -

    3. Web Services for records management and digital preservation

    In reviewing the available projects, services and initiatives related to digital preservation in thisdocument, the underlying aim is to identify those areas of functionality that are available as WebServices, and which are potential candidates for inclusion within workflows identified by REAMP tosupport the use cases surrounding records management and digital preservation needs at theUniversity of Hull. The current review has indicated that the landscape offers a mixture of

    possibilities in this respect, though with some further potential in the future.The initiatives included in the document can be split up into broad categories. These are nottotally discrete, but indicate the level at which the work is taking place and the origins of thepurpose in pursing the work described.

    Frameworks / architectures for enabling digital preservationo PANIC, SHERPA DP2 / SOAPI, Planets

    Registries and tools related to information on file formatso PRONOM / DROID, GDFR, TOM / FRED

    Tools that provide functionality specifically related to digital preservationo JHOVE, AONS, CRiB, Xena

    Tools that could be used for digital preservation, but which were not specifically designedfor this purpose

    o DocMorph, CDS Convert

    There has clearly been much developmental work in this area (e.g., PANIC, TOM) that is nowinforming the further development of services. This work, along with the work of the JISC-fundedPRESERV and PRESERV projects, have concluded that digital preservation is a complexenvironment, and that it is unlikely that one service can provide everything that is need for allcases. Hence, the role of distributed services appears to be key to effective digital preservation.Who manages the digital preservation processes is less clear, with models being investigated of centralised hubs (e.g., Planets, SHERPA DP2) coupled with the provision of downloadable toolsthat can be used to carry out local preservation activity, albeit in some cases relying on acentralised source of information (e.g., DROID, MyMorph, JHOVE).

    The distributed model approach fits closely with the use of Web Service technologies and theservice-oriented architectural approach. Nevertheless, whilst this approach may be considered afinal aim, development of Web Services so far has not been common, and many initiatives haverelied on the tried and trusted approach of providing downloadable tools for local installation(understandable given the relative lack of ability of institutions to work with Web Services usingcurrent systems). As such, at the time of writing there are few examples of substantial stableWeb Services available for use in local institutional workflows.

    In considering options for enabling digital preservation within RMDP workflows as part of theREMAP project there are two approaches that can be taken currently:

    To use available Web Services for the tasks for which they are available, e.g., the fileconversion WSDLs from CRiB and the REST interfaces onto AONS.

    To wrap locally installed tools as Web Services to enable them to be called as part of awider workflow, e.g., using DROID, JHOVE, possibly AONS as an option

    It is anticipated that the Planets and SHERPA DP2 / SOAPI projects will provide additional servicesthat can be considered for inclusion in local workflows later in the projects lifetime.

    It should also be noted that available Web Services specifically for records management were notforthcoming. A number of commercial records management systems suppliers use Web Serviceswithin their systems, but not in a distributed fashion for inclusion in flexible workflows.

  • 8/8/2019 REMAP web services for records management and digital preservation technology watch

    30/30

    Records management and digital preservation Web Services - 30 -

    Update April 2009

    The approach outlined above remains the situation, with no further development of Web Servicesfor wide use for the purposes of records management. However, as digital preservation toolsstart to become used, it is argued that this will lead to an increased awareness of the value of records management in guiding the management of files so they can be best preserved.

    Normalisation is a feature of many systems that use it to harmonise variety and allow effectivemanagement of files. Hence, a preservation policy might state that only certain formats will bepreserved. Normalising tools can assist in migrating files to these formats. It is becomingincreasingly clear, though, that preservation is not a task that occurs late in the lifecycle of digitalcontent files, but an activity that can useful inform all aspects of its lifecycle, starting at creation.If the preservation policy can influence creation and use, then the management of that record willhave made subsequent preservation easier.

    The ICE project described on page 22 takes this a step further by creating templates for thecreation of content so that there is an increased standardisation of the structure of the files.Having increased confidence in the knowledge one has about the structure of a file allows forpreservation tasks to be undertaken based on that structure. Hence, managing the record fromcreation again offers the opportunity to do more with the file at a later point in its lifecycle.

    A balance needs to be struck between services that make such requirements and the flexibilitythat tools provide for creation that can be used to good effect for the immediate purpose of thefile. The extent to which Web Services can be used to support this balance remains uncertain,though certainly many of the services described in this document have the potential to contributeto future solutions.