us gpo aip independence test

21
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ Abbott GPO contact: Kate Zwaard

Upload: renee

Post on 12-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

US GPO AIP Independence Test. CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ Abbott GPO contact: Kate Zwaard. Overview. Background OAIS FDsys Project Objectives AIP METS, MODS, and PREMIS Solution Strategy - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: US GPO AIP Independence Test

US GPOAIP Independence Test

CS 496A – Senior Design

Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong

Faculty advisor: Dr. Russ AbbottGPO contact: Kate Zwaard

Page 2: US GPO AIP Independence Test

Overview Background

OAIS FDsys Project Objectives

AIP METS, MODS, and PREMIS

Solution Strategy XML parsing A note on deliverables Repositories Testing

Conclusion

Page 3: US GPO AIP Independence Test

OAIS Open Archival Information System

“An OAIS is an archive consisting of an organization of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community”

Developed by the Consultive Committee on Space Data Systems (ISO 14721:2003)

Page 4: US GPO AIP Independence Test

FDsysFederal Digital System

FDsys – Am OAIS maintained by the U.S. Government Printing Office to provide public access to information submitted by Congress and Federal agencies.

Page 5: US GPO AIP Independence Test

OAIS Primary Functions Ingest – Turn SIPs into AIPs Archival Storage – Storage and retrieval

of AIPs Data Management – Populating,

maintaining and accessing the varieties of information

Administration – Controls day to day operations

Preservation Planning – Maintaining archive accessibility

Access – Functions for access of archive

Page 6: US GPO AIP Independence Test

Information Package- critical component of OAIS

The information package is a conceptual linking of content information with its preservation description and packaging information.

Three kinds of information packages SIP – Submission Information Package AIP – Archive Information Package DIP – Distribution Information Package

Page 7: US GPO AIP Independence Test

AIP

Archival Information Package Defines how digital objects and its associated

metadata are packaged using XML based files. METS (binding file) MODS PREMIS

Page 8: US GPO AIP Independence Test

Project Objective: Prove AIP Independence

An AIP is independent if, in the event of catastrophic and irretrievable loss or damage of the content management system, a knowledgeable user can still make sense of the data.

Page 9: US GPO AIP Independence Test

Project Objectives This project simulates FDsys breaking down

due to some catastrophic attack or error. We are attempting to categorize and

reconstruct an amount of sample data from FDsys outside the context of the actual CMS. The only references we have available, other

than the actual files in the archive, are publicly defined standards.

It is our hope that this project will help GPO improve the robustness of their file system.

Page 10: US GPO AIP Independence Test

AIP: METS Schema

XML file format

Seven major sections

Page 11: US GPO AIP Independence Test

AIP: METS Schema

5 Major Sections5 Major Sections 1) METS Header 2) Descriptive Metadata 3) Administrative Metadata 4) File Section 5) Structural Map

Page 12: US GPO AIP Independence Test

AIP: MODS

Descriptive metadata

Extension to METS

Top-level elements Mandatory Recommended Optional

Page 13: US GPO AIP Independence Test

AIP: MODS

Page 14: US GPO AIP Independence Test

AIP: PREMIS

Preservation metadata

Extension to METS

PREMIS Data Model Intellectual Entity Object Entity Event Entity Agent Entity Rights Entity*

Page 15: US GPO AIP Independence Test

AIP: PREMIS

Page 16: US GPO AIP Independence Test

Solution Strategy

Data submitted to us are AIPs, not SIPs. Repository software cannot ingest AIPs, only SIPs. We must write scripts that parse the AIPs in such a way to construct SIPs from the the arbitrary file structure, then ingest those SIPs with a repository software to create to new AIPs.

Page 17: US GPO AIP Independence Test

XML Parsing As described above, all metadata is in

the form of XML files. Hence, using code to read XML files is integral to the project.

We plan to use the Java programming language for our scripting needs. Java API for XML Processing (JAXP): the

standard Java library for handling XML It provides several different possible

representations for XML

Page 18: US GPO AIP Independence Test

A Note on Deliverables

This is not a typical computer science design project because our aim is not to design software. Instead, we will be conducting scripted tests on real data and forming conclusions based on the results.

Deliverables will most likely include: a written report of our findings and

recommendations a reorganized version of the input data

Page 19: US GPO AIP Independence Test

Testing After parsing and organizing the data, it will be

important to perform checks to ensure that the reconstruction is accurate. We may send a preliminary report to GPO for

verification.

The exact testing procedure is still undefined, as we haven’t had a chance to investigate the data in depth yet. Our goals should be clearer once we understand

exactly what type of data we are dealing with.

Page 20: US GPO AIP Independence Test

Repositories Third party repository software to

ingest created SIPs. DSpace, Fedora Commons (Duraspace)

Based on simple technologies JavaMysqlApache Tomcat Javascript Server

Page 21: US GPO AIP Independence Test

Conclusion

Our thanks to Kate, Dr. Abbott, and Dr. Pamula for their support.