Download - Infrastructure Training Session
An Infrastructure for Preservation
Claudio Prandoni
Marlis Valentini
MetaWare SpA & CASPAR
Programme
• Digital preservation threats and requisites• Summary of OAIS model• From OAIS to CASPAR• CASPAR key components• Ex. 1: Preservation step by step• Demo: A simple web application• Ex. 2: CASPAR answers to preservation threats• A preservable architecture• Interviews: Two case studies
Introduction
• How can digital data still be used and understood in the future when systems, software, and everyday knowledge continues to change? This is the CASPAR challenge.
Preservation Issue 1
• Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved– How to guarantee digital information may be
accessed and understood in the future?– How to guarantee retrieval of Archival
Information?– How to guarantee intelligibility of digital
information within heterogeneous Designated Communities?
Preservation Issue 2
• Non-maintainability of essential hardware, software or support environment may make the information inaccessible– How to guarantee preservation actors are
informed about change events?– How to guarantee appropriate actions are
undertaken to preserve Archival Information against change events?
Preservation Issue 3
• The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity– How to guarantee an adequate integrity and
identity for any Archival Information?
Preservation Issue 4
• Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future– How to guarantee an adequate security
access with the proper rights to any resource and functionality within an Archive?
Preservation Issue 5
• The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future– How to guarantee a proper information
package management within and Archive?– How to guarantee long-time preservation
maintenance of any information package?
The CASPAR Project
• The CASPAR project is mainly based on the OAIS standard ISO:14721:2003
• In this perspective, its Architecture is defined for– Managing key concepts of the OAIS reference model– Supporting main functionality identified in the OAIS
functional model
• Moreover, the CASPAR project aims to define and implement interfaces and functionally independent components
OAIS Information Model
Content Information
DataObject
interpreted using
interpretedusing
Designated CommunityKnowledge Base
InformationPackage
PreservationDescriptionInformation
Needed for long-term
preservation
DescriptiveInformation
Needed for discovery
Primary focus of archival
preservation
RepresentationInformation
CASPAR Implementation
Monitoring OAIS Environment Monitoring OAIS Environment
Detect Changes/Impacts in DCKBDetect Changes/Impacts in DCKB
Mapping out Preservation Strategy Mapping out Preservation Strategy
Provide Recommendations Provide Recommendations
STORAGESTORAGE
AIP StorageAIP Storage
AIP Maintenance AIP Maintenance
AIP Retrieval AIP Retrieval
DATA MANAGEMENTDATA MANAGEMENT
Populate Descriptive InfoPopulate Descriptive Info
Maintain Descriptive InfoMaintain Descriptive Info
Access Descriptive Info Access Descriptive Info
INGESTINGEST
Receive SIP Receive SIP
Q-check on SIP Q-check on SIP
Generate AIP Generate AIP
Extract DescInfo Extract DescInfo
Coordinate updates Coordinate updates
ACCESSACCESS
Query ProcessingQuery Processing
RetrievalRetrieval
Delivery Delivery
Perform Transformation Perform Transformation
Security Security
Access Control Access Control
STORAGESTORAGE
DATA MANAGEMENTDATA MANAGEMENT
INGESTINGEST
ACCESSACCESS
CASPAR Implementation
CASPAR key components
Creation, maintenance and reuse of OAIS Representation Information
Allow search of an object using either a related measurable parameter or a linkage to remote values
Construction and unpackaging of OAIS Information Packages
Centralised and persistent storage and retrieval of OAIS Representation Information, including PDI
OAIS-based Preservation Aware Storage, providing built-in support for bit and logical preservation
CASPAR key components
Information discovery services
Definition and enforcement of access control policies
Registration of provenance information on digital works and retrieval of right holding information
Maintenance and verification of authenticity in terms of identity and integrity of the digital objects
Reception of notifications from Publishers for a specific “topic” and sending of alerts to Subscribers
Definition of Designated Communities, identification of missing Representation Information
The CASPAR Workflow
Preservation step by step
1) The digital content object has to be “prepared” and “packed” in a proper way to be “ingested” in the digital archive system that will manage and maintain it for a long time.
2) The digital content object has to be “retrieved” within the digital archive, through its descriptive information, and “checked” for any restricting access right policy.
3) The digital content object within the digital archive needs to be maintained in order to be accessed, used and understood for whatever changes during its long-term lifecycle.
Ingestion Phase
InformationPackaging
Components
InformationPackaging
Components
1. Ingest Content Information2. Create Information Package
• Representation Info• Descriptive Info• Preservation Description Info
3. Check Information Package4. Store Information Package for long term
OAIS
IngestIngest
Data Management
Data Management
Archival Storage
Archival Storage
PreservationPlanning
PreservationPlanning
AdministrationAdministration
AccessAccess
Access Phase
InformationAccess
Components
InformationAccess
Components1. Search Content Information2. Obtain Information
Packages and relative Contents and Descriptions
3. Check Content Access Permissions
OAIS
IngestIngest
Data Management
Data Management
Archival Storage
Archival Storage
PreservationPlanning
PreservationPlanning
AdministrationAdministration
AccessAccess
Preservation Phase
CommunicationComponents
CommunicationComponents 1. Notify and Alert for Change
Event impacting long term preservation
2. Trigger Preservation Process
OAIS
IngestIngest
Data Management
Data Management
Archival Storage
Archival Storage
PreservationPlanning
PreservationPlanning
AdministrationAdministration
AccessAccess
CASPAR innovations
• CASPAR aims at preserving not only the bits of digital objects but also the information and knowledge that is encoded in digital objects
• CASPAR aims at preserving digital rights on contents and at identifying mechanisms to ensure maintenance and verification of the authenticity of digital objects along the whole preservation process
Phaistos disk (1700 BC)
We still cannot understand it
(the meaning has not been preserved)
We can only understand it’s a “sequence of symbols”…
Rosetta Stone (196 BC)
…just a
“sequence of symbols”… but…
Ancient Heroglyphic Egyptian
Demotic Egyptian
Greek
Additional components
Designated Community & Knowledge
Management
Designated Community & Knowledge
Management
1. Deal with Designated Community Profile and its own Knowledge Base
2. Identify and Provide Knowledge Gap for understanding a Content Information
ProvenanceManagementProvenanceManagement
1. Deal with Digital Rights
2. Guarantee Authenticity
Web Application
CASPAR answers
• So…
Is CASPAR solution able to provide an answer to the digital preservation issues identified at the beginning?
Preservation Issue 1
• Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved
– You need the ability to create and maintain adequate Representation Information
Preservation Issue 1
• To guarantee a digital information may be accessed and understood in the future, you need an adequate OAIS Representation Information
• To guarantee retrieval of Archival Information, you need an OAIS Finding Aids
• To guarantee intelligibility of digital information within heterogeneous Designated Communities, you need to manage DC Profiles and their Knowledge Base
Preservation Issue 2
• Non-maintainability of essential hardware, software or support environment may make the information inaccessible
– You need the ability to share information about the availability of hardware and software and their replacements/substitutes
Preservation Issue 2
• To guarantee preservation actors are informed about change events, you need an adequate management of message exchange
• To guarantee appropriate actions are undertaken to preserve Archival Information against change events, you need to identify the information to be added/modified
Preservation Issue 3
• The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity
– You need the ability to bring together evidence from diverse sources about the Authenticity of a digital object
Preservation Issue 3
• To guarantee an adequate integrity and identity for any Archival Information, you need an Authenticity Tool
Preservation Issue 4
• Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future
– You need the ability to deal with Digital Rights correctly in a changing and evolving environment
Preservation Issue 4
• To guarantee an adequate security access with the proper rights to any resource and functionality within an OAIS Archive, you need a Security and DRM Management
Preservation Issue 5
• The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future
– You need brokering of organisations to hold data and the ability to package together the information needed to transfer information between organisations ready for long term preservation
Preservation Issue 5
– To guarantee a proper information package management within and OAIS Archive, you need to create an adequate OAIS Information Package
– To guarantee long-time preservation maintenance of any information package, you need an implementation of OAIS Archival Storage
Conclusion
Pla
tform
Pla
tform
Operating System: Linux, Unix, Windows, MacOperating System: Linux, Unix, Windows, Mac
Java PlatformJava Platform
DBMS: H2, PostgresDBMS: H2, Postgres
Fram
ew
ork
Fram
ew
ork
Development Framework: JAX-WS, GWT, AntDevelopment Framework: JAX-WS, GWT, Ant
Application Server: Tomcat, Glassfish, WASCEApplication Server: Tomcat, Glassfish, WASCE
KeyC
om
ponen
tsK
eyC
om
ponen
tsGapManagerGapManagerGapManagerGapManager
OrchestrationOrchestrationOrchestrationOrchestration
DataAccess&SecurityDataAccess&SecurityDataAccess&SecurityDataAccess&Security RepInfoToolboxRepInfoToolboxRepInfoToolboxRepInfoToolbox
RegistryRegistryRegistryRegistry
PackagingPackagingPackagingPackaging
DataStoresDataStoresDataStoresDataStores VirtualisationVirtualisationVirtualisationVirtualisation
CASPAR Service FactoryCASPAR Service Factory
AuthenticityAuthenticityAuthenticityAuthenticity
SemanticWebSemanticWebSemanticWebSemanticWeb
DigitalRightsDigitalRightsDigitalRightsDigitalRights FindingAidsFindingAidsFindingAidsFindingAids
Development Management: Hudson and JTracDevelopment Management: Hudson and JTrac
Th
e C
AS
PA
R F
ou
nd
atio
nT
he
CA
SP
AR
Fo
un
dat
ion
Th
e C
AS
PA
R F
ou
nd
atio
nT
he
CA
SP
AR
Fo
un
dat
ion
Preservable Equation
Self-Contained +
Well Described +
Adaptable +
Replaceable =
Preservable
Pure Service-oriented design guarantees that the component can provide functionality without requiring cooperation of other components
Component analysis, design and development process is strongly based on complete – shared – open documentation at any level
• No DependenciesNo Dependencies• Loosely coupledLoosely coupled• DistributedDistributed
• Sharing know-howSharing know-how• Open SpecificationOpen Specification• Open Source Open Source • Open DocumentationOpen Documentation
Design choices and implementation allows to adapt and configure each component to provide always at least a minimal set of functionality independently from the deployment framework and condition
• FlexibilityFlexibility• ScalabilityScalability
Design choices and implementation allows to replace any component in the framework with compliant one.
• InteroperabilityInteroperability• MantainabilityMantainability
The Developer Community
http://developers.casparpreserves.eu:8080http://developers.casparpreserves.eu:8080
• Shared and cooperative development community based on– CASPAR Best PracticesCASPAR Best Practices
• Development Management based on a detailed– D1302 Overall Master PlanD1302 Overall Master Plan
– Refinement SpecificationsRefinement Specifications
• Development Control based on a Continuous Integration Engine– Hudson + JTracHudson + JTrac
• Specification, Software and Documentation available for developers & practitioners
CASPAR Preservation Nodes
This work is licensed under the Creative Commons Attribution-Noncommercial-
Share Alike 3.0 Unported License. To view a copy of this license, visit http://
creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative
Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.