digital archive policies and trusted digital repositories

23
DCC Conference, Glasgow November, 2006 1 Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer Center

Upload: sybil-mcintyre

Post on 01-Jan-2016

36 views

Category:

Documents


4 download

DESCRIPTION

Digital Archive Policies and Trusted Digital Repositories. MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer Center. What is the Problem?. Need to extract local collection management policies from software to be more discoverable , configurable - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 1

Digital Archive Policies and Trusted Digital

Repositories

MacKenzie Smith, MIT Libraries

Reagan Moore, San Diego Supercomputer Center

Page 2: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 2

What is the Problem? Need to extract local collection management

policies from software to be more discoverable, configurable

Need to standardize ILM policies for sharing across systems within a preservation environment

Need to define metadata to audit ILM operations and achieve trust in a scalable, automated way

Page 3: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 3

Page 4: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 4

Preservation Environment

Preservation Environment

Preservation Properties

Preservation Control

Preservation Operations

Management Functions

Assessment Criteria

Management Policies

Capabilities

Preservation Environment

Persistent State

Rules Services

Physical Infrastructure

Database Rule Engine Storage System

Page 5: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 5

Local Repository Policy/Rule Types

Enterprise specification of assertions

Archive a-periodic, deferred consistency rules

Collection periodic rules

Item periodic or atomic rules

Page 6: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 6

Policy Framework Based on the NARA/RLG TDR checklist

categories:

Organization, environment and legal policies

Community and usability policies

Process and Procedure policies

Technology and Infrastructure policies

Page 7: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 7

Policy Framework Abstract policy (high-level)

Example:

repository stipulates the number and location of copies of all digital objects. Number of copies to be made, and which specific location(s), business rules, preferences for order of replication use. Repository has mechanisms in place to insure any/multiple copies of digital objects are synchronized.

Page 8: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 8

Policy Framework Concrete policy (local policy and metadata)

Example:

Specific number of copies of digital objects Locations of copies of digital objects Order of preference for digital object copies Location of business rules for copies (e.g. contract with 3rd

party archives for remote copies)

Page 9: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 9

Policy Encoding Looked at lots of schemas and approaches

XACML and RuleML, BPEL too limited Single purpose (access control, rights management,

workflow, etc.)

Ponder and KAoS too risky Research projects that are no longer active

Using Rei (N3) RDF ontology

Page 10: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 10

Policy Exchange DSpace DIPs

based on METS (also looked at XFDU, IMS CP, others) encapsulates content files, metadata, provenance, and

policies

iRODS enforces policies based on local rules produces state information (metadata) that can be audited

by the DSpace repository over time

Page 11: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 11

Example Functional RequirementsThe ERA list defines 854 key capabilities (functional requirements) needed for preservation. These can be

loosely organized into categories related to:

Management of disposition agreements describing record retention and disposition actions Accession, the formal acceptance of records into the data management system Arrangement, the organization of the records to preserve a required structure (implemented

as a collection/sub-collection hierarchy) Description, the management of descriptive metadata as well as text indexing Preservation, the generation of Archival Information Packages Access, the generation of Dissemination Information Packages Subscription, the specification of services that a user picks for execution Notification, the delivery of notices on service execution results Queuing of large scale tasks through interaction with workflow systems System performance and failure reports. Of particular interest is the identification of all

failures within the data management system and the recovery procedures that were invoked. Transformative migration, the ability to convert specified data formats to new standards. In

this case, each new encoding format is managed as a version of the original record. Display transformation, the ability to reformat a file for presentation. Automated client specification, the ability to pick the appropriate client for each user.

Page 12: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 12

Rule Definition Based on assessment criteria /

preservation policies / preservation functional capabilities

Implemented as Rules controlling micro-services with

associated persistent state information

Page 13: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 13

Case Study

SRB/iRODS virtualized storage environment

Provides 3rd party preservation services

Rules derived from local policy, preservation requirements

Provides metadata to allow monitoring for trust

DSpace@MIT institutional repository Defines local collection management

policies Consumes 3rd party preservation

services (e.g. iRODS) Provides provenance/audit (History) to

monitor trust

Page 14: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 14

DSpace Event System Archivist defines TDR-level abstract policies, System curator

defines ILM events of interest, based on policies e.g. ingest, modification, preservation migration, new edition, change in

access rules, etc.

System detects and acts on events, records them in the local History (provenance audit) e.g. iRODS deposit History/provenance uses ABC Harmony ontology for ILM (RDF)

System curator monitors iRODS state information DSpace History subsystem (via standard RDF browsing tools)

Page 15: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 15

iRODS Rule-based System Quantify the management policies Automate the application of the policies Track the outcomes from application of the

policies

First release of the software is this month

Page 16: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 16

iRODS - infrastructure independence

Six logical name spaces required to manage preservation properties Records Persons Storage resources Rules Micro-services Persistent state information

Page 17: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 17

Example Archivist Policies Authenticity

Are required provenance metadata provided with record? - Submission requirement

Is the chain of custody properly documented? - Management requirement

Integrity Are the bits protected against natural disasters? -

Management requirement for replication and distribution Are the bits preserved without corruption? - Future

assertion

Page 18: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 18

Example Archivist Policies Infrastructure independence

Management of preservation properties independently of choice of hardware and software infrastructure

Management policies are needed for assertions about the properties of the records (authenticity and integrity) and the properties of the preservation environment (infrastructure independence)

Page 19: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 19

Example of Complete Process of Rule Derivation from Preservation Criteria

Assessment Criteria Integrity of records is preserved

Management policy Integrity will be verified every 6 months

Preservation capabilities Replication of records Checksum on each record Synchronization between replicas Federation between archives

Page 20: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 20

Rule-based Preservation Policies Generated Rules

Event-condition-(set of micro-service or other rules)

Each micro-service corresponds to operations on a record at a remote storage location

Each micro-service has a recovery procedure to handle remote system failure or unavailability

Persistent state information is saved to track the outcome from applying the rule

Page 21: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 21

Rule - validate record integrity Check permissions (requires archivist or proxy)

Operations on specified record Access remote site Compute the checksum and compare with archived value If checksum is not correct

Access a replica, compute checksum, and verify is correct Replace bad replica with a good replica Update audit list to track the replacement

Update persistent state to record date of checksum verification

Page 22: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 22

Additional implied Assessment Criteria Are there any orphaned records present in the archive

with no preservation metadata?

Are the replicas distributed across independent administrative domains on different types of storage systems?

Is the observed error rate a factor of four lower than the validation rate?

Have all records been validated within the required time period?

Page 23: Digital Archive Policies and Trusted Digital Repositories

DCC Conference, Glasgow November, 2006 23

Self-consistency and Closure For every required preservation attribute

(authenticity and integrity) are their assessment criteria?

For every assessment criterion, does there exist preservation metadata?

Are the properties of the preservation environment also preserved?