preservation metadata and premis vilas wuwongse asian institute of technology 1

64
Preservation Metadata and PREMIS Vilas Wuwongse Asian Institute of Technology 1

Upload: kelley-oneal

Post on 29-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Preservation Metadata and PREMIS

Vilas WuwongseAsian Institute of Technology

1

Outline

• Introduction to Metadata• What is preservation metadata?• Why is preservation metadata needed?• How to create preservation metadata• PREMIS• Conclusion• Acknowledgement

2

INTRODUCTION TO METADATA

3

Metadata

• Metadata is often defined as “Structured Data about Data”.

• It defines information about one or more characteristics of the data:– Data’s name, description, purpose, created

date-time, creator, basic information.• For example

– Library catalogues

4

Metadata Categories (1)

• Descriptive– describes identification and information of resource: title,

author, abstract and keywords.• Structural

– informs relationships within and among resource objects: web page containing html files, image files, css files, and javascript files, linking to others files.

• Technical (for physical files)– Includes technical information that applies to any file type:

software/hardware environment, checksums, digital signatures, image width, elapsed time.

5

Metadata Categories (1)

• Administrative– provides information to help manage a resource,

such as when and how it was created, file type and other technical information, and who can access it

– Two important subsets:• Rights management metadata, dealing with intellectual

property rights• Preservation metadata, containing information needed

to archive and preserve a resource.

6

WHAT IS PRESERVATION METADATA?

7

Preservation Metadata (1)

• Information that is essential to ensure long-term accessibility of digital resources

• A verifier of the past• A communication to the future• A best guess on the future: no prescriptive list

of metadata elements available• Must be able to exist independently from the

systems which were used to create them

8

Preservation Metadata (2)

• Sometimes considered a subset of• administrative metadata, assisting in the management

of information• technical metadata, assisting access to the digital

content and ensuring that the digital resources can be rendered originally

• Basic functional objectives: [OCLC]• Providing knowledge about actions to maintain digital

resource over the long-term• Ensuring that the digital resources can be rendered

originally

9

Information Included in Preservation Metadata

• Provenance– Describe history of creation, ownership, access, and change

• Authenticity– Ensure trustworthiness (Does digital resource render originally?)

• Preservation activities– Record process supporting preservation, such as migration

• Technical environment– Provide name and version of hardware, platform, OS, and software that is

required to render digital resources• Rights management

– Inform concern of intellectual property rights and agreement that need to be observed when execute preservation process.E.g. does a creator allow to copy his/her work or not?

10

Example

11

16 preservation metadata elements ( recommended by oclc.org, May 1998)

WHY IS PRESERVATION METADATA NEEDED?

12

Why Preservation Metadata?

Preservation metadata helps the implementation of preservation policies

13

Preservation Policies (1)

• define how to manage digital assets in a repository to avert the risk of content loss in terms of, e.g.,– data storage requirements– preservation actions– Responsibilities

14

Preservation Policies (2)

• Specify preservation goals to ensure that:– digital content is within the physical control of the

repository– digital content can be uniquely and persistently

identified and retrieved in the future– all information is available so that digital content can

be understood by its designated user community– significant characteristics of the digital assets are

preserved even as data carriers or physical representations change

15

Preservation Policies (3)

• Specify preservation goals to ensure that:– physical media are cared for– digital objects remain renderable or executable– digital objects remain whole and unimpaired and that

it is clear how all the parts relate to each other– digital objects are what they purport to be

• All of these preservation functions depend on the availability of preservation metadata

16

HOW TO CREATE PRESERVATION METADATA

17

I want to have my own restaurant.

What should I do?

18

To Begin

What you

should

know

What you

should

know

19

To Begin

What you

should

plan

What you

should

plan

20

To Begin

How you should

run

How you should

run

21

To Begin

I won’t give you a blueprint or concrete model for running a

restaurant.

But I’ll guide you WHAT and HOW you have to

consider when planing to run a restaurant business.

22

To Begin

I want to buildan archival information system.

What should I do?23

OAIS: Introduction

Understand

OAISreference model

Understand

OAISreference model

24

OAIS: Introduction

OAIS Background

• Reference Model for an Open Archival Information System (OAIS)– Development led by the Consultative Committee

for Space Data Systems (CCSDS)– Issued as CCSDS Recommendation (Blue Book)

650.0-B-1 (January 2002)– Also adopted as: ISO 14721:2003

25

OAIS Model (1)

• Conventional categories– Administrative, Descriptive (e.g. MARC, Dublin Core),

Structural

• OAIS model categories– Preservation Description Information

• Reference Information: to enumerate and describe identifiers• Provenance Information: to document the history of the content

information (creation, modification, custody)• Context Information: to document the relationship of the content

to its environment• Fixity Information: to document authentication mechanisms

26

OAIS Model (2)

– Content Information • Content Data Object• Representation Information

the information needed for proper rendering, understanding, and interpretation of a digital object's content

– Packaging Information– Descriptive Information

the information used to aid searching, ordering, and retrieval of the objects

27

OAIS Model (3)Metadata

Packaging Information

binds the digital object and its associated metadata into an identifiable unit or package (i.e., an Archival Information

Package)

Descriptive Information

that helps users of the archive to locate and access

information of potential

interest.

Representation Information

needed to make the data object understandable to

the designated

user community

Content Data Object

the original target of

preservation Provenance Information

documents the history of

the Content Information

Reference Information

enumerates and describes identifiers assigned to the Content Information such that it can be referred to

unambiguously, both internally and externally to

the archive

Context Informationdocuments the

relationships of the Content Information to its environment (e.g., why it

was created, relationships to other Content

Information)

Fixity Informationinformation validating the

authenticity ofthe content information

Content Information Preservation Description Information

necessary to manage the preservation of the Content

Information

Structure Informationinterprets the bits by organizing them into specific data types, groups of data types,

and other higher-level meanings.

Semantic Informationprovides additional meaning for the

interpretation of the content. For example, structural information may identify a bit stream as ASCII text characters, while

semantic information might indicate that

the text is in English.

DC.TitleDC.CreatorDC.SubjectDC.DescriptionDC.PublisherDC.ContributorDC.DateDC.TypeDC.FormatDC.IdentifierDC.SourceDC.LanguageDC.Coverage

- Reason for Creation- Is Version Of- Has Version- Is Replaced By- Replaces (migration)- Is Required By- Requires- Is Part Of- Has Part- Is Referenced By- References- Is Format Of- Has Format- Same Intellectual Content As

Ingest Process History- Institution- Event Date/Time- Event Type- Event DescriptionPreservation History- Institution- Action Date/Time- Action Type- Action Description- Technical Device

Authentication- Dig. Signature / Watermark / Time Stamp- Checksum- Encryption- Documentation of Auth. Mechanism

e.g three numbers interpreted as a date

e.g., ISBN, URN

Content Data Object Description

detailing the characteristics and features of the Content Data Object itself that are necessary to render and understand its

content.

Environment Descriptiondescribes a hardware/software

environment capable of rendering or displaying the Content Data Object in the

form in which it currently exists in the archival store.

Directory structure and file naming conventionsContent type Component types and their relationships File description Installation requirements Size Access inhibitors Access facilitators Significant properties Functionality Description of rendered content Quirks

Documentation

- Access Status- Rights Information - Copyright Statement - Patent Statement - Archiving Permission- Use Conditions - Actors - Actions - Permitted by statute - Permitted by license - Encryption details- Contacts / Rights Holders

Software Environment

Hardware Environment

Archival System Identification

Global Identification

Resource Description

Rights Information

Full-Text DescriptionFor normalising full-text XML

Computational Resources

Storage

Peripherals

Rendering Programs

Operating System

28

OAIS Functional Entities

There are three types of information package:•the Submission Information Package (SIP), which conveys the information provided to the archive by the user and deposit system. •the Archival Information Package (AIP), which is the stored archival version of the information. •the Dissemination Information Package (DIP), which is the version of the information available to users.

29

30http://breastfeedinglib.saiyairak.com

PREMIS

31

PREMISOverview

32

What?

• PREservation Metadata: Implementation Strategies

• Sponsored by Library of Congress (LOC)• People usually refer to “PREMIS” as “Data

Dictionary”• Represented in XML format

33

PREMIS Data Dictionary• Set of Semantic Units (which will be called Metadata Elements

when they are implemented)• Metadata for digital objects so that they

– Can be read from media– Can be rendered– Are stored securely– Keep track of changing formats

• Metadata Scope– Format-spec e.g. audio, video, image, …– Implementation-spec How to access it (by app)– Descriptive metadata Data properties; like, MARC, DC– Detailed info (For media or hardware)– Agents info e.g. people, org, or software– Right info e.g. license, permission

34

Where is PREMIS?

35

PREMIS responses itself as a coordinator among several types of metadata in order to perform preservation function on all digital resources.

Thus, PREMIS is a small core at the heart of preservation metadata

PREMISData Model

36

PREMIS Data Model

IntellectualEntities

Objects

RightsStatements

Agents

Events

Intellectual Entities

Examples:• Rabbit Run by John Updike (a book)• “Maggie at the beach”

(a photograph)• The Library of Congress Website (a

website)• The Library of Congress: American

Memory Home page (a web page)

• Set of content that is considered a single intellectual unit for purposes of management and description (e.g., a book, a photograph, a map, a database)

• May include other Intellectual Entities (e.g. a website that includes a web page)

• **Has one or more digital representations**

• Not fully described in PREMIS DD, but can be linked to in metadata describing digital representation

Objects

Examples:• chapter1.pdf (a file)• chapter1.pdf + chapter2.pdf +

chapter3.pdf (representation of a book w/3 chapters)

• TIFF file containing header and 2 images (2 bitstreams (images), each with own set of properties (semantic units): e.g., identifiers, technical metadata, inhibitors, … )

• Discrete unit of information in digital form

• **Objects are what repository actually preserves**

• Three types of Object:– FILE: named and ordered sequence

of bytes that is known by an operating system

– REPRESENTATION: set of files, including structural metadata, that, taken together, constitute a complete rendering of an Intellectual Entity

– BITSTREAM: data within a file with properties relevant for preservation purposes (but needs additional structure or reformatting to be stand-alone file)

Thailand MapThailand Map

Intellectual Entity

Object 1 Object 2 Object 3

Representation File File1 jpeg file1 TIFF file include:

3 bitstreams of images of map layers•Province•mountain,•river

It can be a web page that contains 3 files •HTML•CSS•JPEG

Example types of object for the preservation of Thailand Map

40

Object Examples: Thailand Map

Object Example: book in two versions

Intellectual EntityDa Vinci Code by

Dan Brown

Representation 1Page image

version

Representation 2ebook version

File 1: page1.tiff

File 2:page2.tiff

File N:pageN.tiff

File 1:book.lit

File N+1:METS.xml

Events

Examples:• Validation Event: use some tools to

verify that chapter1.pdf is a valid PDF file

• Ingest Event: transform an OAIS SIP into an AIP

• Migration Event: create a new version of an Object in an up-to-date format

• An action that involves or impacts at least one Object or Agent associated with or known by the preservation repository

• Helps document digital provenance. Can track history of Object through the chain of Events that occur during the Objects lifecycle

• Determining which Events are in scope is up to the repository (e.g., Events which occur before ingest, or after de-accession)

eventTypeEvent Type Descriptioncapture the process whereby a repository actively obtains an object

compression the process of coding data to save storage space or transmission time

creation the process of removing an object from the inventory of a repository

deaccession the process of removing an object from the inventory of a repository

decompression the process of reversing the effects of compression

decryption the process of converting encrypted data to plaintext

deletion the process of removing an object from repository storage

Event Type Descriptiondigital signature validation the process of determining that a decrypted digital signature matches an expected value

dissemination the process of retrieving an object from repository storage and making it available to users

fixity check the process of verifying that an object has not been changed in a given period

ingestion the process of adding objects to a preservation repository

message digest calculation the process by which a message digest(“hash”) is created

migration a transformation of an object creating a version in a morecontemporary format

Agents

Examples:• Rathachai Chawuthai (a person)• Asian Institute of Technology (an

organization)• Dark Archive in the Sunshine State

implementation (a system)• JHOVE version 1.0 (a software

program)

• Person, organization, or software program/system associated with an Event or a Right (permission statement)

• Agents are associated only indirectly to Objects through Events or Rights

• Not defined in detail in PREMIS DD; not considered core preservation metadata beyond identification

Rights Statements

Example:• Rathachai Chawuthai grants AIT

digital repository permission to make three copies of metadata_fundamentals.pdf for preservation purposes.

• An agreement with a rights holder that grants permission for the repository to undertake an action(s) associated with an Object(s) in the repository.

• Not a full rights expression language; focuses exclusively on permissions that take the form:– Agent X grants Permission Y to

the repository in regard to Object Z.

Semantic units pertaining to objects: technical metadata

• objectIdentifier• preservationLevel• significantProperties• objectCategory• objectCharacteristics

– fixity– size– format– creatingApplication– inhibitors– extension

• originalName• storage• environment• signatureInformation• relationship• linkingEventID• linkingIntellectual EntityID• linkingRights StatementID

Semantic units pertaining to Events: provenance and preservation activity

• eventIdentifier• eventType• eventDateTime• eventDetail• eventOutcome• eventOutcomeDetail• linkingAgentIdentifier• linkingObjectIdentifier

Semantic units pertaining to Rights

rightsStatement rightsStatement Identifier rightsBasis copyrightInformation licenseInformation statuteInformation

rightsGranted act restriction termOfGrant rightsGranted

linkingObjectIdentifier linkingAgentIdentifier rightsExtension

Semantic units pertaining to Agents

• agentIdentifier• agentName• agentType

PREMISPREMIS with METS

50

METS Background• XML based• Describes the structure of digital objects and associates

various kinds of metadata with their components• Uses the XML Schema facility for combining vocabularies

from different Namespaces for extensibility• Metadata is categorized into separate sections (embedded

or linked)• Records the names and locations of the files that comprise

those objects (embedded or linked)• Records a map of hyperlinks between components• Associates executable behaviour with the components

51

The Structure of a METS fileMETSheader

dmdSec

admSec

behaviorSec

structMap

fileSec file inventory

descriptive metadata

administrative metadata

behaviour metadata

structural map

<fileSec>

fileSec fileGrp

file

file

file

FLocat

<fileGrp ID="munahi010-aaa-fgrp-0001">

<file GROUPID="0" ID="munahi010-aaa-0001-0" MIMETYPE="image/tiff" ADMID="munahi010-aaa-tmd-0001-0"> <FLocat LOCTYPE="URL" xlink:href="file://hfs.ox.ac.uk/data/odl/munahi010/digObjects/aaa/0/munahi010-aaa-0001.tiff"/> </file>

<file GROUPID="6" ID="munahi010-aaa-0001-6" MIMETYPE="image/jpeg" ADMID="munahi010-aaa-tmd-0001-6"> <FLocat LOCTYPE="URL" xlink:href="http:odl/munahi010/digObjects/aaa/6/munahi010-aaa-0001-6.jpg"/> </file>

<file GROUPID="3" ID="munahi010-aaa-0001-3" MIMETYPE="image/jpeg" ADMID="munahi010-aaa-tmd-0001-3"> <FLocat LOCTYPE="URL" xlink:href="http:odl/munahi010/digObjects/aaa/3/munahi010-aaa-0001-3.jpg"/> </file>

</fileGrp>

The inside of a METS fileMETSheader

dmdSec

admSec

behaviorSec

structMap

fileSec file inventory

descriptive metadata

administrative metadata

behaviour metadata

structural map

<mdWrap MIMETYPE="text/xml" MDTYPE="MODS" LABEL="MODS Metadata"> <xmlData> <mods:mods> <mods:titleInfo> <mods:title>Cobbett's parliamentary history of England, from the Norman Conquest, in 1066 to the year, 1803 : from which last-mentioned epoch it is continued downwards in the work entitled, &amp;quot;The parliamentary debates&amp;quot;</mods:title> </mods:titleInfo> <mods:titleInfo type="alternative"> <mods:title>Cobbett's Parliamentary History -

volume 2</mods:title> </mods:titleInfo> <mods:name> <mods:namePart>$aGreat Britain. Parliament.</mods:namePart> <mods:role> <mods:roleTerm type="code“

authority="marcrelator">spn</mods:roleTerm> </mods:role> </mods:name> </mods:mods> </xmlData></mdWrap>

METS with PREMIS asOAIS Information Package

• OAIS repository functions for which METS is often used are submission or exchange (SIP), archiving (AIP), dissemination (DIP)

• A METS package is a good candidate for realization of an information object in an OAIS repository

• PREMIS satisfies need for Preservation Description Information: provenance, context, reference and fixity

• PREMIS is an elaboration and translation of OAIS• information model into implementable semantic units

57

58

Why do we need guidelines for using PREMIS with METS?

• Contents of each information package may vary depending on its function within a repository

• Need to determine how to include representation metadata and associate it with package components

• PREMIS data entities (objects, events, rights, agents) do not map perfectly to METS categories for representation metadata (techMD, digiProvMD, rightsMD, sourceMD)

• There are redundant elements between the two standards• Both have extensibility mechanisms• Flexibility of both standards requires implementation

choices• Predictability will enhance the ability for exchange with

minimal human intervention

59

Guidelines for Using PREMISwith METS for Exchange

60

http://www.loc.gov/standards/premis/guidelines-premismets.pdf

Benefits of using PREMIS in METS

• Packages together metadata necessary for digital preservation in a predictable format

• PREMIS provides technical and event metadata• METS provides structural metadata• Both standards are

– Openly available– Flexible– Extensible– Maintained by an open process

• Provides an exchange standard between repositories

61

Conclusions• Information preservation supports an organization’s identity

preservation• An organization must have a preservation policy• A preservation policy is realized by means of preservation

metadata • PREMIS Data Dictionary provides critical piece of reliable digital

preservation infrastructure comprising technology, standards, and best practice

• PREMIS Data Dictionary is a building block with which effective, sustainable digital preservation strategies can be implemented for various domains

• PREMIS is being widely implemented and experience using it needs to be shared

URLs

• PREMIS Maintenance Activity:

http://www.loc.gov/standards/premis/

• PREMIS Data Dictionary for Preservation Metadata, version 2.1:http://www.loc.gov/standards/premis/v2/premis-dd-2-1.pdf

Acknowledgement

• Some of the slides are based on slides by– Priscilla Caplan, Florida Center for Library Automation– Rathachai Chawuthai, Asian Institute of Technology– Angela Dappert, the British Library– Rebecca Guenther, Library of Congress– Brian Lavoie, OCLC

64