day 2, workshop 4, inge van nieuwerburgh

Post on 11-May-2015

4.414 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

(meta)data standards for digital archivingDISH 2009 @ Rotterdam

Universiteitsbibliotheek Gent – MMLab UGent

Summary

• Introduction• Defining the problem• State of the art:

• OAIS• Data formats• Metadata schemas• Declarative containers

• Layered Metadata Model• Best practices

Introduction

BOM Vl: Preservation and disclosure of multimedia data in Flanders

Flemish project – 1.5 yearsCross sectoral: broadcasters, archival institutions, cultural sector and the libraries.Studies:• Needs for preservation• Selection• Metadata standards & exchange formats• Digital rights• Supply and distribution models

Defining the problem

Problems when archiving digital information

Problem 1.• Analogous formats are disappearing and have to be replaced by digital alternatives.• Quick growth of data.• Discrepancy between the short life span of digital technology and the need for long term archiving.

Problems when archiving digital information

Problem 2.• In digital form, information is abstract, independent from the storage medium. The abstract information has to be preserved, not the medium.

Problems when archiving digital information

But also consider…

Growth Storage capacity of desktop computers (HanKwang 2008)

Evolution of used file formats (PRONOM)

1980 1990 2000

‘86 – TIFF3

’87 ‘88 TIFF4 & 5

‘92 – TIFF6

‘96 - PNG 1.0

’99 – PNG 1.2

’00 - JPEG2000

‘92 - JPEG’87 – GIF87

’87 – GIF89

‘92 - MrSID

‘85 - BMP

‘84 - TGA ‘03 - SVG

’84 - GEM Raster

Evolution format derivatives

MIME type image/tiff:• TIFF (alle versies)• TIFF/IT• TIFF G4/LZW/UNC• Digital Negative Format (DNG)• GeoTIFF• Pyramid TIFF• …

Bron: PRONOM Technical Registry [http://www.nationalarchives.gov.uk/pronom/]

Riscs at the long term

Bit Errors/BugsFile Format Changes

Time

Changing Technology

Organizational changes

Interpretation of the format

1980 1990 2000

Study: state of the art (meta)data standards

• What is a digital archive NOT:• mass storage for active applications and data• a networked backup solution

• What is a digital archive:• Storage of digital information with historical, scientific, financial or legal value in the long term.• Platform independent access to digital information for 50, 100 years or longer.

What is a digital archive?

OAIS

Open Archival Information System (OAIS)

• Reference model for the description of digital archives.• Developed in 1982:

• NASA (US)• ESA (EU)• RSA (USSR)• NASDA (Japan)• …

•Since 2002 ISO Standard 14721

OAIS model

• Consists of 3 parts:1. Description of an archival system: responsabilities,

procedures and a common terminology.2. Functional model: all processes needed for the

longterm preservation of digital information.3. Information model: describes the stored digital

information.

OAIS functional model

• Need to explore the necessary, recommended and generally used standards

• technical schemas• descriptive schemas• preservation schemas• structural schemas

• What are the different metadata schemas (if any) used in the different cultural sectors?

Standards

Data formats

What

• Raw data is increasingly storage consuming• Need to compress: compression standards

• video: Mpeg-2, H.264/Mpeg-4 AVC, Motion JPEG2000• audio: MP3, AAC • images: JPEG, TIFF

• Need for container formats for exchange of A/V material• MXF, AVI, WMA, MP4

Metadata schemas

What

• Descriptive metadata• Administrative metadata• Preservation metadata• Technical metadata• Usage data

Standards

• Especially for descriptive metadata: differences in sectors=> Preferred standard per sector?• Differences in detail• Differences in structure• Differences in relations

• Preservation metadata: PREMIS• Conceptual models

Declarative containers

What

• Compound information objects, combining descriptive, administrative and/or structural metadata• Advantage: the ease to exchange and reuse them• some examples:

•METS•MPEG-21 DIDL: describe complex digital objects•LOM: learning objects•ORE: model to describe aggregations

Layered Metadata Model

How to proceed?

• Need for a layered metadata model to manage digital archive

• Why? Too much differences between data models• Need a common ground

Solution: layered metadata model

• Model in different layers:• A generic top level descriptive metadata schema

(DC)• A refined standard per sector for detail, to preserve

the metadata in detail• + Preservation metadata, technical metadata and

rights metadata

Layered metadata model

MARCXML TIFF PSD

Descriptive metadata: Dublin Core

Preservation metadata: PREMIS

Rights metadata: PREMIS, MPEG-21/REL, INDECS, ODRL, XrML

Technical metadata:PREMIS, MPEG-7, Z38.87, AudioMD, VideoMD, TextMD

MARC Standard

TIFF Standard

Layered Metadata Model

Descriptive Model: Dublin Core

• Most interoperable, cross sectoral.• Greatest common divider of all metadata models.• All fields are repeatable and optional.

Mapping between own metadata model.

• Dublin Core as pidgin:• DC as common layer above the own metadata.• DC as model for querying.• Discovery and identification of digital objects.

Layered Metadata Model

Descriptive Model: Dublin Core

How to disseminate as DC?• Crosswalk to DC is made for the most important

metadata models used in the sectors:

Libraries: MARC21

A/V Sector: P/Meta

Arts sector and museums: CDWA and SPECTRUM

Archiving sector: ISAD(G) and EAD• Crosswalks can be used to disseminate the DC records

via OAI-PMH, GRDDL(XSLT), mapping API (D2RQ), or ontology linking.

Layered Metadata Model

Preservation Model: PREMIS

• Administrative metadata + Rights Metadata

assisting in the management of the digital objects.

• Technical metadata

assisting the access (conversions or emulation).

• Preservation Metadata

Tracking the provenance – history of all actions on an object.

Layered Metadata Model

Preservation Model: PREMIS

Layered Metadata Model

Preservation Model: PREMIS

• Objects: Describes the objects to be preserved in a technical manner.

• 3 subclasses:• Bitstream• File• Representation

• Facilitates the conversion or emulation process.

Layered Metadata Model

Preservation Model: PREMIS

Objects: • Describes the objects to be preserved in a

technical manner.• 3 subclasses:

• Bitstream• File• Representation

• Facilitates the conversion or emulation process.

Layered Metadata Model

Preservation Model: PREMIS

Agents: • Aggregates information about agents (persons,

organisations, software) associated with rights management and preservation events in the life of a data object.

• No direct relation between Agent and Object:• May hold or grant one or more rights• May carry out, authorize, or compel one or

more events.• Identify agents uniquely.

Layered Metadata Model

Preservation Model: PREMIS

Events: • Actions that modify objects should always be

recorded. Other actions such as copying an object for backup purposes may be recorded in an Event entity.

• Stored separately from the digital object.

Layered Metadata Model

Preservation Model: PREMIS

Rights:• The minimum core rights information that a

preservation repository must know, however, is what rights or permissions a repository has to carry out actions related to objects within the repository.

• These may be granted by copyright law, by statute, or by a license agreement with the rightsholder.

Layered Metadata Model

Preservation Model: PREMIS

Intellectual Entity:• Descriptive metadata: out of scope for PREMIS.• Dublin Core

Layered Metadata Model

Preservation Model: PREMIS

PREMIS OWL:• Semantic (OWL) ontology following the data

dictionary of PREMIS 2.0.• Published Online:

http://multimedialab.elis.ugent.be/users/samcoppe/ontologies/Premis/premis.owl

• Documentation Online:

http://multimedialab.elis.ugent.be/users/samcoppe/ontologies/Premis/index.html

Best practicesOr

How to minimize risks

Best Practice # 1: Store technical metadata

Bron: Adrian Brown, National Archives UK; “Developing Practical Approaches to Active Preservation”

Bitrot/Software errors

• No storage device is perfect and eternal.• David Rosenthal Stanford University

“Bit Preservation: A Solved Problem?”• Bit half-life of 8 x 10^17 year => gives 50% chance

that 1 Petabyte survives a century without errors.• Comparable studies by Carnegie Mellon University,

Google and CERN

Bitrot/Software errors

• Volker Heydegger University of Cologne• Analysing the Impact of File Formats on Digital Integrity

Best Practice # 2: Preserve preservation metadata

• Checksums• Digital Signatures• Provenance• …

Interpretation riscs

One of the coolest and oldest dwarf stars ever been found.

Best Practice # 3: Representation metadata

• Time• Place• Wave lengths/Calibration data• Provenance

Technology Changes

4b50 0403 0014 0000 0008 0cdb 282e 7d22ddaa 0243 0001 ab00 0002 000f 0000 63415f65 666f

INC $D020DEC $D020JMP $2000LDX $D020INXSTX $D020JMP $2000LDA $5000

+ =

Documentation

Information

Syntax

Semantics

Best Practice # 4: Do not trust software

• It is an illusion to think that software will always offer access to the archived data.

• Computer software is an active component in the archive and it knows only two possible states:

1. It works and is maintained.

2. It does not work and is not maintained.

Best Practice # 4: Do not trust software (cont.)

• Case 2: Software does not work, is not maintained:– Documentation metadata has to contain the source

code of the original software.– Emulation has to be foreseen; metadata has to

contain all the emulation parameters.• Case 1: Software works, is maintained:

– The archive has the software. – The user has the software.– Both cases have a dynamic metadata layer with all

the software aspects needed to access the data.

Descriptive metadata

• Are descriptive metadata (or other access tools like thumbnails, previews) data or metadata?

• Non-discussion: ‘metadata’ is a relative term.• as Data:

• Advantage: descriptive metadata are ‘core business’, too valuable not to be archived.

• Disadvantage: this type data is very dynamic.• as Metadata:

• Advantage : metadata are dynamic; can be adapted to the needs of the archive.

• Disadvantage : which descriptive model have to be used: MARC, EAD, P/META,…?

Best Practice # 5: Store descriptive metadata as dataProvide a broadly accepted descriptive model like Dublin Core

• Dublin Core describes the ‘Who’, ‘What’,’Where’, ‘When’ and ‘How’.

• Sector specific descriptive metadata models have finer granularity.

• Use international standards (MARC, EAD, P/Meta).

Want to know more?

Book (in dutch):“(Meta)datastandaarden voor digitale archieven full-text available book “, Bastijns, Paul; Coppens, Sam; Corneillie, Siska; Hochstenbach, Patrick et al. , (2009); http://hdl.handle.net/1854/LU-480734

Deliverable Layered metadata model: http://hdl.handle.net/1854/LU-764194

Q & A

top related