day 2, workshop 4, inge van nieuwerburgh
TRANSCRIPT
(meta)data standards for digital archivingDISH 2009 @ Rotterdam
Universiteitsbibliotheek Gent – MMLab UGent
Summary
• Introduction• Defining the problem• State of the art:
• OAIS• Data formats• Metadata schemas• Declarative containers
• Layered Metadata Model• Best practices
Introduction
BOM Vl: Preservation and disclosure of multimedia data in Flanders
Flemish project – 1.5 yearsCross sectoral: broadcasters, archival institutions, cultural sector and the libraries.Studies:• Needs for preservation• Selection• Metadata standards & exchange formats• Digital rights• Supply and distribution models
Defining the problem
Problems when archiving digital information
Problem 1.• Analogous formats are disappearing and have to be replaced by digital alternatives.• Quick growth of data.• Discrepancy between the short life span of digital technology and the need for long term archiving.
Problems when archiving digital information
Problem 2.• In digital form, information is abstract, independent from the storage medium. The abstract information has to be preserved, not the medium.
Problems when archiving digital information
But also consider…
Growth Storage capacity of desktop computers (HanKwang 2008)
Evolution of used file formats (PRONOM)
1980 1990 2000
‘86 – TIFF3
’87 ‘88 TIFF4 & 5
‘92 – TIFF6
‘96 - PNG 1.0
’99 – PNG 1.2
’00 - JPEG2000
‘92 - JPEG’87 – GIF87
’87 – GIF89
‘92 - MrSID
‘85 - BMP
‘84 - TGA ‘03 - SVG
’84 - GEM Raster
Evolution format derivatives
MIME type image/tiff:• TIFF (alle versies)• TIFF/IT• TIFF G4/LZW/UNC• Digital Negative Format (DNG)• GeoTIFF• Pyramid TIFF• …
Bron: PRONOM Technical Registry [http://www.nationalarchives.gov.uk/pronom/]
Riscs at the long term
Bit Errors/BugsFile Format Changes
Time
Changing Technology
Organizational changes
Interpretation of the format
1980 1990 2000
Study: state of the art (meta)data standards
• What is a digital archive NOT:• mass storage for active applications and data• a networked backup solution
• What is a digital archive:• Storage of digital information with historical, scientific, financial or legal value in the long term.• Platform independent access to digital information for 50, 100 years or longer.
What is a digital archive?
OAIS
Open Archival Information System (OAIS)
• Reference model for the description of digital archives.• Developed in 1982:
• NASA (US)• ESA (EU)• RSA (USSR)• NASDA (Japan)• …
•Since 2002 ISO Standard 14721
OAIS model
• Consists of 3 parts:1. Description of an archival system: responsabilities,
procedures and a common terminology.2. Functional model: all processes needed for the
longterm preservation of digital information.3. Information model: describes the stored digital
information.
OAIS functional model
• Need to explore the necessary, recommended and generally used standards
• technical schemas• descriptive schemas• preservation schemas• structural schemas
• What are the different metadata schemas (if any) used in the different cultural sectors?
Standards
Data formats
What
• Raw data is increasingly storage consuming• Need to compress: compression standards
• video: Mpeg-2, H.264/Mpeg-4 AVC, Motion JPEG2000• audio: MP3, AAC • images: JPEG, TIFF
• Need for container formats for exchange of A/V material• MXF, AVI, WMA, MP4
Metadata schemas
What
• Descriptive metadata• Administrative metadata• Preservation metadata• Technical metadata• Usage data
Standards
• Especially for descriptive metadata: differences in sectors=> Preferred standard per sector?• Differences in detail• Differences in structure• Differences in relations
• Preservation metadata: PREMIS• Conceptual models
Declarative containers
What
• Compound information objects, combining descriptive, administrative and/or structural metadata• Advantage: the ease to exchange and reuse them• some examples:
•METS•MPEG-21 DIDL: describe complex digital objects•LOM: learning objects•ORE: model to describe aggregations
Layered Metadata Model
How to proceed?
• Need for a layered metadata model to manage digital archive
• Why? Too much differences between data models• Need a common ground
Solution: layered metadata model
• Model in different layers:• A generic top level descriptive metadata schema
(DC)• A refined standard per sector for detail, to preserve
the metadata in detail• + Preservation metadata, technical metadata and
rights metadata
Layered metadata model
MARCXML TIFF PSD
Descriptive metadata: Dublin Core
Preservation metadata: PREMIS
Rights metadata: PREMIS, MPEG-21/REL, INDECS, ODRL, XrML
Technical metadata:PREMIS, MPEG-7, Z38.87, AudioMD, VideoMD, TextMD
MARC Standard
TIFF Standard
Layered Metadata Model
Descriptive Model: Dublin Core
• Most interoperable, cross sectoral.• Greatest common divider of all metadata models.• All fields are repeatable and optional.
Mapping between own metadata model.
• Dublin Core as pidgin:• DC as common layer above the own metadata.• DC as model for querying.• Discovery and identification of digital objects.
Layered Metadata Model
Descriptive Model: Dublin Core
How to disseminate as DC?• Crosswalk to DC is made for the most important
metadata models used in the sectors:
Libraries: MARC21
A/V Sector: P/Meta
Arts sector and museums: CDWA and SPECTRUM
Archiving sector: ISAD(G) and EAD• Crosswalks can be used to disseminate the DC records
via OAI-PMH, GRDDL(XSLT), mapping API (D2RQ), or ontology linking.
Layered Metadata Model
Preservation Model: PREMIS
• Administrative metadata + Rights Metadata
assisting in the management of the digital objects.
• Technical metadata
assisting the access (conversions or emulation).
• Preservation Metadata
Tracking the provenance – history of all actions on an object.
Layered Metadata Model
Preservation Model: PREMIS
Layered Metadata Model
Preservation Model: PREMIS
• Objects: Describes the objects to be preserved in a technical manner.
• 3 subclasses:• Bitstream• File• Representation
• Facilitates the conversion or emulation process.
Layered Metadata Model
Preservation Model: PREMIS
Objects: • Describes the objects to be preserved in a
technical manner.• 3 subclasses:
• Bitstream• File• Representation
• Facilitates the conversion or emulation process.
Layered Metadata Model
Preservation Model: PREMIS
Agents: • Aggregates information about agents (persons,
organisations, software) associated with rights management and preservation events in the life of a data object.
• No direct relation between Agent and Object:• May hold or grant one or more rights• May carry out, authorize, or compel one or
more events.• Identify agents uniquely.
Layered Metadata Model
Preservation Model: PREMIS
Events: • Actions that modify objects should always be
recorded. Other actions such as copying an object for backup purposes may be recorded in an Event entity.
• Stored separately from the digital object.
Layered Metadata Model
Preservation Model: PREMIS
Rights:• The minimum core rights information that a
preservation repository must know, however, is what rights or permissions a repository has to carry out actions related to objects within the repository.
• These may be granted by copyright law, by statute, or by a license agreement with the rightsholder.
Layered Metadata Model
Preservation Model: PREMIS
Intellectual Entity:• Descriptive metadata: out of scope for PREMIS.• Dublin Core
Layered Metadata Model
Preservation Model: PREMIS
PREMIS OWL:• Semantic (OWL) ontology following the data
dictionary of PREMIS 2.0.• Published Online:
http://multimedialab.elis.ugent.be/users/samcoppe/ontologies/Premis/premis.owl
• Documentation Online:
http://multimedialab.elis.ugent.be/users/samcoppe/ontologies/Premis/index.html
Best practicesOr
How to minimize risks
Best Practice # 1: Store technical metadata
Bron: Adrian Brown, National Archives UK; “Developing Practical Approaches to Active Preservation”
Bitrot/Software errors
• No storage device is perfect and eternal.• David Rosenthal Stanford University
“Bit Preservation: A Solved Problem?”• Bit half-life of 8 x 10^17 year => gives 50% chance
that 1 Petabyte survives a century without errors.• Comparable studies by Carnegie Mellon University,
Google and CERN
Bitrot/Software errors
• Volker Heydegger University of Cologne• Analysing the Impact of File Formats on Digital Integrity
Best Practice # 2: Preserve preservation metadata
• Checksums• Digital Signatures• Provenance• …
Interpretation riscs
One of the coolest and oldest dwarf stars ever been found.
Best Practice # 3: Representation metadata
• Time• Place• Wave lengths/Calibration data• Provenance
Technology Changes
4b50 0403 0014 0000 0008 0cdb 282e 7d22ddaa 0243 0001 ab00 0002 000f 0000 63415f65 666f
INC $D020DEC $D020JMP $2000LDX $D020INXSTX $D020JMP $2000LDA $5000
+ =
Documentation
Information
Syntax
Semantics
Best Practice # 4: Do not trust software
• It is an illusion to think that software will always offer access to the archived data.
• Computer software is an active component in the archive and it knows only two possible states:
1. It works and is maintained.
2. It does not work and is not maintained.
Best Practice # 4: Do not trust software (cont.)
• Case 2: Software does not work, is not maintained:– Documentation metadata has to contain the source
code of the original software.– Emulation has to be foreseen; metadata has to
contain all the emulation parameters.• Case 1: Software works, is maintained:
– The archive has the software. – The user has the software.– Both cases have a dynamic metadata layer with all
the software aspects needed to access the data.
Descriptive metadata
• Are descriptive metadata (or other access tools like thumbnails, previews) data or metadata?
• Non-discussion: ‘metadata’ is a relative term.• as Data:
• Advantage: descriptive metadata are ‘core business’, too valuable not to be archived.
• Disadvantage: this type data is very dynamic.• as Metadata:
• Advantage : metadata are dynamic; can be adapted to the needs of the archive.
• Disadvantage : which descriptive model have to be used: MARC, EAD, P/META,…?
Best Practice # 5: Store descriptive metadata as dataProvide a broadly accepted descriptive model like Dublin Core
• Dublin Core describes the ‘Who’, ‘What’,’Where’, ‘When’ and ‘How’.
• Sector specific descriptive metadata models have finer granularity.
• Use international standards (MARC, EAD, P/Meta).
Want to know more?
Book (in dutch):“(Meta)datastandaarden voor digitale archieven full-text available book “, Bastijns, Paul; Coppens, Sam; Corneillie, Siska; Hochstenbach, Patrick et al. , (2009); http://hdl.handle.net/1854/LU-480734
Deliverable Layered metadata model: http://hdl.handle.net/1854/LU-764194
Q & A