Transcript
Page 1: PERICLES  Information Packaging Techniques

GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]

“This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”.

Information Packaging TechniquesAn overview of methods and standardsAnna-Grit Eggers (University of Goettingen)

Page 2: PERICLES  Information Packaging Techniques

Information Packaging Techniques

a) Simple Archive Container Formatsb) Structured Packaging for Archivingc) Metadata Schemes

Contents

Page 3: PERICLES  Information Packaging Techniques

Information Packaging

Simple Archive Container Formats

Page 4: PERICLES  Information Packaging Techniques

● Sole purpose: packing files together.● File containers are often combined with a

compression option to reduce the needed disk space for storage.

● All files in the containers are stored equally as payload, and the containers have to be unzipped to fully access the packed files.

Simple Archive Container Formats

Page 5: PERICLES  Information Packaging Techniques

● The ZIP format was introduced by PKWARE in 1989: https://www.pkware.com/support/zip-app-note◦ Public domain◦ Supports various compression algorithms Dateiicon von

WinZIP

◦ Each single file in the archive container is compressed => possible to unzip single files from the container

◦ Option to aggregate files without using compression◦ Preserves the original file paths and offers optional encryption

ZIP

Page 6: PERICLES  Information Packaging Techniques

◦ Size reduction: Archive containers can be divided into parts

◦ Flexibility: add or extract single files from a zip archive without touching the other stored files

• + advantage: possibility to frequently change packages

• - disadvantage: causes overhead in the form of an additional file list which is stored together with the content.

◦ Loss prevention: ZIP uses cyclic redundancy checks. In case a file becomes corrupt, the other files would be still flawlessly accessible.

ZIP (cont.)

Page 7: PERICLES  Information Packaging Techniques

● A widespread container format in UNIX (ustar, pax), LINUX (GNU tar), and BSD (bsdtar) environments

● Can be enabled on Windows Operating Systems for example by software libraries such as LibArchive

● Writes files sequentially into one file, called ‘tarball’

● Was originally used for tape drives

● TAR is combined with a compression algorithm like gzip or bzip2

● In contrast to ZIP, TAR doesn’t allow extracting single files from the container.

TAR

Page 8: PERICLES  Information Packaging Techniques

Information Packaging

Structured Packaging for Archiving

Page 9: PERICLES  Information Packaging Techniques

Creation of an information container, in which the packed information can be stored in a well-defined and structured way.

Structured Packaging for Archiving

Page 10: PERICLES  Information Packaging Techniques

● A standard for storing files and their metadata in a well-defined directory structure

● Developed by the California Digital Library Digital Preservation Group and the Library of Congress

● Often used for preservation purposes, e.g. by Tate (UK).● Data files are stored in a data directory● Their checksums are saved in a manifest file● The metadata, or tags, are listed together with their checksums in a

tag-manifest file. ● A further BagIt file stores the used BagIt version and the file encoding. ● BagIt is often combined with a simple archiving format, such as TAR or

ZIP, for the serialisation of the bag directory, or used only as directory structure technique for sensible content.

● See: http://www.cdlib.org/cdlinfo/2008/07/02/bagit-transferring-digital-content/

BagIt

Page 11: PERICLES  Information Packaging Techniques

● Container files, which contain file aggregations serving a specific purpose

● Often used to store all files belonging to a video, and to group them as a single self-describing video file.

● Popular examples for video containers are AVI and Ogg Media.

Xiph.Org Foundation

Compound Documents

Page 12: PERICLES  Information Packaging Techniques

● The source code of a computer program is often stored together with other project-related resources, such as images, in a package with a well-defined directory structure.

● Structured source code packages are often executable (=> run the computer program).

● Examples: Java’s JARs, Ruby Gems and Python Eggs.◦ The JAR format is derived from the ZIP format. ◦ JAR be seen as compound document similar to the video containers,

because the Java program which is represented by the JAR can be executed by running the JAR.

◦ It contains a well-defined path structure and an optional manifest file, which can be regarded as metadata file.

◦ Therefore the passage to the subsequent category of metadata schemes becomes fluent.

Structured Source Code Packages

Page 13: PERICLES  Information Packaging Techniques

Information Packaging

Metadata schemes

Page 14: PERICLES  Information Packaging Techniques

● Mostly used in combination with packaging● But also be kept beside the described content and linked to it● Or embedded with the content● Most common is the use of the XML format to define a scheme for

a use domain.

Metadata schemes

Page 15: PERICLES  Information Packaging Techniques

● METS standS for Metadata Encoding and Transmission Standard maintained by the METS Editorial Board

● It provides an XML schema for encoding different types of metadata ● It simplifies the administration and exchange of digital objects between data

collections. ● A METS-file serves as a hub file that links together the digital object with all

its belonging files and the metadata to create a digital entity. ● A METS XML-file consists of:

◦ Header: Contains metadata of the METS file itself, like the creation date and the authors.◦ Descriptive metadata: Provides links to external metadata documents.◦ Administrative metadata: Stores the data concerning storage, rights and creation.◦ File section: Manages a list of all files belonging to the DO.◦ Structural map: Describes the inner structure of the DO and provides the linkage between

data and metadata.◦ Structural links: Provides hyperlinks and is useful for the archiving of websites.◦ Behaviour: Stores executable instructions for the behaviour.

● See: http://www.loc.gov/standards/mets/

METS

Page 16: PERICLES  Information Packaging Techniques

● ORE is a standard for Object Reuse and Exchange by the Open Archives Initiative OAI.

● It implements two new types of resources: Aggregations and Resource Maps.

● An Aggregation is a representation of a set of associated web resources. ◦ It is like a Semantic Web resource, hence has no representation by itself.

● A Resource Map belongs to an Aggregation. ◦ It holds a machine-readable description of the Aggregation and a list of

associated resources. In addition, it describes the relationships and properties relevant to all resources and has some metadata for itself.

● Both resources are addressed by an HTTP URI in the Web. ◦ Aggregations can be used by applications to visualise all associated

resources processing them as a collection. ◦ This simplifies the exchange and archiving of resource sets. ◦ Various formats for the Resource are available: Atom XML, RDF/XML, and

RDFa.◦ All of these formats support serialisation.

● See: http://www.openarchives.org/ore/

OAI-ORE

Page 17: PERICLES  Information Packaging Techniques

● Developed by the PREservation Metadata: Implementation Strategies (PREMIS) group of the Library of Congress

● It supports the preservation and long-term usability of digital objects and their metadata

● The Data Dictionary is a specification for metadata handling in digital archiving systems.

● The data model provides five entities: intellectual, object, event, agent and rights.

● See: https://www.era.lib.ed.ac.uk/bitstream/handle/1842/3339/Higgins PREMIS_V-2-1-2009-03.pdf?sequence=1&isAllowed=y

PREMIS Data Dictionary

Page 18: PERICLES  Information Packaging Techniques

● Used to describe and bundle research data in a way that supports citation and sharing in a machine-readable fashion.

● The initiative includes a number of techniques that have a set of principles in common: ◦ Identity◦ Aggregation ◦ Annotation

● The metadata is described in the RO ontology.● Bundling can be done using different techniques, including the

RO bundling and BagIt.● See: http://www.researchobject.org/

Research Object (RO)

Page 19: PERICLES  Information Packaging Techniques
Page 20: PERICLES  Information Packaging Techniques

● The Long-term preservation Metadata for Electronic Resources project provides an XML schema particularly for long-term preservation purposes, based on the preservation implementation schema by the National Library of New Zealand.

● The schema was developed by the DNB (Deutsche National Bibliothek) as a schema for technical metadata.

● It is used, in combination with METS, for defining the packaging format UOF. ● It is designed for cooperating with standard exchange formats, and can be

integrated in METS.● The LMER-schema consists of the following sections:

◦ Object: The object with an URN as persistent identifier.◦ Process: Protocol of technical changes.◦ Metadata: Metadata for each file that belongs to the digital object.◦ Metadata modifications: Protocol of changes of the metadata.

● See: http://www.dnb.de/DE/Standardisierung/LMER/lmer_node.html

LMER

Page 21: PERICLES  Information Packaging Techniques

● Timothy DiLauro and Jonathan Petters introduced the Data Conservancy Package Tool, at the International Digital Curation Conference (IDCC) 2015 (http://www.dcc.ac.uk/sites/default/files/documents/IDCC15/196.pdf).

● The tool facilitates the creation of packages for research data objects in the conservation domain

● It provides a user interface for the definition of packages.● It focusses on curation activities.●See:

http://dataconservancy.org/wp-content/uploads/2014/10/DCSDOCPKG-PackageToolsDocumentationHome-Full.pdf 

The Data Conservancy Package Tool


Top Related