pericles information packaging techniques

21
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] “This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”. Information Packaging Techniques An overview of methods and standards Anna-Grit Eggers (University of Goettingen)

Upload: periclesfp7

Post on 15-Apr-2017

224 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: PERICLES  Information Packaging Techniques

GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]

“This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”.

Information Packaging TechniquesAn overview of methods and standardsAnna-Grit Eggers (University of Goettingen)

Page 2: PERICLES  Information Packaging Techniques

Information Packaging Techniques

a) Simple Archive Container Formatsb) Structured Packaging for Archivingc) Metadata Schemes

Contents

Page 3: PERICLES  Information Packaging Techniques

Information Packaging

Simple Archive Container Formats

Page 4: PERICLES  Information Packaging Techniques

● Sole purpose: packing files together.● File containers are often combined with a

compression option to reduce the needed disk space for storage.

● All files in the containers are stored equally as payload, and the containers have to be unzipped to fully access the packed files.

Simple Archive Container Formats

Page 5: PERICLES  Information Packaging Techniques

● The ZIP format was introduced by PKWARE in 1989: https://www.pkware.com/support/zip-app-note◦ Public domain◦ Supports various compression algorithms Dateiicon von

WinZIP

◦ Each single file in the archive container is compressed => possible to unzip single files from the container

◦ Option to aggregate files without using compression◦ Preserves the original file paths and offers optional encryption

ZIP

Page 6: PERICLES  Information Packaging Techniques

◦ Size reduction: Archive containers can be divided into parts

◦ Flexibility: add or extract single files from a zip archive without touching the other stored files

• + advantage: possibility to frequently change packages

• - disadvantage: causes overhead in the form of an additional file list which is stored together with the content.

◦ Loss prevention: ZIP uses cyclic redundancy checks. In case a file becomes corrupt, the other files would be still flawlessly accessible.

ZIP (cont.)

Page 7: PERICLES  Information Packaging Techniques

● A widespread container format in UNIX (ustar, pax), LINUX (GNU tar), and BSD (bsdtar) environments

● Can be enabled on Windows Operating Systems for example by software libraries such as LibArchive

● Writes files sequentially into one file, called ‘tarball’

● Was originally used for tape drives

● TAR is combined with a compression algorithm like gzip or bzip2

● In contrast to ZIP, TAR doesn’t allow extracting single files from the container.

TAR

Page 8: PERICLES  Information Packaging Techniques

Information Packaging

Structured Packaging for Archiving

Page 9: PERICLES  Information Packaging Techniques

Creation of an information container, in which the packed information can be stored in a well-defined and structured way.

Structured Packaging for Archiving

Page 10: PERICLES  Information Packaging Techniques

● A standard for storing files and their metadata in a well-defined directory structure

● Developed by the California Digital Library Digital Preservation Group and the Library of Congress

● Often used for preservation purposes, e.g. by Tate (UK).● Data files are stored in a data directory● Their checksums are saved in a manifest file● The metadata, or tags, are listed together with their checksums in a

tag-manifest file. ● A further BagIt file stores the used BagIt version and the file encoding. ● BagIt is often combined with a simple archiving format, such as TAR or

ZIP, for the serialisation of the bag directory, or used only as directory structure technique for sensible content.

● See: http://www.cdlib.org/cdlinfo/2008/07/02/bagit-transferring-digital-content/

BagIt

Page 11: PERICLES  Information Packaging Techniques

● Container files, which contain file aggregations serving a specific purpose

● Often used to store all files belonging to a video, and to group them as a single self-describing video file.

● Popular examples for video containers are AVI and Ogg Media.

Xiph.Org Foundation

Compound Documents

Page 12: PERICLES  Information Packaging Techniques

● The source code of a computer program is often stored together with other project-related resources, such as images, in a package with a well-defined directory structure.

● Structured source code packages are often executable (=> run the computer program).

● Examples: Java’s JARs, Ruby Gems and Python Eggs.◦ The JAR format is derived from the ZIP format. ◦ JAR be seen as compound document similar to the video containers,

because the Java program which is represented by the JAR can be executed by running the JAR.

◦ It contains a well-defined path structure and an optional manifest file, which can be regarded as metadata file.

◦ Therefore the passage to the subsequent category of metadata schemes becomes fluent.

Structured Source Code Packages

Page 13: PERICLES  Information Packaging Techniques

Information Packaging

Metadata schemes

Page 14: PERICLES  Information Packaging Techniques

● Mostly used in combination with packaging● But also be kept beside the described content and linked to it● Or embedded with the content● Most common is the use of the XML format to define a scheme for

a use domain.

Metadata schemes

Page 15: PERICLES  Information Packaging Techniques

● METS standS for Metadata Encoding and Transmission Standard maintained by the METS Editorial Board

● It provides an XML schema for encoding different types of metadata ● It simplifies the administration and exchange of digital objects between data

collections. ● A METS-file serves as a hub file that links together the digital object with all

its belonging files and the metadata to create a digital entity. ● A METS XML-file consists of:

◦ Header: Contains metadata of the METS file itself, like the creation date and the authors.◦ Descriptive metadata: Provides links to external metadata documents.◦ Administrative metadata: Stores the data concerning storage, rights and creation.◦ File section: Manages a list of all files belonging to the DO.◦ Structural map: Describes the inner structure of the DO and provides the linkage between

data and metadata.◦ Structural links: Provides hyperlinks and is useful for the archiving of websites.◦ Behaviour: Stores executable instructions for the behaviour.

● See: http://www.loc.gov/standards/mets/

METS

Page 16: PERICLES  Information Packaging Techniques

● ORE is a standard for Object Reuse and Exchange by the Open Archives Initiative OAI.

● It implements two new types of resources: Aggregations and Resource Maps.

● An Aggregation is a representation of a set of associated web resources. ◦ It is like a Semantic Web resource, hence has no representation by itself.

● A Resource Map belongs to an Aggregation. ◦ It holds a machine-readable description of the Aggregation and a list of

associated resources. In addition, it describes the relationships and properties relevant to all resources and has some metadata for itself.

● Both resources are addressed by an HTTP URI in the Web. ◦ Aggregations can be used by applications to visualise all associated

resources processing them as a collection. ◦ This simplifies the exchange and archiving of resource sets. ◦ Various formats for the Resource are available: Atom XML, RDF/XML, and

RDFa.◦ All of these formats support serialisation.

● See: http://www.openarchives.org/ore/

OAI-ORE

Page 17: PERICLES  Information Packaging Techniques

● Developed by the PREservation Metadata: Implementation Strategies (PREMIS) group of the Library of Congress

● It supports the preservation and long-term usability of digital objects and their metadata

● The Data Dictionary is a specification for metadata handling in digital archiving systems.

● The data model provides five entities: intellectual, object, event, agent and rights.

● See: https://www.era.lib.ed.ac.uk/bitstream/handle/1842/3339/Higgins PREMIS_V-2-1-2009-03.pdf?sequence=1&isAllowed=y

PREMIS Data Dictionary

Page 18: PERICLES  Information Packaging Techniques

● Used to describe and bundle research data in a way that supports citation and sharing in a machine-readable fashion.

● The initiative includes a number of techniques that have a set of principles in common: ◦ Identity◦ Aggregation ◦ Annotation

● The metadata is described in the RO ontology.● Bundling can be done using different techniques, including the

RO bundling and BagIt.● See: http://www.researchobject.org/

Research Object (RO)

Page 19: PERICLES  Information Packaging Techniques
Page 20: PERICLES  Information Packaging Techniques

● The Long-term preservation Metadata for Electronic Resources project provides an XML schema particularly for long-term preservation purposes, based on the preservation implementation schema by the National Library of New Zealand.

● The schema was developed by the DNB (Deutsche National Bibliothek) as a schema for technical metadata.

● It is used, in combination with METS, for defining the packaging format UOF. ● It is designed for cooperating with standard exchange formats, and can be

integrated in METS.● The LMER-schema consists of the following sections:

◦ Object: The object with an URN as persistent identifier.◦ Process: Protocol of technical changes.◦ Metadata: Metadata for each file that belongs to the digital object.◦ Metadata modifications: Protocol of changes of the metadata.

● See: http://www.dnb.de/DE/Standardisierung/LMER/lmer_node.html

LMER

Page 21: PERICLES  Information Packaging Techniques

● Timothy DiLauro and Jonathan Petters introduced the Data Conservancy Package Tool, at the International Digital Curation Conference (IDCC) 2015 (http://www.dcc.ac.uk/sites/default/files/documents/IDCC15/196.pdf).

● The tool facilitates the creation of packages for research data objects in the conservation domain

● It provides a user interface for the definition of packages.● It focusses on curation activities.●See:

http://dataconservancy.org/wp-content/uploads/2014/10/DCSDOCPKG-PackageToolsDocumentationHome-Full.pdf 

The Data Conservancy Package Tool