pericles information embedding techniques

40
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] “This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”. Information Embedding Techniques An overview of methods and standards Anna-Grit Eggers (University of Goettingen)

Upload: periclesfp7

Post on 15-Apr-2017

282 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: PERICLES Information Embedding Techniques

GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]

“This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”.

Information Embedding TechniquesAn overview of methods and standardsAnna-Grit Eggers (University of Goettingen)

Page 2: PERICLES Information Embedding Techniques

● The selection of a specific technique can be challenging due to this diversity.

Introduction● Information embedding techniques distinguish between:

◦ the information that serves as carrier and◦ the payload information which is embedded into the carrier.

● Techniques discussed in the following are:1. Steganography focusing on hiding of information:

For these techniques the payload is more important than the carrier.2. Digital Watermarking:

the carrier is the object of interest.

● Not all techniques ensure the restorability of carrier and payload information:◦ some of them embed the payload information visible to humans, ◦ others let it become undetectable.

Page 3: PERICLES Information Embedding Techniques
Page 4: PERICLES Information Embedding Techniques

Information Embedding

Metadata Embedding Standards

Page 5: PERICLES Information Embedding Techniques

● The metadata standards have in common that the information can easily be removed from the DO, in the worst case inadvertently by processing software that isn't aware of the correct handling of the standards. This increases the chance of information loss.

Metadata Embedding Standards● Metadata standards enable a well-defined usage of available

metadata fields. ● Popular are standards for designated domains, like the IPTC-IIM-

Standard and Exif for image metadata, which define vocabularies as well as methods for their embedding into image files.

● The XMP standard also provides a defined way of metadata embedding.

● It provides many different metadata vocabularies for customisable usage, for example the aforementioned vocabularies of IPTC-IMM and Exif, as well as the popular Dublin Core vocabulary for the description of web resources.

Page 6: PERICLES Information Embedding Techniques

● The Extensible Metadata Platform (XMP) is a standard (ISO 16684-1:2012) for embedding metadata into a file, created by Adobe Systems Inc.

● Many data formats are supported. ● XMP uses the specific properties of the data formats to

embed the information in a way the file can be handled normally.

● In a first step the metadata are serialised in a special data model in RDF/XML syntax and packed to an XMP packet.

● Different namespaces, like one for the Dublin Core metadata element set, are available and can be used to fulfil special needs.

● The data model can be extended but with several constraints on the contents of the RDF that can be serialised to obtain valid XMP RDF.

XMP

Page 7: PERICLES Information Embedding Techniques

● The supported RDF profile recognises only certain constructs (scalars, arrays, structures) and, thus, does not constitute a means for embedding arbitrary RDF/XML structures.

● After serialisation the XMP packet can be embedded in the digital object without damaging the file or other existing metadata.

● Adobe provides an open source toolkit under BSD license with two C-language libraries: ◦ XMP-Files offers support for embedding the serialized metadata in

different file formats and for retrieving and updating them.◦ XMPCore contains functions for the creation and manipulation of

metadata following the XMP Data ModelA Java version of XMPCore is also available from Adobe.

XMP (cont.)

Page 8: PERICLES Information Embedding Techniques

● XMP offers support for IPTC Core and for Exif files.● Besides the apparent limits on the RDF embedding, a

disadvantage of XMP is that the capacity of the carrier files can be insufficient for big metadata payload files.

● An advantage, on the other hand, is the standardisation and support by many applications.

● See: https://www.adobe.com/products/xmp/

XMP (cont.)

Page 9: PERICLES Information Embedding Techniques

● The Exchangeable Image File Format (Exif) is a standard for storing metadata in digital images by the Japan Electronic and Information Technology Industries Association.

Exif

◦ Examples of stored data are the date and time of the shot, the type of the camera, and settings like focal distance and exposure time. Afterwards, additional IPTC metadata can be added to the image.

◦ For some options a number is stored as tag, instead of the real information. The translation is done by applications that support this standard, by providing a mapping table. For example the name of the manufacturer is stored this way.

● Exif is embedded directly into the headers of image files of the formats JPEG or TIFF.

● Many digital cameras store Exif metadata about the images at the moment the picture is taken.

Page 10: PERICLES Information Embedding Techniques

● A disadvantage of Exif is that it is integrated on such a deep level that the users often do not notice the embedded information. ◦ Risk: they inadvertently pass information to someone else, for example

the storage of the serial number, a username or GPS coordinates. ● Saving Exif information with digital cameras while taking

photos is an example for the sheer curation of metadata at the creation environment of the DO.

● See: http://en.wikipedia.org/wiki/Exchangeable_image_file_format

Exif (cont.)

Page 11: PERICLES Information Embedding Techniques

• The International Press Telecommunications Council (IPTC) and the Newspaper Association of America (NAA) have developed the IPTC-IMM standard.• A standard for embedding of metadata about an image into the image file. It is suitable for video, audio and text formats. • IPTC-IIM- determines where the information is stored at the file. • It provides a list of metadata fields and their meanings, together with a technical format for the storage of values into these fields.• The successor of IPTC is IPTC Core, implemented as component of the XMP standard (partial incompatibility with the original IPTC fields). • https://iptc.org/standards/iim/

IPTC-IMM

Page 12: PERICLES Information Embedding Techniques

• The IPTC metadata fields originate from the Information Interchange Model (IMM) that describes an exchange format for multimedia news.• A synchronisation between the IPTC metadata fields and the XMP format is supported and described by the Metadata Working Group (see next subsection).• IPTC does not support as many metadata as Exif. In particular, the camera preferences are not stored by default, but a section for the creation of user defined tags is featured.

IPTC fields

Page 13: PERICLES Information Embedding Techniques

• The metadata working group is a consortium of companies in the digital media industry• Their goal is to improve the preservation, interoperability and availability of the metadata of digital images. • They have published technical specifications that describe the handling and storage of image metadata by the use of the common standards Exif, IPTC-NAA and XMP. • They also provide a set of tools and test files for testing the correct implementation of these standards.• http://www.metadataworkinggroup.com/

Metadata Working Group

Page 14: PERICLES Information Embedding Techniques

Information Embedding

Steganography

Page 15: PERICLES Information Embedding Techniques

• Steganography is the storage and transmission of hidden information within the digital object. • Mostly used for confidential communication.• Since obscurity is not security, for actually secret communication

it is necessary to use encryption.• Encrypted steganographic messages will look like background

noise• Unencrypted messages can be detected by analysing the carrier.• Digital objects originating from a digitisation process often carry a

high rate of background noises• e.g. just scanning a book page with a professional scanner twice without moving the book or changing the technical environment produces bitwise and on a sufficiently high zoom level even visually different images.

Steganography

Page 16: PERICLES Information Embedding Techniques

• If a steganographic technique embeds data into this digitalisation background noise, then the significant properties of the carrier object won’t be damaged.• Compared to other encapsulation methods, robust steganographic

messages have the potential to survive several conversions and processing of the carrier object. • The selection of a suitable algorithm is crucial for this feature.• The security of the hidden message can depend just on the

secrecy of the algorithm, or the ignorance of the viewer. • A safer method is to encrypt the message and to use secret key-

based steganography with a public/private key pair.

Steganography (cont).

Page 17: PERICLES Information Embedding Techniques

• LSB changes the least significant bits of a bit plane in an image file. (Encyptic, Stegotif, Hide).

Steganography algorithms

• Hide: This algorithm increments or decrements pixel values.

• Hash based embedding: The embedding of hash values can minimize the necessary changes.

• There are different methods to apply an LSB algorithm•One is to change the least significant bits in a random walk•Another is to change just the bits of a subset of pixels. • Instead of changing the bit plane, the least significant bits of the color index in the palette can be changed. •Without further sorting calculations, this could result in image artifacts. •A good reference on LSB steganography can be found in: http://aaronmiller.in/thesis/]http://aaronmiller.in/thesis/

Page 18: PERICLES Information Embedding Techniques

• Risk of losing the embedded information, if the knowledge of their existence gets lost. • Risk of losing the knowledge about the way of extracting the

embedded information.

Risks

Risk prevention:

• The use of steganographic techniques for long-term preservation purposes requires a special attention by the choice of the method.• A possible solution could be to combine the embedding

method with a visible digital watermark, which informs the user about the presence of the embedded information.

Page 19: PERICLES Information Embedding Techniques

• There are different open source Steganography tools that demonstrate the diversity of this IE domain:

Tools

• OpenStego:

• An open-source steganography tool for image embedding under GNU General Public License v.2.0 written in Java. http://www.openstego.info/

• Password-based encryption support

• Offers a plugin-based architecture to extend the tool with additional algorithms.

• Currently a plugin for the LSB algorithm is available, that uses the least significant bit of image pixels, and a Random LSB plugin.

• Further plugins could be developed for the DCT and the FFT algorithms.

Page 20: PERICLES Information Embedding Techniques

• OutGuess

• A universal steganographic tool for the insertion of hidden information into the redundant bits of data sources.

• Open source under BSD license.

• Works with data-specific handlers that extract the redundant bits for modification. The modified bits will be written back at the original positions.

• Supports PNM and JPEG images.

• See: http://www.outguess.org/

Tools (cont.)

Page 21: PERICLES Information Embedding Techniques

• Steghide

• An open source tool under GPL v.3 for hiding data in various kinds of image and audio files.

• Offers compression and encryption of the embedded data and embeds a checksum to verify the integrity of the embedded data after extraction.

• Supports JPEG, BMP, WAV and AU files. With pySteg there is a GNOME GUI wrapper for Steghide available.

• See http://steghide.sourceforge.net/

Tools (cont.)

Page 22: PERICLES Information Embedding Techniques

• SNOW

• Steganographic Nature Of Whitespace (SNOW) is an open source tool under GPL for whitespace steganography.

• It conceals messages in ASCII text by appending whitespace to the end of lines and offers a built-in encryption.

• See: http://www.darkside.com.au/snow/

Tools (cont.)

• Stepic

• A Python image steganography tool under GNU GPL 2 license.

• See: http://stegstudio.sourceforge.net/ and http://domnit.org/stepic/doc/

Page 23: PERICLES Information Embedding Techniques

• Integrity of steganographically embedded information:• As the carrier is usually not important in steganography, the common

tools offer integrity checks for the embedded data only. • A restoration of the digital object is typically not provided.• The reversibility of the algorithms and an integrity check for the digital

object would need to be developed for each algorithm, if such reversibility is possible without the access to the original object.

Relevance for LTDP

• Embedding of environment information into digitised objects:• Steganographic methods for long-term preservation

purposes can be practical since the given background noises can be exploited.• Embedding of environment information into

born-digital objects: • Use of steganographic methods is not advisable for long-

term preservation purposes as they would damage the digital object.

Page 24: PERICLES Information Embedding Techniques

Information Embedding

Digital Watermarking

Page 25: PERICLES Information Embedding Techniques

• Digital watermarking is closely related to steganography. • They differ is in their emphasis:

• steganography is primarily concerned with the transmission of information with the aim that it remains undetected except for the intended recipient;•digital watermarking is concerned with the digital object itself, with additional information embedded into that object (such as a logo or serial number).

Background

• Digital watermarking can provide the means for incorporating the steganographic message within the object.• Key goal for digital watermarking is tracking the use or origin of an

object, e.g. •monitor or manage media usage•better personalising user experience of media •ensure that copyright is not breached (e.g. a logo included in copyrighted images or videos);•authenticate content and objects, such as in the case of government-issued documents (such as passports or identity cards)• integrity watermarking can be used to detect manipulation or modification of an object, since such changes can destroy or damage an embedded watermark.

Page 26: PERICLES Information Embedding Techniques

• In the majority of cases, a digital watermark remains imperceptible to the user who will be able to use, copy, and manipulate the watermarked object as any other. • In some cases the digital watermark may be visible to

the user.

Basic characteristics

• The lifecycle of digital watermarking fits into three main phases: 1. The embedding phase: the signal to be transmitted is embedded into the host and

the algorithm then produces a watermarked signal; 2. The attack phase: the watermarked digital signal is transmitted or stored (usually

by another person) and eventually perhaps modified. Such modifications may include editing the file, cropping an image, or using ‘lossy’ compression, but they need not be malicious to count as an ‘attack’.

3. The detection or extraction phase: an algorithm is applied to the attacked signal in order to check for the watermark and, if possible, extract it. If the watermark applied was ‘robust’, then this should still be accessible even if the object was modified; in the case of ‘fragile’ digital watermarking, then extraction should fail if any modifications were made to the object.

Page 27: PERICLES Information Embedding Techniques

• Choice of watermarking method depends on•the context and requirements relating to the object in question•whether or not an active adversary will be present •who wants to remove the watermark (and therefore the ability to withstand attack is critical)•the number of bits or amount of information that needs to be transmitted in the watermark (larger amounts may not be possible or may make the watermark perceptible).

Implementation

Page 28: PERICLES Information Embedding Techniques

• Robustness: the ability of the digital watermark to survive processing of the carrier object. • A robust watermark is often hard to remove. An attempt of removing the

mark damages the carrier and makes it useless for the adversary. • Robust watermarks can be visible or invisible, and are often used to display

the owner of the digital object.• They can survive file format changes and processing of the carrier object. • The intensity of the robustness depends on the used algorithm.

Key features

• Fragility: by processing of the digital object, the fragile watermark will be damaged. • An intact watermark is a warranty that no modifications were performed. • A method used to ensure authenticity and integrity;

Page 29: PERICLES Information Embedding Techniques

• Semi-fragility: used to detect intentional attacks rather than validating the originality of an image or object. • In some cases, passive digital watermarking is used to retrace the origin of

a digital object (digital watermark as a digital fingerprint).

Key features

• Reversibility: Digital watermarking techniques can be reversible. • Reversible watermarking techniques store the information to recover the

digital object within the watermarking.• Perceptibility: considered ‘imperceptible’ if the digital signal and cover

signal cannot be distinguished, when this is not the case, the digital watermark is ‘perceptible’.• Watermarks can used to point to additional embedded information that is

invisible, and so prevent the loss of the knowledge about this information. • Or they could display methods to extract this embedded information. • Or they may be encrypted so that access to the embedded information is

restricted to individuals who have the access key.

Page 30: PERICLES Information Embedding Techniques

• Two applications of digital watermarking have great potential for preservation purposes:• embedding metadata within an object • integrity check of digital objects.

• Robustness: • the mark should be as easily removable as possible without damage to the digital object. • the mark needs to be robust enough to avoid accidental distortions.

• Reversibility:•Useful for restoration of the digital object and its metadata.

• Perceptibility:•With imperceptible marks, it should be ensured that the knowledge about the embedded data and the recovery method does not get lost. Therefore, the combination with a visible watermark pointing to the not perceptible data can be helpful.

Relevance for LTDP

Page 31: PERICLES Information Embedding Techniques

• Integrity checking of digital objects can be supported using digital watermarking by e.g. embedding a checksum. • Future users could then validate both the checksum and the integrity of the object.

Integrity check

Page 32: PERICLES Information Embedding Techniques

• Embedded information:◦Using digital watermarking techniques for embedding information in an object risks losing access to that embedded information due to the a lack of awareness or knowledge of the algorithm used.

◦This risk can be reduced by using well understood and commonly available algorithms for embedding and restoration of the information. This also reduces the risk of potential users not having access to the algorithm, and thus not being able to access the embedded information.

Risks and risk mitigation

• Use of QR-Codes:◦QR Codes are a special type of digital watermarking, their major advantage being that along with related technologies they are commonly understood

◦Using such a mark can point to hidden data and prevent it from getting lost. ◦QR-codes can be decoded by nearly every smartphone and can embed a URL or a note. However: The use of URLs bears the risk of broken links.◦All QR-codes come with the danger of attack, such as getting altered to lead to another homepage, or to encapsulate code that could run unwanted functionality.

Page 33: PERICLES Information Embedding Techniques

Information Embedding

Use of File Format Features

Page 34: PERICLES Information Embedding Techniques

• Many data formats offer more features than the standard metadata fields for native embedding of information.• E.g. Container formats like AVI or PDF, are aggregations of

different files.• In contrast to the packaging methods, container formats fit the

exact needs of a specific media format• They can be processed by standard applications. • These features provided by the file formats can often be used for

embedding of customised information in a well-defined way. • Many formats offer special sections just for the embedding of

additional information.

• Risk: the possibility that the metadata is lost or corrupted when data is edited or migrated to a new format

Overview

Page 35: PERICLES Information Embedding Techniques

• Nearly every data format provides a set of metadata fields by default.• This is commonly complemented with file systems providing

additional sets of metadata. • Often not all of these fields are in use, or the fields are not

completely used. • Tools for visualising and editing these standard metadata fields for

different file types are common and well supported, e.g. most music players support editing of the ID3 metadata tags of MP3s.• This method uses the unused space to store other kinds of

information. • The major difficulty of this method is that embedding has to

comply with the file format specifications, which are often quite complex.

Use of available metadata fields

Page 36: PERICLES Information Embedding Techniques

• Exploiting a file in a way which is originally not intended and not standardised. Example: • It is possible to add additional information bytes at the end of

JPEG files, and most of the image processing tools will ignore these additional bytes.• This method is not as stable as the use of intended information

space and might result in unexpected errors. • These errors can be corrected, if the encapsulated information is

removed from the file to restore the original file. • This method of unstandardised embedding could be an option, if it

is not required that the carrier file can be processed normally.

Use of available metadata fields

Page 37: PERICLES Information Embedding Techniques

Information Embedding

Use of Information Frames

Page 38: PERICLES Information Embedding Techniques

• Information frames consist of the same medium as the carrier digital object and are attached to the carrier without modifying it. Example: Closing credits of films.• A frame can consist of additional pixels for images, an additional

soundtrack for audio files, or an additional sequence in a film. • It is necessary to store also the information about how to remove

the frame from the carrier object into the frame itself. • This encapsulation method enables the possibility to restore the

carrier and all payload information bit correctly.

Overview

Page 39: PERICLES Information Embedding Techniques

• The information frame needs• a visible header for users • a detectable header for the processing tools.

• The advice for the user is needed • to describe the sense of the extension • to inform about the extraction of the embedded data and the removal of the information frame.

• The tools that process files and information frames need a mark to identify enriched files and to know how to handle them.• The size of the embedded information is theoretically unlimited,

because the information frame can be expanded to the required size. • The attached information strikes the eye immediately and can be

annoying to the users. • This method can be combined with all other embedding methods

described.

Overview (cont.)

Page 40: PERICLES Information Embedding Techniques

This example uses a steganography algorithm to embed the restoration metadata into the blue pixel frame. This metadata is used for the correct decapsulation and restoration of carrier and payload.