comparison of video copy detection techniques: the robustness

Comparison of Video Copy Detection techniques:

The Robustness against Distortion and attacking

Wei-Lun Chao

Graduate Institute of Communication Engineering, NTU

Abstract

This report gives an introduction about two copy detection techniques: the

well-known “Watermarking” and the newborn “Video Fingerprinting”. Researches

of copyright management and information security are affected by cases of distortion

and attacking, when a new attacking algorithm is created, researchers try to find

pertinent and robust methods to deal with it. I’ll focus on common cases where these

two techniques are used for copy detection and make a comparison, and finally we’ll

see why video fingerprinting has drawn so much attention nowadays.

1. Introduction

With the tremendous development of technology, Internet has become a fast and

dominant media to propagate information, and more and more information now has

been transformed into digital versions, such as MP3s, JPEG-2000 photos, MPEG-4

videos, and e-books, etc. These changes make life easy and substantial, but also bring

a troublesome issue: the management of intellectual property rights (IPRs). Digital

watermarking is one of the most popular techniques on this issue, while video

fingerprinting gradually becomes important due to its great property: the media itself

is the watermark. The main concept of video fingerprinting has been presented in the

report, “Introduction to Video Fingerprinting”, in this report I’ll briefly introduce

digital watermarking (especially on the application of images) and make a comparison

between these two techniques.

Section 2 presents the introduction to digital watermarking, while section 3 focuses on

the performance between digital watermarking and video fingerprinting, and in

section 4 I’ll give a compact conclusion about this report.

2. Introduction to digital watermarking

In this section I’ll focus on the application of watermarking on the images. What is

watermarking? An example in our daily life is the “paper banknote”. The

1000-NT-dollar paper banknote has a chrysanthemum pattern inside but we cannot

notice it with naked eyes. But just by the help of light, we can find it at the back side

[Fig 2.1]. The goal of the watermark here is to prevent illegal copy of the banknote,

and the technique for inserting the pattern inside should be kept extremely secretly.

Figure 2.1: The watermark of 1000-NT-dollar paper banknote.

Digital watermarking means to embed a digital pattern or a sequence of information

into a digital data, and it’s an application of the research of information hiding. In

1999, Fabien A. P. Petitcolas etc published a classification of information hiding

techniques [Fig 2.2], and watermarking is the most symbolic one in the branch of

copyright marking. The technique, fingerprinting, is also showed in this picture.

Figure 2.2: The classification of information hiding technique published by Fabien A. P. Petitcolas etc in 1999.

The framework of watermarking could be separated into two procedures by the order

of execution: Embedding and Extraction. Fig 2.3 and 2.4 show the basic block

diagrams of these two procedures, and with different techniques of digital

watermarking, the diagrams may be different.

2.1 Embedding

Assume we have an image S and want to embed a watermark pattern W inside, we

light

http://tw.dictionary.yahoo.com/search?ei=UTF-8&p=%E8%8F%8A%E8%8A%B1

may not just insert W pattern directly into S, but use a seed K1 and a pseudo-random

number generator (PRNG) to generate some random numbers or sequence then to

insert the watermark through an embedding algorithm. The algorithm should be kept

secretly and the information generated in this procedure will be stored in the secrete

key K2. Some image-stored forms include head files for putting the watermarks

directly inside to announce the possession, but it’s easy to be canceled. The better

watermarking technique is to insert the information of a watermark inside the pixel

values and even make it invisible in the watermarked image.

2.2 Extraction

When we want to check the watermark inside the watermarked image, we need the

help of K1, K2 and even the original image S and the extraction algorithm (sometimes

just the reverse of the embedding algorithm) to finish the work.

Digital

signals

Transform &

EmbeddingS

Watermarks Seeds

W K1

Watermarked

Signals

Secret Keys

Sw

K2

Figure 2.3: The embedding procedure of watermarking.

Watermarked

Signals

Transform &

ExtractionSw

Secrete Keys

K2

WatermarksW

Seeds

K1

Figure 2.4: The extraction procedure of watermarking.

2.3 Classification of digital watermarking technique

There are four general kinds of watermarks classification based on different features.

2.3.1 Based on extraction algorithm:

Public watermark: Or called blind watermark. When extracting the

watermark out, the only information needed is the seed-key.

Private watermark: This type needs not only the seed-key but the

original image to get the watermark out.

Semi-private watermark: Or called semi-blind watermark. To extract the

watermark out, we don’t need the original image but need the original

watermark and also the seed-key.

2.3.2 Based on human vision:

Visible: Visible watermark means that we can easily see the watermark in the

image (without transformation and hiding), which is widely used in paper data

[Fig 2.5]. The purpose is to directly claim the ownership and due to the

embedding algorithm is not necessary here, it’s easier to execute than invisible

watermark. The disadvantage is that visible watermark reduces the quality of the

image and is easy to be erased and destroyed, or even an illegal logo can be

inserted.

Invisible: Invisible watermarking technique uses the random numbers or

sequence brought by the seed and PRNG, and through an embedding algorithm

to hide the watermark inside the original image. Actually, the pixel values of the

original image have been slightly changed but are hardly noticed by human

vision. A critical difference between visible and invisible watermark is about the

integrity of the watermark pattern. In visible watermark, the pattern will be

transformed or distributed into the original image, while in visible watermark,

the pattern usually be inserted directly.

Figure 2.5: Two examples of watermarks. The left-hand side is the watermark used in the master thesis of NTU. The right-hand

side is the picture from the Major League Baseball website (www.mlb.com).

2.3.3 Based on the way of embedding:

Space-domain embedding: The technique just adjusts or changes the pixel

values to embed the watermark inside. The advantages are easy execution

(without space-frequency transformation) and larger content for hiding data. The

disadvantage is less robustness of the watermark.

Frequency-domain embedding: This technique means first transforming (time

consuming) the original image (space-domain) into 2-D frequency domain either

by discrete Fourier transform (DFT), discrete cosine transform (DCT) or discrete

wavelet transform (DWT), then through the embedding algorithm to insert the

watermark into the frequency domain. Finally transforming the frequency-

domain image back to the space domain to present the watermarked image.

The advantages and disadvantages of frequency-domain embedding are related

to the cons and pros of space-domain embedding. In the space domain, each

pixel is equally-weighted to human vision, while in the frequency domain, the

weight of each coefficient is different because human eyes are more sensitive to

low frequency than high frequency and also by the DFT or DCT, energy will

concentrate at low frequency.

The reason why frequency domain can contain less hiding information is

because each coefficient change (pixel value) in 2-D frequency domain affects

all the pixel values in the space domain, but on the other side, the effect of

watermarks can be distributed anywhere in the space-domain image, makes the

watermark robust against attacking and distortion.

2.3.4 Based on robustness:

This kind of classification is based on not only robustness of the watermark

techniques but also on the purpose of them.

Robust: After kinds of image processing (such as blurring, sharpening, zooming,

rotation, cutting, and data-stored form transformation), the watermark pattern can

still be identified. Robust watermarks are used for ownership identification.

Fragile: The watermarks are sensitive to image processing, which is used to

recognize if the original image has been changes or been attacked. It’s usually

applied to integrity of data.

Semi-fragile: Also used for integrity of data. It’s sensitive to attacking but robust

to general image processing like zooming, data version transformation, and even

can correct the change of images.

The last two groups of techniques are called image authentication because of their

application.

2.4 Standard of performance estimation

Besides the subjective human vision, we still need objective standards to estimate the

robustness of watermarks. There are four proposed standards to determine the

performance of watermarking techniques.

2.4.1 Peak signal to noise ratio (PSNR): It’s the most frequently used standard

nowadays, especially on gray-level images. The formula is shown in (2-1).

PSNR = 10𝐥𝐨𝐠𝟏𝟎𝟐𝟓𝟓𝟐

𝐌𝐒𝐄 (2-1)

MSE = (𝟏

𝐬𝐢𝐳𝐞 𝐨𝐟 𝐢𝐦𝐚𝐠𝐞 𝐡∗𝐰) (𝐱𝐢𝐣 − 𝐱 𝐢𝐣)

𝟐𝐰𝐣=𝟏

𝐡𝐢=𝟏 (2-2)

The MSE is the mean square error of two patterns, 𝐱 𝐢𝐣 is the pixel value at

(i, j) from the extracted watermark while 𝐱𝐢𝐣 is from the original watermark.

Over 80% of relative researches use this standard, while sometimes PSNR

cannot accurately describes the feeling of human vision on image difference.

2.4.2 Normalized correlation (NC)

Nc is widely used if the embedded digital watermark is an binary image (black

means 1 and white means 0). The formula is shown at (2-3), higher NC means

good performance (NC∈[0, 1]).

NC = (𝐰𝐢𝐣 ∗

𝐰𝐣=𝟏 𝐰 𝐢𝐣)

𝐡𝐢=𝟏

(𝐰𝐢𝐣 ∗ 𝐰𝐢𝐣)𝐰𝐣=𝟏

𝐡𝐢=𝟏

(2-3)

Errors of this standard occur when the original watermark is unchanged while

other black points entering into the extracted watermark, and even NC = 0.8

can be a strongly blurred watermark.

2.4.3 Tamper assessment function (TAF)

This method calculates the exclusive-OR on the pixel at the same location

from the original and extracted patterns (also used on binary watermarks). (2-4)

shows the formula and lower TAF means the better performance (TAF∈[0, 1]).

TAF = (𝐰𝐢𝐣⊕𝐰 𝐢𝐣)

𝐰𝐣=𝟏

𝐡𝐢=𝟏

𝐡 ∗ 𝐰 (2-4)

Errors happen at the case that the pixel value of the watermark is reversed (1

to 0, and 0 to 1), the TAF will be 1 but people can easily recognize that these

two watermarks are the same.

2.4.4 Similarity measure (Sim)

The formula is shown at (2-5).

Sim = 𝐰𝐢𝐣∗ 𝐰 𝐢𝐣

𝐰𝐣=𝟏

𝐡𝐢=𝟏

𝐰 𝐢𝐣 ∗ 𝐰𝐢𝐣𝐰𝐣=𝟏

𝐡𝐢=𝟏

(2-5)

3. Comparison of watermarking & video fingerprinting

This section starts to talk about the main topic of this report: the comparison of

performances between digital watermarking and video fingerprinting. I’ll briefly

introduce the cases of attacking and image processing then explain the core difference

between these two copy detection methods.

There is a question you may ask, why comparing image watermarking with video

fingerprinting? The first reason is that the book I read is about image watermarking,

and the second reason is that almost all the video fingerprinting techniques are based

on features of image (frames of videos), so starting with the image-based properties

will gives a basic and compact explanation.

I’ll focus on the robust watermarks because the goal of this group is to detect copies

or illegal ownership.

3.1 Types of attacking & image processing

Sometimes the video or image may suffer from the noise or data losing and make the

quality worse, and this kind of distortion can be seen as types of attacking, so here we

just focus on the classification of attacking. Fig 3.1 and fig 3.2 give examples of

attacks on images.

3.1.1 Robustness attack: Using kinds of image processing techniques to change the

original videos or images.

Noising attack: Gaussian, pepper and salt noise.

Filtering attack: blurring, sharpening.

Geometric attack: rotation, shifting, zooming (image size changed or not),

cutting.

Lossy compression attack or data-stored form changing: JPEG, JPEG2000

Palette attack: Gamma correction, Change of the hue, saturation, brightness or

even changing a colorful image into gray-level or binary type.

There is a special case videos will face, the frame drop, meaning several frames in a

video get lost.

3.1.2 Specific attack: Aiming at the algorithm used for embedding watermarks or

calculating the fingerprint, the damage is usually really strong.

3.1.3 Background changing: This is a severe image and video processing to make a

negative detection. The attacker just leaves the important part of the image or video

and changes the other part, such as changing the background.

(a) Original image

(b) Blurring (5x5 window)

(c) AWGN (SNR = 24 dB)

(d) Gamma correction (+20%)

(e) JPEG (quality factor of 30)

(f) Rotation 180%

Figure 3.1: Examples of attacks.

3.2 The core difference between watermarking and video

fingerprinting

The core difference is “changing the original images and videos or not”. Most of the

watermarking techniques change the pixel values of the image or frame in the video.

Only a small part of them don’t change the pixel values but use the information of the

image and the watermark together to generate a secrete key, such as the VQ-based

watermarking. While the video fingerprinting techniques never change the pixel

values and never generate any secrete key.

(a) Original image

(b) 1x1 dithering

(C) 4X4 dithering

(c) Error diffusion

Figure 3.2: Examples of attacks, a 8-bit gray-level image gas been transformed into binary image.

Watermarks should be inserted before the videos or images having been propagated or

sold, while “the video fingerprint itself is the watermark”, so nothing will be inserted.

A watermark can be seen as a kind of noise added in the original image and the

objective of attacking is to destroy the watermark but maintain the quality of the

image wanted as better as possible, so the embedding algorithm of watermarking is

very important. If a watermarking method randomly chooses points inside the image

and changes pixel values, which makes just a little relation between the original image

and the watermark, then it’s easily to be destroyed.

A better watermarking technique should have higher correlation between the original

image and the watermark pattern. The VQ-based method changes one bit of the

corresponding index of the randomly chosen block by pixel values of the watermark

to be the secrete key, while the frequency-based method inserts the pattern into the

low frequency part which makes it robust against high frequency noise and hardly to

be destroyed just by space-domain attacking. In order to resist the JPEG block

artifacts or the zooming effect, watermarks use the ordinal or average properties of

blocks of images are created. General speaking, new watermarking techniques are

created to resist kinds of distortion and attacking, and more kinds it can resist, the

better it is. Additional memory is needed to store the seed, secrete key, the watermark,

and the visible watermarking should give legal customers or users the secrete keys

and algorithms to restore the watermarked image.

On the other side, Video fingerprinting use the properties of videos to make the

signatures, such as color histogram, block motion, trajectories of points of interest,

that’s why we called itself the watermark. Because of the signatures based on contents

of videos, as long as the video could still be accepted by the users (which means the

quality is still ok, and the important information of the video still exists.), a robust

video fingerprinting algorithm will have the ability to detect the copies.

Among kinds of attacks, noise attack, filtering, lossy compression attack or

data-stored form changing, and palette attack are easier to be overcome by these two

copy detection techniques. Ordinal, block averaging, just using the brightness image

after color transformation are good plans.

Background changing and important patterns cutting (just reserve the important part

like a person or a building) result in strong challenge of video copy detection. Local

descriptors of video fingerprinting can effectively deal with this problem by just

extracting signatures from points of interest, but still need great point-of-interest

extraction algorithm. Watermarks can also be inserted in to these key points, while

without any random processing, attackers can find the same key points and do strong

damage on them.

Rotation, shift and zooming are also disasters. Difficulties for global descriptors of

video fingerprinting occur because the features are stored as a signature according to

its location inside a frame. Complicated signature-extraction and voting algorithms

are needed to figure out these problems, and local descriptors are great solutions.

While for watermarks, locations for inserting them are randomly chosen, after these

three distortions, same coordinate locations will be chosen but the pixels there stand

for different information in the image. An embedding algorithm against rotation, shift

and zooming are needed to be found.

There are three basic advantages makes video fingerprinting dominant in video copy

detection according to robustness against attacks and also properties of execution.

3.2.1 Robustness: The basic concept of video fingerprinting is to find the symbolic

features of videos (or frames), and researchers are trying to find pertinent and

compact descriptions which can be used to determine if two videos (one in the

database of owner and one from the Internet) are the same even if one of them

has undergone noises or attack. While watermarking needs a delicate

embedding algorithm to generate strong correlations between watermarks and

the original data.

3.2.2 Signature Extraction after being processing: Let’s talk about how to

execute video fingerprinting techniques. The original data of the owners are

collected as a database, and the video fingerprints are calculated for each

video and stored also in the database (video fingerprint database for a specific

algorithm, not stored in videos). If we want to test videos on the Internet or

from a suspicious illegal owner, just using the corresponding algorithm to

extract signatures of them and through a delicate voting algorithm to

determine they’re copies or not. While watermarking inserts a pattern inside

the original data, and stores seeds, keys and specific watermarks in the

database. When testing, using the corresponding algorithm to extract

watermarks from suspicious owners and see if the watermarks are the same as

what legal owners put inside.

There is a strong advantage of video fingerprinting: several algorithms can be

used at the same time. Though several watermarks can be put into an image

by different algorithm to enhance robustness, the work should be done before

images have been propagated, while video fingerprinting doesn’t have this

trouble. The tester can use any algorithm he want on the test video as long as

he generate the corresponding video fingerprints on all the videos in the

database, and this work can be done after the video has been propagated. A

tester can use different algorithm with different features (some are faster but

inaccurate, and some are robust but slower) to do a progressive test, and use

the best one on the specific distortion case.

Finally, a person who wants to attack the data can aim at a specific embedding

algorithm and results in a strong damage on watermarks, while video

fingerprinting doesn’t worry about it because there is still nothing been

inserted.

3.2.3 Fast searching: The last advantage is about searching the copies on the

Internet. There are tones of videos and images on the Internet, and we can’t

check if an image is likely to be a copy or not one by one. A same pattern,

same seed, same algorithm can be used on several data, so by extracting

patterns, we cannot directly determine if the detected one is the same as

(maybe with some processing) the one in the database.

While video fingerprinting can used to extract signatures of data online, and

the detected one has higher probability to be a copy because the signature is a

symbolic form of a video, and different videos can extremely hardly to have a

same signature.

4. Conclusion

Although a great embedding and extraction algorithm are possible to be found to

improve the performance of watermarking, the fact that it has to consider suitable

locations inside the original image, combination algorithms of watermarks and

original images and skills of cryptography and coding theory makes the algorithm

rather complicated. While video fingerprinting only concern how to give compact and

pertinent descriptions of videos and parts of work can be down by computer vision

and machine learning method, which makes it easier.

From many aspects, video fingerprinting seems to dominate the performance of video

copy detection than watermarking. But with different purposes, different cases of

attacking, different data types, and even different laws of intellectual property rights,

there exists several uncertainties and variations in the researching of copyright and

security management. So I cannot assert that video fingerprinting is the best technique

for video fingerprinting, but it’s really worth researching and still has potential to be

better.

Reference

[1] Papers listed in the report “Introduction to Video Fingerprinting”.

[2] Z.X. Pan, Z.C. Zhang, Y.Z. Lin. A Challenge to image processing-Digital

Watermarking techniques. Mc Graw Hill, 2007.

comparison of video copy detection techniques: the robustness

Documents