comparison of video copy detection techniques: the robustness
TRANSCRIPT
Comparison of Video Copy Detection techniques:
The Robustness against Distortion and attacking
Wei-Lun Chao
Graduate Institute of Communication Engineering, NTU
Abstract
This report gives an introduction about two copy detection techniques: the
well-known “Watermarking” and the newborn “Video Fingerprinting”. Researches
of copyright management and information security are affected by cases of distortion
and attacking, when a new attacking algorithm is created, researchers try to find
pertinent and robust methods to deal with it. I’ll focus on common cases where these
two techniques are used for copy detection and make a comparison, and finally we’ll
see why video fingerprinting has drawn so much attention nowadays.
1. Introduction
With the tremendous development of technology, Internet has become a fast and
dominant media to propagate information, and more and more information now has
been transformed into digital versions, such as MP3s, JPEG-2000 photos, MPEG-4
videos, and e-books, etc. These changes make life easy and substantial, but also bring
a troublesome issue: the management of intellectual property rights (IPRs). Digital
watermarking is one of the most popular techniques on this issue, while video
fingerprinting gradually becomes important due to its great property: the media itself
is the watermark. The main concept of video fingerprinting has been presented in the
report, “Introduction to Video Fingerprinting”, in this report I’ll briefly introduce
digital watermarking (especially on the application of images) and make a comparison
between these two techniques.
Section 2 presents the introduction to digital watermarking, while section 3 focuses on
the performance between digital watermarking and video fingerprinting, and in
section 4 I’ll give a compact conclusion about this report.
2. Introduction to digital watermarking
In this section I’ll focus on the application of watermarking on the images. What is
watermarking? An example in our daily life is the “paper banknote”. The
1000-NT-dollar paper banknote has a chrysanthemum pattern inside but we cannot
notice it with naked eyes. But just by the help of light, we can find it at the back side
[Fig 2.1]. The goal of the watermark here is to prevent illegal copy of the banknote,
and the technique for inserting the pattern inside should be kept extremely secretly.
Figure 2.1: The watermark of 1000-NT-dollar paper banknote.
Digital watermarking means to embed a digital pattern or a sequence of information
into a digital data, and it’s an application of the research of information hiding. In
1999, Fabien A. P. Petitcolas etc published a classification of information hiding
techniques [Fig 2.2], and watermarking is the most symbolic one in the branch of
copyright marking. The technique, fingerprinting, is also showed in this picture.
Figure 2.2: The classification of information hiding technique published by Fabien A. P. Petitcolas etc in 1999.
The framework of watermarking could be separated into two procedures by the order
of execution: Embedding and Extraction. Fig 2.3 and 2.4 show the basic block
diagrams of these two procedures, and with different techniques of digital
watermarking, the diagrams may be different.
2.1 Embedding
Assume we have an image S and want to embed a watermark pattern W inside, we
light
may not just insert W pattern directly into S, but use a seed K1 and a pseudo-random
number generator (PRNG) to generate some random numbers or sequence then to
insert the watermark through an embedding algorithm. The algorithm should be kept
secretly and the information generated in this procedure will be stored in the secrete
key K2. Some image-stored forms include head files for putting the watermarks
directly inside to announce the possession, but it’s easy to be canceled. The better
watermarking technique is to insert the information of a watermark inside the pixel
values and even make it invisible in the watermarked image.
2.2 Extraction
When we want to check the watermark inside the watermarked image, we need the
help of K1, K2 and even the original image S and the extraction algorithm (sometimes
just the reverse of the embedding algorithm) to finish the work.
Digital
signals
Transform &
EmbeddingS
Watermarks Seeds
W K1
Watermarked
Signals
Secret Keys
Sw
K2
Figure 2.3: The embedding procedure of watermarking.
Watermarked
Signals
Transform &
ExtractionSw
Secrete Keys
K2
WatermarksW
Seeds
K1
Figure 2.4: The extraction procedure of watermarking.
2.3 Classification of digital watermarking technique
There are four general kinds of watermarks classification based on different features.
2.3.1 Based on extraction algorithm:
Public watermark: Or called blind watermark. When extracting the
watermark out, the only information needed is the seed-key.
Private watermark: This type needs not only the seed-key but the
original image to get the watermark out.
Semi-private watermark: Or called semi-blind watermark. To extract the
watermark out, we don’t need the original image but need the original
watermark and also the seed-key.
2.3.2 Based on human vision:
Visible: Visible watermark means that we can easily see the watermark in the
image (without transformation and hiding), which is widely used in paper data
[Fig 2.5]. The purpose is to directly claim the ownership and due to the
embedding algorithm is not necessary here, it’s easier to execute than invisible
watermark. The disadvantage is that visible watermark reduces the quality of the
image and is easy to be erased and destroyed, or even an illegal logo can be
inserted.
Invisible: Invisible watermarking technique uses the random numbers or
sequence brought by the seed and PRNG, and through an embedding algorithm
to hide the watermark inside the original image. Actually, the pixel values of the
original image have been slightly changed but are hardly noticed by human
vision. A critical difference between visible and invisible watermark is about the
integrity of the watermark pattern. In visible watermark, the pattern will be
transformed or distributed into the original image, while in visible watermark,
the pattern usually be inserted directly.
Figure 2.5: Two examples of watermarks. The left-hand side is the watermark used in the master thesis of NTU. The right-hand
side is the picture from the Major League Baseball website (www.mlb.com).
2.3.3 Based on the way of embedding:
Space-domain embedding: The technique just adjusts or changes the pixel
values to embed the watermark inside. The advantages are easy execution
(without space-frequency transformation) and larger content for hiding data. The
disadvantage is less robustness of the watermark.
Frequency-domain embedding: This technique means first transforming (time
consuming) the original image (space-domain) into 2-D frequency domain either
by discrete Fourier transform (DFT), discrete cosine transform (DCT) or discrete
wavelet transform (DWT), then through the embedding algorithm to insert the
watermark into the frequency domain. Finally transforming the frequency-
domain image back to the space domain to present the watermarked image.
The advantages and disadvantages of frequency-domain embedding are related
to the cons and pros of space-domain embedding. In the space domain, each
pixel is equally-weighted to human vision, while in the frequency domain, the
weight of each coefficient is different because human eyes are more sensitive to
low frequency than high frequency and also by the DFT or DCT, energy will
concentrate at low frequency.
The reason why frequency domain can contain less hiding information is
because each coefficient change (pixel value) in 2-D frequency domain affects
all the pixel values in the space domain, but on the other side, the effect of
watermarks can be distributed anywhere in the space-domain image, makes the
watermark robust against attacking and distortion.
2.3.4 Based on robustness:
This kind of classification is based on not only robustness of the watermark
techniques but also on the purpose of them.
Robust: After kinds of image processing (such as blurring, sharpening, zooming,
rotation, cutting, and data-stored form transformation), the watermark pattern can
still be identified. Robust watermarks are used for ownership identification.
Fragile: The watermarks are sensitive to image processing, which is used to
recognize if the original image has been changes or been attacked. It’s usually
applied to integrity of data.
Semi-fragile: Also used for integrity of data. It’s sensitive to attacking but robust
to general image processing like zooming, data version transformation, and even
can correct the change of images.
The last two groups of techniques are called image authentication because of their
application.
2.4 Standard of performance estimation
Besides the subjective human vision, we still need objective standards to estimate the
robustness of watermarks. There are four proposed standards to determine the
performance of watermarking techniques.
2.4.1 Peak signal to noise ratio (PSNR): It’s the most frequently used standard
nowadays, especially on gray-level images. The formula is shown in (2-1).
PSNR = 10𝐥𝐨𝐠𝟏𝟎𝟐𝟓𝟓𝟐
𝐌𝐒𝐄 (2-1)
MSE = (𝟏
𝐬𝐢𝐳𝐞 𝐨𝐟 𝐢𝐦𝐚𝐠𝐞 𝐡∗𝐰) (𝐱𝐢𝐣 − 𝐱 𝐢𝐣)
𝟐𝐰𝐣=𝟏
𝐡𝐢=𝟏 (2-2)
The MSE is the mean square error of two patterns, 𝐱 𝐢𝐣 is the pixel value at
(i, j) from the extracted watermark while 𝐱𝐢𝐣 is from the original watermark.
Over 80% of relative researches use this standard, while sometimes PSNR
cannot accurately describes the feeling of human vision on image difference.
2.4.2 Normalized correlation (NC)
Nc is widely used if the embedded digital watermark is an binary image (black
means 1 and white means 0). The formula is shown at (2-3), higher NC means
good performance (NC∈[0, 1]).
NC = (𝐰𝐢𝐣 ∗
𝐰𝐣=𝟏 𝐰 𝐢𝐣)
𝐡𝐢=𝟏
(𝐰𝐢𝐣 ∗ 𝐰𝐢𝐣)𝐰𝐣=𝟏
𝐡𝐢=𝟏
(2-3)
Errors of this standard occur when the original watermark is unchanged while
other black points entering into the extracted watermark, and even NC = 0.8
can be a strongly blurred watermark.
2.4.3 Tamper assessment function (TAF)
This method calculates the exclusive-OR on the pixel at the same location
from the original and extracted patterns (also used on binary watermarks). (2-4)
shows the formula and lower TAF means the better performance (TAF∈[0, 1]).
TAF = (𝐰𝐢𝐣⊕𝐰 𝐢𝐣)
𝐰𝐣=𝟏
𝐡𝐢=𝟏
𝐡 ∗ 𝐰 (2-4)
Errors happen at the case that the pixel value of the watermark is reversed (1
to 0, and 0 to 1), the TAF will be 1 but people can easily recognize that these
two watermarks are the same.
2.4.4 Similarity measure (Sim)
The formula is shown at (2-5).
Sim = 𝐰𝐢𝐣∗ 𝐰 𝐢𝐣
𝐰𝐣=𝟏
𝐡𝐢=𝟏
𝐰 𝐢𝐣 ∗ 𝐰𝐢𝐣𝐰𝐣=𝟏
𝐡𝐢=𝟏
(2-5)
3. Comparison of watermarking & video fingerprinting
This section starts to talk about the main topic of this report: the comparison of
performances between digital watermarking and video fingerprinting. I’ll briefly
introduce the cases of attacking and image processing then explain the core difference
between these two copy detection methods.
There is a question you may ask, why comparing image watermarking with video
fingerprinting? The first reason is that the book I read is about image watermarking,
and the second reason is that almost all the video fingerprinting techniques are based
on features of image (frames of videos), so starting with the image-based properties
will gives a basic and compact explanation.
I’ll focus on the robust watermarks because the goal of this group is to detect copies
or illegal ownership.
3.1 Types of attacking & image processing
Sometimes the video or image may suffer from the noise or data losing and make the
quality worse, and this kind of distortion can be seen as types of attacking, so here we
just focus on the classification of attacking. Fig 3.1 and fig 3.2 give examples of
attacks on images.
3.1.1 Robustness attack: Using kinds of image processing techniques to change the
original videos or images.
Noising attack: Gaussian, pepper and salt noise.
Filtering attack: blurring, sharpening.
Geometric attack: rotation, shifting, zooming (image size changed or not),
cutting.
Lossy compression attack or data-stored form changing: JPEG, JPEG2000
Palette attack: Gamma correction, Change of the hue, saturation, brightness or
even changing a colorful image into gray-level or binary type.
There is a special case videos will face, the frame drop, meaning several frames in a
video get lost.
3.1.2 Specific attack: Aiming at the algorithm used for embedding watermarks or
calculating the fingerprint, the damage is usually really strong.
3.1.3 Background changing: This is a severe image and video processing to make a
negative detection. The attacker just leaves the important part of the image or video
and changes the other part, such as changing the background.
(a) Original image
(b) Blurring (5x5 window)
(c) AWGN (SNR = 24 dB)
(d) Gamma correction (+20%)
(e) JPEG (quality factor of 30)
(f) Rotation 180%
Figure 3.1: Examples of attacks.
3.2 The core difference between watermarking and video
fingerprinting
The core difference is “changing the original images and videos or not”. Most of the
watermarking techniques change the pixel values of the image or frame in the video.
Only a small part of them don’t change the pixel values but use the information of the
image and the watermark together to generate a secrete key, such as the VQ-based
watermarking. While the video fingerprinting techniques never change the pixel
values and never generate any secrete key.
(a) Original image
(b) 1x1 dithering
(C) 4X4 dithering
(c) Error diffusion
Figure 3.2: Examples of attacks, a 8-bit gray-level image gas been transformed into binary image.
Watermarks should be inserted before the videos or images having been propagated or
sold, while “the video fingerprint itself is the watermark”, so nothing will be inserted.
A watermark can be seen as a kind of noise added in the original image and the
objective of attacking is to destroy the watermark but maintain the quality of the
image wanted as better as possible, so the embedding algorithm of watermarking is
very important. If a watermarking method randomly chooses points inside the image
and changes pixel values, which makes just a little relation between the original image
and the watermark, then it’s easily to be destroyed.
A better watermarking technique should have higher correlation between the original
image and the watermark pattern. The VQ-based method changes one bit of the
corresponding index of the randomly chosen block by pixel values of the watermark
to be the secrete key, while the frequency-based method inserts the pattern into the
low frequency part which makes it robust against high frequency noise and hardly to
be destroyed just by space-domain attacking. In order to resist the JPEG block
artifacts or the zooming effect, watermarks use the ordinal or average properties of
blocks of images are created. General speaking, new watermarking techniques are
created to resist kinds of distortion and attacking, and more kinds it can resist, the
better it is. Additional memory is needed to store the seed, secrete key, the watermark,
and the visible watermarking should give legal customers or users the secrete keys
and algorithms to restore the watermarked image.
On the other side, Video fingerprinting use the properties of videos to make the
signatures, such as color histogram, block motion, trajectories of points of interest,
that’s why we called itself the watermark. Because of the signatures based on contents
of videos, as long as the video could still be accepted by the users (which means the
quality is still ok, and the important information of the video still exists.), a robust
video fingerprinting algorithm will have the ability to detect the copies.
Among kinds of attacks, noise attack, filtering, lossy compression attack or
data-stored form changing, and palette attack are easier to be overcome by these two
copy detection techniques. Ordinal, block averaging, just using the brightness image
after color transformation are good plans.
Background changing and important patterns cutting (just reserve the important part
like a person or a building) result in strong challenge of video copy detection. Local
descriptors of video fingerprinting can effectively deal with this problem by just
extracting signatures from points of interest, but still need great point-of-interest
extraction algorithm. Watermarks can also be inserted in to these key points, while
without any random processing, attackers can find the same key points and do strong
damage on them.
Rotation, shift and zooming are also disasters. Difficulties for global descriptors of
video fingerprinting occur because the features are stored as a signature according to
its location inside a frame. Complicated signature-extraction and voting algorithms
are needed to figure out these problems, and local descriptors are great solutions.
While for watermarks, locations for inserting them are randomly chosen, after these
three distortions, same coordinate locations will be chosen but the pixels there stand
for different information in the image. An embedding algorithm against rotation, shift
and zooming are needed to be found.
There are three basic advantages makes video fingerprinting dominant in video copy
detection according to robustness against attacks and also properties of execution.
3.2.1 Robustness: The basic concept of video fingerprinting is to find the symbolic
features of videos (or frames), and researchers are trying to find pertinent and
compact descriptions which can be used to determine if two videos (one in the
database of owner and one from the Internet) are the same even if one of them
has undergone noises or attack. While watermarking needs a delicate
embedding algorithm to generate strong correlations between watermarks and
the original data.
3.2.2 Signature Extraction after being processing: Let’s talk about how to
execute video fingerprinting techniques. The original data of the owners are
collected as a database, and the video fingerprints are calculated for each
video and stored also in the database (video fingerprint database for a specific
algorithm, not stored in videos). If we want to test videos on the Internet or
from a suspicious illegal owner, just using the corresponding algorithm to
extract signatures of them and through a delicate voting algorithm to
determine they’re copies or not. While watermarking inserts a pattern inside
the original data, and stores seeds, keys and specific watermarks in the
database. When testing, using the corresponding algorithm to extract
watermarks from suspicious owners and see if the watermarks are the same as
what legal owners put inside.
There is a strong advantage of video fingerprinting: several algorithms can be
used at the same time. Though several watermarks can be put into an image
by different algorithm to enhance robustness, the work should be done before
images have been propagated, while video fingerprinting doesn’t have this
trouble. The tester can use any algorithm he want on the test video as long as
he generate the corresponding video fingerprints on all the videos in the
database, and this work can be done after the video has been propagated. A
tester can use different algorithm with different features (some are faster but
inaccurate, and some are robust but slower) to do a progressive test, and use
the best one on the specific distortion case.
Finally, a person who wants to attack the data can aim at a specific embedding
algorithm and results in a strong damage on watermarks, while video
fingerprinting doesn’t worry about it because there is still nothing been
inserted.
3.2.3 Fast searching: The last advantage is about searching the copies on the
Internet. There are tones of videos and images on the Internet, and we can’t
check if an image is likely to be a copy or not one by one. A same pattern,
same seed, same algorithm can be used on several data, so by extracting
patterns, we cannot directly determine if the detected one is the same as
(maybe with some processing) the one in the database.
While video fingerprinting can used to extract signatures of data online, and
the detected one has higher probability to be a copy because the signature is a
symbolic form of a video, and different videos can extremely hardly to have a
same signature.
4. Conclusion
Although a great embedding and extraction algorithm are possible to be found to
improve the performance of watermarking, the fact that it has to consider suitable
locations inside the original image, combination algorithms of watermarks and
original images and skills of cryptography and coding theory makes the algorithm
rather complicated. While video fingerprinting only concern how to give compact and
pertinent descriptions of videos and parts of work can be down by computer vision
and machine learning method, which makes it easier.
From many aspects, video fingerprinting seems to dominate the performance of video
copy detection than watermarking. But with different purposes, different cases of
attacking, different data types, and even different laws of intellectual property rights,
there exists several uncertainties and variations in the researching of copyright and
security management. So I cannot assert that video fingerprinting is the best technique
for video fingerprinting, but it’s really worth researching and still has potential to be
better.
Reference
[1] Papers listed in the report “Introduction to Video Fingerprinting”.
[2] Z.X. Pan, Z.C. Zhang, Y.Z. Lin. A Challenge to image processing-Digital
Watermarking techniques. Mc Graw Hill, 2007.