fundamental techniques for blind digital watermarking

Fundamental Techniques for Blind Digital WatermarkingCS 591: Forensics & Security (Fall 2011)

Daniel C. Cannon For: Professor Fernando Perez-GonzalesDept. of Electrical & Computer Engineering

December 9, 2011 University of New Mexico

Introduction

Blind digital watermarking encompasses a variety of techniques for embedding information in a digital

signal that only a receiver with knowledge of a secret key, and no knowledge of the original host signal, can

detect [3, 1]. While different applications of blind digital watermarking impose different design constraints,

most techniques attempt to optimize the trade-offs between three opposing objectives: [5]

1. Bandwidth. The size of the message that can be embedded must be sufficiently large.

2. Robustness. The watermark must survive deliberate and unintentional attacks, such as compression

and rescaling of an image.

3. Imperceptibility. It must not be apparent to a casual viewer that the signal has been altered, nor

should the presence of a watermark be otherwise detectable without knowledge of a secret key.

In this report, I will present and analyze two popular classes of techniques for embedding and decoding

watermarks in digital images: spread-spectrum watermarking and quantization-based watermarking. In

Section 1, I will define spread-spectrum watermarking and discuss a simple scheme for applying spread-

spectrum watermarking to embed information in the pixel domain of an image. In Section 2, I will present an

alternative spread-spectrum scheme that embeds information in the discrete cosine transform (DCT) domain

of an image. In Section 3, I will describe dither modulation [2], a form of quantization index modulation

(QIM), and show that this approach can achieve far greater bandwidths, and with greater robustness to

interference, than spread-spectrum techniques. Finally, in Section 4, I will apply dither modulation in the

DCT domain and characterize its performance.

FUNDAMENTAL TECHNIQUES FOR BLIND DIGITAL WATERMARKING CANNON

1 Additive Spread-Spectrum Watermarking in the Pixel Domain

1.1 Overview

While some of the earliest techniques for data hiding, such as quantize-and-replace schemes [2], were ca-

pable of high message bandwidths, practical applications of digital watermarking, such as identifying the

copyright holder of an image, generally require a minimal amount of embedded data [1], but with high ro-

bustness and low perceptibility. Additive spread-spectrum techniques [6] attempt to address this challenge

by redundantly embedding information about the value of a single bit in multiple locations in the signal,

thus sacrificing bandwidth for robustness.

Conceptually, additive spread-spectrum embedding involves adding a small amount of noise to each

pixel (or other coefficient) of an image. While the added noise appears to be uncorrelated white noise, it

can be partitioned by a pseudorandom interleaver into sets corresponding to each bit of a message, and each

noise term in the set constitutes, effectively, a vote about the true value of the bit.

More concretely, we assume that the sender and the receiver of the message are privy to a secret key k.

Using this secret key, we pseudorandomly partition the coefficients of the image Z into non-overlapping sets

Si, with each set corresponding to a bit bi ∈ {−1,+1} of our message, and we generate a pseudorandom

matrix B of the same dimensionality as our image where ∀x ∈ B : x ∈ {−1,+1}. We can then use this

matrix B and a perceptual mask A (see Section 1.2) to define a modulation matrix,

Φi[m,n] =

A[m,n]B[m,n], (m,n) ∈ Si

0, otherwise.

(1)

Once the modulation matrices for each bit bi of the message are computed, the watermark can be

computed as

W[m,n] =

l∑i=1

biΦi[m,n] (2)

and our new, watermarked image is simply the sum of the original image and the watermark,

Z′ = Z + W. (3)

While the embedding of a spread spectrum watermark is relatively straightforward, the method for

recovering the embedded message from a watermarked, and possibly attacked, image Z′ is far less obvious

under the assumption that the receiver has no knowledge of the original signal.

2


Consider first the inner product of the image Z′ with a modulation matrix Φi,

ρi = 〈Z′,Φi〉 (4)

=∑m

∑n

(Z[m,n] + biΦi[m,n]) Φi[m,n] (5)

= bi∑m

∑n

(Φi[m,n])2 +∑m

∑n

Z[m,n]Φi[m,n] (6)

= bi∑m

∑n

(Φi[m,n])2 + 〈Z,Φi〉. (7)

We recognize that Eqn. 7 is the sum of an information term and a noise term, 〈Z,Φi〉. That is, while

we cannot compute Φi exactly, we can closely estimate it, and we know that the information term will be

positive if bi = +1 and negative if bi = −1. More importantly, while we do not have knowledge of Z, we

do know that E(Φi) ≈ 0, and so expect that the noise term should be comparatively small. Thus, we expect

that

ρi ≈ bi∑m

∑n

(Φi[m,n])2 . (8)

We therefore begin by computing the modulation matrices Φ′i for the image Z′. We can then compute

the mean value of the inner product of Z′ with each Φ′i,

µ = E(〈Z′,Φ′i〉

)(9)

=1

l

l∑i=1

〈Z′,Φ′i〉 (10)

yielding an approximate measure of the expected value of the noise term. If we then compute the sum of the

squared values of each Φ′i,

νi =∑m

∑n

(Φ′i[m,n]

)2. (11)

we can then obtain an estimated reconstruction of the original message, given by

ri =

−1, |〈Z′,Φ′i〉 − µ− νi| < |〈Z′,Φ′i〉 − µ+ νi|

+1, otherwise.

(12)

1.2 Computation of the Perceptual Mask

Because the human visual system is most sensitive to disruptions in smooth regions of images, an ideal

perceptual mask should concentrate alterations in regions of the image that are already noisy. One simple

3


(a) Original image. (b) Perceptual mask. (c) Watermark. (d) Watermarked image.

Figure 1: The embedding of a 128-bit message in a 1024×1024 image using spread-spectrum watermarking

in the pixel domain.

method for identifying these regions is to compute the instantaneous gradient magnitude at each pixel in the

image. That is,

|∇Z| =

√(∂Z

∂x

)2

+

(∂Z

∂y

)2

. (13)

While Eqn. 13 takes a concise analytical form, it is only well-defined if Z is a continuous quantity. In

the case of a discrete image, we must resort to approximating the gradient by applying a Sobel filter. That

is, we first convolve the image with two 3× 3 kernels,

Gx =

−1 0 +1

−2 0 +2

−1 0 +1

∗ Z (14)

and

Gy =

−1 −2 −1

0 0 0

+1 +2 +1

∗ Z. (15)

Then, the gradient (our perceptual mask) can be computed as

A =√

Gx2 + Gy

2. (16)

The resulting perceptual masks, an example of which is depicted in Figure 1(b), have large values around

the edges of objects in an image and in other noisy regions, as was desired.

4


(a) JPEG25 (b) SCALE25 (c) AWGN1000

Figure 2: Three distortions applied to the “man” image. (a) The image is compressed using the JPEG

algorithm with a quality factor of 25%. (b) The image is rescaled to 25% of its original size. (c) The image

is distorted by an AWGN attack with σ = 1000.

1.3 Experimental Design

A successful watermarking scheme must be robust to a number of various distortions and attacks, including

additive noise, filtering, cropping, compression, rotation and scaling, statistical averaging, and multiple wa-

termarking [5]. It should be immediately obvious that spread-spectrum watermarks (and, in fact, watermarks

generated by all of the techniques explored in this report) can be easily removed with a rotation or cropping

attack without the inclusion of markers to reorient the image. It will therefore prove more interesting to

consider the following three classes of common distortions:

1. Additive White Gaussian Noise (AWGN) Attack. The pixels of the image are distorted by AWGN

with mean 0.0 and variation σ.

2. JPEG Compression Attack. The image is compressed using the lossy JPEG algorithm with a vari-

able quality factor (ranging from 0% to 100%) and then converted back to its original format.

3. Scaling Attack. The image is rescaled (undersampled) to a fraction of its original size and then

rescaled (oversampled) to its original size.

For each attack, a random message with a specific bit rate (in bits per pixel) was embedded in the “man”

image. The image was then attacked, and the message was decoded and compared to the original. This

process was repeated for five trials at each bit rate.

5


-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

10-6

10-5

10-4

10-3

10-2

10-1

100

Ave

rag

e B

ER

Bits of Information per Pixel

No attackσ = 10

2

σ = 103

σ = 104

(a) Resilience to AWGN attacks.

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

10-6

10-5

10-4

10-3

10-2

10-1

100

Avera

ge B

ER


No attackJPEG99JPEG50JPEG25

(b) Resilience to JPEG attacks.

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

10-6

10-5

10-4

10-3

10-2

10-1

100

Ave

rag

e B

ER


No attackRescale by 99%Rescale by 50%Rescale by 25%

(c) Resilience to scaling attacks.

Figure 3: The average error performance of spread-spectrum watermarking in the pixel domain for various

attacks and embedding rates.

1.4 Results

Unfortunately, systematically quantifying the amount of perceptible disruption caused by modifying an

image is difficult without making broad assumptions about the human visual system or convening a focus

group. However, as the reader will hopefully agree upon inspecting Figure 1(d), the visual disruption caused

by the embedding is subjectively minimal to nonexistent.

Figure 3 summarizes the outcomes that can be quantitatively measured. As expected, spread-spectrum

exhibits a relatively high level of robustness to the AWGN attack (Figure 3(a)) because, when summing over

the respective subsets, the introduced noise tends to cancel itself. Spread-spectrum also survives each of the

JPEG attacks (Figure 3(b)) surprisingly well, again because of the inherent redundancy in the embedding

method. However, rescaling attacks (Figure 3(c)) present a far greater challenge; with a rescaling ratio of

25%, spread-spectrum barely outperforms random guessing, even for embedding rates as low as 10−5 bits

per pixel.

More critically, host interference alone (with no additional attacks) is sufficient to limit the embedding

rate to less than 10−3 bits per pixel (in the case of a 1024 × 1024 image, as presented here, a little over

a kilobit of information) if perfect reconstruction is desired. This result makes clear that spread-spectrum

should only be used when bandwidth is not a priority.

2 Spread-Spectrum Watermarking in the DCT Domain

2.1 Overview

While spread-spectrum was presented as a technique for embedding a watermark in the spatial (pixel) do-

main of an image, it can be readily adapted, with minor modification, to operate on image coefficients in

6


many transformed domains. One that is commonly used in image processing applications, and particularly

in compression algorithms like JPEG, is the discrete cosine transform (DCT) domain.

The DCT is almost identical to the discrete Fourier transform (DFT), which represents a signal as a

weighted sum of sines and cosines, except that the DCT instead represents a signal as a weighted sum of

cosines alone. This minor difference obviates the need for complex coefficients, which is one reason the

DCT is a popular choice for many applications.

The two-dimensional DCT of an image Z is defined by the unsightly function

C{Z}[p, q] = αpαq

M−1∑m=0

N−1∑n=0

Z[m,n] cosπ(2m+ 1)p

2Mcos

π(2n+ 1)q

2N(17)

where

αp =

1√M, q = 0√

2M , 1 ≤ q ≤ N − 1

(18)

and

αq =

1√N, q = 0√

2N , 1 ≤ q ≤ N − 1

(19)

are normalization constants.

Once an image Z has been transformed by the DCT into a matrix of coefficients Y, this operation can

be reversed by applying the inverse DCT,

C−1{Y}[m,n] =M−1∑p=0

N−1∑q=0

αpαqY[p, q] cosπ(2m+ 1)p

2Mcos

π(2n+ 1)q

2N. (20)

While one could transform the entire image into the DCT domain, it is generally more practical to apply

the DCT block-wise. That is, we will divide the image Z into 8 × 8 blocks and then apply the DCT to

each block. The transformed blocks are then recombined to form a new matrix Y. Next, we will embed

our message b in the matrix Y using the spread-spectrum technique described in Section 1; however, we

must take care not to embed any information in the DC (0,0) coefficient of any block, as this would create

substantial distortion. After embedding the message in Y to produce a new matrix Y′, we can apply the

inverse DCT in the same block-wise fashion to yield our watermarked image Z′.

To decode the message embedded in a watermarked image Z′, we need only apply the DCT block-wise

to obtain Y′ and then demodulate the message using the same approach as described in Section 1.

7


2.2 Generation of the Perceptual Mask

Generating a perceptual mask for use in the DCT domain is markedly different from generating a perceptual

mask for use in the spatial domain.

Following the approach taken by the authors in [4], we will define a visibility threshold T for the coef-

ficient (i, j) by

log T (i, j) = log

(Tmin

(fi,0

2 + f0,j2)(

fi,02 + f0,j

2)− 4(1− r)fi,02f0,j

2

)+K

(log√fi,0

2 + f0,j2 − log fmin

)2

(21)

where fi,0 and fi, j are the vertical and horizontal frequencies of the DCT basis functions, Tmin = 1.1548

is the minimum value of T (i, j) for the spatial frequency fmin = 3.68 cycles/degree, r = 0.7, and K = 1.728.

The block-corrected visibility threshold is defined by

T ′(i, j) = T (i, j)

(X0,0(i, j)

X0,0

)ar(22)

where X0,0(i, j) denotes the value of the DC (0, 0) coefficient corresponding to the block in which the

coefficient (i, j) resides, X0,0 = 1024 is the average luminance of an 8 × 8 block on the screen, and

ar = 0.649.

Finally, the perceptual mask A is given by

A[k1, k2] = 4 ·(

1 +(√

2− 1)δ(l1)

)·(

1 +(√

2− 1)δ(l2)

)· γ · T ′(l1, l2) (23)

where li = ki mod 8, δ(·) is the Kronecker delta, and γ < 1 is an intensity scaling factor.

2.3 Experimental Design

2.4 Results

Again, the watermarked image (Figure 4(d)) appears unmodified from the original. But, as can be seen

in Figure 4(c), the spatial watermark produced by embedding in the DCT domain contains far less of the

original structure of the image and appears more uniformly distributed than that produced by embedding in

the spatial domain directly (see Figure 1(c)).

In terms of performance, embedding in the DCT domain proved more robust to host interference than

in the spatial domain; however, the difference was relatively small, and may be more a product of the

perceptual mask and parameter selection than the domain choice. Embedding in the DCT domain gained

little noticeable robustness to the attacks surveyed.

8


(a) Original image. (b) Perceptual mask. (c) Watermark. (d) Watermarked image.

Figure 4: The embedding of a 128-bit message in a 1024×1024 image using spread-spectrum watermarking

in the DCT domain.

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

10-6

10-5

10-4

10-3

10-2

10-1

100

Ave

rag

e B

ER


No attackσ = 10

2

σ = 103

σ = 104

(a) Resilience to noise attacks.

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

10-6

10-5

10-4

10-3

10-2

10-1

100

Avera

ge B

ER




-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

10-6

10-5

10-4

10-3

10-2

10-1

100

Ave

rag

e B

ER




Figure 5: The average error performance of spread-spectrum watermarking in the DCT domain for various

attacks and embedding rates.

3 Quantization Index Modulation in the Pixel Domain

3.1 Overview

Quantization Index Modulation (QIM) [2, 3] is a surprisingly simple, high-bandwidth watermarking tech-

nique. Unfortunately, however, the only descriptions of the technique in the open literature are geared toward

an audience well-versed in the terminology and methodology of image processing. I will instead prefer to

define QIM in terms that should be familiar to anyone with a basic background in discrete math.

The fundamental idea underlying QIM is to replace each pixel with a value drawn from one of two sets,

Λ−1 or Λ+1, each corresponding to a different bit value, −1 or +1. That is, we will embed one bit per each

pixel, and the embedding will take the form of quantizing the pixel—replacing it with the most similar value

in a set—using one of two quantizers.

In selecting our codeword sets—the sets of values to which the pixels of our image can be transformed—

9


it is important to consider the trade-offs between the potential visual disruption of the image and the robust-

ness of the message to interference. In particular, our codeword sets should span the range of possible pixel

values in the image, and there should exist a codeword that is reasonably close to each pixel value, but not so

close as to render codewords of each set indistinguishable from each other. A sensible candidate to address

these challenges is a lattice, a set of equally spaced points.

For each possible bit value, −1 and +1, we will define an infinite lattice parameterized by a step size ∆

and an offset βi. More concretely, we will define Λi as

Λi = {n∆ + βi | n ∈ Z} (24)

where ∆ is common to both lattices and βi is specific to each.

To determine the offset parameters for each lattice, we will begin by using our secret key k to seed a

pseudorandom number generator, which we will then use to select β−1 such that

β−1 ∼ Uniform(−∆/2,∆/2). (25)

The offset parameter for the lattice Λ+1, β+1, is then a simple shift of β−1 by half the step size. That is,

β+1 = β−1 + ∆/2. (26)

This parameterization guarantees a few important qualities. First, our codeword sets are non-overlapping

(assuming ∆ 6= 0). Second, each lattice is simply a shifted version of the other. And finally, the union of

both lattices is itself a lattice; thus, each lattice is a sub-lattice of the lattice with step size ∆/2 and offset β−1.

With our codeword sets determined, we can then define our two quantizers Qi by

Qi(v) = arg minλ∈Λi

|λ− v|. (27)

That is, the quantization a value v with the i-quantizer is the value λ in the set Λi with the minimal divergence

from v.1 Then to embed our message b in the image x, we must simply quantize each pixel with the

quantizer corresponding to the bit to be placed at that location,

yn = Qbn(xn). (28)

To implement this scheme, one could, of course, construct both codeword sets (or rather subsets whose

values lie within the range of possible pixel values) and then search these sets each time a pixel is to be

1Should v be equidistant from two values in Λi, we can simply choose one of these arbitrarily.

10


quantized; however, this approach is quite inefficient. Because both codeword sets are lattices, we instead

recognize that, for a given v, Qi(v) will be either

h1(v, i) =

⌊v − βi

∆

⌋·∆ + βi (29)

or

h2(v, i) =

⌈v − βi

∆

⌉·∆ + βi. (30)

We can therefore redefine the quantizerQi(·) by an equivalent, more efficiently-computable piecewise func-

tion,

Qi(v) =

h1(v, i), |h1(v, i)− v| < |h2(v, i)− v|

h2(v, i), otherwise.

(31)

Finally, to reconstruct our original message from the watermarked, and potentially attacked, image y we

need simply compare the value of each pixel with its quantized values. That is, if yn is closer to Q−1(yn)

than it is to Q+1(yn), we assume the embedded value was −1. More precisely, our reconstructed message

r is given by

rn =

−1, |Q−1(yn)− yn| < |Q+1(yn)− yn|

+1, otherwise.

(32)

3.2 Repetition coding

While the scheme described above requires the length of the embedded message to equal the number of

pixels in the image, it is often desirable to embed a shorter message. Rather than simply pad the shorter

message to the requisite length, it seems more sensible to utilize these junk bits for additional redundancy.

Therefore, when embedding a message with fewer bits than there are pixels in the image, I expand the

message using a simple repetition code. That is, the message is repeated multiple times (and possibly a

fractional number of times) until the total number of bits is equal to the number of pixels in the image. The

repetition-coded message is then shuffled by a pseudorandom interleaver and embedded in the image.

When a message is decoded, it is first de-interleaved using the same pseudorandom sequence, and then

each bit corresponding to a position in the original message is counted as a vote for the value of the position.

11


(a) Original image. (b) Watermark. (c) Watermarked image.

Figure 6: The embedding of a 1024 × 1024 bit message in a 1024 × 1024 image using QIM in the pixel

domain.

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

10-6

10-5

10-4

10-3

10-2

10-1

100

Ave

rag

e B

ER


No attackσ = 10

2

σ = 103

σ = 104

(a) Resilience to noise attacks.

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

10-6

10-5

10-4

10-3

10-2

10-1

100

Avera

ge B

ER




-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

10-6

10-5

10-4

10-3

10-2

10-1

100

Ave

rag

e B

ER




Figure 7: The average error performance of QIM embedding in the pixel domain for various attacks and

embedding rates.

12


3.3 Results

When subjected to only host interference, the QIM scheme achieved a bit error rate of 0% with an embedding

rate of 1 pixel per bit, far exceeding the performance of spread-spectrum embedding. However, as shown in

Figure 8, QIM embedding is still quite sensitive to non-host interference, and the performance degradation

is non-graceful.

Nonetheless, QIM embedding, when coupled with repetition coding, outperforms spread spectrum when

considering bit error rate versus embedding rate for simple attacks, such as the addition of small amounts

of AWGN (Figure 7(a)), high quality JPEG compression (Figure 7(b)), and minor rescaling (Figure 7(c)).

Meanwhile it performs worse than spread spectrum for more disruptive variants of these attacks.

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

-10 -5 0 5 10 15 20 25 30 35

Avera

ge B

ER

WNR [dB]

Figure 8: The error performance of demodulating a QIM-embedded 1024×1024 bit message after attacking

the image with AWGN to yield the given watermark-to-noise ratio (WNR).

4 Quantization Index Modulation in the DCT Domain

Finally, as with spread-spectrum, we can easily apply QIM to image coefficients of many transformed

domains. To use QIM in the DCT domain, we need only prohibit embedding in the DC coefficient of

any block, which consequently scales our message bandwidth by a factor of 63/64 (assuming a block size of

eight).

The error performance of QIM embedding in the DCT domain for variable-strength AWGN attacks, as

summarized in Figure 10, roughly parallels that of QIM embedding in the pixel domain. However, QIM

embedding in the DCT domain, even when subjected to only host interference, resulted in a small, but non-

13


zero, bit error rate of 5× 10−3. While not a significantly high bit error rate, given the channel capacity, it is

still an unexpected result that warrants further investigation.

References

[1] BENDER, W., GRUHL, D., MORIMOTO, N., AND LU, A. Techniques for data hiding. IBM Systems

Journal 35, 3&4 (1996), 313–336.

[2] CHEN, B., AND WORNELL, G. W. Quantization index modulation: A class of provably good methods

for digital watermarking and information embedding. IEEE Transaction on Information Theory 47, 4

(May 2001), 1423–1443.

[3] EGGERS, J. J., SU, J. K., AND GIROD, B. A blind watermarking scheme based on structured code-

books. In IEE Seminar on Secure Images and Image Authentication (London, U.K., Apr. 2000).

[4] HERNANDEZ, J. R., AND GONZALEZ, F. P. DCT-domain watermarking techniques for still images:

Detector performance analysis and a new structure. IEEE Transactions on Image Processing 9, 1 (2000),

55–68.

[5] PEREZ-GONZALEZ, F., AND HERNANDEZ, J. R. A tutorial on digital watermarking. In Proc. IEEE

33rd International Carnahan Conference on Security Techonology (Madrid, Spain, Oct. 1999).

[6] SMITH, J. R., AND COMISKEY, B. O. Modulation and information hiding in images. In Proc. First

Information Hiding Workshop (Cambridge, U.K., 1996).

14


(a) Original image. (b) Watermark. (c) Watermarked image.

Figure 9: The embedding of a 1024 × 1024 bit message in a 1024 × 1024 image using QIM in the DCT

domain.

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

-10 -5 0 5 10 15 20 25 30 35

Avera

ge B

ER

WNR [dB]

Figure 10: The error performance of demodulating a 1024× 1024 bit message, embedded using QIM in the

DCT domain, after attacking the image with AWGN to yield the given watermark-to-noise ratio (WNR).

15

fundamental techniques for blind digital watermarking

Documents