[ieee signal processing (wcsp 2011) - nanjing, china (2011.11.9-2011.11.11)] 2011 international...

An Audio Watermarking of Wavelet Domain Based on BP Neural Network

Liang Chen, Huan Hao, Guohong Zheng Department of Electronic Information Engineering ICE, PLAUST

Nanjing, China

Abstract—This paper proposes an audio watermarking algorithm of wavelet domain based on BP neural network. We introduce BP neural network into audio watermarking systems, embed watermark in the low-frequency, important components of wavelet coefficients. In order to improve the efficiency and accuracy of BP network training, design a high-efficient, high-precision, fast training algorithm of BP neural network, which improve the performance of the watermarking system effectively. Experimental results show that the capacity of the proposed scheme is relatively high-transparent and high-robust, the improved fast training BP algorithm converges fast, reaches high accuracy.

Keywords-audio watermarking; wavelet coefficients; neural network

I. INTRODUCTION Digital audio watermark is considered as an effective way

to resolve the copyright protection of digital audio products. It embeds the logo into the original audio at the premise of not affecting the use of audio signal. Thus copyright and integrity of audio file can be confirmed through watermark detection. [1]

Existing digital audio watermarking methods can be divided into time-domain methods and transform domain methods. The former embed watermark by modifying the host’s time-domain [2] samples. The typical examples are the least significant bit (LSB) [3] method and echo masking method, etc. Such methods generally can embed little watermark and have poor resistance to attack. Transform domain methods embed watermark by modifying the host’s transform domain coefficients. Commonly used transforms are the discrete Fourier transform (DFT) [4], discrete cosine transform (DCT) [5], and discrete wavelet transform (DWT) [6], etc.

In this paper an audio watermarking method based on BP neural network is proposed. Embed digital audio watermarking into host’s wavelet coefficients and design a network architecture being fit for watermarking system by

using neural network’s powerful nonlinear and adaptive approximation ability. The receiver can extract watermark without original audio by using neural network.

The remainder of the paper is organized as follows. Section 2 describes how to use BP neural network to embed and extract watermark. Then in Section 3, we introduce a new fast training algorithm of BP neural network. Experiment results and the conclusions can be found in Section 4 and 5.

II. WATERMARKING SCHEME BASED ON NEURAL NETWORK As the wavelet transform has good time-frequency

resolution characteristics, insert the watermark into the low-frequency of the important component of wavelet coefficients, and establish contacts between wavelet coefficients before and after embedding watermark through BP neural network, and then transmit the watermarked audio in the channel and send the neural network weights as side information. In this way, the receiver can extract watermark without the original signal, which has high security and flexibility.

Watermarking scheme consists of three phases: watermark embedding scheme, BP neural network design and watermark extraction scheme.

A. Watermark embedding scheme As the low-frequency of wavelet coefficients of audio signal contain most of the energy, according to the psycho-acoustic and linguistic- acoustic’s test for the human ear’s characteristics shows: although most of the audio signal energy contained in the low-frequency components, but their contribution to the resolution is not great. Taking the attack in the channel into account, such as compression and filtering operations, high-frequency components are often easily destroyed and lost, which is negative for the digital watermark. Therefore, we consider embedding the digital watermark into the low-frequency of the wavelet coefficients, which both enhance the watermark’s robustness and minimize the watermark’s effect on the quality of the sound.

First conduct L -Mallat wavelet decomposition for original audio，we can get wavelet coefficients C ：

{ }1 2 1

...L L L L− −

=A D D D D

C C , C , C , C , , C

978-1-4577-1010-0/11/$26.00 ©2011 IEEE

Where LAC is the low-frequency coefficients of the L -

level wavelet; mDC ( 1 m L≤ ≤ ) is the low-frequency

coefficient of the m-level wavelet. In order to effectively improve the convergence speed of

neural network training, the choice of embedded wavelet coefficients adopt the following method: conduct L - level wavelet decomposition for the signal S , then collocate the low-frequency of the coefficients to make their absolute value in descending order. Then we get 1C :

{ }1 1 | ( ) | , 1L

i A i LC C I i M= = ≤ ≤C (1)

Where, { ,1 }i L

I i M= ≤ ≤I is the coefficient index of absolute value in descending order. If we adopt Daubechies-N

wavelet, we get 11

2L

L

MM N−

−= +

. If we embed K bit per

frame, K must meet LK M≤ . Modify K coefficients to

embed watermark from 1C and other coefficients remain.

LA′C may be expressed as:

1( ) sgn( ( )) (1 )L LA AC α′ = ⋅ ⋅ +C I I C W (2)

α is the embedding strength, which controls the trade-off between robustness and transparency. Reconstruct the wavelet with the modified

LA′C :

{ }− −

′ ′=L L L 1 L 2 1

A D D D DC C , C , C , C , ......, C (3)

Thus restore the watermarked signal ′S .

B. Neural network design BP neural network has good nonlinear approximation

ability. It can establish the relationship between original wavelet coefficients and watermarked wavelet coefficients by adjusting the network weights and bias and reflect audio features’ changes before and after embedding watermark. Owning to the use of neural network, we can extract watermark without the original signal and thus reduce the limit in practical applications.

Using neural network based on BP algorithm will meet the problem: how to determine the optimal structure of BP network, namely how to choose the network layers and the number of neurons. If we can’t select an appropriate network, it is difficult to significantly improve network performance, even if a large number of improvements have been made to the training algorithm. But there is no general theory to solve this problem so far.

In 1987, Hecht-Nielsen proved that for any closed interval of a continuous function can be approximated by a network with one hidden layer when each node has a different threshold. [7] Therefore, a three-layer neural network based on BP algorithm can approximate arbitrary n-dimensional to m-dimensional. So this paper adopt three-layer BP network, namely, the input layer - hidden layer - output layer network. In order to enhance the resistance to the interference in the

channel, the input layer adopts 9 neurons, the input training set is:

{ ( ) , 5 4}in in L

i i M= ≤ ≤ −T T ， 8L

K M= − (4) where：

{ }1 1 1 1 1 1 1 1 1

4 3 2 1 1 2 3 4( ) , , , , , , , ,

in i i i i i i i i ii C C C C C C C C C

− − − − + + + +=T (5)

The output layer adopts one neuron, the output training set is:

{ ( ) , 5 4}out out L

i i M= ≤ ≤ −T T (6) where：

1( ) { }out i

i C=T (7)

C. Watermark extraction scheme After sending the neural network weight vector and I as

side information to the receiver, we can extract watermark with the trained BP network. First conduct L -level wavelet decomposition for the received signal ′′S ， we can get wavelet coefficients ′′C ：

{ }− −

′′ ′′ ′′ ′′ ′′ ′′=L L L 1 L 2 1

A D D D DC C , C , C , C , ......, C (8)

Where we use the low-frequency of wavelet coefficients ′′

LAC and I to form 2C :

{ }2 2 | ( ) | , 1Li A i LC C I i M′′= = ≤ ≤C (9)

The neural network input vectors are: { ( ) , 5 4}

in in Li i M= ≤ ≤ −D D ， 8

LK M= − (10)

{ }2 2 2 2 2 2 2 2 2

4 3 2 1 1 2 3 4( ) , , , , , , , ,

in i i i i i i i i ii C C C C C C C C C

− − − − + + + +=D (11)

Output is: { ( ) , 5 4}

out out Li i M= ≤ ≤ −D D (12)

1ˆ( ) { }out i

i C=D (13)

We get 81{ } LM

ip −=p ，where：

2 1

4 4

1

4

i i

i

i

C Cp

Cα+ +

+

−=

⋅，1 8

Li M≤ ≤ − (14)

Thus the extraction of the watermark ′W is: sgn( )′ =W p (15)

After decoding the watermark ′W at very low rate, the hidden secret audio can be restored.

III. THE NEW RAPID CONVERGENCE OF TRAINING ALGORITHM

A. Basic BP neural network training problems A good watermarking system should ensure the high

precision of the network’s training convergence and training speed. However, basic BP network training has some problems, resulting in the training of the network weights inefficient:

(1) Training time is long, mainly due to the small learning rate.

(2) Almost can’t train. In the network training process, if the weight is too large, mn will locate in the activation function’s saturated zone, which make its derivative very small and network weights adjustment process almost to a halt.

(3) Local minimum. BP algorithm can make the network weights converge to a solution, but it does not guarantee a global minimum solution in error hyperplane, maybe just a local minimum solution.

For the first question, we can adopt changing learning rate or adaptive learning rate to improve; In order to avoid the occurrence of the second question, we can select a smaller initial weights or using a smaller learning rate which will also increase the training time; For the third issue, it is possible to get better results by using multi-layer network or more neurons, but it will also increase network complexity and training time.

Currently there are two major heuristic BP network training methods: additional momentum method and adaptive learning rate method.

B. Improved BP neural network Additional momentum method and adaptive learning rate

method have improved BP algorithm a lot and have achieved good results. However, our watermarking system needs to have a more efficient training algorithm which convergences fast and reaches the minimum error quickly. We adopt a fast convergence algorithm based on research and improvement. For multi-layer network training, the index of the performance function ( )F x can be expressed as the sum of Q sample collections’ square error:

2

,

1 1 1

2

1

( ) ( ) ( ) ( )

( ) ( ) ( )

MQ Q S

t

q q q q j q

q q j

Nt

i

i

F

v

= = =

=

= − − =

= =

∑ ∑∑

∑

x t a t a e

v x v x

(16)

1 1

1 2

1 1 1 1 1 11,1 1,2 1 1,2,

[ ...... ]

[ ...... ...... ...... ]M

tn

M

S R S S

x x x

w w w b b w b

=

=

x (17)

1 2 1 1( 1) ( 1) ( 1)M Mn S R S S S S −= + + + + ⋅ ⋅ ⋅ + + (18)

1 2

1,1 2,1 1,2,1 ,

( ) [ ...... ][ ...... ...... ]M M

tN

S S Q

v v ve e e e e

==

v x (19a)

MN Q S= × (19b)

Where ,j qe is the error of the thj element of the thq sample. According to Newton algorithm, there are

iterative relations: 1

1k k k k−

+ = −x x A g (20)

Where, 2 ( ) |kk F

=≡ ∇ x xA x , ( ) |

kk F =≡ ∇ x xg x . From formula (16), the gradient is:

( ) 2 ( ) ( )tF∇ =x J x v x (21)

where：

1 1 1

1 2

2 2 2

1 2

1 2

( ) ( ) ( )......

( ) ( ) ( )......

( )

. . . . . . . . . . .

( ) ( ) ( )......

n

n

N N N

n

v v v

x x x

v v v

x x x

v v v

x x x

∂ ∂ ∂

∂ ∂ ∂

∂ ∂ ∂

∂ ∂ ∂=

∂ ∂ ∂

∂ ∂ ∂

x x x

x x x

J x

x x x

1

1

1

1 ,1 1 ,1 1 ,1 1 ,1

1 1 1 1

1 ,1 1 , 2 1,

,1 ,1 ,1 ,1

1 1 1 1

1 ,1 1 , 2 1,

1 , 2 1 , 2 1 , 2 1 , 2

1 1 1 1

1 ,1 1 , 2 1,

......

. . . . . . . . .

......

......

. . . . . . . . .

M M M M

S R

S S S S

S R

S R

e e e e

w w w b

e e e e

w w w b

e e e e

w w w b

∂ ∂ ∂ ∂

∂ ∂ ∂ ∂

∂ ∂ ∂ ∂=

∂ ∂ ∂ ∂

∂ ∂ ∂ ∂

∂ ∂ ∂ ∂

(22)

From formula (21), the Hessian matrix can be expressed as: 2 ( ) 2 ( ) ( ) 2 ( )tF∇ = +x J x J x S x (23)

where：

2

1

( ) ( ) ( )N

i ii

v v=

= ∇∑S x x x (24)

( )S x is negligible as it is generally very small. Put

( )F∇ x and 2 ( )F∇ x into the equation of Newton, we get: 1

1 [ ( ) ( )] ( ) ( )t tk k k k k k

−+ = −x x J x J x J x v x (25)

That’s Gauss - Newton method estimated by SSE, which avoids the difficulties of calculating the second derivative. But the following problem is: maybe = tH J J is irreversible, namely the eigenvalue doesn’t meet the inequation 1{ } 0n

iλ > . Here we improve equations above.

If we add the correction factor µ to the eigenvalue of H to

meet the inequation 1{ } 0n

iλ µ+ > . The matrix is reversible here and the characteristic vector of H remains the same before and after the correction. Put it into the Gauss - Newton formula, we get:

11 [ ( ) ( ) ] ( ) ( )t t

k k k k k k kµ −+ = − +x x J x J x J x v x (26)

It’s obvious that the iterative algorithm of formula (26) is the promotion of Gauss-Newton method. When 0kµ = ,

algorithm reduces to Gauss-Newton method. kµ takes a small value when starting iteration. If iteration can’t reduce ( )F x ,

multiply kµ with a factor greater than 1 and iterate again. At this point, it’s equivalent to the steepest descent method. If

( )F x reduces, divide ξ into kµ and iterate again. At this point, the algorithm is equivalent to the Gauss - Newton method. Our

algorithm is a compromise between the steepest descent algorithm and Gauss - Newton method.

Thus, the following questions focus on how to calculate J . Elements of the matrix are:

, , ,, , ,

, , ,

[ ]m

k q k q i q m mhh l i h j qm m m

l i j i q i j

e e nvs a

x w n w

∂ ∂ ∂∂= = = ⋅ = ×∂ ∂ ∂ ∂

J ，

kSqh M +−= )1( (27)

We define ,

,

,

k qm

i h m

i q

es

n

∂=∂

as a new sensitivity, different from

the basic BP algorithm. When lx is the offset value, we get:

, , ,

, ,

,

[ ]m

k q k q i q mhh l i hm m m

l i i q i

e e nvs

x b n b

∂ ∂ ∂∂= = = ⋅ =∂ ∂ ∂ ∂

J (28)

The same as basic BP algorithm, it back-propagate the sensitivity through the network. But for this new sensitivity, in last layer we will get:

, , , ,,

, , ,

,

( )

( ) ,

0 ,

M Mk q k q k q k qM

i h M M Mi q i q i q

M Mi q

e t a as

n n n

f n i k

i k

∂ ∂ − ∂= = = −∂ ∂ ∂

− ==

≠

(29)

Thus when a vector input the network and we get output, the sensitivity of back-propagation is initialized to:

( )M M M

q qF= −S n (30) Its back-propagation formula is:

1 1( )( )m m m m m

q q q

+ +=S F n W S (31) Thus we get the whole sensitivity matrix of each layer:

1 2

[ | | | ]m m m m

Q= ⋅ ⋅ ⋅S S S S (32)

We can calculate the elements of J with mS and consequently solve the key problems of the improved algorithm.

So iterative steps of the improved algorithm can be summarized as:

(1)Input Sample set and we can get the output mqa and

error Mq q q= −e t a of every layer of the network. Calculate

the sum of square error of all samples ( )F x ; (2)Get the whole sensitivity matrix of all layers with

recursive method and calculate the matrix J ; (3)Calculate 1k k k+∆ = −x x x with formula (26); (4)Repeat calculating the sum of square error with

k k+ ∆x x . If the new sum is smaller than the one calculated in step (1), divide θ （ θ >1 ） into µ and make

1k k k+= + ∆x x x , and then switch to step (1); Otherwise,

multiplyµ with θ and switch to step (3).

IV. PERFORMANCE ANALYSIS

A. Performance analysis of neural network training

Set the goal of the sum of square error to 810SSE −≤ . The size of the training audio is 739, 512 points per frame. Each frame we get a training set and a collection of input and output containing 32 samples. To compare our improved algorithm with basic BP algorithm, we adopt a BP network (9-12-1) to train the same training set. Training steps and time that various algorithms required for convergence are shown in Table 1, from which we can see that our improved algorithm achieves good performance both in the training steps and the convergence time.

For a public audio, the convergence of using basic BP algorithm and our improved algorithm and the relationship between the network output and the output training set outT are shown in Figure 1 and Figure 2. It is obvious that our improved algorithm converges very fast and the precision of convergence is very high.

Figure 1. basic BP algorithm

Figure 2. our improved algorithm

B. Performance Analysis of Audio Watermarking The quality of the audio embedded watermark reflects the

algorithm's transparency index and performance. A frame of audio was chosen to embed and extract watermark based on our improved algorithm. The waveform of original frame S and the frame embedded with watermark ′S are shown in Figure 3. We can measure similarity of the original watermark and extracted watermark by calculating the normalized correlation coefficient ρ with formula (33). The waveform of original audio S and the audio embedded with watermark ′S are shown in Figure 4.

( , )( )( )

t

t t

ρ′⋅

′ =′ ′⋅ ⋅

W WW W

W W W W (33)

Figure 3. waveform compare before and after

embedding watermark in a frame

Figure 4. waveform compare before and after

embedding watermark for a audio Comparison of training steps, convergence time, error

detection bits and SNR are summarized in the table as below:

TABLE 1. PERFORMANCE COMPARISON OF DIFFERENT ALGORITHMS

algorithm Training steps

convergence time(s)）

error detection

bits

SNR (dB)

Basic BP 28864 40.53 0 36.8 Additional momentum method 15988 23.45 0 36.8 Adaptive learning rate method 5558 8.35 1 36.8

Additional momentum method +adaptive learning rate method 1744 2.64 1 36.8

Our improved algorithm 7 0.55 0 34

As the signal would be interfered and even attacked when transmitting in the channel, so the robustness of watermarking scheme must be resistant to various effects. We have tested normalized correlation coefficient ρ under a variety of attacks in the channel, which are shown in Table 2 to Table 4. The larger ρ is, the higher performance is and the stronger robustness is. From the tab, we can see that our watermarking scheme has achieved good performance for a variety of attacks.

TABLE 2. RESISTANCE TO COMPRESSION PERFORMANCE

performance index Compression

algorithm ρ

24kb/s ADPCM 0.9808 32kb/s ADPCM 0.9952

40kb/s ADPCM 0.9957

TABLE 3. RESISTANCE TO MULTIPLE ATTACKS PERFORMANCE

performance index

channel conditions

ρ

No attack 1 Median filter 0.9788 Resampling 0.9753

Gaussian white noise 0.9576

TABLE 4. RESISTANCE TO LOW-PASS FILTERING PERFORMANCE

Low-pass filter ρ ≤3000Hz 1 ≤2500Hz 0.9942 ≤2000Hz 0.9909 ≤1500Hz 0.9861

V. CONCLUSION A novel watermarking scheme for audio based on wavelet

transform and BP neural network is proposed. The technique takes advantage of the time-frequency characteristics of wavelet transform and adaptive nature of BP neural network. Simulation results show that the proposed method has achieved a good performance in the middle audio and also been robust to various attacks. In addition, the structure and weights of neural network can be transmitted alone as the side information through an independent hidden channel. It’s the key to control the extraction of watermark and can also ensure the security of the system.

ACKNOWLEDGMENT This work is supported by National Natural Science

Foundation of China (Grant No. 61072042).

VI. REFERENCES [1] Xiaohong Ma, Linlin Zhao, “A robust audio watermarking method based

on QR decomposition and lifting wavelet transform,” Journal Of Dalian University Of Technology, Vol.50 (2), pp.278-282, March 2010.

[2] Wen.Nung Lie, Li-Chun Chang, “Robust and high—quality time domain audio watermarking based on low frequency amplitude modification,” IEEE Transactions on Multi-media, pp.46-59, August 2006.

[3] PETITCOLAS F A P, ANDERSON R J , KUHN M G, “Information hiding a survey,” Proceedings IEEE, 1999, pp.1062-1077.

[4] Mehdi Fallahpour, David Megias, “High capacity audio watermarking using FFT amplitude interpolation,” IEICE Electronic, pp.1057-1063, June 2009.

[5] Yiping MA and Jigin HAN, “Audio Watermarking in DCT:Embedding Strategy and Algorithm,” Chinese Journal Of Electronics, 2006,pp.1260-1264.

[6] Qin HE, Huaxing ZOU and Jian BAI, “An Audio Hiding Algorithm Based on DWT,” Application Research Of Computers ,pp.118-119,2005.

[7] R.Hecht-Nielsen, “Counterpropagation Networks,” Applied Optics, pp.4976-4984, 1987.

[ieee signal processing (wcsp 2011) - nanjing, china (2011.11.9-2011.11.11)] 2011 international...

Documents