[ieee 2011 international conference on advanced technologies for communications (atc 2011) - da...

4
2011 International Conference on Advanced Technologies for Communications (ATC 2011) Assignment Problem-based Approach for Solving Permutation Ambiguity in Frequency Domain Convolutive Source Separation Vuong Hoang Nam l) , Nguyen Quoc Trung 1) , Tran Hoai Linh 2) 1) School of Electronics and Telecommunications 2) School of Electrical Engineering Hanoi University of Science and Technology No.1, Dai Co Viet Street, Hanoi, Vietnam E-mail: [email protected] Abstract: In this paper, we propose an effective method for blind source separation of convolutive mixtures in the frequency domain. The main difficulty in a frequency approach is the permutation problem. In the proposed method, we use the Assignment Problem (AP) approach to solve permutation ambiguity in the frequency domain. In our work, we apply different algorithms to solve the AP to find the optimal solution. Furthermore, by using a simple check procedure, the algorithm has to only solve the permutation problem in a few frequency bins instead of all. This method has been proven more efficient on computation than the previous methods using the AP approach. Computer simulation experiments with speech data are presented to illustrate the proposed method. I. INTRODUCTION Blind Source Separation (BSS) is an approach to estimate original source signals by using only the information of the mixed signals observed at a sensor array. If source signals are mutually independent and non-Gaussian, we can apply the technique of Independent Component Analysis (ICA) to solve a BSS problem [1]. Let us formulate the BSS model of convolutive mixtures. Suppose N orginal sources are blindly mixed and observed at N sensors. In our case and in most ICA applications, the channel transfer nction is usually modeled by a causal and FIR filter and we have the relations between the observations and the sources in the time domain as follow: N = X i ( n ) = Ii (k) S i ( n -k) , i = 1, N (1 ) i=i k=O where Xi ( n ) is the observation at the ith sensor, S i ( n ) is the jth source, hu (k ) is the channel transfer nction between the jth source and the ith sensor. In our model, the noise is assumed to be negligible as well as the sources are stationary. There are two major approaches of solving the convolutive mixtures using ICA, which all have some advantages and disadvantages [2]. The first is Time Domain (TD) ICA, which is directly applied to the convolutive mixtures. We can separate the mixtures by estimating a set of unmixing FIR filters as expressed by the following equation: N Yi ( n ) = IIwij(k) x / ( n - k) , i = l, N (2) j=i k The FIR filter which is used in separation can be either noncausal [3] or causal [4] depending on the method. This achieves a good result once upon the algorithm converges, but it is computationally expensive because of dealing with convolution operations. The second method is Frequency Domain (FD) ICA, where the convolutive mixtures are first converted to equency domain, then ICA is applied to each equency bin, which is seen now as instantaneous mixtures, since convolution in time domain is equal to multiplication in equency domain. This method is quite simple, but the problem of peutation and scaling is a big challenge since different equency bands may have different permutation and scaling [5]-[7]. In this paper, Assignment Problem (AP) approach solving the peutation ambiguity problem in the equency domain is exploited. Some efficient algorithms have been proposed also. The paper is organized as follows. Aſter introduction in Section I, Section II presents the problem of equency domain separation. Section TIT presents proposed approach. Section IV shows experimental results and some valuable discussions, and the last section is the conclusions. II. FREQUENCY DOIN ICA A. Frequency Domain leA Smaragdis [8] has proposed the Short Time Fourier Transform (STFT) method applying for the convolutive mixtures. Using STFT, the time-domain observed signal are transformed into equency-domain signals. Let X ( n ) be a digital signal and X ( m, T ) = ¢ ( m ) .x ( m + T R) the windowed and time-shiſt version of the signal. The STFT of x ( n ) IS given by K-j X ( k, T ) = I x ( m, T )e- jOJ , m (3) m =O where k = 2 n k is the discrete normalized equency, with K k = O, ... ,K -1 bin index, T = O, .. ,L -1 ame index; ¢ ( m ) is a Hanning (in our case) window of length K and R is the ame shiſting interval for this transform. Then, the convolutive BSS problem in the time domain is converted to multiple instantaneous problems with complex valued data in equency bins. For a fixed equency k, the instantaneous linear mixture can be expressed by the equation 978-1-4577-1207-4/11/$26.00 ©2011 IEEE 291

Upload: hoai-linh

Post on 28-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

2011 International Conference on Advanced Technologies for Communications (ATC 2011)

Assignment Problem-based Approach for Solving

Permutation Ambiguity in Frequency Domain

Convolutive Source Separation Vuong Hoang Naml), Nguyen Quoc Trung1), Tran Hoai Linh2)

1) School of Electronics and Telecommunications 2) School of Electrical Engineering

Hanoi University of Science and Technology

No.1, Dai Co Viet Street, Hanoi, Vietnam

E-mail: [email protected]

Abstract: In this paper, we propose an effective method for

blind source separation of convolutive mixtures in the

frequency domain. The main difficulty in a frequency approach

is the permutation problem. In the proposed method, we use the

Assignment Problem (AP) approach to solve permutation

ambiguity in the frequency domain. In our work, we apply

different algorithms to solve the AP to find the optimal solution.

Furthermore, by using a simple check procedure, the algorithm

has to only solve the permutation problem in a few frequency

bins instead of all. This method has been proven more efficient

on computation than the previous methods using the AP

approach. Computer simulation experiments with speech data

are presented to illustrate the proposed method.

I. INTRODUCTION

Blind Source Separation (BSS) is an approach to estimate original source signals by using only the information of the mixed signals observed at a sensor array. If source signals are mutually independent and non-Gaussian, we can apply the technique of Independent Component Analysis (ICA) to solve a BSS problem [1]. Let us formulate the BSS model of convolutive mixtures. Suppose N orginal sources are blindly mixed and observed at N sensors. In our case and in most ICA applications, the channel transfer function is usually modeled by a causal and FIR filter and we have the relations between the observations and the sources in the time domain as follow:

N =

Xi (n) = I2>ii (k ) S i (n -k) , \f i = 1, N (1) i=i k=O

where Xi ( n) is the observation at the ith sensor, Si ( n ) is the

jth source, hu (k) is the channel transfer function between

the jth source and the ith sensor. In our model, the noise is assumed to be negligible as well as the sources are stationary.

There are two major approaches of solving the convolutive mixtures using ICA, which all have some advantages and disadvantages [2]. The first is Time Domain (TD) ICA, which is directly applied to the convolutive mixtures. We can separate the mixtures by estimating a set of unmixing FIR filters as expressed by the following equation:

N

Yi(n)= IIwij(k)x/(n-k),\fi=l,N (2) j=i k

The FIR filter which is used in separation can be either noncausal [3] or causal [4] depending on the method. This

achieves a good result once upon the algorithm converges, but it is computationally expensive because of dealing with convolution operations.

The second method is Frequency Domain (FD) ICA, where the convolutive mixtures are first converted to frequency domain, then ICA is applied to each frequency bin, which is seen now as instantaneous mixtures, since convolution in time domain is equal to multiplication in frequency domain. This method is quite simple, but the problem of permutation and scaling is a big challenge since different frequency bands may have different permutation and scaling [5]-[7].

In this paper, Assignment Problem (AP) approach solving the permutation ambiguity problem in the frequency

domain is exploited. Some efficient algorithms have been proposed also. The paper is organized as follows. After introduction in Section I, Section II presents the problem of frequency domain separation. Section TIT presents proposed approach. Section IV shows experimental results and some valuable discussions, and the last section is the conclusions.

II. FREQUENCY DOMAIN ICA

A. Frequency Domain leA Smaragdis [8] has proposed the Short Time Fourier Transform (STFT) method applying for the convolutive mixtures. Using STFT, the time-domain observed signal are

transformed into frequency-domain signals. Let X (n) be a

digital signal and X ( m, T) = ¢ ( m ).x ( m + T R) the windowed

and time-shift version of the signal. The STFT of x( n) IS

given by

K-j X ( OJk, T) = I x( m, T)e-jOJ,m (3)

m=O

where OJk =

2nk

is the discrete normalized frequency, with K

k = O, ... ,K -1 bin index, T = O, .. ,L -1 frame index; ¢ ( m) is

a Hanning (in our case) window of length K and R is the frame shifting interval for this transform. Then, the convolutive BSS problem in the time domain is converted to multiple instantaneous problems with complex valued data in

frequency bins. For a fixed frequency OJk, the instantaneous

linear mixture can be expressed by the equation

978-1-4577-1207-4/11/$26.00 ©2011 IEEE 291

Xi ( Wk , T) = Hij ( Wk ) Sf ( Wk , T) (4)

where, X;(wk),S/(wk),Hij(wk) are the STFT of

Xi (n ) , Sf ( n ) , hij (k ) , at the frequency bin k-th, respectively.

Then the BS S model is converted into the frequency domain:

X(Wk,T)=H(Wk)S(Wk,T) (5)

where H (wk) is the NxN mixing matrix in the frequency

bin k-th, and X(Wk,T),S(Wk,T) are time-frequency

representations of the observed and source signals, respectively. And the estimated signals turned into:

Y ( wk , T) = W ( wk ) X ( wk , T) (6)

where Y=[�'''''YNr , W(Wk) is the NxN un-mixing

matrix in the frequency bin k-th.

B. The Complex ICA algorithm

To estimate the unmixing matrix for each frequency bin, we use the complex version of FastTCA for complex signals, under the instantaneous TCA model [9]. The algorithm uses a

deflation scheme to search the extreme of E{c ( lwHxn} ,

where G is a contrast function. Different choices of G have been suggested in [9]. A good contrast function is one for which the estimator given by the contrast function is more robust to outliers in the sample values. In our work, we

choose G (t ) = log (0.1 + t ) as the contrast function. The

robustness of the estimator is captured in the slow growth of

C ,as its argument increases [9].

Ill. THE PROPOSED METHOD

A. Permutation check procedure

Tn a single frequency bin k-th, the separation matrix

Wk obtained by the ICA process is checked to determine the

possible permutation first and then corrected if necessary. If the resolution in the frequency domain is high enough, the separation matrices in consecutive frequency bins will tend to converge in the same permutation order, which means the step of solving permutation can be avoided [10]. By this way, we have to only solve the permutation problem in a few frequency bins instead of all. The final separation matrix,

which does not have permutation, is denoted by Wk' Tn the

next frequency bin, the TCA step is initialized with �. To compute separation matrices, the iteration can be

performed by starting from the highest frequency that runs slightly faster than starting from the lowest. We employ the

distance between matrices D = Li,jl� ( i,J ) - �-l ( i,J ) 1 as

criteria to determine whether a possible permutation exits

{::::: E => Not permuted D- m -

> E => Possibly permuted

where E is a suitable threshold. If the distance is under the

threshold, no permutation changes will be made, the separated signals in this frequency bin are sent to the rescaling stage where the scaling problem is solved. But if it

is above the threshold, the permutation may be needed. Tn this case, after the rescaling stage, we use AP approach to correct the permutation.

B. Scaling problem

For the scaling problem (in rescaling stage), the method presented in [11] is applied.

After the Complex ICA algorithm is applied, we have an estimated time sequence whose components are mutually

independent for each frequency W

Y ( W, T) = B ( w) X ( W, T) (8)

To solve scaling problem, we disassemble the spectrograms exploiting the independent components at each frequency channel. Let us define split spectrograms by

o

U(w,r;i ) = B(w( 1'; (W,T) (9)

o

where index i denotes the dependence of the spectrograms at

W on the i-th independent component of Y ( W, T) . Note that

implicitly, i is a function of the frequency w , i.e i =i(w). Tn order to obtain the outputs of the rescaling stage

U(w,T; i) , we apply B(w) and B(wt ' and therefore

U ( W, T; i ) does not have an ambiguity of scaling.

C. Permutation problem

The key technique in FD-ICA is how to solve the permutation problem. Ciaramella et al [12] proposed to apply the Assignment Problem approach for solving permutation problem. To solve AP, they used Hungarian algorithm [13]. We note that since the AP is a special case of the Transportation Problem we could use the same algorithms to solve the AP. In the proposed method, we use 10nker­Volgenant (JV) algorithm [14]. In our experiments in the next section, this algorithm has been proven to be uniformly faster and better than the Hungarian.

We considered that the outputs of two adjacent bins as sources in the source set A and destinations in the destination set B in an AP. The task then is to find a perfect matching (one to one) between A and B such that the sum of the weights of the matching is minimized. The weights

correspond to the distance between two elements. Let xi) be

a variable which is I if aj in A maps to b j in Band 0

otherwise and di) is the distance between two elements. The

mathematical model for the AP is given by the following: N N

Minimize f(A,B) = IIdljxlj

subject to N

I Xu = 1, i = 1, N J=1

/=1 J=1

N

I Xu = 1, j = 1, N /=1

(10)

292

xUE{O,l} Vi,} (11)

In the AP, to evaluate the distance between bins the symmetric Kullback-Leibler (KL) divergence is used [15]. The KL divergence is defined between two discrete M-dimensional probability density functions P = [Pl",PM ] and q = [q1 .. ·qM ] as

YI

Y2

M KL(p,q) = LPi log

Pi i=l qi

nIl 1\

V KI11

(12)

Cost matrix

, , , KLll , I

�' ...... ' I

, ... 'nIl \

, , ..

" - ' , " - , ,

\ ' , KLZl I KL22 I

, �, " '1IIIIirI._' ......

Fig. I. Illustration of the adjacent bands KL divergence method (N=2).

Tn this case, the Cost matrix in the AP is an NxN matrix

whose elements is defined as

KLij = KL (p�_ppn Vi,} = I,N (13)

where the lth component of the discrete distribution

p� = [p�,l"" ·P�,M ] was defined as

(14)

where Y; (OJk,l) is the STFT of the ith rescaling extracted

source at the lth frame of the kth frequency bin and M is the total number of frames.

IV. SIMULATION RESULTS AND DISCUSSIONS

Tn this section, some computer simulation experiments are presented to illustrate the proposed method. Experiments are performed on a PC with Intel Core 2 Duo CPU @ 2.40GHz, 1 GB of RAM. In our experiments, we have chosen E=0.2,K=512,R=256 . Tn this section, we use two

methods: the first using the N algorithm combined with a permutation check procedure (The proposed method or the Method 1) and the second using the JV algorithm but without permutation check procedure (the Method 2 used as a method for comparison). The two methods have been compared with Ciaramella's method (the Method 3) [12]. To estimate performances of methods, we use the Source-to­Interferences Ratio (SIR) for time-invariant filters allowed distortions [16].

Tn our initial experiment, we use signals from Dr. Te­Won Lee's home page [17]. These data were recorded in a real environment (All files are in 16 kHz wav-format).

Tn the first mixed signals, a speaker has been recorded with two distance talking microphones in a normal office room with loud music in the background. The distance between the speaker, cassette player and the microphones is about 60cm in a square ordering. The two mixed signals are shown in Fig.2-a. The results obtained after separation using the Method 1 are shown in Fig.2-b.

In the second mixed signals, two speakers have been recorded speaking simultaneously. Speaker 1 says the digits from one to ten in English (one two three ... ) and speaker 2 counts at the same time the digits in Spanish (uno dos tres ... ). The recording has been done in a normal office room. The distance between the speakers and the microphones is about 60cm in a square ordering. The two mixed signals are shown in Fig.2-c. The results obtained after separation using the Method 1 are shown in Fig.2-d.

"" 10"

r��'�lI1lI.�.� '�I.1i'''''1l!J> wl�l{ (a)

>< 10�

(b)

>< 10"

(d) Fig. 2. Results obtainedfrom the first experiment.

In this experiment, the method has to only solve the permutation problem in 376 and 387 frequency bins instead of all 512 bins for the first and second mixed signals, respectively . A bigger threshold E could make the method

run ever faster, but it could also result in the missing of some permuted frequency bins.

Tn the second experiment, the average performance of 15 sets of speech utterances is used to evaluate the performance. For each set the combination of two Vietnamese speeches of about 5 seconds (85,000 samples) is used. The sampling rate is 16 kHz. To create convolutive mixtures of speech signals,

293

simulated room impulse responses are used. For the simulation of the room impulse response, we used the "shoebox" room simulation toolbox avaible in [18]. The source-microphone configuration for the room impulse responses simulation is shown in Fig.3.

Microphones and sources are at 1.2 m height

Source 1

.. --1.5 m-. I �

Mic.11 / - �� ��r_ ��� ___________ _

1 �oo - "- . 13

I . 1"1)________ Source 2

MiC.21 -............... '"

3 I

Room size = 6.25 m x 3.75 m x 2.5 m

Fig. 3. Source-microphone configuration.

In this experiment, we compared the average computation time between the methods. The comparison is shown in Table 1.

TABLE 1. COMPARISON OF AVERAGE COMPUTATION TIME BETWEEN THE METHODS

Method Time (second)

Method 1 6.90

Method 2 12.75

Method 3 13.03

TABLE 2. COMPARISON OF OBTAINED RESULTS BETWEEN THE METHODS

Method Average SIR (-dB)

Method 1 23.28

Method 2 24.09

Method 3 23.46

We also compared the average SIR between the methods. The comparison is shown in Table 2. This shows that there is almost no differences between results obtained from the methods. In this case, the Method 2 yields slightly higher SIR, i.e. about 1.0 dB.

From experimental results, we can fmd that the proposed method, by concentrating on a few frequency bins which could possibly have permutations, can reduces considerably the computation times and meanwhile, it achieved the same average SIR compared with the conventional methods using the AP approach (Method 2 and Method 3).

V. CONCLUSION

In this paper, we have presented a FD-ICA approach, which is the most successful approach up to now, using the

Assigment Problem. Furthermore, by using a simple check procedure, the algorithm has to only solve the permutation problem in a few frequency bins instead of all. From the different experimental results with artificial and real speech data, the proposed method (Method 1) has been proven more efficient on computation than the previous algorithm using the AP approach (Method 3) [12]. We also conclude that the proposed method presents very good separation performance as well as indicates a good potential to be implemented into BSS application in real world situations. Our future work will focus on DSP implementation of blind signal separation using the proposed method.

REFERENCES

[1] Aapo Hyvarinen, Juha Karhunen and Erkkl OJ a, Independent Component Analysis, John Wiley and Sons Ltd, 2001

[2] lBenesty, S.Makino and lChen, Speech Enhancement. Springer, 200S.

[3] Johan Thomas et aI., "Time-domain fast fixed-point algorithms for convolutive ICA," IEEE Sigual Processing Letters, vol. 13, no. 4, pp. 22S-23I , April 2006.

[4] S.Douglas et aI., "Natural gradient multichannel blind deconvolution and speech separation using causal FIR filters," IEEE Transactions on Speech and Audio Processing, vol.l3, issue 1, pp 92-104,2004.

[S] H.Sawada et aI., "A robust and precise method for solving the permutation problem of frequency-domain blind source separation," IEEE Trans. Speech Audio Process, vol.l2, no.S, pp.S30-S3S, Sep.2004.

[6]

[7]

[S]

[9]

M.z.Ikram and D.R.Morgan, "Permutation inconsistency in blind speech separation: Investigation and solutions," IEEE Trans. Speech Audio Process, vol. 13, no.l, pp.I-13, Jan.200S.

AHiroe, "Solution of permutation problem in frequency domain ICA using multivariate probability density functions," in Proc.lCA '06 (LNCS 3889), March 2006, pp 601-60S.

P.Smaragdis,"Blind separation of convolved mixtures in frequency domain," Neurocomputing, No.22, pp. 21-34, 1995.

E.Bingham and A Hyvarinen, "A fast fixed-point algorithm for independent component analysis of complex-valued signals," International Journal of Neural Systems 10, pp.l-S, 2000.

[10] Peng Xie and Steven.L.Grant, "A fast and efficient frequency-domain method for convolutive blind source separation," Region 5 IEEE Conference 2008 Kansas City MO, pp 1-4, April 200S.

[II] N.Mutara, S.lkeda and AZiehe, "An approach to blind source separation based on temporal structure of speech signals," Neurocomputing, 41, 2001, pp.I-24.

[12] ACiaramella and R.Tagliaferri, "Separation of Convolved Mixtures in Frequency Domain ICA," International Mathematical Forum, no.16, pp. 769-79S, 2006.

[13] l Munkres, "Algorithms for the Assignment and Transportation Problems," Journal of the Society of Indust.rial and Applied Mathematics, S(1):32-3S, 19S7 March.

[14] R.Jonker and AVolgenant, "A Shortest Augmenting Path Algorithm for Dense and Sparse Linear Assignment Problems," Computing 38, pp.32S-340,19S7.

[IS] Don.H.Johnson and Sinan Sinanovic, "Symmetrizing the Kullback­Leibler Distance," IEEE Transactions on Information Theory, 2001.

[16] Emmanuel Vincent et aI, "Performance Measurement in Blind Audio Source Separation," IEEE Trans. on Audio,Speech and Language Processing, Vol.l4,No.4, pp 1462-1469, July 2006.

[17] http://cnl.salk.edul-tewonlBlindlblind_audio.html

[IS] Douglas R.Campbell et aI, "Roomsim, a Matlab simulation of shoebox room acoustics for use in teaching and research," III http://mediapaisley. ac. uk! campbelllRoomsiml.

294