[ieee 2011 international conference on advanced technologies for communications (atc 2011) - da...
TRANSCRIPT
2011 International Conference on Advanced Technologies for Communications (ATC 2011)
Assignment Problem-based Approach for Solving
Permutation Ambiguity in Frequency Domain
Convolutive Source Separation Vuong Hoang Naml), Nguyen Quoc Trung1), Tran Hoai Linh2)
1) School of Electronics and Telecommunications 2) School of Electrical Engineering
Hanoi University of Science and Technology
No.1, Dai Co Viet Street, Hanoi, Vietnam
E-mail: [email protected]
Abstract: In this paper, we propose an effective method for
blind source separation of convolutive mixtures in the
frequency domain. The main difficulty in a frequency approach
is the permutation problem. In the proposed method, we use the
Assignment Problem (AP) approach to solve permutation
ambiguity in the frequency domain. In our work, we apply
different algorithms to solve the AP to find the optimal solution.
Furthermore, by using a simple check procedure, the algorithm
has to only solve the permutation problem in a few frequency
bins instead of all. This method has been proven more efficient
on computation than the previous methods using the AP
approach. Computer simulation experiments with speech data
are presented to illustrate the proposed method.
I. INTRODUCTION
Blind Source Separation (BSS) is an approach to estimate original source signals by using only the information of the mixed signals observed at a sensor array. If source signals are mutually independent and non-Gaussian, we can apply the technique of Independent Component Analysis (ICA) to solve a BSS problem [1]. Let us formulate the BSS model of convolutive mixtures. Suppose N orginal sources are blindly mixed and observed at N sensors. In our case and in most ICA applications, the channel transfer function is usually modeled by a causal and FIR filter and we have the relations between the observations and the sources in the time domain as follow:
N =
Xi (n) = I2>ii (k ) S i (n -k) , \f i = 1, N (1) i=i k=O
where Xi ( n) is the observation at the ith sensor, Si ( n ) is the
jth source, hu (k) is the channel transfer function between
the jth source and the ith sensor. In our model, the noise is assumed to be negligible as well as the sources are stationary.
There are two major approaches of solving the convolutive mixtures using ICA, which all have some advantages and disadvantages [2]. The first is Time Domain (TD) ICA, which is directly applied to the convolutive mixtures. We can separate the mixtures by estimating a set of unmixing FIR filters as expressed by the following equation:
N
Yi(n)= IIwij(k)x/(n-k),\fi=l,N (2) j=i k
The FIR filter which is used in separation can be either noncausal [3] or causal [4] depending on the method. This
achieves a good result once upon the algorithm converges, but it is computationally expensive because of dealing with convolution operations.
The second method is Frequency Domain (FD) ICA, where the convolutive mixtures are first converted to frequency domain, then ICA is applied to each frequency bin, which is seen now as instantaneous mixtures, since convolution in time domain is equal to multiplication in frequency domain. This method is quite simple, but the problem of permutation and scaling is a big challenge since different frequency bands may have different permutation and scaling [5]-[7].
In this paper, Assignment Problem (AP) approach solving the permutation ambiguity problem in the frequency
domain is exploited. Some efficient algorithms have been proposed also. The paper is organized as follows. After introduction in Section I, Section II presents the problem of frequency domain separation. Section TIT presents proposed approach. Section IV shows experimental results and some valuable discussions, and the last section is the conclusions.
II. FREQUENCY DOMAIN ICA
A. Frequency Domain leA Smaragdis [8] has proposed the Short Time Fourier Transform (STFT) method applying for the convolutive mixtures. Using STFT, the time-domain observed signal are
transformed into frequency-domain signals. Let X (n) be a
digital signal and X ( m, T) = ¢ ( m ).x ( m + T R) the windowed
and time-shift version of the signal. The STFT of x( n) IS
given by
K-j X ( OJk, T) = I x( m, T)e-jOJ,m (3)
m=O
where OJk =
2nk
is the discrete normalized frequency, with K
k = O, ... ,K -1 bin index, T = O, .. ,L -1 frame index; ¢ ( m) is
a Hanning (in our case) window of length K and R is the frame shifting interval for this transform. Then, the convolutive BSS problem in the time domain is converted to multiple instantaneous problems with complex valued data in
frequency bins. For a fixed frequency OJk, the instantaneous
linear mixture can be expressed by the equation
978-1-4577-1207-4/11/$26.00 ©2011 IEEE 291
Xi ( Wk , T) = Hij ( Wk ) Sf ( Wk , T) (4)
where, X;(wk),S/(wk),Hij(wk) are the STFT of
Xi (n ) , Sf ( n ) , hij (k ) , at the frequency bin k-th, respectively.
Then the BS S model is converted into the frequency domain:
X(Wk,T)=H(Wk)S(Wk,T) (5)
where H (wk) is the NxN mixing matrix in the frequency
bin k-th, and X(Wk,T),S(Wk,T) are time-frequency
representations of the observed and source signals, respectively. And the estimated signals turned into:
Y ( wk , T) = W ( wk ) X ( wk , T) (6)
where Y=[�'''''YNr , W(Wk) is the NxN un-mixing
matrix in the frequency bin k-th.
B. The Complex ICA algorithm
To estimate the unmixing matrix for each frequency bin, we use the complex version of FastTCA for complex signals, under the instantaneous TCA model [9]. The algorithm uses a
deflation scheme to search the extreme of E{c ( lwHxn} ,
where G is a contrast function. Different choices of G have been suggested in [9]. A good contrast function is one for which the estimator given by the contrast function is more robust to outliers in the sample values. In our work, we
choose G (t ) = log (0.1 + t ) as the contrast function. The
robustness of the estimator is captured in the slow growth of
C ,as its argument increases [9].
Ill. THE PROPOSED METHOD
A. Permutation check procedure
Tn a single frequency bin k-th, the separation matrix
Wk obtained by the ICA process is checked to determine the
possible permutation first and then corrected if necessary. If the resolution in the frequency domain is high enough, the separation matrices in consecutive frequency bins will tend to converge in the same permutation order, which means the step of solving permutation can be avoided [10]. By this way, we have to only solve the permutation problem in a few frequency bins instead of all. The final separation matrix,
which does not have permutation, is denoted by Wk' Tn the
next frequency bin, the TCA step is initialized with �. To compute separation matrices, the iteration can be
performed by starting from the highest frequency that runs slightly faster than starting from the lowest. We employ the
distance between matrices D = Li,jl� ( i,J ) - �-l ( i,J ) 1 as
criteria to determine whether a possible permutation exits
{::::: E => Not permuted D- m -
> E => Possibly permuted
where E is a suitable threshold. If the distance is under the
threshold, no permutation changes will be made, the separated signals in this frequency bin are sent to the rescaling stage where the scaling problem is solved. But if it
is above the threshold, the permutation may be needed. Tn this case, after the rescaling stage, we use AP approach to correct the permutation.
B. Scaling problem
For the scaling problem (in rescaling stage), the method presented in [11] is applied.
After the Complex ICA algorithm is applied, we have an estimated time sequence whose components are mutually
independent for each frequency W
Y ( W, T) = B ( w) X ( W, T) (8)
To solve scaling problem, we disassemble the spectrograms exploiting the independent components at each frequency channel. Let us define split spectrograms by
o
U(w,r;i ) = B(w( 1'; (W,T) (9)
o
where index i denotes the dependence of the spectrograms at
W on the i-th independent component of Y ( W, T) . Note that
implicitly, i is a function of the frequency w , i.e i =i(w). Tn order to obtain the outputs of the rescaling stage
U(w,T; i) , we apply B(w) and B(wt ' and therefore
U ( W, T; i ) does not have an ambiguity of scaling.
C. Permutation problem
The key technique in FD-ICA is how to solve the permutation problem. Ciaramella et al [12] proposed to apply the Assignment Problem approach for solving permutation problem. To solve AP, they used Hungarian algorithm [13]. We note that since the AP is a special case of the Transportation Problem we could use the same algorithms to solve the AP. In the proposed method, we use 10nkerVolgenant (JV) algorithm [14]. In our experiments in the next section, this algorithm has been proven to be uniformly faster and better than the Hungarian.
We considered that the outputs of two adjacent bins as sources in the source set A and destinations in the destination set B in an AP. The task then is to find a perfect matching (one to one) between A and B such that the sum of the weights of the matching is minimized. The weights
correspond to the distance between two elements. Let xi) be
a variable which is I if aj in A maps to b j in Band 0
otherwise and di) is the distance between two elements. The
mathematical model for the AP is given by the following: N N
Minimize f(A,B) = IIdljxlj
subject to N
I Xu = 1, i = 1, N J=1
/=1 J=1
N
I Xu = 1, j = 1, N /=1
(10)
292
xUE{O,l} Vi,} (11)
In the AP, to evaluate the distance between bins the symmetric Kullback-Leibler (KL) divergence is used [15]. The KL divergence is defined between two discrete M-dimensional probability density functions P = [Pl",PM ] and q = [q1 .. ·qM ] as
YI
Y2
M KL(p,q) = LPi log
Pi i=l qi
nIl 1\
V KI11
(12)
Cost matrix
, , , KLll , I
�' ...... ' I
, ... 'nIl \
, , ..
" - ' , " - , ,
\ ' , KLZl I KL22 I
, �, " '1IIIIirI._' ......
Fig. I. Illustration of the adjacent bands KL divergence method (N=2).
Tn this case, the Cost matrix in the AP is an NxN matrix
whose elements is defined as
KLij = KL (p�_ppn Vi,} = I,N (13)
where the lth component of the discrete distribution
p� = [p�,l"" ·P�,M ] was defined as
(14)
where Y; (OJk,l) is the STFT of the ith rescaling extracted
source at the lth frame of the kth frequency bin and M is the total number of frames.
IV. SIMULATION RESULTS AND DISCUSSIONS
Tn this section, some computer simulation experiments are presented to illustrate the proposed method. Experiments are performed on a PC with Intel Core 2 Duo CPU @ 2.40GHz, 1 GB of RAM. In our experiments, we have chosen E=0.2,K=512,R=256 . Tn this section, we use two
methods: the first using the N algorithm combined with a permutation check procedure (The proposed method or the Method 1) and the second using the JV algorithm but without permutation check procedure (the Method 2 used as a method for comparison). The two methods have been compared with Ciaramella's method (the Method 3) [12]. To estimate performances of methods, we use the Source-toInterferences Ratio (SIR) for time-invariant filters allowed distortions [16].
Tn our initial experiment, we use signals from Dr. TeWon Lee's home page [17]. These data were recorded in a real environment (All files are in 16 kHz wav-format).
Tn the first mixed signals, a speaker has been recorded with two distance talking microphones in a normal office room with loud music in the background. The distance between the speaker, cassette player and the microphones is about 60cm in a square ordering. The two mixed signals are shown in Fig.2-a. The results obtained after separation using the Method 1 are shown in Fig.2-b.
In the second mixed signals, two speakers have been recorded speaking simultaneously. Speaker 1 says the digits from one to ten in English (one two three ... ) and speaker 2 counts at the same time the digits in Spanish (uno dos tres ... ). The recording has been done in a normal office room. The distance between the speakers and the microphones is about 60cm in a square ordering. The two mixed signals are shown in Fig.2-c. The results obtained after separation using the Method 1 are shown in Fig.2-d.
"" 10"
r��'�lI1lI.�.� '�I.1i'''''1l!J> wl�l{ (a)
>< 10�
(b)
>< 10"
(d) Fig. 2. Results obtainedfrom the first experiment.
In this experiment, the method has to only solve the permutation problem in 376 and 387 frequency bins instead of all 512 bins for the first and second mixed signals, respectively . A bigger threshold E could make the method
run ever faster, but it could also result in the missing of some permuted frequency bins.
Tn the second experiment, the average performance of 15 sets of speech utterances is used to evaluate the performance. For each set the combination of two Vietnamese speeches of about 5 seconds (85,000 samples) is used. The sampling rate is 16 kHz. To create convolutive mixtures of speech signals,
293
simulated room impulse responses are used. For the simulation of the room impulse response, we used the "shoebox" room simulation toolbox avaible in [18]. The source-microphone configuration for the room impulse responses simulation is shown in Fig.3.
Microphones and sources are at 1.2 m height
Source 1
.. --1.5 m-. I �
Mic.11 / - �� ��r_ ��� ___________ _
1 �oo - "- . 13
I . 1"1)________ Source 2
MiC.21 -............... '"
3 I
Room size = 6.25 m x 3.75 m x 2.5 m
Fig. 3. Source-microphone configuration.
In this experiment, we compared the average computation time between the methods. The comparison is shown in Table 1.
TABLE 1. COMPARISON OF AVERAGE COMPUTATION TIME BETWEEN THE METHODS
Method Time (second)
Method 1 6.90
Method 2 12.75
Method 3 13.03
TABLE 2. COMPARISON OF OBTAINED RESULTS BETWEEN THE METHODS
Method Average SIR (-dB)
Method 1 23.28
Method 2 24.09
Method 3 23.46
We also compared the average SIR between the methods. The comparison is shown in Table 2. This shows that there is almost no differences between results obtained from the methods. In this case, the Method 2 yields slightly higher SIR, i.e. about 1.0 dB.
From experimental results, we can fmd that the proposed method, by concentrating on a few frequency bins which could possibly have permutations, can reduces considerably the computation times and meanwhile, it achieved the same average SIR compared with the conventional methods using the AP approach (Method 2 and Method 3).
V. CONCLUSION
In this paper, we have presented a FD-ICA approach, which is the most successful approach up to now, using the
Assigment Problem. Furthermore, by using a simple check procedure, the algorithm has to only solve the permutation problem in a few frequency bins instead of all. From the different experimental results with artificial and real speech data, the proposed method (Method 1) has been proven more efficient on computation than the previous algorithm using the AP approach (Method 3) [12]. We also conclude that the proposed method presents very good separation performance as well as indicates a good potential to be implemented into BSS application in real world situations. Our future work will focus on DSP implementation of blind signal separation using the proposed method.
REFERENCES
[1] Aapo Hyvarinen, Juha Karhunen and Erkkl OJ a, Independent Component Analysis, John Wiley and Sons Ltd, 2001
[2] lBenesty, S.Makino and lChen, Speech Enhancement. Springer, 200S.
[3] Johan Thomas et aI., "Time-domain fast fixed-point algorithms for convolutive ICA," IEEE Sigual Processing Letters, vol. 13, no. 4, pp. 22S-23I , April 2006.
[4] S.Douglas et aI., "Natural gradient multichannel blind deconvolution and speech separation using causal FIR filters," IEEE Transactions on Speech and Audio Processing, vol.l3, issue 1, pp 92-104,2004.
[S] H.Sawada et aI., "A robust and precise method for solving the permutation problem of frequency-domain blind source separation," IEEE Trans. Speech Audio Process, vol.l2, no.S, pp.S30-S3S, Sep.2004.
[6]
[7]
[S]
[9]
M.z.Ikram and D.R.Morgan, "Permutation inconsistency in blind speech separation: Investigation and solutions," IEEE Trans. Speech Audio Process, vol. 13, no.l, pp.I-13, Jan.200S.
AHiroe, "Solution of permutation problem in frequency domain ICA using multivariate probability density functions," in Proc.lCA '06 (LNCS 3889), March 2006, pp 601-60S.
P.Smaragdis,"Blind separation of convolved mixtures in frequency domain," Neurocomputing, No.22, pp. 21-34, 1995.
E.Bingham and A Hyvarinen, "A fast fixed-point algorithm for independent component analysis of complex-valued signals," International Journal of Neural Systems 10, pp.l-S, 2000.
[10] Peng Xie and Steven.L.Grant, "A fast and efficient frequency-domain method for convolutive blind source separation," Region 5 IEEE Conference 2008 Kansas City MO, pp 1-4, April 200S.
[II] N.Mutara, S.lkeda and AZiehe, "An approach to blind source separation based on temporal structure of speech signals," Neurocomputing, 41, 2001, pp.I-24.
[12] ACiaramella and R.Tagliaferri, "Separation of Convolved Mixtures in Frequency Domain ICA," International Mathematical Forum, no.16, pp. 769-79S, 2006.
[13] l Munkres, "Algorithms for the Assignment and Transportation Problems," Journal of the Society of Indust.rial and Applied Mathematics, S(1):32-3S, 19S7 March.
[14] R.Jonker and AVolgenant, "A Shortest Augmenting Path Algorithm for Dense and Sparse Linear Assignment Problems," Computing 38, pp.32S-340,19S7.
[IS] Don.H.Johnson and Sinan Sinanovic, "Symmetrizing the KullbackLeibler Distance," IEEE Transactions on Information Theory, 2001.
[16] Emmanuel Vincent et aI, "Performance Measurement in Blind Audio Source Separation," IEEE Trans. on Audio,Speech and Language Processing, Vol.l4,No.4, pp 1462-1469, July 2006.
[17] http://cnl.salk.edul-tewonlBlindlblind_audio.html
[IS] Douglas R.Campbell et aI, "Roomsim, a Matlab simulation of shoebox room acoustics for use in teaching and research," III http://mediapaisley. ac. uk! campbelllRoomsiml.
294