single channel speech music separation using nonnegative matrixfactorization and spectral masks

16
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. Grais Hakan Erdogan 17 th International Conference on Digital Signal Processing,2011

Upload: bonner

Post on 23-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS. Emad M. Grais. Hakan Erdogan. 17 th International Conference on Digital Signal Processing,2011. Jain- De,Lee. Outline. INTRODUCTION NON-NEGATIVE MATRIX FACTORIZATION - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Jain-De,Lee

Emad M. Grais Hakan Erdogan

17th International Conference on Digital Signal Processing,2011

Page 2: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Outline INTRODUCTION

NON-NEGATIVE MATRIX FACTORIZATION

SIGNAL SEPARATION AND MASKING

EXPERIMENTS AND DISCUSSION

CONCLUSION

Page 3: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Introduction There are two main stages of this work

– Training stage– Separation stage

Using NMF with different types of masks to improve the separation process

– The separation process faster– NMF with fewer iterations

Page 4: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Introduction Problem formulation

– The observe a signal x(t) ,which is the mixture of two sources s(t) and m(t)

– Assume the sources have the same phase angle as the mixed

),(),(),( ),(),(),(

),(),(),(ftMjftSjftXj eftMeftSeftX

ftMftSftX

Where (t , f) be the STFT of x(t)

X = S + M

Page 5: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Non-negative Matrix Factorization

Non-negative matrix factorization algorithm

Minimization problem

Different cost functions C of NMF– Euclidean distance– KL divergence

BWV

),(min,

BWVCWB

subject to elements of B,W 0≧

Page 6: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Non-negative Matrix Factorization

The magnitude spectrogram S and M are calculated by NMF

Larger number of basis vectors– Lower approximation error– Redundant set of basis– Require more computation time

musicmusicTrain

speechspeechTrain

WBM

WBS

Page 7: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Signal Separation and Masking

The NMF is used decompose the magnitude spectrogram matrix X

The initial spectrograms estimates for speech and music signals are respectively calculated as follows

WBBX musicspeech ][

Mmusic

Sspeech

WBM

WBS

~

~

Where WS and WM are submatrices in matrix W

Page 8: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Signal Separation and Masking

Use the initial estimated spectrograms and to build a mask as follows

Source signals reconstruction

S~ M~

PP

P

MSSH ~~~

XHM

XHS

)1(ˆ

ˆ

Where 1 is a matrix of ones is element-wise multiplication

Page 9: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Signal Separation and Masking

Two specific values of p correspond to special masks– Wiener filter(soft mask)

– Hard mask

22

2

~~~

MSSHWiener

)~~~

(22

2

MSSroundH hard

Page 10: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Signal Separation and Masking

The value of the mask versus the linear ratio for different values of p

Page 11: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Experiments and Discussion Simulation

– 16kHz sampling rate– Speech

• Training speech data-540 short utterances• Testing speech data-20 utterances

– Music• 38 pieces for training• one piece for testing

– Hamming window-512 point– FFT size-512 point

Page 12: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Experiments and Discussion

Page 13: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Experiments and Discussion

Page 14: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Experiments and Discussion

Page 15: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Experiments and Discussion

Page 16: SINGLE CHANNEL SPEECH MUSIC SEPARATION USING  NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Conclusion The family of masks have a parameter to control the

saturation level

The proposed algorithm gives better results and facilitates to speed up the separation process