統計的独立性と低ランク行列分解理論に基づくブラインド音源分離...
TRANSCRIPT
Simple Violet
Blind source separation based on statistical independence and low-rank matrix decomposition Independent low-rank matrix analysis 3
2016926
1
BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA2
BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA3
audio source separation etc.
1/44
CD
4
2/4
25
FDICA, IVA, ILRMANMFTF NMF
CD
L-chR-ch
2-ch
1-ch
1-ch
CD5
3/46
6
4/4nonnegative matrix factorization: NMF
7
Amplitude
Amplitude()()()Time
: : :
TimeFrequency
Frequency
[Lee, 1999], [Lee, 2000], etc.
NMFNMFTVXTVTVNMF7
BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA8
blind source separation: BSS
BSSICA[Comon, 1994]IVA[Hiroe, 2006], [Kim, 2006]BSS9
State-of-the-art
BSS
BSSBSSICAIVAIVAState-of-the-artBSS9
FDICA10
ICA
1212
Permutation Solver
12
Freq.TimeICAfrequency-domain ICA: FDICA
ICA[Smaragdis, 1998], [Sawada, 2004], [Saruwatari, 2006], etc.
FDICAICA10
FDICADOABSS200611FDICA+DOA [Saruwatari, 2006]DOA
ICADOA
DOA
DOA
DOASource 1Source 2
DOADOA1DOA11
FDICA
ABF [Araki, 2003]ABF0ABFFDICAFDICAICA12
12
1
1
12
FDICAABF [Araki, 2003] 13
BSS ABF TR = 0 msTR = 300 msTR = 0 msTR = 300 ms
BSS2006independent vector analysis: IVA
FDICAICA
14
[Hiroe, 2006], [Kim, 2006], [Kim, 2007]
1IVAICAIVA2214
FDICAIVAscore functionScore function: gradient
IVA
IVA15
15
[Kim, 2007]
IVA16
x1x2x1x2Higher-order correlationHigher-order dependency
16
IVAIVA1
17
17
IVANMFBSS2016independent low-rank matrix analysis: ILRMA
NMF
18FrequencyTime
Frequency
Time
FrequencyTime
FrequencyBasis
BasisTime
IVAILRMA
[Kitamura, 2015], [Kitamura, 2016]
18
NMFNMF111
NMF
19
19
NMFItakura-Saito NMF: ISNMF
NMF20
[Fvotte, 2009]
20
STFT
NMF21
ImaginaryReali.i.d.
0
21
NMF22
Frequency binTime frame
: 0
22
IVANMFBSS2016independent low-rank matrix analysis: ILRMA
IVA23FrequencyTime
Frequency
Time
FrequencyTime
FrequencyBasis
BasisTime
IVAILRMA
23
ILRMAILRMA
ILRMAIVA ILRMAIVAIVA1ILRMAIVA1ILRMANMFgiven24
ISNMFIVA
24
ILRMAILRMA [Kitamura, 2016]NMF
25
, 1 0
25
ILRMA
26
IVA
NMFNMF
IVANMF26
BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA27
NMFNMFNMF
28Ozerov and Fevotte, 2010 NMFEM ,NMFArberet et al., 2010 NMFEM ,NMFOzerov et al., 2011 NMFEM ,NMFSawada et al., 2013NMF , NMFKitamura et al., 20161NMF ,NMF
NMFNMFNMFRsTVNMFRs1Rs1RsW28
NMF [Sawada, 2013]
NMF29
TimeFrequency
TimeFrequency
TimeFrequency
TimeFrequency
TimeFrequency
29
[Duong, 2010]Duong model
30
Source image
Wiener filter
30
1
1
31
1
31
1111
32
: :
A11NMF32
NMF1
1NMFILRMANMFILRMA33
1
2.
3.
ILRMA
33
IVANMFILRMAIVANMFIVANMFNMF1NMFIVANMF
34
IVA
NMF
NMF
34
FDICA1
ICAIVA11
NMFNMF
NMFILRMA1NMF
NMF
35
BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA36
ILRMAILRMANMFIVA1NMF1NMF11137
38 SiSECRWCP 22FFT 512 ms 128 ms (1/4) 130ILRMA160ILRMA2 SDR
2 m
Source 1
5.66 cm50
50
Source 2
2 m
Source 1
5.66 cm60
60
Source 2Impulse response E2A(reverberation time: 300 ms)Impulse response JR2(reverberation time: 470 ms)
2300msSDR38
fort_minor-remember_the_name39
Sawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clusteringSawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering
Violin synth.
Vocals
Violin synth.
VocalsE2A300msJR2470ms
ultimate_nz_tour40E2A300msJR2470msSawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering
Sawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering
Guitar
Synth.
Guitar
Synth.
Ozerovs MNMF with random initialization
ultimate_nz_tour41
IVAOzerovs MNMFProposed method w/o partitioning functionProposed method with partitioning functionSawadas MNMF initialized by proposed methodSawadas MNMF
GuitarSynth.
NMF42 SiSEC22FFT 256 ms 128 ms (1/4) 12ILRMA14ILRMA2 SDR
Number of bases for each source ( )
Number of bases for each source ( )
Speaker 1Speaker 2
42
female3_liverec_1m43130ms250msSawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering
Sawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering
Speaker 1
Speaker 2
Speaker 1
Speaker 2
male3_liverec_1m44130ms250ms(a)Sawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering
Sawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering
Speaker 1
Speaker 2
Speaker 1
Speaker 2
NMF1
45
SiSEC: bearlin-roads__snip_85_9914 s16 kHz: acoustic_guit_main, bass, vocals3: MATLAB 8.3, Intel Core i7-4790 (3.6 GHz): 200
46IVAMNMFILRMA()ILRMA()91.64498.4121.0173.4
s
200MNMF
1 47
147
BSSPCAPCA1NMF
48
Mixing:
BSS[Kitamura, 2015]
24BSSPCA1NMF48
1NMFNMF
49
1
11222:
1NMFT49
50 SiSECRWCP 22 PCAIVA, PCA1NMF NMF1NMFFFT 128 ms 64 ms (1/2) 130 SDR
JR2: 470 ms
2 m
80
60
122.83 cm
24470ms128ms150
: ultimate nz tour, guitar and vocal5110
PCA + 2ch IVAPCA +2ch proposed method4ch proposed method with basis sharing4ch multichannel NMF
PCA + 2ch IVAPCA + 2ch proposed method4ch multichannel NMF4ch proposed method with basis sharing53.8 s67.6 s8307.1 s330.97 s
: 200
PCAIVA1NMFNMF1NMFPCANMFNMF51
BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA52
IVANMFILRMAFDICAIVAILRMANMFILRMAIVANMF1ILRMA1ILRMAIVA53
1/3[Lee, 1999]: D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol. 401, pp. 788791, 1999.[Lee, 2000]: D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, in Proc. Adv. Neural Inform. Process. Syst., 2000, vol. 13, pp. 556562.[Smaragdis, 1998]: P. Smaragdis, Blind separation of convolved mixtures in the frequency domain, Neurocomputing, vol. 22, pp. 2134, 1998.[Sawada, 2004]: H. Sawada, R. Mukai, S. Araki, and S.Makino, Convolutive blind source separation for more than two sources in the frequency domain, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2004, pp. III-885III-888.[Saruwatari, 2006]: H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, and K. Shikano, Blind source separation based on a fast-convergence algorithm combining ICA and beamforming, IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 2, pp. 666678, Mar. 2006.[Araki, 2003]: S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, and H. Saruwatari, Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutive mixtures, EURASIP Journal on Advances in Signal Process., vol. 2003, no. 11, pp. 110, 2003.[Hiroe, 2006]: A. Hiroe, Solution of permutation problem in frequency domain ICA using multivariate probability density functions, in Proc. Int. Conf. Independent Compon. Anal. Blind Source Separation, 2006, pp. 601608.54
2/3[Kim, 2006]: T. Kim, T. Eltoft, and T.-W. Lee, Independent vector analysis: An extension of ICA to multivariate components, in Proc. Int. Conf. Independent Compon. Anal. Blind Source Separation, 2006, pp. 165172.[Kim, 2007]: T. Kim, H. T. Attias, S.-Y. Lee, and T.-W. Lee, Blind source separation exploiting higher-order frequency dependencies, IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 7079, 2007.[Kitamura, 2015]: D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 276280.[Kitamura, 2016]: D. Kitamura, H. Saruwatari, H. Kameoka, Y. Takahashi, K. Kondo and S. Nakamura, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 9, pp. 16261641, Spt. 2016.[Fvotte, 2009]: C. Fvotte, N. Bertin, and J.-L.Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis, Neural Comput., vol. 21, no. 3, pp. 793830, 2009.[Sawada, 2013]: H. Sawada, H.Kameoka, S.Araki, and N. Ueda, Multichannel extensions of non-negative matrix factorization with complex-valued data, IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 971982, May 2013.55
3/3[Duong, 2010]: N. Q. K. Duong, E. Vincent, and R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model, IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp. 18301840, Sep. 2010.[Kitamura, 2015]: D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, Relaxation of rank-1 spatial constraint in overdetermined blind source separation, in Proc. Eur. Signal Process. Conf., 2015, pp. 12711275.
56