sparse bss in the large-scale regime -...

Sparse BSS in the large-scale regimeChristophe Kervazo, Jérôme Bobin, Cécile Chenot

Dictionary learning on manifolds workshop - September, 4th 2017

Blind Source Separation (BSS)

2

X = AS + N

Applications: astrophysical data, spectroscopic data…

X : m rows observations and t samples columns A : mixing matrix (m x n) S : sources (n x t) N : noise and model imperfections (m x t)

Goal: estimate A and S from X

Examples of sources and observations in spectroscopy:

X1 X2


State of art

3

Ill-posed unsupervised matrix factorization problem [1]

Requires prior information on A and S. Several families: - sources assumed statistically independent (e.g. ICA [1]) - sources assumed sparse [2] (e.g. GMCA [3]) - A and S non-negative (e.g. HALS [4])

Some applications with a high number of sources: - spectroscopy - hyperspectral imaging - very close to dictionary learning - …

[1] P. Comon, C. Jutten, Handbook of Blind Source Separation: Independent component analysis and applications, Academic press, 2010.[2] M. Zibulevsky, B. A. Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary, Neural computation 13 (4) (2001) 863–882. [3] J. Bobin, J.-L. Starck, Y. Moudden, M. J. Fadili, Blind source separation: The sparsity revolution, Advances in Imaging and Electron Physics 152 (1) (2008) 221–302. [4] N. Gillis, F. Glineur, Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization, Neural computation 24 (4) (2012) 1085–1105.


Goal

4

Investigate block-based optimization strategies

Problem: the performances decline with a high number of sources n:

Thorough explanation of the results

Dictionary learning on manifolds workshop - September, 4th 2017 5

Optimization problem (1/2)

minA,S

1

2kX�ASk2F + J (A) + G(S)

Write the problem as an optimization problem [3]:

Data fidelity term Enforces constraints on A

Enforces constraints on S

If J Gand are convex, multi-convex problem => use of blocks [5].Non-convex

[3] J. Bobin, J.-L. Starck, Y. Moudden, M. J. Fadili, Blind source separation: The sparsity revolution, Advances in Imaging and Electron Physics 152 (1) (2008) 221–302.[5] Y. Xu, W. Yin, A globally convergent algorithm for nonconvex optimization based on block coordinate update, arXiv preprint arXiv:1410.1386.


Optimization problem (2/2)

6

=X

A

S

- full size blocks: whole matrices A and S [3]. High dimensional problem and slower

- hierarchical or deflation methods: blocks of size 1 [4]. Simpler and faster but error propagation

- bGMCA (here with r = 3) In-between approach: intermediate block sizes r => simpler problem without dramatic error propagations.


PALM step

7

Proximal Alternating Linearized Minimization (PALM) [6] algorithm: - Fast - Convergence to a stationary point

S(k)I = prox �G(.)��A

(k�1)T

IA

(k�1)I

��2

0

@S(k�1)I � ��A(k�1)T

I A(k�1)I

��2

A(k�1)T

I (A(k�1)S(k�1) �X)

1

A

A(k)I = prox �J (.)��S

(k)I

S(k)T

I

��2

0

@A(k�1)I � ��S(k)

I S(k)T

I

��2

(A(k�1)S(k) �X)S(k)T

I

1

A

. Choose which indices I to update

. Update the corresponding sub-matrix SI

. Update the corresponding sub-matrix AI

While not converged:

minA,S

1

2kX�ASk2F + J (A) + G(S)

[6] J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming 146 (1-2) (2014) 459-494.


Studied penalizations

8

Use of the proximal operators of J and G

Studied penalizations:

G(S) =��⇤S � S�T

S

��1- l1 sparsity constraint:

- l1 sparsity constraint in a transformed domain and non-negativity:

- For S:

- For A:- Oblique constraint:

Proximal operator: non-explicit prox => approximation or GFB [8]G(S) = k⇤S �

⇣S�S

T⌘k`1 + ◆{8j,k;S[j,k]�0}(S)

Proximal operator: soft-thresholding

J (A) = ◆{8i;kAik221}(A)

Proximal operator: projection on the l2 unit ball

- Oblique and non-negativity constraints:J (A) = ◆8i;kAik221(A) + ◆8i,j;A[i,j]�0(A)

Proximal operator: projection on the positive orthant and on the l2 unit ball

[7]

[7] N. Parikh, S. Boyd, et al., Proximal algorithms, Foundations and Trend in Optimization 1 (3) (2014) 127–239. [8] H. Raguet, J. Fadili, G. Peyré, A generalized forward-backward splitting, SIAM Journal on Imaging Sciences 6 (3) (2013) 1199–1226.


Warm-up stage: GMCA [3]

9

PALM suffers from a lack of robustness w.r.t.: - a bad initialization - the threshold choice - often converge to local minima

GMCA - is robust - comprehends an automatic thresholding strategy - not guaranteed to converge

Build a 2-stage minimization procedure:

- Warm-up stage: GMCA - Refinement stage: PALM


RI = X �AICSIC

Warm-up stage: GMCA

10

GMCA with blocks:

. Random initialization for AFor each iteration (k) from 0 to kmax:

. Choose which indices I to update, compute the residual

. Update the corresponding sub-matrix SI

. Update the corresponding sub-matrix AI

S(k)I = proxG(.)(A

(k�1)†I RI )

A(k)I = proxJ (.)(RIS

(k)†I )

Heuristic: at each iteration, use a decreasing new threshold (better unmixing, increased robustness to noise and spurious local minima [3])


Number of iterations

11

- Given number of iterations for the warm-up stage (no convergence)

- Stopping criterion for the refinement step:

� =

Pj2[1,n]

��A(k)j �A(k�1)

j

��1

n

Based on the evolution of the angle of the columns between 2 iterations.

The refinement step stops when � > ⌧


Modeling the behavior of the bGMCA algorithm

12

Focus on the warm-up stage.

RI = A⇤IS

⇤I

RI = X�AICSIC = A⇤IS

⇤I + E +N

E used to model the estimation error

Using a first order expansion on RI = X�AICSIC we get:

- Interference term: leakage of the true sources that are outside the current block

- Interference + artefact term: originates from the error on the sources outside the block

Tradeoff when using small-size blocks: - simpler problem - implies larger errors E

E = (A⇤IC �AIC )S⇤

IC�AIC ✏IC

Experiments on simulated matrices


Experimental setting and metric

14

Source matrix S: sparse in the direct domain: - n sources - t = 1000 samples - p = 10% of the samples are non-zero. Their amplitudes follow a standard

normal distribution

Source matrix A: drawn following a standard normal distribution: - m observations. Here, m = n - Modified to have a given condition number Cd

Metric CA [9]:CA = median(|PA†A⇤|� Id)

- A*: true mixing matrix - A: solution given by the algorithm - P: correction for the permutation and scale factors indeterminacies

The experiments are performed 25 times and the median displayed

[9] J. Bobin, J. Rapin, A. Larue, J.-L. Starck, Sparsity and adaptivity for the blind separation of partially correlated sources., IEEE Trans. Signal Processing 63 (5) (2015) 1199–1213.


Study of the impact of r and n

15

Evolution of the results as a function of the block size r for different numbers of sources n

Error propagation

More difficult problem

Trade-off => optimum


Study of the impact of Cd

16

Evolution of the results as a function of the block size r with different condition numbers Cd of A


Study of the impact of p

17

Evolution of the results as a function of the block size r with different sparsity levels of the sources p


Complexity and number of iterations

18

r[mt+ rm+ r2 + tpcondition of S]

Complexity of one iteration:

Number of iterations:Intuitively, proportional to dn/retimes the number of iterations of GMCA

With r the block size

=> Gain in computation time

Realistic sources


Experimental setting

20

Simulated LC - 1H NMR experiment

Identify each of the chemicals of a mixture of chemical compounds

S: 40 chemicals, t = 10 000 samples

Sources taken on the SDBS [10] database and convolved with a Laplacian

A: simulates Gaussian-shaped elution times (m,n) = (320,40). Constructed in 2 parts, the second one corresponding to more correlated elution times

Monte-Carlo simulations by randomly assigning the sources and the columns of A

[10] N. I. of Advanced Industrial Science, T. (AIST), Spectral database for organic compounds.

Dictionary learning on manifolds workshop - September, 4th 2017 21

Study of the impact of r2 main differences with the previous experiment: - the sources are sparse in the wavelet domain - non-negativity on A and S (in the direct domain) is enforced


Conclusion

22

- bGMCA: retrieve a high number of sources

- Block-based optimization strategy

- 2 stage minimization procedure

- Enhances the separation performances (can be perfect)

- Significant decrease of the computational cost


References

23

[1] P. Comon, C. Jutten, Handbook of Blind Source Separation: Independent component analysis and applications, Academic press, 2010.

[2] M. Zibulevsky, B. A. Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary, Neural computation 13 (4) (2001) 863–882.

[3] J. Bobin, J.-L. Starck, Y. Moudden, M. J. Fadili, Blind source separation: The sparsity revolution, Advances in Imaging and Electron Physics 152 (1) (2008) 221–302.

[4] N. Gillis, F. Glineur, Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization, Neural computation 24 (4) (2012) 1085–1105.

[5] Y. Xu, W. Yin, A globally convergent algorithm for nonconvex optimization based on block coordinate update, arXiv preprint arXiv:1410.1386.

[6] J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming 146 (1-2) (2014) 459-494.

[7] N. Parikh, S. Boyd, et al., Proximal algorithms, Foundations and Trend in Optimization 1 (3) (2014) 127–239.

[8] H. Raguet, J. Fadili, G. Peyré, A generalized forward-backward splitting, SIAM Journal on Imaging Sciences 6 (3) (2013) 1199–1226.

[9] J. Bobin, J. Rapin, A. Larue, J.-L. Starck, Sparsity and adaptivity for the blind separation of partially correlated sources., IEEE Trans. Signal Processing 63 (5) (2015) 1199–1213.

[10] N. I. of Advanced Industrial Science, T. (AIST), Spectral database for organic compounds.

sparse bss in the large-scale regime -...

Documents