sparse bss in the large-scale regime -...
TRANSCRIPT
Sparse BSS in the large-scale regimeChristophe Kervazo, Jérôme Bobin, Cécile Chenot
Dictionary learning on manifolds workshop - September, 4th 2017
Blind Source Separation (BSS)
2
X = AS + N
Applications: astrophysical data, spectroscopic data…
X : m rows observations and t samples columns A : mixing matrix (m x n) S : sources (n x t) N : noise and model imperfections (m x t)
Goal: estimate A and S from X
Examples of sources and observations in spectroscopy:
X1 X2
Dictionary learning on manifolds workshop - September, 4th 2017
State of art
3
Ill-posed unsupervised matrix factorization problem [1]
Requires prior information on A and S. Several families: - sources assumed statistically independent (e.g. ICA [1]) - sources assumed sparse [2] (e.g. GMCA [3]) - A and S non-negative (e.g. HALS [4])
Some applications with a high number of sources: - spectroscopy - hyperspectral imaging - very close to dictionary learning - …
[1] P. Comon, C. Jutten, Handbook of Blind Source Separation: Independent component analysis and applications, Academic press, 2010.[2] M. Zibulevsky, B. A. Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary, Neural computation 13 (4) (2001) 863–882. [3] J. Bobin, J.-L. Starck, Y. Moudden, M. J. Fadili, Blind source separa- tion: The sparsity revolution, Advances in Imaging and Electron Physics 152 (1) (2008) 221–302. [4] N. Gillis, F. Glineur, Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization, Neural computation 24 (4) (2012) 1085–1105.
Dictionary learning on manifolds workshop - September, 4th 2017
Goal
4
Investigate block-based optimization strategies
Problem: the performances decline with a high number of sources n:
Thorough explanation of the results
Dictionary learning on manifolds workshop - September, 4th 2017 5
Optimization problem (1/2)
minA,S
1
2kX�ASk2F + J (A) + G(S)
Write the problem as an optimization problem [3]:
Data fidelity term Enforces constraints on A
Enforces constraints on S
If J Gand are convex, multi-convex problem => use of blocks [5].Non-convex
[3] J. Bobin, J.-L. Starck, Y. Moudden, M. J. Fadili, Blind source separa- tion: The sparsity revolution, Advances in Imaging and Electron Physics 152 (1) (2008) 221–302.[5] Y. Xu, W. Yin, A globally convergent algorithm for nonconvex optimization based on block coordinate update, arXiv preprint arXiv:1410.1386.
Dictionary learning on manifolds workshop - September, 4th 2017
Optimization problem (2/2)
6
=X
A
S
- full size blocks: whole matrices A and S [3]. High dimensional problem and slower
- hierarchical or deflation methods: blocks of size 1 [4]. Simpler and faster but error propagation
- bGMCA (here with r = 3) In-between approach: intermediate block sizes r => simpler problem without dramatic error propagations.
Dictionary learning on manifolds workshop - September, 4th 2017
PALM step
7
Proximal Alternating Linearized Minimization (PALM) [6] algorithm: - Fast - Convergence to a stationary point
S(k)I = prox �G(.)����A
(k�1)T
IA
(k�1)I
����2
0
@S(k�1)I � ����A(k�1)T
I A(k�1)I
���2
A(k�1)T
I (A(k�1)S(k�1) �X)
1
A
A(k)I = prox �J (.)����S
(k)I
S(k)T
I
����2
0
@A(k�1)I � ����S(k)
I S(k)T
I
���2
(A(k�1)S(k) �X)S(k)T
I
1
A
. Choose which indices I to update
. Update the corresponding sub-matrix SI
. Update the corresponding sub-matrix AI
While not converged:
minA,S
1
2kX�ASk2F + J (A) + G(S)
[6] J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming 146 (1-2) (2014) 459-494.
Dictionary learning on manifolds workshop - September, 4th 2017
Studied penalizations
8
Use of the proximal operators of J and G
Studied penalizations:
G(S) =��⇤S � S�T
S
��1- l1 sparsity constraint:
- l1 sparsity constraint in a transformed domain and non-negativity:
- For S:
- For A:- Oblique constraint:
Proximal operator: non-explicit prox => approximation or GFB [8]G(S) = k⇤S �
⇣S�S
T⌘k`1 + ◆{8j,k;S[j,k]�0}(S)
Proximal operator: soft-thresholding
J (A) = ◆{8i;kAik221}(A)
Proximal operator: projection on the l2 unit ball
- Oblique and non-negativity constraints:J (A) = ◆8i;kAik221(A) + ◆8i,j;A[i,j]�0(A)
Proximal operator: projection on the positive orthant and on the l2 unit ball
[7]
[7] N. Parikh, S. Boyd, et al., Proximal algorithms, Foundations and Trend in Optimization 1 (3) (2014) 127–239. [8] H. Raguet, J. Fadili, G. Peyré, A generalized forward-backward splitting, SIAM Journal on Imaging Sciences 6 (3) (2013) 1199–1226.
Dictionary learning on manifolds workshop - September, 4th 2017
Warm-up stage: GMCA [3]
9
PALM suffers from a lack of robustness w.r.t.: - a bad initialization - the threshold choice - often converge to local minima
GMCA - is robust - comprehends an automatic thresholding strategy - not guaranteed to converge
Build a 2-stage minimization procedure:
- Warm-up stage: GMCA - Refinement stage: PALM
Dictionary learning on manifolds workshop - September, 4th 2017
RI = X �AICSIC
Warm-up stage: GMCA
10
GMCA with blocks:
. Random initialization for AFor each iteration (k) from 0 to kmax:
. Choose which indices I to update, compute the residual
. Update the corresponding sub-matrix SI
. Update the corresponding sub-matrix AI
S(k)I = proxG(.)(A
(k�1)†I RI )
A(k)I = proxJ (.)(RIS
(k)†I )
Heuristic: at each iteration, use a decreasing new threshold (better unmixing, increased robustness to noise and spurious local minima [3])
Dictionary learning on manifolds workshop - September, 4th 2017
Number of iterations
11
- Given number of iterations for the warm-up stage (no convergence)
- Stopping criterion for the refinement step:
� =
Pj2[1,n]
���A(k)j �A(k�1)
j
���1
n
Based on the evolution of the angle of the columns between 2 iterations.
The refinement step stops when � > ⌧
Dictionary learning on manifolds workshop - September, 4th 2017
Modeling the behavior of the bGMCA algorithm
12
Focus on the warm-up stage.
RI = A⇤IS
⇤I
RI = X�AICSIC = A⇤IS
⇤I + E +N
E used to model the estimation error
Using a first order expansion on RI = X�AICSIC we get:
- Interference term: leakage of the true sources that are outside the current block
- Interference + artefact term: originates from the error on the sources outside the block
Tradeoff when using small-size blocks: - simpler problem - implies larger errors E
E = (A⇤IC �AIC )S⇤
IC�AIC ✏IC
Experiments on simulated matrices
Dictionary learning on manifolds workshop - September, 4th 2017
Experimental setting and metric
14
Source matrix S: sparse in the direct domain: - n sources - t = 1000 samples - p = 10% of the samples are non-zero. Their amplitudes follow a standard
normal distribution
Source matrix A: drawn following a standard normal distribution: - m observations. Here, m = n - Modified to have a given condition number Cd
Metric CA [9]:CA = median(|PA†A⇤|� Id)
- A*: true mixing matrix - A: solution given by the algorithm - P: correction for the permutation and scale factors indeterminacies
The experiments are performed 25 times and the median displayed
[9] J. Bobin, J. Rapin, A. Larue, J.-L. Starck, Sparsity and adaptivity for the blind separation of partially correlated sources., IEEE Trans. Signal Processing 63 (5) (2015) 1199–1213.
Dictionary learning on manifolds workshop - September, 4th 2017
Study of the impact of r and n
15
Evolution of the results as a function of the block size r for different numbers of sources n
Error propagation
More difficult problem
Trade-off => optimum
Dictionary learning on manifolds workshop - September, 4th 2017
Study of the impact of Cd
16
Evolution of the results as a function of the block size r with different condition numbers Cd of A
Dictionary learning on manifolds workshop - September, 4th 2017
Study of the impact of p
17
Evolution of the results as a function of the block size r with different sparsity levels of the sources p
Dictionary learning on manifolds workshop - September, 4th 2017
Complexity and number of iterations
18
r[mt+ rm+ r2 + tpcondition of S]
Complexity of one iteration:
Number of iterations:Intuitively, proportional to dn/retimes the number of iterations of GMCA
With r the block size
=> Gain in computation time
Realistic sources
Dictionary learning on manifolds workshop - September, 4th 2017
Experimental setting
20
Simulated LC - 1H NMR experiment
Identify each of the chemicals of a mixture of chemical compounds
S: 40 chemicals, t = 10 000 samples
Sources taken on the SDBS [10] database and convolved with a Laplacian
A: simulates Gaussian-shaped elution times (m,n) = (320,40). Constructed in 2 parts, the second one corresponding to more correlated elution times
Monte-Carlo simulations by randomly assigning the sources and the columns of A
[10] N. I. of Advanced Industrial Science, T. (AIST), Spectral database for organic compounds.
Dictionary learning on manifolds workshop - September, 4th 2017 21
Study of the impact of r2 main differences with the previous experiment: - the sources are sparse in the wavelet domain - non-negativity on A and S (in the direct domain) is enforced
Dictionary learning on manifolds workshop - September, 4th 2017
Conclusion
22
- bGMCA: retrieve a high number of sources
- Block-based optimization strategy
- 2 stage minimization procedure
- Enhances the separation performances (can be perfect)
- Significant decrease of the computational cost
Dictionary learning on manifolds workshop - September, 4th 2017
References
23
[1] P. Comon, C. Jutten, Handbook of Blind Source Separation: Independent component analysis and applications, Academic press, 2010.
[2] M. Zibulevsky, B. A. Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary, Neural computation 13 (4) (2001) 863–882.
[3] J. Bobin, J.-L. Starck, Y. Moudden, M. J. Fadili, Blind source separa- tion: The sparsity revolution, Advances in Imaging and Electron Physics 152 (1) (2008) 221–302.
[4] N. Gillis, F. Glineur, Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization, Neural computation 24 (4) (2012) 1085–1105.
[5] Y. Xu, W. Yin, A globally convergent algorithm for nonconvex optimization based on block coordinate update, arXiv preprint arXiv:1410.1386.
[6] J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming 146 (1-2) (2014) 459-494.
[7] N. Parikh, S. Boyd, et al., Proximal algorithms, Foundations and Trend in Optimization 1 (3) (2014) 127–239.
[8] H. Raguet, J. Fadili, G. Peyré, A generalized forward-backward splitting, SIAM Journal on Imaging Sciences 6 (3) (2013) 1199–1226.
[9] J. Bobin, J. Rapin, A. Larue, J.-L. Starck, Sparsity and adaptivity for the blind separation of partially correlated sources., IEEE Trans. Signal Processing 63 (5) (2015) 1199–1213.
[10] N. I. of Advanced Industrial Science, T. (AIST), Spectral database for organic compounds.