informatics and mathematical modelling / intelligent signal processing 1 eusipco’09 27 august 2009...

14
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark Joint work with Lars Kai Hansen DTU Informatics Intelligent Signal Processing Technical University of Denmark

Upload: benjamin-todd

Post on 11-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

1EUSIPCO’09 27 August 2009

Tuning Pruning in Sparse Non-negative Matrix

Factorization

Morten Mørup DTU Informatics

Intelligent Signal ProcessingTechnical University of Denmark

Joint work with Lars Kai HansenDTU Informatics

Intelligent Signal ProcessingTechnical University of Denmark

Page 2: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

2EUSIPCO’09 27 August 2009

VWH, V≥0,W≥0,H≥0

Non-negative Matrix Factorization (NMF)

Nature 1999

Sebastian SeungDaniel D. Lee

Gives part-based representation(and as such also promotes sparse representations)(Lee and Seung, 1999)

Also named Positive Matrix Factorization (PMF) (Paatero and Tapper, 1994)

Popularized due to a simple algorithmic procedure based on multiplicative update(Lee & Seung, 2001)

V WH

Page 3: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

3EUSIPCO’09 27 August 2009

(first part of this talk)

A good starting point is not to use multiplicative updates

Roadmap: Some important challenges in NMF

How to efficiently compute the decomposition (NMF is a non-convex problem)

How to resolve the non-uniqueness of the decomposition

How to determine the number of components

z

yx

Convex Hull

z

y

x

Positive Orthant

z

yx

(second part of this talk)

We will demonstrate that Automatic Relevance Determination in Bayesian learning can address these challenges by tuning the pruning in sparse NMF

NMF only unique when data adequately spans the positive orthant (Donoho & Stodden, 2004)

Page 4: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

4EUSIPCO’09 27 August 2009

Multiplicative updates

Step size parameter(Salakhutdinov, Roweis, Ghahramani, 2004)

(Lee & Seung, 2001)

Page 5: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

5EUSIPCO’09 27 August 2009

Other common approaches for solving the NMF problem Active set procedure (Analytic closed form solution wihtin active set for LS-error)

(Lawson and Hansen 1974), (R. Bro and S. de Jong 1997)

Projected gradient (C.-J. Lin 2007)

MU do not converge to optimal solution!!!!

Page 6: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

6EUSIPCO’09 27 August 2009

Sparseness has been imposed to alleviate the non-uniqueness of NMF

(P. Hoyer 2002, 2004), (J. Eggert and E. Körner 2004)

Sparseness motivated by the principle of parsimony, i.e. forming the simplest account. As such sparseness is also related to VARIMAX and ML-ICA based on sparse priors

Page 7: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

7EUSIPCO’09 27 August 2009

Open problems for Sparse NMF (SNMF)

What is the adequate degree of sparsity imposed What is the adequate number of components K to

model the data

Both issues can be posed as the single problem of tuning the pruning in sparse NMF (SNMF). Hence, by imposing a component wise sparsity penalty the above problems boils down to determining k.k results in kth component turned off (i.e. removed).

Page 8: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

8EUSIPCO’09 27 August 2009

Bayesian Learning and the Principle of Parsimony

To get the posterior probability distribution, multiply the prior probability distribution by the likelihood function and then normalize

The explanation of any phenomenon should make as few assumptions as

possible, eliminating those that make no difference in the observable predictions of the explanatory

hypothesis or theory.

Bayesian learning embodies Occam’s razor, i.e. Complex models are penalized. The horizontal axis represents the space of possible data sets D. Bayes rule rewards models in proportion to how much they predicted the data that occurred. These predictions are quantified by a normalized probability distribution on D.

David J.C. MacKay

Thomas Bayes

William of Ockham

Page 9: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

9EUSIPCO’09 27 August 2009

SNMF in a Bayesian formulation

Likelihood function Prior

(In the hierarchical Bayesian framework priors on can further be imposed)

Page 10: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

10EUSIPCO’09 27 August 2009

The log posterior for Sparse NMF is now given by

The contribution in the log posterior from the normalization constant of the priors enables to learn from data the regularization strength (This is also known as Automatic Relevance Determination (ARD))

When Inserting this value for in the objective it can be seen that ARD corresponds to a reweighted L0-norm optimization scheme of the component activation

Page 11: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

11EUSIPCO’09 27 August 2009

No closed form solution for posterior moments of W and H due to non-negativity constraint and use of non-conjugate priors. Posterior distribution can be estimated by sampling

approaches, c.f. previous talk by Mikkel Schmidt. Point estimates of W and H can be obtained by

maximum a posteriori (MAP) estimation forming a regular sparse NMF optimization problem.

Tuning Pruning algorithm

for sparse NMF

Page 12: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

12EUSIPCO’09 27 August 2009

Data resultsHandwritten digits:

X256 Pixels x 7291 digitsCBCL face database:

X361 Pixels x 2429 faces

Wavelet transformed EEG: X64 channels x 122976 tim.-freq. bins

Page 13: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

13EUSIPCO’09 27 August 2009

Analyzing X vs. XT

Handwritten digits (X): X256 Pixels x 7291 digits

Handwritten digits (XT): X7291 digits x 256 Pixels

SNMF has clustering like-properties(As reported in Ding, 2005)

SNMF have part based representation(As reported in Lee&Seung, 1999)

Page 14: Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization

Informatics and Mathematical Modelling / Intelligent Signal Processing

14EUSIPCO’09 27 August 2009

Conclusion Bayesian learning forms a simple framework for tuning the pruning in

sparse NMF thereby both establishing the model order as well as resolving the non-uniqueness of the NMF representation.

Likelihood function (i.e. KL (Poisson noise) vs. LS (Gaussian noise)) heavily impacted the extracted number of components.In comparison, a tensor decomposition study given in (Mørup et al., Journal of Chemometrics 2009) demonstrated that the choice of prior distribution only has limited effect for the model order estimation.

Many other conceivable parameterizations of the prior as well as approaches to parameter estimation. However, Bayesian learning forms a promising framework for model order estimation as well as resolving ambiguities in the NMF model through the tuning of the pruning.