computing a nonnegative matrix factorization...

109
Computing a Nonnegative Matrix Factorization – Provably Ankur Moitra, IAS joint work with Sanjeev Arora, Rong Ge and Ravi Kannan June 20, 2012 Ankur Moitra (IAS) Provably June 20, 2012

Upload: others

Post on 03-Mar-2020

37 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Computing a Nonnegative MatrixFactorization – Provably

Ankur Moitra, IAS

joint work with Sanjeev Arora, Rong Ge and Ravi Kannan

June 20, 2012

Ankur Moitra (IAS) Provably June 20, 2012

Page 2: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

m

n

M

Page 3: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

inner−dimension

W

A

n

=m M

Page 4: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

A

n

=minner−dimension

rank

W

M

Page 5: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

A

n

=minner−dimension

non−negative

non−negative

rank

W

M

Page 6: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

A

n

=minner−dimension

non−negative

non−negative

ranknon−negative

W

M

Page 7: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Information Retrieval

documents:

Page 8: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Information Retrieval

documents:

n

m M

Page 9: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Information Retrieval

A

n

=minner−dimension

documents:

W

M

Page 10: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Information Retrieval

"topics"

W

A

n

=minner−dimension

documents:

M

Page 11: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Information Retrieval

"representation"

W

A

n

=minner−dimension

documents:

"topics"

M

Page 12: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Information Retrieval

A

n

=minner−dimension

documents:

"topics" "representation"

W

M

Page 13: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Information Retrieval

A

n

=minner−dimension

documents:

"topics" "representation"

W

M

Page 14: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Applications

Statistics and Machine Learning:extract latent relationships in data

image segmentation, text classification, information retrieval,collaborative filtering, ...[Lee, Seung], [Xu et al], [Hofmann], [Kumar et al], [Kleinberg, Sandler]

Combinatorics:extended formulation, log-rank conjecture[Yannakakis], [Lovasz, Saks]

Physical Modeling:interaction of components is additive

visual recognition, environmetrics

Page 15: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Applications

Statistics and Machine Learning:extract latent relationships in data

image segmentation, text classification, information retrieval,collaborative filtering, ...[Lee, Seung], [Xu et al], [Hofmann], [Kumar et al], [Kleinberg, Sandler]

Combinatorics:extended formulation, log-rank conjecture[Yannakakis], [Lovasz, Saks]

Physical Modeling:interaction of components is additive

visual recognition, environmetrics

Page 16: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Applications

Statistics and Machine Learning:extract latent relationships in data

image segmentation, text classification, information retrieval,collaborative filtering, ...[Lee, Seung], [Xu et al], [Hofmann], [Kumar et al], [Kleinberg, Sandler]

Combinatorics:extended formulation, log-rank conjecture[Yannakakis], [Lovasz, Saks]

Physical Modeling:interaction of components is additive

visual recognition, environmetrics

Page 17: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Local Search: Given A, compute W , compute A, ....

Known to fail on worst-case inputs (stuck in local minima)

Highly sensitive to cost function, regularization, update procedure

Question (theoretical)

Is there an algorithm that (provably) works on all inputs?

Question

Can we give a theoretical explanation for why simple heuristics are soeffective?

(What distinguishes a realistic input from an artificial one?)

Page 18: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Local Search: Given A, compute W , compute A, ....

Known to fail on worst-case inputs (stuck in local minima)

Highly sensitive to cost function, regularization, update procedure

Question (theoretical)

Is there an algorithm that (provably) works on all inputs?

Question

Can we give a theoretical explanation for why simple heuristics are soeffective?

(What distinguishes a realistic input from an artificial one?)

Page 19: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Local Search: Given A, compute W , compute A, ....

Known to fail on worst-case inputs (stuck in local minima)

Highly sensitive to cost function, regularization, update procedure

Question (theoretical)

Is there an algorithm that (provably) works on all inputs?

Question

Can we give a theoretical explanation for why simple heuristics are soeffective?

(What distinguishes a realistic input from an artificial one?)

Page 20: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Local Search: Given A, compute W , compute A, ....

Known to fail on worst-case inputs (stuck in local minima)

Highly sensitive to cost function, regularization, update procedure

Question (theoretical)

Is there an algorithm that (provably) works on all inputs?

Question

Can we give a theoretical explanation for why simple heuristics are soeffective?

(What distinguishes a realistic input from an artificial one?)

Page 21: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Hardness of NMF

Theorem (Vavasis)

NMF is NP-hard to compute

Hence it is unlikely that there is an exact algorithm that runs in timepolynomial in n, m and r

Question

Should we expect r to be large?

What if you gave me a collection of 100 documents, and I told youthere are 75 topics?

How quickly can we solve NMF if r is small?

Page 22: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Hardness of NMF

Theorem (Vavasis)

NMF is NP-hard to compute

Hence it is unlikely that there is an exact algorithm that runs in timepolynomial in n, m and r

Question

Should we expect r to be large?

What if you gave me a collection of 100 documents, and I told youthere are 75 topics?

How quickly can we solve NMF if r is small?

Page 23: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Hardness of NMF

Theorem (Vavasis)

NMF is NP-hard to compute

Hence it is unlikely that there is an exact algorithm that runs in timepolynomial in n, m and r

Question

Should we expect r to be large?

What if you gave me a collection of 100 documents, and I told youthere are 75 topics?

How quickly can we solve NMF if r is small?

Page 24: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Hardness of NMF

Theorem (Vavasis)

NMF is NP-hard to compute

Hence it is unlikely that there is an exact algorithm that runs in timepolynomial in n, m and r

Question

Should we expect r to be large?

What if you gave me a collection of 100 documents, and I told youthere are 75 topics?

How quickly can we solve NMF if r is small?

Page 25: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Hardness of NMF

Theorem (Vavasis)

NMF is NP-hard to compute

Hence it is unlikely that there is an exact algorithm that runs in timepolynomial in n, m and r

Question

Should we expect r to be large?

What if you gave me a collection of 100 documents, and I told youthere are 75 topics?

How quickly can we solve NMF if r is small?

Page 26: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

The Worst-Case Complexity of NMF

Theorem (Arora, Ge, Moitra, Kannan)

There is an (nm)O(r2) time exact algorithm for NMF

Previously, the fastest (provable) algorithm for r = 3 ran in timeexponential in n and m

Can we improve the exponential dependence on r?

Theorem (Arora, Ge, Moitra, Kannan)

An exact algorithm for NMF that runs in time (nm)o(r) would yield asub-exponential time algorithm for 3-SAT

Page 27: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

The Worst-Case Complexity of NMF

Theorem (Arora, Ge, Moitra, Kannan)

There is an (nm)O(r2) time exact algorithm for NMF

Previously, the fastest (provable) algorithm for r = 3 ran in timeexponential in n and m

Can we improve the exponential dependence on r?

Theorem (Arora, Ge, Moitra, Kannan)

An exact algorithm for NMF that runs in time (nm)o(r) would yield asub-exponential time algorithm for 3-SAT

Page 28: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

The Worst-Case Complexity of NMF

Theorem (Arora, Ge, Moitra, Kannan)

There is an (nm)O(r2) time exact algorithm for NMF

Previously, the fastest (provable) algorithm for r = 3 ran in timeexponential in n and m

Can we improve the exponential dependence on r?

Theorem (Arora, Ge, Moitra, Kannan)

An exact algorithm for NMF that runs in time (nm)o(r) would yield asub-exponential time algorithm for 3-SAT

Page 29: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

The Worst-Case Complexity of NMF

Theorem (Arora, Ge, Moitra, Kannan)

There is an (nm)O(r2) time exact algorithm for NMF

Previously, the fastest (provable) algorithm for r = 3 ran in timeexponential in n and m

Can we improve the exponential dependence on r?

Theorem (Arora, Ge, Moitra, Kannan)

An exact algorithm for NMF that runs in time (nm)o(r) would yield asub-exponential time algorithm for 3-SAT

Page 30: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Local Search: Given A, compute W , compute A, ....

Known to fail on worst-case inputs (stuck in local minima)

Highly sensitive to cost function, regularization, update procedure

Question (theoretical)

Is there an algorithm that (provably) works on all inputs?

Question

Can we give a theoretical explanation for why simple heuristics are soeffective?

(What distinguishes a realistic input from an artificial one?)

Page 31: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Local Search: Given A, compute W , compute A, ....

Known to fail on worst-case inputs (stuck in local minima)

Highly sensitive to cost function, regularization, update procedure

Question (theoretical)

Is there an algorithm that (provably) works on all inputs?

Question

Can we give a theoretical explanation for why simple heuristics are soeffective?

(What distinguishes a realistic input from an artificial one?)

Page 32: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Local Search: Given A, compute W , compute A, ....

Known to fail on worst-case inputs (stuck in local minima)

Highly sensitive to cost function, regularization, update procedure

Question (theoretical)

Is there an algorithm that (provably) works on all inputs?

Question

Can we give a theoretical explanation for why simple heuristics are soeffective?

(What distinguishes a realistic input from an artificial one?)

Page 33: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separability [Donoho, Stodden], Reinterpreted

Each topic has an anchor word, and any document that containsthis word is very likely to be (at least partially) about thecorresponding topic

e.g. personal finance ← 401k , baseball ← outfield, ...

A document can contain no anchor words, but when one occurs itis a strong indicator

Observation (Blei)

This condition is met by topics found on real data, say, by local search

Separability was introduced to understand when NMF is unique – Is itenough to make NMF easy?

Page 34: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separability [Donoho, Stodden], Reinterpreted

Each topic has an anchor word, and any document that containsthis word is very likely to be (at least partially) about thecorresponding topic

e.g. personal finance ← 401k , baseball ← outfield, ...

A document can contain no anchor words, but when one occurs itis a strong indicator

Observation (Blei)

This condition is met by topics found on real data, say, by local search

Separability was introduced to understand when NMF is unique – Is itenough to make NMF easy?

Page 35: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separability [Donoho, Stodden], Reinterpreted

Each topic has an anchor word, and any document that containsthis word is very likely to be (at least partially) about thecorresponding topic

e.g. personal finance ← 401k , baseball ← outfield, ...

A document can contain no anchor words, but when one occurs itis a strong indicator

Observation (Blei)

This condition is met by topics found on real data, say, by local search

Separability was introduced to understand when NMF is unique – Is itenough to make NMF easy?

Page 36: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separability [Donoho, Stodden], Reinterpreted

Each topic has an anchor word, and any document that containsthis word is very likely to be (at least partially) about thecorresponding topic

e.g. personal finance ← 401k , baseball ← outfield, ...

A document can contain no anchor words, but when one occurs itis a strong indicator

Observation (Blei)

This condition is met by topics found on real data, say, by local search

Separability was introduced to understand when NMF is unique – Is itenough to make NMF easy?

Page 37: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separability [Donoho, Stodden], Reinterpreted

Each topic has an anchor word, and any document that containsthis word is very likely to be (at least partially) about thecorresponding topic

e.g. personal finance ← 401k , baseball ← outfield, ...

A document can contain no anchor words, but when one occurs itis a strong indicator

Observation (Blei)

This condition is met by topics found on real data, say, by local search

Separability was introduced to understand when NMF is unique – Is itenough to make NMF easy?

Page 38: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Beyond Worst-Case Analysis

Theorem (Arora, Ge, Kannan, Moitra)

There is a polynomial time exact algorithm for NMF when the topicmatrix A is separable

What if documents do not contain many words compared to thedictionary? (e.g. we are given samples from M)

In fact, the above algorithm can be made robust to noise:

Theorem (Arora, Ge, Moitra)

There is a polynomial time algorithm for learning a separable topicmatrix A in various probabilistic models - e.g. LDA, CTM

Page 39: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Beyond Worst-Case Analysis

Theorem (Arora, Ge, Kannan, Moitra)

There is a polynomial time exact algorithm for NMF when the topicmatrix A is separable

What if documents do not contain many words compared to thedictionary? (e.g. we are given samples from M)

In fact, the above algorithm can be made robust to noise:

Theorem (Arora, Ge, Moitra)

There is a polynomial time algorithm for learning a separable topicmatrix A in various probabilistic models - e.g. LDA, CTM

Page 40: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Beyond Worst-Case Analysis

Theorem (Arora, Ge, Kannan, Moitra)

There is a polynomial time exact algorithm for NMF when the topicmatrix A is separable

What if documents do not contain many words compared to thedictionary? (e.g. we are given samples from M)

In fact, the above algorithm can be made robust to noise:

Theorem (Arora, Ge, Moitra)

There is a polynomial time algorithm for learning a separable topicmatrix A in various probabilistic models - e.g. LDA, CTM

Page 41: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Local Search: Given A, compute W , compute A, ....

Known to fail on worst-case inputs (stuck in local minima)

Highly sensitive to cost function, regularization, update procedure

Question (theoretical)

Is there an algorithm that (provably) works on all inputs?

Question

Can we give a theoretical explanation for why simple heuristics are soeffective?

(What distinguishes a realistic input from an artificial one?)

Page 42: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Local Search: Given A, compute W , compute A, ....

Known to fail on worst-case inputs (stuck in local minima)

Highly sensitive to cost function, regularization, update procedure

Question (theoretical)

Is there an algorithm that (provably) works on all inputs?

Question

Can we give a theoretical explanation for why simple heuristics are soeffective?

(What distinguishes a realistic input from an artificial one?)

Page 43: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Is NMF Computable?

[Cohen, Rothblum]: Yes (DETOUR)

Variables: entries in A and W (nr + mr total)

Constraints: A,W ≥ 0 and AW = M (degree two)

Running time for a solver is exponential in the number of variables

Question

What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?

Page 44: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Is NMF Computable?

[Cohen, Rothblum]: Yes

(DETOUR)

Variables: entries in A and W (nr + mr total)

Constraints: A,W ≥ 0 and AW = M (degree two)

Running time for a solver is exponential in the number of variables

Question

What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?

Page 45: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Is NMF Computable?

[Cohen, Rothblum]: Yes (DETOUR)

Variables: entries in A and W (nr + mr total)

Constraints: A,W ≥ 0 and AW = M (degree two)

Running time for a solver is exponential in the number of variables

Question

What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?

Page 46: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Semi-algebraic sets: s polynomials, k variables, Boolean function B

S = {x1, x2...xk |B(sgn(f1), sgn(f2), ...sgn(fs)) = ”true” }

Question

How many sign patterns arise (as x1, x2, ...xk range over Rk)?

Naive bound: 3s (all of {−1, 0, 1}s), [Milnor, Warren]: at most (ds)k ,where d is the maximum degree

In fact, best known algorithms (e.g. [Renegar]) for finding a point inS run in (ds)O(k) time

Page 47: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Semi-algebraic sets: s polynomials, k variables, Boolean function B

S = {x1, x2...xk |B(sgn(f1), sgn(f2), ...sgn(fs)) = ”true” }

Question

How many sign patterns arise (as x1, x2, ...xk range over Rk)?

Naive bound: 3s (all of {−1, 0, 1}s), [Milnor, Warren]: at most (ds)k ,where d is the maximum degree

In fact, best known algorithms (e.g. [Renegar]) for finding a point inS run in (ds)O(k) time

Page 48: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Semi-algebraic sets: s polynomials, k variables, Boolean function B

S = {x1, x2...xk |B(sgn(f1), sgn(f2), ...sgn(fs)) = ”true” }

Question

How many sign patterns arise (as x1, x2, ...xk range over Rk)?

Naive bound: 3s (all of {−1, 0, 1}s),

[Milnor, Warren]: at most (ds)k ,where d is the maximum degree

In fact, best known algorithms (e.g. [Renegar]) for finding a point inS run in (ds)O(k) time

Page 49: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Semi-algebraic sets: s polynomials, k variables, Boolean function B

S = {x1, x2...xk |B(sgn(f1), sgn(f2), ...sgn(fs)) = ”true” }

Question

How many sign patterns arise (as x1, x2, ...xk range over Rk)?

Naive bound: 3s (all of {−1, 0, 1}s), [Milnor, Warren]: at most (ds)k ,where d is the maximum degree

In fact, best known algorithms (e.g. [Renegar]) for finding a point inS run in (ds)O(k) time

Page 50: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Semi-algebraic sets: s polynomials, k variables, Boolean function B

S = {x1, x2...xk |B(sgn(f1), sgn(f2), ...sgn(fs)) = ”true” }

Question

How many sign patterns arise (as x1, x2, ...xk range over Rk)?

Naive bound: 3s (all of {−1, 0, 1}s), [Milnor, Warren]: at most (ds)k ,where d is the maximum degree

In fact, best known algorithms (e.g. [Renegar]) for finding a point inS run in (ds)O(k) time

Page 51: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Is NMF Computable?

[Cohen, Rothblum]: Yes (DETOUR)

Variables: entries in A and W (nr + mr total)

Constraints: A,W ≥ 0 and AW = M (degree two)

Running time for a solver is exponential in the number of variables

Question

What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?

Page 52: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Is NMF Computable?

[Cohen, Rothblum]: Yes (DETOUR)

Variables: entries in A and W (nr + mr total)

Constraints: A,W ≥ 0 and AW = M (degree two)

Running time for a solver is exponential in the number of variables

Question

What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?

Page 53: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Is NMF Computable?

[Cohen, Rothblum]: Yes (DETOUR)

Variables: entries in A and W (nr + mr total)

Constraints: A,W ≥ 0 and AW = M (degree two)

Running time for a solver is exponential in the number of variables

Question

What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?

Page 54: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Is NMF Computable?

[Cohen, Rothblum]: Yes (DETOUR)

Variables: entries in A and W (nr + mr total)

Constraints: A,W ≥ 0 and AW = M (degree two)

Running time for a solver is exponential in the number of variables

Question

What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?

Page 55: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Is NMF Computable?

[Cohen, Rothblum]: Yes (DETOUR)

Variables: entries in A and W (nr + mr total)

Constraints: A,W ≥ 0 and AW = M (degree two)

Running time for a solver is exponential in the number of variables

Question

What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?

Page 56: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Reducing the Number of Variables

Easy: If A has full rank, then f (r) = 2r 2

[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)

[Moitra]: In general, f (r) = 2r 2 (using a normal form)

Corollary

There is an (nm)O(r2) time algorithm for NMF

In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT

Page 57: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Reducing the Number of Variables

Easy: If A has full rank, then f (r) = 2r 2

[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)

[Moitra]: In general, f (r) = 2r 2 (using a normal form)

Corollary

There is an (nm)O(r2) time algorithm for NMF

In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT

Page 58: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Easy Case: A has Full Column Rank

A

Page 59: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Easy Case: A has Full Column Rank

A

A

pseudo−inverse

+

Page 60: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Easy Case: A has Full Column Rank

I rA =

A

pseudo−inverse

+

Page 61: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Easy Case: A has Full Column Rank

A =A

pseudo−inverse

+

W

W

Page 62: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Easy Case: A has Full Column Rank

A =pseudo−inverse

A

M

+

W

W

Page 63: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Easy Case: A has Full Column Rank

linearly independent

A =pseudo−inverse

A

M

+

W

W

Page 64: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Easy Case: A has Full Column Rank

T

change of basis

W

linearly independent

A =

M’

pseudo−inverse

A

M

+

W

Page 65: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Putting it Together: 2r 2 Variables

M

Page 66: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Putting it Together: 2r 2 Variables

M’’

M

M’

Page 67: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Putting it Together: 2r 2 Variables

S

M

variables

M’

M’’

T

Page 68: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Putting it Together: 2r 2 Variables

T

S

M

variables

A

M’

W

M’’

Page 69: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Putting it Together: 2r 2 Variables

T

S

M

non−negative?

non−negative?

variables

A

M’

W

M’’

Page 70: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Putting it Together: 2r 2 Variables

T

S

M

( ) ( )

non−negative?

non−negative?

variables

A

M’

W

M’’

Page 71: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Putting it Together: 2r 2 Variables

T

S

M

( ) ( ) = M

?

non−negative?

non−negative?

variables

A

M’

W

M’’

Page 72: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Reducing the Number of Variables

Easy: If A has full rank, then f (r) = 2r 2

[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)

[Moitra]: In general, f (r) = 2r 2 (using a normal form)

Corollary

There is an (nm)O(r2) time algorithm for NMF

In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT

Page 73: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Reducing the Number of Variables

Easy: If A has full rank, then f (r) = 2r 2

[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)

[Moitra]: In general, f (r) = 2r 2 (using a normal form)

Corollary

There is an (nm)O(r2) time algorithm for NMF

In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT

Page 74: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Reducing the Number of Variables

Easy: If A has full rank, then f (r) = 2r 2

[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)

[Moitra]: In general, f (r) = 2r 2 (using a normal form)

Corollary

There is an (nm)O(r2) time algorithm for NMF

In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT

Page 75: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Reducing the Number of Variables

Easy: If A has full rank, then f (r) = 2r 2

[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)

[Moitra]: In general, f (r) = 2r 2 (using a normal form)

Corollary

There is an (nm)O(r2) time algorithm for NMF

In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT

Page 76: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Reducing the Number of Variables

Easy: If A has full rank, then f (r) = 2r 2

[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)

[Moitra]: In general, f (r) = 2r 2 (using a normal form)

Corollary

There is an (nm)O(r2) time algorithm for NMF

In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT

Page 77: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Local Search: Given A, compute W , compute A, ....

Known to fail on worst-case inputs (stuck in local minima)

Highly sensitive to cost function, regularization, update procedure

Question (theoretical)

Is there an algorithm that (provably) works on all inputs?

Question

Can we give a theoretical explanation for why simple heuristics are soeffective?

(What distinguishes a realistic input from an artificial one?)

Page 78: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Local Search: Given A, compute W , compute A, ....

Known to fail on worst-case inputs (stuck in local minima)

Highly sensitive to cost function, regularization, update procedure

Question (theoretical)

Is there an algorithm that (provably) works on all inputs?

Question

Can we give a theoretical explanation for why simple heuristics are soeffective?

(What distinguishes a realistic input from an artificial one?)

Page 79: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separable Instances

Recall: For each topic, there is some (anchor) word that only appearsin this topic

Observation

Rows of W appear as (scaled) rows of M

Question

How can we identify anchor words?

Removing a row from M strictly changes the convex hull iff it is ananchor word

Hence we can identify all the anchor words via linear programming(can be made robust to noise)

Page 80: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

A

Page 81: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

A

Page 82: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

=

A W M

Page 83: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

=

A W M

Page 84: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separable Instances

Recall: For each topic, there is some (anchor) word that only appearsin this topic

Observation

Rows of W appear as (scaled) rows of M

Question

How can we identify anchor words?

Removing a row from M strictly changes the convex hull iff it is ananchor word

Hence we can identify all the anchor words via linear programming(can be made robust to noise)

Page 85: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separable Instances

Recall: For each topic, there is some (anchor) word that only appearsin this topic

Observation

Rows of W appear as (scaled) rows of M

Question

How can we identify anchor words?

Removing a row from M strictly changes the convex hull iff it is ananchor word

Hence we can identify all the anchor words via linear programming(can be made robust to noise)

Page 86: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separable Instances

Recall: For each topic, there is some (anchor) word that only appearsin this topic

Observation

Rows of W appear as (scaled) rows of M

Question

How can we identify anchor words?

Removing a row from M strictly changes the convex hull iff it is ananchor word

Hence we can identify all the anchor words via linear programming(can be made robust to noise)

Page 87: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

=

A W M

Page 88: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

nr

=

A W M

brute force:

Page 89: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

nr

=

A W M

brute force:

Page 90: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

brute force: nr

=

A W M

Page 91: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

brute force: nr

=

A W M

Page 92: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

brute force: nr

=

A W M

Page 93: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

brute force: nr

=

A W M

Page 94: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separable Instances

Recall: For each topic, there is some (anchor) word that only appearsin this topic

Observation

Rows of W appear as (scaled) rows of M

Question

How can we identify anchor words?

Removing a row from M strictly changes the convex hull iff it is ananchor word

Hence we can identify all the anchor words via linear programming(can be made robust to noise)

Page 95: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separable Instances

Recall: For each topic, there is some (anchor) word that only appearsin this topic

Observation

Rows of W appear as (scaled) rows of M

Question

How can we identify anchor words?

Removing a row from M strictly changes the convex hull iff it is ananchor word

Hence we can identify all the anchor words via linear programming(can be made robust to noise)

Page 96: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separable Instances

Recall: For each topic, there is some (anchor) word that only appearsin this topic

Observation

Rows of W appear as (scaled) rows of M

Question

How can we identify anchor words?

Removing a row from M strictly changes the convex hull iff it is ananchor word

Hence we can identify all the anchor words via linear programming

(can be made robust to noise)

Page 97: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Separable Instances

Recall: For each topic, there is some (anchor) word that only appearsin this topic

Observation

Rows of W appear as (scaled) rows of M

Question

How can we identify anchor words?

Removing a row from M strictly changes the convex hull iff it is ananchor word

Hence we can identify all the anchor words via linear programming(can be made robust to noise)

Page 98: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Topic Models

Topic matrix A, generate W stochastically

For each document (column in M = AW ) sample N words

e.g. Latent Dirichlet Allocation (LDA), Correlated Topic Model(CTM), Pachinko Allocation Model (PAM), ...

Question

Can we estimate A, given random samples from M?

Yes! [Arora, Ge, Moitra] we give a provable algorithm based on(noise-tolerant) NMF

Advertisement: Sanjeev will talk about this here in July

Page 99: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Topic Models

Topic matrix A, generate W stochastically

For each document (column in M = AW ) sample N words

e.g. Latent Dirichlet Allocation (LDA), Correlated Topic Model(CTM), Pachinko Allocation Model (PAM), ...

Question

Can we estimate A, given random samples from M?

Yes! [Arora, Ge, Moitra] we give a provable algorithm based on(noise-tolerant) NMF

Advertisement: Sanjeev will talk about this here in July

Page 100: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Topic Models

Topic matrix A, generate W stochastically

For each document (column in M = AW ) sample N words

e.g. Latent Dirichlet Allocation (LDA), Correlated Topic Model(CTM), Pachinko Allocation Model (PAM), ...

Question

Can we estimate A, given random samples from M?

Yes! [Arora, Ge, Moitra] we give a provable algorithm based on(noise-tolerant) NMF

Advertisement: Sanjeev will talk about this here in July

Page 101: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Topic Models

Topic matrix A, generate W stochastically

For each document (column in M = AW ) sample N words

e.g. Latent Dirichlet Allocation (LDA), Correlated Topic Model(CTM), Pachinko Allocation Model (PAM), ...

Question

Can we estimate A, given random samples from M?

Yes! [Arora, Ge, Moitra] we give a provable algorithm based on(noise-tolerant) NMF

Advertisement: Sanjeev will talk about this here in July

Page 102: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Topic Models

Topic matrix A, generate W stochastically

For each document (column in M = AW ) sample N words

e.g. Latent Dirichlet Allocation (LDA), Correlated Topic Model(CTM), Pachinko Allocation Model (PAM), ...

Question

Can we estimate A, given random samples from M?

Yes! [Arora, Ge, Moitra] we give a provable algorithm based on(noise-tolerant) NMF

Advertisement: Sanjeev will talk about this here in July

Page 103: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Concluding Remarks

This is just part of a broader agenda:

Question

When is machine learning provably easy?

Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)

Question

Will this involve a better understanding of real data? A betterunderstanding of popular algorithms in machine learning? Both?

Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning

Page 104: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Concluding Remarks

This is just part of a broader agenda:

Question

When is machine learning provably easy?

Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)

Question

Will this involve a better understanding of real data? A betterunderstanding of popular algorithms in machine learning? Both?

Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning

Page 105: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Concluding Remarks

This is just part of a broader agenda:

Question

When is machine learning provably easy?

Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)

Question

Will this involve a better understanding of real data?

A betterunderstanding of popular algorithms in machine learning? Both?

Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning

Page 106: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Concluding Remarks

This is just part of a broader agenda:

Question

When is machine learning provably easy?

Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)

Question

Will this involve a better understanding of real data? A betterunderstanding of popular algorithms in machine learning?

Both?

Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning

Page 107: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Concluding Remarks

This is just part of a broader agenda:

Question

When is machine learning provably easy?

Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)

Question

Will this involve a better understanding of real data? A betterunderstanding of popular algorithms in machine learning? Both?

Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning

Page 108: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Concluding Remarks

This is just part of a broader agenda:

Question

When is machine learning provably easy?

Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)

Question

Will this involve a better understanding of real data? A betterunderstanding of popular algorithms in machine learning? Both?

Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning

Page 109: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with

Thanks!