part 3. nonnegative matrix factorization k-means and...

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 100

Part 3.Nonnegative Matrix Factorization ⇔

K-means and Spectral Clustering


Nonnegative Matrix Factorization (NMF)

),,,( 21 nxxxX L=Data Matrix: n points in p-dim:

TFGX ≈Decomposition (low-rank approximation)

Nonnegative Matrices0,0,0 ≥≥≥ ijijij GFX

),,,( 21 kgggG L=),,,( 21 kfffF L=

is an image, document, webpage, etc

ix


Some historical notes

• Earlier work by statistics people• P. Paatero (1994) Environmetrices• Lee and Seung (1999, 2000)

– Parts of whole (no cancellation)– A multiplicative update algorithm


0 0050 710

080 20 0

.

.

.

.

.

.

.

M

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

Pixel vector


),,,( 21 kfffF L= ),,,( 21 kgggG L=

TFGX ≈

),,,( 21 nxxxX L=Parts-based perspective


TFGX ≈ ),,,( 21 kfffF L=

Sparsify F to get parts-based picture

Li, et al, 200; Hoyer 2003


Theorem.NMF = kernel K-means clustering

NMF produces holistic modeling of the dataTheoretical results and experiments verification

(Ding, He, Simon, 2005)


Our Results: NMF = Data Clustering


Theorem: K-means = NMF

• Reformulate K-means and Kernel K-means

• Show equivalence

)(Trmax0,

WHH T

HIHHT ≥=

2

0,||||min T

HIHHHHW

T−

≥=


NMF = K-means

22

0,||||)](-2Tr||W[||minarg HHWHH TT

HIHHT+=

≥=

)(Trmaxarg0,

WHHH T

HIHHT ≥==

)]([-2Trminarg0,

WHH T

HIHHT ≥==

2

0,||||minarg T

HIHHHHW

T−=

≥=2

0||||minarg T

HHHW −⇒

≥


Spectral Clustering = NMF

Normalized Cut ⇒

Unsigned cluster indicators:

))~((

)~()~(),,( 111

YWIY

yWIyyWIyyyJT

kTk

Tk

−=

−++−=

TrNcut LL

Re-write:

}||||/)00,11,00( 2/12/1

kT

n

k hDDyk

LLL=

IYYYW T =tosubject :Optimize ),~YTY

Tr(max

2/12/1~ −−= WDDW

2

0,||~||min T

HIHHHHW

T−

≥=

kTk

kTk

T

T

DhhhWDh

DhhhWDh )()(

11

11 −++−= L

(Gu , et al, 2001)

∑∑ −=⎟⎟⎠

⎞⎜⎜⎝

⎛+=

>< k k

kk

lk l

lk

k

lk

dC,GCs

d,CCs

d,CCsJ )()()(

,NcutNormalized Cut:


Advantages of NMF over standard K-means

Soft clustering


Experiments on Internet Newsgroups

NG2: comp.graphicsNG9: rec.motorcyclesNG10: rec.sport.baseballNG15: sci.spaceNG18: talk.politics.mideast

100 articles from each group. 1000 wordsTf.idf weight. Cosine similarity

Accuracy of clustering resultscosine similarity

0.7110.6970.6520.632

0.6080.576

0.5900.491

0.6120.531

W=HH’K-means


Summary for Symmetric NMF

• K-means , Kernel K-means• Spectral clustering

• Equivalence to

)(Trmax0,

WHH T

HIHHT ≥=

2

0,||||min T

HIHHHHW

T−

≥=


Nonsymmetric NMF

• K-means , Kernel K-means• Spectral clustering

• Equivalence to

)(Trmax0,

WHH T

HIHHT ≥=

2

0,||||min T

HIHHHHW

T−

≥=


Non-symmetric NMFRectangular Data Matrix

Bipartite Graph

• Information Retrieval: word-to-document• DNA gene expressions• Image pixels• Supermarket transaction data


K-means Clustering of Bipartite Graphs

)(Tr||||

)(

1

, BGFCR

BsJ T

k

j jj

CRKmeans

jj == ∑=

Simultaneous clustering of rows and columns

cut

Ffff ==

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

),,(

1100

0011

321

Gggg ==

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

),,(

1010

0101

321

row indicators

column indicators


NMF = K-means clustering

)(Trmaxarg0,

0,BGFH T

FIFFGIGTG

T

≥=≥=

=

)2(Trminarg0,

0,BGFT

FIFFGIGTG

T−=

≥=≥=

)2||(||Trminarg 2

0,0,

GFGFBGFB TTT

FIFFGIGTG

T+−=

≥=≥=

2

0,||||minarg

0,

T

FIFFFGB

GIGTG

T−=

≥=≥=


Solving NMF with non-negative least square

Fix F, solve for G; Fix G, solve for F

Iterate, converge to a local minima

0,0,|||| 2 ≥≥−= GFFGXJ T

∑= ⎟

⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛=−=

n

in

ii

g

gGgFxJ

1

12

~

~

,||~|| M


Solving NMF with multiplicative updating

Fix F, solve for G; Fix G, solve for F

Lee & Seung ( 2000) propose

0,0,|||| 2 ≥≥−= GFFGXJ T

jkT

jkT

jkjk FGFFX

GG)(

)(←

ikT

ikikik GFG

XGFF)(

)(←


Symmetric NMF

Complementarity slackness condition

Constraint Optimization. KKT 1st condition

0,|||| 2 ≥−= HHHWJ T

ikikikik H

JHH∂∂−← ε

))(

)(1(ik

Tik

ikik HHHWHHH ββ +−←

ikikT

ikik

HHHHWHHHJ )44(0 +−=⎟⎠⎞

⎜⎝⎛

∂∂=

ikTik

ik HHHH

)(4=ε

Gradient decent


Summary

• NMF is a new low-rank approximation• The holistic picture (vs. parts-based)• NMF is equivalent to spectral clustering• Main advantage: soft clustering

part 3. nonnegative matrix factorization k-means and...

Documents