part 3. nonnegative matrix factorization k-means and...
TRANSCRIPT
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 100
Part 3.Nonnegative Matrix Factorization ⇔
K-means and Spectral Clustering
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 101
Nonnegative Matrix Factorization (NMF)
),,,( 21 nxxxX L=Data Matrix: n points in p-dim:
TFGX ≈Decomposition (low-rank approximation)
Nonnegative Matrices0,0,0 ≥≥≥ ijijij GFX
),,,( 21 kgggG L=),,,( 21 kfffF L=
is an image, document, webpage, etc
ix
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 102
Some historical notes
• Earlier work by statistics people• P. Paatero (1994) Environmetrices• Lee and Seung (1999, 2000)
– Parts of whole (no cancellation)– A multiplicative update algorithm
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 103
0 0050 710
080 20 0
.
.
.
.
.
.
.
M
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
Pixel vector
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 104
),,,( 21 kfffF L= ),,,( 21 kgggG L=
TFGX ≈
),,,( 21 nxxxX L=Parts-based perspective
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 105
TFGX ≈ ),,,( 21 kfffF L=
Sparsify F to get parts-based picture
Li, et al, 200; Hoyer 2003
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 106
Theorem.NMF = kernel K-means clustering
NMF produces holistic modeling of the dataTheoretical results and experiments verification
(Ding, He, Simon, 2005)
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 107
Our Results: NMF = Data Clustering
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 108
Our Results: NMF = Data Clustering
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 109
Theorem: K-means = NMF
• Reformulate K-means and Kernel K-means
• Show equivalence
)(Trmax0,
WHH T
HIHHT ≥=
2
0,||||min T
HIHHHHW
T−
≥=
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 110
NMF = K-means
22
0,||||)](-2Tr||W[||minarg HHWHH TT
HIHHT+=
≥=
)(Trmaxarg0,
WHHH T
HIHHT ≥==
)]([-2Trminarg0,
WHH T
HIHHT ≥==
2
0,||||minarg T
HIHHHHW
T−=
≥=2
0||||minarg T
HHHW −⇒
≥
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 111
Spectral Clustering = NMF
Normalized Cut ⇒
Unsigned cluster indicators:
))~((
)~()~(),,( 111
YWIY
yWIyyWIyyyJT
kTk
Tk
−=
−++−=
TrNcut LL
Re-write:
}||||/)00,11,00( 2/12/1
kT
n
k hDDyk
LLL=
IYYYW T =tosubject :Optimize ),~YTY
Tr(max
2/12/1~ −−= WDDW
2
0,||~||min T
HIHHHHW
T−
≥=
kTk
kTk
T
T
DhhhWDh
DhhhWDh )()(
11
11 −++−= L
(Gu , et al, 2001)
∑∑ −=⎟⎟⎠
⎞⎜⎜⎝
⎛+=
>< k k
kk
lk l
lk
k
lk
dC,GCs
d,CCs
d,CCsJ )()()(
,NcutNormalized Cut:
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 112
Advantages of NMF over standard K-means
Soft clustering
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 113
Experiments on Internet Newsgroups
NG2: comp.graphicsNG9: rec.motorcyclesNG10: rec.sport.baseballNG15: sci.spaceNG18: talk.politics.mideast
100 articles from each group. 1000 wordsTf.idf weight. Cosine similarity
Accuracy of clustering resultscosine similarity
0.7110.6970.6520.632
0.6080.576
0.5900.491
0.6120.531
W=HH’K-means
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 114
Summary for Symmetric NMF
• K-means , Kernel K-means• Spectral clustering
• Equivalence to
)(Trmax0,
WHH T
HIHHT ≥=
2
0,||||min T
HIHHHHW
T−
≥=
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 115
Nonsymmetric NMF
• K-means , Kernel K-means• Spectral clustering
• Equivalence to
)(Trmax0,
WHH T
HIHHT ≥=
2
0,||||min T
HIHHHHW
T−
≥=
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 116
Non-symmetric NMFRectangular Data Matrix
Bipartite Graph
• Information Retrieval: word-to-document• DNA gene expressions• Image pixels• Supermarket transaction data
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 117
K-means Clustering of Bipartite Graphs
)(Tr||||
)(
1
, BGFCR
BsJ T
k
j jj
CRKmeans
jj == ∑=
Simultaneous clustering of rows and columns
cut
Ffff ==
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
),,(
1100
0011
321
Gggg ==
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
),,(
1010
0101
321
row indicators
column indicators
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 118
NMF = K-means clustering
)(Trmaxarg0,
0,BGFH T
FIFFGIGTG
T
≥=≥=
=
)2(Trminarg0,
0,BGFT
FIFFGIGTG
T−=
≥=≥=
)2||(||Trminarg 2
0,0,
GFGFBGFB TTT
FIFFGIGTG
T+−=
≥=≥=
2
0,||||minarg
0,
T
FIFFFGB
GIGTG
T−=
≥=≥=
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 119
Solving NMF with non-negative least square
Fix F, solve for G; Fix G, solve for F
Iterate, converge to a local minima
0,0,|||| 2 ≥≥−= GFFGXJ T
∑= ⎟
⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛=−=
n
in
ii
g
gGgFxJ
1
12
~
~
,||~|| M
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 120
Solving NMF with multiplicative updating
Fix F, solve for G; Fix G, solve for F
Lee & Seung ( 2000) propose
0,0,|||| 2 ≥≥−= GFFGXJ T
jkT
jkT
jkjk FGFFX
GG)(
)(←
ikT
ikikik GFG
XGFF)(
)(←
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 121
Symmetric NMF
Complementarity slackness condition
Constraint Optimization. KKT 1st condition
0,|||| 2 ≥−= HHHWJ T
ikikikik H
JHH∂∂−← ε
))(
)(1(ik
Tik
ikik HHHWHHH ββ +−←
ikikT
ikik
HHHHWHHHJ )44(0 +−=⎟⎠⎞
⎜⎝⎛
∂∂=
ikTik
ik HHHH
)(4=ε
Gradient decent
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 122
Summary
• NMF is a new low-rank approximation• The holistic picture (vs. parts-based)• NMF is equivalent to spectral clustering• Main advantage: soft clustering