08 dimensionality reduction - sjtu › ~yshen › courses › bigdata › 08... · 5/2/17 12 ¡a =...
TRANSCRIPT
5/2/17
1
Mining of Massive DatasetsJure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University
http://www.mmds.org
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http://www.mmds.org
¡ Assumption: Dataliesonornearalowd-dimensionalsubspace
¡ Axesofthissubspaceareeffectiverepresentationofthedata
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 2
5/2/17
2
¡ Compress/reducedimensionality:§ 106 rows;103 columns;noupdates§ Randomaccesstoanycell(s);smallerror:OK
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 3
The above matrix is really “2-dimensional.” All rows can be reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1]
¡ Q:Whatisrank ofamatrixA?¡ A: Numberoflinearlyindependent columnsofA¡ Forexample:§ MatrixA= hasrankr=2
§ Why?Thefirsttworowsarelinearlyindependent,sotherankisatleast2,butallthreerowsarelinearlydependent(thefirstisequaltothesumofthesecondandthird)sotherankmustbelessthan3.
¡ Whydowecareaboutlowrank?§ WecanwriteA astwo“basis”vectors:[121][-2-31]§ Andnewcoordinatesof:[10][01][1-1]
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 4
5/2/17
3
¡ Cloudofpoints3Dspace:§ Thinkofpointpositionsasamatrix:
¡ Wecanrewritecoordinatesmoreefficiently!§ Oldbasisvectors: [100][010][001]§ Newbasisvectors:[121][-2-31]§ ThenA hasnewcoordinates:[10].B:[01],C:[1-1]
§ Notice:Wereducedthenumberofcoordinates!
1 row per point:
ABC
A
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 5
¡ Goalofdimensionalityreductionistodiscovertheaxisofdata!
Rather than representingevery point with 2 coordinateswe represent each point with1 coordinate (corresponding tothe position of the point on the red line).
By doing this we incur a bit oferror as the points do not exactly lie on the line
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 6
5/2/17
4
Whyreducedimensions?¡ Discoverhiddencorrelations/topics§ Wordsthatoccurcommonlytogether
¡ Removeredundantandnoisyfeatures§ Notallwordsareuseful
¡ Interpretationandvisualization¡ Easierstorageandprocessingofthedata
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 7
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 8
A[mxn] =U[mxr] S [ rxr] (V[nxr])T
¡ A:Inputdatamatrix§ m xnmatrix(e.g.,m documents,n terms)
¡ U:Leftsingularvectors§ m xrmatrix (m documents,r concepts)
¡ S:Singularvalues§ r xr diagonalmatrix(strengthofeach‘concept’)(r :rankofthematrixA)
¡ V:Rightsingularvectors§ n xrmatrix(n terms,r concepts)
5/2/17
5
9
Am
n
Sm
n
U
VT
»
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org
T
10
Am
n
» +
s1u1v1 s2u2v2
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org
σi … scalarui … vectorvi … vector
T
5/2/17
6
Itisalways possibletodecomposearealmatrixA intoA=US VT ,where
¡ U,S,V:unique¡ U,V:columnorthonormal§ UT U=I;VT V=I (I:identitymatrix)§ (Columnsareorthogonalunitvectors)
¡ S:diagonal§ Entries(singularvalues)arepositive,andsortedindecreasingorder(σ1 ³ σ2 ³ ...³ 0)
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 11
Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf
¡ A=US VT - example:UserstoMovies
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 12
=SciFi
Romnce
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
Sm
n
U
VT
“Concepts” AKA Latent dimensionsAKA Latent factors
5/2/17
7
¡ A=US VT - example:UserstoMovies
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 13
=SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
¡ A=US VT - example:UserstoMovies
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 14
SciFi-conceptRomance-concept
=SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
5/2/17
8
¡ A=US VT - example:
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 15
Romance-concept
U is “user-to-concept” similarity matrix
SciFi-concept
=SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
¡ A=US VT - example:
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 16
SciFi
Romnce
SciFi-concept
“strength” of the SciFi-concept
=SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
5/2/17
9
¡ A=US VT - example:
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 17
SciFi-concept
V is “movie-to-concept”similarity matrix
SciFi-concept
=SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 18
‘movies’,‘users’and‘concepts’:¡ U:user-to-conceptsimilaritymatrix
¡ V:movie-to-conceptsimilaritymatrix
¡ S:itsdiagonalelements:‘strength’ofeachconcept
5/2/17
10
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 20
v1
first right singular vector
Movie 1 rating
Mov
ie 2
ratin
g
¡ Insteadofusingtwocoordinates(𝒙, 𝒚) todescribepointlocations,let’suseonlyonecoordinate 𝒛
¡ Point’spositionisitslocationalongvector𝒗𝟏¡ Howtochoose𝒗𝟏?Minimizereconstructionerror
5/2/17
11
¡ Goal:Minimizethesumofreconstructionerrors:
)) 𝑥+, − 𝑧+,/
0
,12
3
+12§ where𝒙𝒊𝒋 arethe“old”and𝒛𝒊𝒋 arethe“new”coordinates
¡ SVDgives‘best’axistoprojecton:§ ‘best’=minimizingthereconstructionerrors
¡ Inotherwords,minimumreconstructionerror
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 21
v1
first right singularvector
Movie 1 rating
Mov
ie 2
ratin
g
¡ A=US VT- example:§ V: “movie-to-concept”matrix§ U:“user-to-concept”matrix
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 22
v1
first right singularvector
Movie 1 rating
Mov
ie 2
ratin
g
= x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
5/2/17
12
¡ A=US VT- example:
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 23
v1
first right singularvector
Movie 1 rating
Mov
ie 2
ratin
g
variance (‘spread’) on the v1 axis
= x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
A=US VT- example:¡ US: Givesthecoordinatesofthepointsintheprojectionaxis
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 24
v1
first right singularvector
Movie 1 rating
Mov
ie 2
ratin
g
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
1.61 0.19 -0.015.08 0.66 -0.036.82 0.85 -0.058.43 1.04 -0.061.86 -5.60 0.840.86 -6.93 -0.870.86 -2.75 0.41
Projection of users on the “Sci-Fi” axis (U S) T:
5/2/17
13
Moredetails¡ Q: Howexactlyisdim.reductiondone?
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 25
= x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
Moredetails¡ Q: Howexactlyisdim.reductiondone?¡ A:Setsmallestsingularvaluestozero
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 26
= x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
5/2/17
14
Moredetails¡ Q: Howexactlyisdim.reductiondone?¡ A:Setsmallestsingularvaluestozero
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 27
x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
»
Moredetails¡ Q: Howexactlyisdim.reductiondone?¡ A:Setsmallestsingularvaluestozero
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 28
x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
»
5/2/17
15
Moredetails¡ Q: Howexactlyisdim.reductiondone?¡ A:Setsmallestsingularvaluestozero
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 29
» x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.020.41 0.070.55 0.090.68 0.110.15 -0.590.07 -0.730.07 -0.29
12.4 0 0 9.5
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.69
Moredetails¡ Q: Howexactlyisdim.reductiondone?¡ A:Setsmallestsingularvaluestozero
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 30
»
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.92 0.95 0.92 0.01 0.012.91 3.01 2.91 -0.01 -0.013.90 4.04 3.90 0.01 0.014.82 5.00 4.82 0.03 0.030.70 0.53 0.70 4.11 4.11
-0.69 1.34 -0.69 4.78 4.780.32 0.23 0.32 2.01 2.01
Frobenius norm:
ǁMǁF = ÖΣij Mij2 ǁA-BǁF = Ö Σij (Aij-Bij)2
is“small”
5/2/17
16
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 31
A USigma
VT=
B USigma
VT=
B is best approximation of A
¡ Theorem:Let A =US VT and B =US VT whereS = diagonalrxr matrix withsi=σi (i=1…k)elsesi=0thenB isa best rank(B)=k approx.toA
Whatdowemeanby“best”:§ B isasolutiontominB ǁA-BǁF whererank(B)=k
Σ𝜎22
𝜎88
𝐴 − 𝐵 ; = ) 𝐴+, − 𝐵+,/
�
+,
�
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 32
5/2/17
17
¡ Theorem: Let A =US VT (σ1³σ2³…,rank(A)=r)then B =US VT
§ S = diagonalrxr matrix wheresi=σi (i=1…k)elsesi=0isabestrank-k approximationtoA:§ B isasolutiontominB ǁA-BǁF whererank(B)=k
¡ Wewillneed2facts:§ 𝑀 ; = ∑ 𝑞++ /�
+ whereM =P Q R isSVDofM§ US VT- US VT =U (S - S)VT
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 33
Σ𝜎22
𝜎88
Details!
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 34
¡ Wewillneed2facts:§ 𝑀 ; = ∑ 𝑞AA /�
A whereM =P Q R isSVDofM
§ US VT- US VT =U (S - S)VT
We apply:-- P column orthonormal-- R row orthonormal-- Q is diagonal
Details!
5/2/17
18
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 35
¡ A =US VT ,B =US VT (σ1³σ2³…³ 0, rank(A)=r)§ S = diagonalnxn matrixwheresi=σi (i=1…k)elsesi=0then B issolutiontominB ǁA-BǁF ,rank(B)=k
¡ Why?
¡ Wewanttochoosesi tominimize∑ 𝜎+ − 𝑠+ /�+
¡ Solutionistosetsi=σi (i=1…k)andothersi=0
ååå+=+==
=+-=r
kii
r
kii
k
iiis s
i1
2
1
2
1
2)(min sss
å=
=-=-S=-
r
iiisFFkBrankBsSBA
i1
2
)(,)(minminmin s
We used: U S VT - U S VT = U (S - S) VT
Details!
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 36
Equivalent:‘spectraldecomposition’ofthematrix:
= x xu1 u2
σ1
σ2
v1
v2
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
5/2/17
19
Equivalent:‘spectraldecomposition’ofthematrix
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 37
= u1σ1 vT1 u2σ2 vT
2+ +...
n
m
n x 1 1 x m
k terms
Assume: σ1 ³ σ2 ³ σ3 ³ ... ³ 0
Why is setting small σi to 0 the right thing to do?Vectors ui and vi are unit length, so σiscales them.So, zeroing small σi introduces less error.
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 38
Q:Howmanyσs tokeep?A: Rule-of-athumb:keep80-90%of‘energy’ = ∑ 𝝈𝒊𝟐�
𝒊
= u1σ1 vT1 u2σ2 vT
2+ +...n
m
Assume: σ1 ³ σ2 ³ σ3 ³ ...
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
5/2/17
20
¡ TocomputeSVD:§ O(nm2) orO(n2m) (whicheverisless)
¡ But:§ Lesswork,ifwejustwantsingularvalues§ orifwewantfirstk singularvectors§ orifthematrixissparse
¡ Implementedinlinearalgebrapackageslike§ LINPACK,Matlab,SPlus,Mathematica ...
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 39
¡ SVD: A=US VT:unique§ U:user-to-conceptsimilarities§ V:movie-to-conceptsimilarities§ S :strengthofeachconcept
¡ Dimensionalityreduction:§ keepthefewlargestsingularvalues(80-90%of‘energy’)
§ SVD:picksuplinearcorrelations
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 40
5/2/17
21
¡ SVDgivesus:§ A = U S VT
¡ Eigen-decomposition:§ A = X L XT
§ Aissymmetric§ U,V,Xareorthonormal (UTU=I),§ L, S arediagonal
¡ Nowlet’scalculate:§ AAT= US VT(US VT)T = US VT(VSTUT) = USST UT
§ ATA = V ST UT (US VT) = V SST VT
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 41
¡ SVDgivesus:§ A = U S VT
¡ Eigen-decomposition:§ A = X L XT
§ Aissymmetric§ U,V,Xareorthonormal (UTU=I),§ L, S arediagonal
¡ Nowlet’scalculate:§ AAT= US VT(US VT)T = US VT(VSTUT) = USST UT
§ ATA = V ST UT (US VT) = V SST VT
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 42
X L2 XT
X L2 XT
Shows how to computeSVD using eigenvalue
decomposition!
5/2/17
22
¡ A AT =U S2 UT
¡ ATA =V S2 VT
¡ (ATA) k=V S2k VT
§ E.g.:(ATA)2=V S2 VTV S2 VT=V S4 VT
¡ (ATA) k~v1 σ12k v1T fork>>1
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 43
5/2/17
23
¡ Q:Findusersthatlike‘Matrix’¡ A:Mapqueryintoa‘conceptspace’– how?
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 45
=SciFi
Romnce
x x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
¡ Q:Findusersthatlike‘Matrix’¡ A:Mapqueryintoa‘conceptspace’– how?
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 46
5 0 0 0 0
q =
Matrix
Alie
n
v1
q
v2
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Project into concept space:Inner product with each ‘concept’ vector vi
5/2/17
24
¡ Q:Findusersthatlike‘Matrix’¡ A:Mapqueryintoa‘conceptspace’– how?
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 47
v1
q
q*v1
5 0 0 0 0
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
v2
MatrixA
lien
q =
Project into concept space:Inner product with each ‘concept’ vector vi
Compactly,wehave:qconcept =qV
E.g.:
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 48
movie-to-conceptsimilarities (V)
=
SciFi-concept
5 0 0 0 0
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
q =
0.56 0.120.59 -0.020.56 0.120.09 -0.690.09 -0.69
x 2.8 0.6
5/2/17
25
¡ Howwouldtheuserd thatrated(‘Alien’,‘Serenity’)behandled?dconcept =dV
E.g.:
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 49
movie-to-conceptsimilarities (V)
=
SciFi-concept
0 4 5 0 0
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
q =
0.56 0.120.59 -0.020.56 0.120.09 -0.690.09 -0.69
x 5.2 0.4
¡ Observation: Userd thatrated(‘Alien’,‘Serenity’)willbesimilar touserq thatrated(‘Matrix’),althoughd andq havezeroratingsincommon!
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 50
0 4 5 0 0
d =
SciFi-concept
5 0 0 0 0
q =
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Zero ratings in common Similarity ≠ 0
2.8 0.6
5.2 0.4
5/2/17
26
+ Optimallow-rankapproximationintermsofFrobenius norm
- Interpretabilityproblem:§ Asingularvectorspecifiesalinearcombinationofallinputcolumnsorrows
- Lackofsparsity:§ Singularvectorsaredense!
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 51
=U
SVT
5/2/17
27
¡ Goal:ExpressAasaproductofmatricesC,U,RMakeǁA-C·U·RǁF small
¡ “Constraints”onCandR:
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 53
A C U R
Frobenius norm:
ǁXǁF = Ö Σij Xij2
¡ Goal:ExpressAasaproductofmatricesC,U,RMakeǁA-C·U·RǁF small
¡ “Constraints”onCandR:
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 54
Pseudo-inverse of the intersection of C and R
A C U R
Frobenius norm:
ǁXǁF = Ö Σij Xij2
5/2/17
28
¡ Let:Ak bethe“best”rankk approximationtoA (thatis,Ak isSVDofA)
Theorem [Drineas etal.]CUR inO(m·n)timeachieves§ ǁA-CURǁF £ ǁA-AkǁF +eǁAǁFwithprobabilityatleast1-d,bypicking§ O(klog(1/d)/e2) columns,and§ O(k2log3(1/d)/e6) rows
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 55
In practice:Pick 4k cols/rows
¡ Samplingcolumns(similarlyforrows):
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 56
Note this is a randomized algorithm, same column can be sampled more than once
5/2/17
29
¡ LetW bethe“intersection”ofsampledcolumnsC androwsR§ LetSVDofW= XZ YT
¡ Then: U =Y (Z+)2XT
§ Z+:reciprocalsofnon-zerosingularvalues: Z+
ii =1/ Zii
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 57
AC
R W
»
¡ Forexample:
§ Select𝒄 = 𝑶 𝒌 𝒍𝒐𝒈 𝒌𝜺𝟐
columnsofAusingColumnSelect algorithm
§ Select𝒓 = 𝑶 𝒌 𝒍𝒐𝒈 𝒌𝜺𝟐
rowsofAusingColumnSelect algorithm
§ Set𝑼 = 𝑾P
¡ Then:withprobability98%
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 58
In practice:Pick 4k cols/rowsfor a “rank-k” approximation
SVD errorCUR error
5/2/17
30
+ Easyinterpretation• Sincethebasisvectorsareactualcolumnsandrows
+ Sparsebasis• Sincethebasisvectorsareactualcolumnsandrows
- Duplicatecolumnsandrows• Columnsoflargenormswillbesampledmanytimes
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 59
Singular vectorActual column
¡ Ifwewanttogetridoftheduplicates:§ Throwthemaway§ Scale(multiply)thecolumns/rowsbythesquarerootofthenumberofduplicates
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 60
ACd
Rd
Cs
Rs
Construct a small U
5/2/17
31
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 61
SVD: A = U S VT
Huge but sparse Big and dense
CUR: A = C U RHuge but sparse Big but sparse
dense but small
sparse and small
¡ DBLPbibliographicdata§ Author-to-conferencebigsparsematrix§ Aij:Numberofpaperspublishedbyauthori atconferencej
§ 428Kauthors(rows),3659conferences(columns)§ Verysparse
¡ Wanttoreducedimensionality§ Howmuchtimedoesittake?§ Whatisthereconstructionerror?§ Howmuchspacedoweneed?
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 62
5/2/17
32
¡ Accuracy:§ 1– relativesumsquarederrors
¡ Spaceratio:§ #outputmatrixentries/#inputmatrixentries
¡ CPUtime
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 63
SVDCURCUR no duplicates
SVDCURCUR no dup
Sun, Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM ’07.
CUR
SVD
¡ SVDislimitedtolinearprojections:§ Lower-dimensionallinearprojectionthatpreservesEuclideandistances
¡ Non-linearmethods:Isomap§ Dataliesonanonlinearlow-dimcurveakamanifold
§ Usethedistanceasmeasuredalongthemanifold
§ How?§ Buildadjacencygraph§ Geodesicdistanceisgraphdistance
§ SVD/PCAthegraphpairwise distancematrix
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 64
5/2/17
33
¡ Drineas etal.,FastMonteCarloAlgorithmsforMatricesIII:ComputingaCompressedApproximateMatrixDecomposition,SIAMJournalonComputing,2006.
¡ J.Sun,Y.Xie,H.Zhang,C.Faloutsos:LessisMore:CompactMatrixDecompositionforLargeSparseGraphs,SDM2007
¡ Intra- andinterpopulation genotypereconstructionfromtaggingSNPs,P.Paschou,M.W.Mahoney,A.Javed,J.R.Kidd,A.J.Pakstis,S.Gu,K.K.Kidd,andP.Drineas,GenomeResearch,17(1),96-107(2007)
¡ Tensor-CURDecompositionsForTensor-BasedData,M.W.Mahoney,M.Maggioni,andP.Drineas,Proc.12-thAnnualSIGKDD,327-336(2006)
J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,http://www.mmds.org 65