social/collaborative filteringwcohen/10-601/cf.pdf · • rich dad’s guide to investing: what the...
TRANSCRIPT
Social/Collaborative Filtering
Outline
• Recap • SVD vs PCA • Collaborative <iltering
– aka Social recommendation – k-NN CF methods – classi<ication
• CF via MF • MF vs SGD vs ….
Dimensionality Reduction�and Principle Components
Analysis: Recap
3
More cartoons
4
PCA as matrices: the optimization problem
PCA as matrices
10,000 pixels
1000
imag
es
1000 * 10,000,00
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 … … …
… …
vij
…
vnm
~
V[i,j] = pixel j in image i
2 mixing wts
PC1
PC2
Poll • True or false: the weights for an example
should add up to 1. • True or false: the weights for a prototype
should add up to 1
Implementing PCA
8
Also:theyareorthogonaltoeachother!
PCA FOR MODELING TEXT�(SVD = SINGULAR VALUE DECOMPOSITION)
9
A Scalability Problem with PCA • Covariance matrix is large in high dimensions
– With d features covariance matrix is d*d • SVD is a closely-related method that can be
implemented more ef<iciently in high dimensions – Don’t explicitly compute covariance matrix – Instead write docterm matrix X as X=USVT
– S is k*k where k<<d – S is diagonal and S[i,i]=sqrt(λi) the i-th eigenvec – Columns of V ~= principle components – Rows of US ~= embedding for examples
10
Recovering latent factors in a matrix
mterms
ndo
cumen
ts
doctermmatrix
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
Docterm[i,j]=TFIDFscoreoftermjindoci
11
UVT
s1 0
0 s2
S
SVD example
12
• The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common Sense Investing: The Only
Way to Guarantee Your Fair Share of Stock Market Returns
• The Little Book of Value Investing • Value Investing: From Graham to Buffett and Beyond • Rich Dad’s Guide to Investing: What the Rich Invest in,
That the Poor and the Middle Class Do Not! • Investing in Real Estate, 5th Edition • Stock Investing For Dummies • Rich Dad’s Advisors: The ABC’s of Real Estate
Investing: The Secrets of Finding Hidden Profits Most Investors Miss
https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/
https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/
=
InvesKngforrealestate
RichDad’sAdvisor’s:TheABCsofReal
EstateInvestment…
TheliQlebookofcommon
senseinvesKng:…
NeatestLiQleGuidetoStock
MarketInvesKng
My recap: SVD vs PCA
• Very closely related methods • As described here
– SVD decomp doesn’t require a square matrix – PCA decomp is always applied to CX which is
square and mean-centered – You can implement PCA using SVD as a substep
• People sometimes use the terms interchangeably
Outline
• What is CF? • Nearest-neighbor methods for CF
– One old-school paper: BellCore’s movie recommender
– Some general discussion • CF reduced to classi<ication • CF reduced to matrix factoring • Other uses of matrix factoring in ML
WHAT IS COLLABORATIVE FILTERING?� AKAsocialfiltering,recommendaKonsystems,…
What is collaborative <iltering?
What is collaborative <iltering?
What is collaborative <iltering?
What is collaborative <iltering?
What is collaborative <iltering?
Other examples of social <iltering….
Other examples of social <iltering….
Other examples of social <iltering….
Other examples of social <iltering….
Everyday Examples of Collaborative Filtering...
• Bestseller lists • Top 40 music lists • The “recent returns” shelf at the library • Unmarked but well-used paths thru the woods • The printer room at work • “Read any good books lately?” • .... • Common insight: personal tastes are
correlated: – If Alice and Bob both like X and Alice likes Y then
Bob is more likely to like Y – especially (perhaps) if Bob knows Alice
SOCIAL/COLLABORATIVE FILTERING: NEAREST-NEIGHBOR METHODS
BellCore’s MovieRecommender
• Recommending And Evaluating Choices In A Virtual Community Of Use. Will Hill, Larry Stead, Mark Rosenstein and George Furnas, Bellcore; CHI 1995
Byvirtualcommunitywemean"agroupofpeoplewhosharecharacterisKcsandinteractinessenceoreffectonly".Inotherwords,peopleinaVirtualCommunityinfluenceeachotherasthoughtheyinteractedbuttheydonotinteract.Thusweask:"IsitpossibletoarrangeforpeopletosharesomeofthepersonalizedinformaKonalbenefitsofcommunityinvolvementwithouttheassociatedcommunicaKonscosts?"
MovieRecommender Goals Recommendations should: • simultaneously ease and encourage rather than replace
social processes....should make it easy to participate while leaving in hooks for people to pursue more personal relationships if they wish.
• be for sets of people not just individuals...multi-person recommending is often important, for example, when two or more people want to choose a video to watch together.
• be from people not a black box machine or so-called "agent".
• tell how much con<idence to place in them, in other words they should include indications of how accurate they are.
BellCore’s MovieRecommender • Participants sent email to [email protected] • System replied with a list of 500 movies to rate on
a 1-10 scale (250 random, 250 popular) – Only subset need to be rated
• New participant P sends in rated movies via email • System compares ratings for P to ratings of (a
random sample of) previous users • Most similar users are used to predict scores for
unrated movies (more later) • System returns recommendations in an email
message.
SuggestedVideosfor:JohnA.Jamus.
Yourmust-seelistwithpredictedraKngs:
• 7.0"Alien(1979)"
• 6.5"BladeRunner"
• 6.2"CloseEncountersOfTheThirdKind(1977)"
YourvideocategorieswithaverageraKngs:
• 6.7"AcKon/Adventure"
• 6.5"ScienceFicKon/Fantasy"
• 6.3"Children/Family"
• 6.0"Mystery/Suspense"
• 5.9"Comedy"
• 5.8"Drama"
TheviewingpaQernsof243viewerswereconsulted.PaQernsof7viewerswerefoundtobemostsimilar.CorrelaKonwithtargetviewer:
• 0.59viewer-130([email protected])
• 0.55bullert,janer([email protected])
• 0.51jan_arst([email protected])
• 0.46KenCross([email protected])
• 0.42rskt([email protected])
• 0.41kkgg([email protected])
• 0.41bnn([email protected])
Bycategory,theirjointraKngsrecommend:
• AcKon/Adventure:• "Excalibur"8.0,4viewers• "ApocalypseNow"7.2,4viewers• "Platoon"8.3,3viewers
• ScienceFicKon/Fantasy:
• "TotalRecall"7.2,5viewers• Children/Family:
• "WizardOfOz,The"8.5,4viewers
• "MaryPoppins"7.7,3viewers
Mystery/Suspense:• "SilenceOfTheLambs,The"9.3,3viewers
Comedy:• "NaKonalLampoon'sAnimalHouse"7.5,4viewers• "DrivingMissDaisy"7.5,4viewers• "HannahandHerSisters"8.0,3viewers
Drama:• "It'sAWonderfulLife"8.0,5viewers• "DeadPoetsSociety"7.0,5viewers• "RainMan"7.5,4viewers
CorrelaKonofpredictedraKngswithyouractualraKngsis:0.64Thisnumbermeasuresabilitytoevaluatemoviesaccuratelyforyou.0.15meanslowability.0.85meansverygoodability.0.50meansfairability.
BellCore’s MovieRecommender
• Evaluation: – Withhold 10% of the ratings of each user to use
as a test set – Measure correlation between predicted ratings
and actual ratings for test-set movie/user pairs
BellCore’s MovieRecommender
• Participants sent email to [email protected] • System replied with a list of 500 movies to rate
New participant P sends in rated movies via email
• System compares ratings for P to ratings of (a random sample of) previous users
• Most similar users are used to predict scores for unrated movies – Empirical Analysis of Predictive Algorithms for
Collaborative Filtering Breese, Heckerman, Kadie, UAI98
• System returns recommendations in an email message.
recap: k-nearest neighbor learning
?
Givenatestexamplex:1. Findthektraining-set
examples(x1,y1),….,(xk,yk)thatareclosesttox.
2. Predictthemostfrequentlabelinthatset.
?
Breaking it down:
• To train: – save the data
• To test: – For each test example x: 1. Findthektraining-setexamples(x1,y1),….,(xk,yk)thatareclosesttox.
2. Predictthemostfrequentlabelinthatset.
Veryfast!
PredicKonisrelaKvelyslow…
...butitdoesn’tdependonthenumberofclasses,onlythenumberofneighbors
recap: k-nearest neighbor learning
?
Givenatestexamplex:1. Findthektraining-set
examples(x1,y1),….,(xk,yk)thatareclosesttox.
2. Predictthemostfrequentlabelinthatset.
?
Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98)
• vi,j = vote of user i on item j • Ii = items for which user i has voted • Mean vote for i is
• Predicted vote for “active user” a for j is a weighted sum
• weight is based on similarity between user a and i
weightsofnsimilarusersnormalizer
Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98)
• K-nearest neighbor
• Pearson correlation coef<icient (Resnick ’94, Grouplens):
• Cosine distance, etc, …
⎩⎨⎧ ∈
=else0
)neighbors( if1),(
aiiaw
SOCIAL/COLLABORATIVE FILTERING: TRADITIONAL CLASSIFICATION
What are other ways to formulate the collaborative <iltering problem?
• Treat it like ordinary classi<ication or regression
Collaborative + Content Filtering (Basu et al, AAAI98; Condliff et al, AI-STATS99)
Airplane Matrix Room with a View
... Hidalgo
comedy, $2M
action,$70M
romance,$25M
... action,$30M
Joe 27,M,$70k
9 7 2 7
Carol 53,F, $20k
8 9
...
Kumar 25,M,$22k
9 3 6
Ua 48,M,$81k
4 7 ? ? ?
Collaborative + Content Filtering As Classification (Basu, Hirsh, Cohen, AAAI98)
Airplane Matrix Room with a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua 48,M,81k 0 1 ? ? ?
Classification task: map (user,movie) pair into {likes,dislikes}
Training data: known likes/dislikes
Test data: active users Features: any properties of user/movie pair
Collaborative + Content Filtering As Classification (Basu et al, AAAI98)
Airplane Matrix Room with a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua 48,M,81k 0 1 ? ? ?
Features: any properties of user/movie pair (U,M)
Examples: genre(U,M), age(U,M), income(U,M),...
• genre(Carol,Matrix) = action
• income(Kumar,Hidalgo) = 22k/year
Collaborative + Content Filtering As Classification (Basu et al, AAAI98)
Airplane Matrix Room with a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua 48,M,81k 0 1 ? ? ?
Features: any properties of user/movie pair (U,M)
Examples: usersWhoLikedMovie(U,M):
• usersWhoLikedMovie(Carol,Hidalgo) = {Joe,...,Kumar}
• usersWhoLikedMovie(Ua, Matrix) = {Joe,...}
Collaborative + Content Filtering As Classification (Basu et al, AAAI98)
Airplane Matrix Room with a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua 48,M,81k 0 1 ? ? ?
Features: any properties of user/movie pair (U,M)
Examples: moviesLikedByUser(M,U):
• moviesLikedByUser(*,Joe) = {Airplane,Matrix,...,Hidalgo}
• actionMoviesLikedByUser(*,Joe)={Matrix,Hidalgo}
Collaborative + Content Filtering As Classification (Basu et al, AAAI98)
Airplane Matrix Room with a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua 48,M,81k 1 1 ? ? ?
Features: any properties of user/movie pair (U,M)
genre={romance}, age=48, sex=male, income=81k, usersWhoLikedMovie={Carol}, moviesLikedByUser={Matrix,Airplane}, ...
Collaborative + Content Filtering As Classification (Basu et al, AAAI98)
Airplane Matrix Room with a View
... Hidalgo
comedy action romance ... action
Joe 27,M,70k 1 1 0 1
Carol 53,F,20k 1 1 0
...
Kumar 25,M,22k 1 0 0 1
Ua 48,M,81k 1 1 ? ? ?
genre={romance}, age=48, sex=male, income=81k, usersWhoLikedMovie={Carol}, moviesLikedByUser={Matrix,Airplane}, ...
genre={action}, age=48, sex=male, income=81k, usersWhoLikedMovie = {Joe,Kumar}, moviesLikedByUser={Matrix,Airplane},...
Collaborative + Content Filtering As Classification (Basu et al, AAAI98)
• Classification learning algorithm: rule learning (RIPPER)
• If NakedGun33/13 moviesLikedByUser and Joe usersWhoLikedMovie and genre=comedy then predict likes(U,M)
• If age>12 and age<17 and HolyGrail moviesLikedByUser and director=MelBrooks then predict likes(U,M)
• If Ishtar moviesLikedByUser then predict likes(U,M)
∈
∈
∈
∈
Basu et al 98 - results • Evaluation:
– Predict liked(U,M)=“M in top quartile of U’s ranking” from features, evaluate recall and precision
– Features: • Collaborative: UsersWhoLikedMovie,
UsersWhoDislikedMovie, MoviesLikedByUser • Content: Actors, Directors, Genre, MPAA rating, ... • Hybrid: ComediesLikedByUser, DramasLikedByUser,
UsersWhoLikedFewDramas, ... • Results: at same level of recall (about 33%)
– Ripper with collaborative features only is worse than the original MovieRecommender (by about 5 pts precision – 73 vs 78)
– Ripper with hybrid features is better than MovieRecommender (by about 5 pts precision)
Matrix Factorization for Collaborative Filtering
Recovering latent factors in a matrix
m movies
v11 …
… …
vij
…
vnm
V[i,j] = user i’s rating of movie j
n us
ers
Recovering latent factors in a matrix
m movies
n us
ers
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
talk pilfered from à …..
KDD 2011
Recovering latent factors in a matrix
m movies
n us
ers
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
r
W
H
V
Recovering latent factors in a matrix
m movies
n us
ers
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
r
W
H
V
… is like Linear Regression ….
m=1 regressors
n in
stan
ces
(e.g
., 15
0)
predictions
pl1 pw1 sl1 sw1
pl2 pw2 sl2 sw2
.. ..
… …
pln pwn
w1
w2
w3
w4
y1
…
yi
yn
~
Y[i,1] = instance i’s prediction
W
H
Y
r features (eg 4)
.. for many outputs at once….
m regressors
n in
stan
ces
(e.g
., 15
0)
predictions
pl1 pw1 sl1 sw1
pl2 pw2 sl2 sw2
.. ..
… …
pln …
w11 w12
w21 ..
w31 ..
w41 .. …
y11 y12
…
yn1
~
Y[I,j] = instance i’s prediction for
regression task j
W
H
Y
ym
…
ynm
r features (eg 4)
… where we also have to <ind the dataset!
Matrix factorization as SGD
step size
Matrix factorization as SGD - why does this work?
step size
Matrix factorization as SGD - why does this work? Here’s the key claim:
Checking the claim
Think for SGD for logistic regression • LR loss = compare y and ŷ = dot(w,x) • similar but now update w (user weights) and x (movie weight)
What loss functions are possible?
“generalized” KL-divergence
What loss functions are possible?
What loss functions are possible?
ALS = alternating least squares
Matrix Multiplications in Machine Learning:�
MF vs PCA vs SGD vs ….
Recovering latent factors in a matrix
m movies
n us
ers
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
r
W
H
V
….. vs k-means (1)
cluster means
n ex
ampl
es
0 1
1 0
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
original data set indicators for r
clusters
Z
M
X
Matrix multiplication - 1 (Gram matrix)
transpose of X
x1 y1
x2 y2
.. ..
… …
xn yn
x1 x2 .. … xn
y1 y2 … yn <x1,x1> …
… …
…
<xi,xj> …
…
<xn,xn>
~
V[i,j] = inner product of instances i and j (Gram matrix)
X
XT
V
n in
stan
ces
(e.g
., 15
0)
r features (eg 2)
Matrix multiplication - 2 �(Covariance matrix)
transpose of X
a1 a2 … am
b1 b2 … bm
v11 v12
v21 v22
~ XT x1 y1
x2 y2
.. ..
… …
xn yn
X
assuming mean(X)=0
CX (i, j) = xit x j
t
t=1
n
∑ = n ⋅cov(i, j)
r features (eg 2)
CX
Matrix multiplication - 2 (PCA)
V = CX = XT X …variance/covariances E = eigenvectors(V) X E T = PCA(X) = Z Z(i,j) is similarity of example i to eigenvector j
x1 y1
x2 y2
.. ..
… …
xn yn
X ET
e11 e12
e21 e22
z1 z1
z2 z2
.. ..
… …
zn zn
Z
Eigenvecs of CX
I think of these as fixed point of a process where we predict each feature value from the others and CX
Matrix multiplication - 2 (PCA)
V = CX = XT X …variance/covariances E = eigenvectors(V) X E T = PCA(X) = ZK … or use E(1:K, :) instead of E
x1 y1
x2 y2
.. ..
… …
xn yn
X EKT
e11
e21
z1
z2
..
…
zn
ZK
top K eigenvecs of CX
Matrix multiplication - 3 (SVD)
V = CX = XT X …variance/covariances E = eigenvectors(V) X E T = PCA(X) = Z
X = Z E = Z Σ-1 Σ E = U Σ E … where U = Z Σ-1 … i.e., factored version of X Usually written as X = U Σ V
x1 y1
x2 y2
.. ..
… …
xn yn
X E
e11 e12
e21 e22
z1 z1
z2 z2
.. ..
… …
zn zn
Z
Eigenvecs of CX
Matrix multiplication - 3 (SVD)
V = CX = XT X …variance/covariances E = eigenvectors(V) X EK T = PCA(X) = ZK … EK = E(1:K, :) instead of E
X ≈ ZK EK ≈ ZK Σ-1 Σ E ≈ UK Σ E … where UK = ZK Σ-1 … i.e., factored version of X
x1 y1
x2 y2
.. ..
… …
xn yn
X E
e11 e12
e21 e22
z1
z2
..
…
zn
UK
Eigenvecs of CX Σ
Σ1
Matrix multiplication - 2 (SVD)
eigenvectors of CX
x1 y1
x2 y2
.. ..
… …
xn yn
x1 x2 .. … xn
y1 y2 … yn <x1,x1> …
… …
…
<xi,xj> …
…
<xn,xn>
~
original matrix
UK
E
X
n in
stan
ces
K features with zero covar and unit
variances Σ
Σ1 0
0 Σ2
Recovering latent factors in a matrix
m movies
n us
ers
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
r
W
H
V