slides lec 11
TRANSCRIPT
Lecture11:RecommenderSystems
SanjeevArora EladHazan
COS402– MachineLearningand
ArtificialIntelligenceFall2016
(BorrowsfromslidesofD.Jurafsky StanfordU.)10/20/16 1
Admin
• Exercise5(written),nextTue,inclass• MidtermnextThu• Surveyresults,analysisandfollow-up
10/20/16 2
• Learningfromexamples• Movie/philosophyofAI.Dr.SingeronGoogleML.• Language
• Probabilisticmodeloflanguage• Semanticsviawordembedding• Today:recommendersystems
• Knowledgerepresentation• Reinforcementlearning
Recap
10/20/16 3
RecommenderSystems
• CustomerX• BuysMetallicaCD• BuysMegadeth CD
• CustomerY• DoessearchonMetallica• RecommendersystemsuggestsMegadeth fromdatacollectedaboutcustomerX10/20/16 4
Recommendations
10/20/16 5
Items
Search Recommendations
Products, web sites, blogs, news items, …
Examples:
FromScarcitytoAbundance
• Shelfspaceisascarcecommodityfortraditionalretailers
• Also:TVnetworks,movietheaters,…
• Webenablesnear-zero-costdisseminationofinformationaboutproducts
• Fromscarcitytoabundance
• Morechoicenecessitatesbetterfilters• Recommendationengines• HowIntoThinAirmadeTouchingtheVoidabestseller:http://www.wired.com/wired/archive/12.10/tail.html
10/20/16 6
TypesofRecommendations
• Editorialandhandcurated• Listoffavorites• Listsof“essential”items
• Simpleaggregates• Top10,MostPopular,RecentUploads
• Tailoredtoindividualusers• Amazon,Netflix,…
10/20/16 9
Todayclass
FormalModel
•X =setofCustomers• S =setofItems
•Utilityfunction u:X × Sà R• R =setofratings• R isatotallyorderedset• e.g.,0-5 stars,realnumberin[0,1]
10/20/16 10
KeyProblems
• (1) Gathering“known”ratingsformatrix• Howtocollectthedataintheutilitymatrix
• (2) Extrapolateunknownratingsfromknownones– MAINLEARNINGPROBLEM
• Mainlyinterestedinhighunknownratings
• (3) Evaluatingextrapolationmethods• Howtomeasuresuccess/performanceofrecommendationmethods
10/20/16 12
(1)GatheringRatings
• Explicit• Askpeopletorateitems• Doesn’tworkwellinpractice– peoplecan’tbebothered
• Crowdsourcing:Paypeopletolabelitems
• Implicit• Learnratingsfromuseractions
• E.g.,purchaseimplieshighrating
• Whataboutlowratings?10/20/16 13
(2)ExtrapolatingUtilities
• Keyproblem: UtilitymatrixU issparse• Mostpeoplehavenotratedmostitems• Coldstart:
• Newitemshavenoratings• Newusershavenohistory
• Threeapproachestorecommendersystems:1. Content-based2. Collaborative3. Latentfactorbased
10/20/16 14
Today!
Content-basedRecommendations
•Mainidea: Recommenditemstocustomerx similartopreviousitemsratedhighlybyx
Example:•Movierecommendations
• Recommendmovieswithsameactor(s),director,genre,…
•Websites,blogs,news• Recommendothersiteswith“similar”content
10/20/16 16
ItemProfiles
• Foreachitem,createanitemprofile• Profileisaset(vector)offeatures
• Movies: author,genre,director,actors,year…• Text: Setof“important”wordsindocument
• Howtopickimportantfeatures?• TF-IDF (Termfrequency*InverseDocFrequency)
JohnnyDepp
MovieX
MovieY
0 1 1 0 1 1 0 1 3
1 1 0 1 0 1 1 0 4
ActorA
ActorB …
AvgRating
PirateGenre
SpyGenre
ComicGenre
UserProfiles
• Wantavectorwiththesamecomponents/dimensionsasitems• Couldbe1srepresentinguserpurchases• Orarbitrarynumbersfromarating
• Userprofileisaggregateofitems:• Average(weighted?)ofrateditemprofiles
10/20/16 19
NataliePortman
UserU 0.2 .005 0 0 …
ActorA
ActorB …
Prediction
•Useranditemvectorshavethesamecomponents/dimensions,recommendtheitemswhosevectorsaremostsimilartotheuservector!
•Givenuserprofilex anditemprofilei,• estimate𝑢 𝒙, 𝒊 = cos(𝒙, 𝒊) = 𝒙·𝒊
| 𝒙 |⋅| 𝒊 |
10/20/16 20
Pros
• +:Noneedfordataonotherusers• Nocold-startorsparsity problems
• +:Abletorecommendtouserswithuniquetastes
• +:Abletorecommendnew&unpopularitems
• Nofirst-raterproblem• +:Abletoprovideexplanations
• Canprovideexplanationsofrecommendeditemsbylistingcontent-featuresthatcausedanitemtoberecommended
10/20/16 21
• –:Findingtheappropriatefeaturesishard
• E.g.,images,movies,music• –:Recommendationsfornewusers
• Howtobuildauserprofile?• –:Overspecialization
• Neverrecommendsitemsoutsideuser’scontentprofile
• Peoplemighthavemultipleinterests• Unabletoexploitqualityjudgmentsofotherusers
Cons
CollaborativeFiltering
• Consideruserx
• FindsetN ofotheruserswhoseratingsare“similar”tox’sratings
• Estimatex’sratingsbasedonratingsofusersinN
10/20/16 23
x
N
movies
users
A methodological learning-based approach
1 00 1
0 1 101
0 0 01 1 1
0 0
Howmanyfactorsdeterminepreference?
Thelow-rankassumption
1 0
0 1
0 1 1
0
1
0 0 0
1 1 1
0 0
⇡
⇥
“Preferenceisdeterminedbykfactors”usuallyk={5,…,10}
n
m
n
mk
k
Foreveryentryinthepreferencematrix
𝑀01 = 𝑣0 ⋅ 𝑢1 = 3 𝑣0 𝑡 𝑢1(𝑡)567589
Where𝑣0,𝑢1 ∈ 𝑅9
Example– rank1anditsbenefits
⇡
⇥
Afterobserving(m+n)entries– cancomputetheentirematrix!
n
m
n
mK=1
K=1
Foreveryentryinthepreferencematrix𝑀01 = 𝑣0 ⋅ 𝑢1
Where𝑣0,𝑢1 ∈ 𝑅arescalars.
Howmanyunknowns? Howmanyobservationsareneeded tocompletethematrix?
(food forthought: relatetostatisticallearningtheory– samplecomplexity? )
Thematrixcompletionapproach
1 0
0 1
0 1 1
0
1
0 0 0
1 1 1
0 0
⇡
⇥
n
m
n
mk
k
Solveforu,v:
min?@,AB
3 𝑀01 − 𝑣0 ⋅ 𝑢1D
0,18EFGH?GI
Where𝑣0,𝑢1 ∈ 𝑅9
Totalofk(m+n) variables.
Analgorithmforpredictingrecommendations
Input:observationsofpreferences𝑀01 for{ 𝑖7,𝑗7 , 𝑖D, 𝑗D ,… , (𝑖N, 𝑗N)} (mnumbers intherange[0,1])
Output:AmatrixM ∈ 𝑅AFGHF×N8?0GF thathasallpredictedpreferences
Assumption: thereexistlowdimensionalvectors{vS, uU} suchthatMSU = vS ⋅ 𝑢1
Algorithm: Gradientdescent!Objectivefunction:
𝑓 {𝑢, 𝑣} = 3 𝑀01 − 𝑣0 ⋅ 𝑢1D
0,18EFGH?GI
Whatdowedowiththevectors?
SpellingoutGDinthiscase
GDformatrixcompletion:
• InitializeuS, vU randomly• Foriteration=1,2,…do:
• Update𝑣 ← 𝑣 − 𝜂 YY?𝑓 {𝑣0,𝑢1} forallvectors 𝑣0,𝑢1
spelling itout, foreachcoordinatetofvector𝑣0,update:
YY?Z 5
𝑓 = 2 ∑ (𝑀]1 −1 𝑣] ⋅ 𝑢1 ) ⋅ 𝑢1(𝑡) Thus,
∀𝑎, 𝑡:𝑣] 𝑡 ← 𝑣] 𝑡 − 𝜂 ⋅ 23(𝑀]1 −1
𝑣] ⋅ 𝑢1) ⋅ 𝑢1(𝑡)
• Ifneeded,normalizeeachvector,𝑣 ← ?abc{7, ? }
• EndFor• Returnfinal(oraverageof lastfew)vectorsolutions
𝑓 {𝑢, 𝑣} = 3 𝑀01 − 𝑣0 ⋅ 𝑢1D
0,18EFGH?GI
1 0
0 1
0 1 1
0
1
0 0 0
1 1 1
0 0
⇡
⇥
n
m
n
mk
k
Predictingmeta-datafromrec.data
Personvector!
Movie/songvector!
Predictingmeta-datafromrec.data[EstherRolf’15]
10011101
Gender?Annual income?WillbuyproductX?
1 00 1
0 1 101
0 0 01 1 1
0 0
users
movies
Implicationstouserprivacy,security,…
EvaluatingPredictions
• Comparepredictionswithknownratings• Root-mean-squareerror (RMSE)
• ∑ 𝑟e0 − 𝑟e0∗D
e0 where𝒓𝒙𝒊 ispredicted,𝒓𝒙𝒊∗ isthetrueratingofx oni
• Narrowfocusonaccuracysometimesmissesthepoint
• PredictionDiversity• PredictionContext• Orderofpredictions
• Inpractice,wecareonlytopredicthighratings:• RMSEmightpenalizeamethodthatdoeswellforhighratingsandbadlyforothers
10/20/16 35
FamousHistoricalExample:TheNetflixPrize
• Trainingdata• 100millionratings,480,000users,17,770movies• 6yearsofdata:2000-2005
• Testdata• Lastfewratingsofeachuser(2.8million)• Evaluationcriterion:rootmeansquarederror(RMSE)• NetflixCinematch RMSE:0.9514
• Competition• 2700+teams• $1millionprizefor10%improvementonCinematch• BellKor systemwonin2009.Combinedmanyfactors
• Overalldeviationsofusers/movies• Regionaleffects• Localcollaborativefilteringpatterns• Temporalbiases
10/20/16 36