slides lec 11

Lecture11:RecommenderSystems

SanjeevArora EladHazan

COS402– MachineLearningand

ArtificialIntelligenceFall2016

(BorrowsfromslidesofD.Jurafsky StanfordU.)10/20/16 1

Admin

• Exercise5(written),nextTue,inclass• MidtermnextThu• Surveyresults,analysisandfollow-up

10/20/16 2

• Learningfromexamples• Movie/philosophyofAI.Dr.SingeronGoogleML.• Language

• Probabilisticmodeloflanguage• Semanticsviawordembedding• Today:recommendersystems

• Knowledgerepresentation• Reinforcementlearning

Recap

10/20/16 3

RecommenderSystems

• CustomerX• BuysMetallicaCD• BuysMegadeth CD

• CustomerY• DoessearchonMetallica• RecommendersystemsuggestsMegadeth fromdatacollectedaboutcustomerX10/20/16 4

Recommendations

10/20/16 5

Items

Search Recommendations

Products, web sites, blogs, news items, …

Examples:

FromScarcitytoAbundance

• Shelfspaceisascarcecommodityfortraditionalretailers

• Also:TVnetworks,movietheaters,…

• Webenablesnear-zero-costdisseminationofinformationaboutproducts

• Fromscarcitytoabundance

• Morechoicenecessitatesbetterfilters• Recommendationengines• HowIntoThinAirmadeTouchingtheVoidabestseller:http://www.wired.com/wired/archive/12.10/tail.html

10/20/16 6

10/20/16 7

Sidenote:TheLongTail

10/20/16 8Source: Chris Anderson (2004)

TypesofRecommendations

• Editorialandhandcurated• Listoffavorites• Listsof“essential”items

• Simpleaggregates• Top10,MostPopular,RecentUploads

• Tailoredtoindividualusers• Amazon,Netflix,…

10/20/16 9

Todayclass

FormalModel

•X =setofCustomers• S =setofItems

•Utilityfunction u:X × Sà R• R =setofratings• R isatotallyorderedset• e.g.,0-5 stars,realnumberin[0,1]

10/20/16 10

UtilityMatrix

0.410.2

0.30.50.21

10/20/16 11

Avatar LOTR Matrix Pirates

Alice

Bob

Carol

David

KeyProblems

• (1) Gathering“known”ratingsformatrix• Howtocollectthedataintheutilitymatrix

• (2) Extrapolateunknownratingsfromknownones– MAINLEARNINGPROBLEM

• Mainlyinterestedinhighunknownratings

• (3) Evaluatingextrapolationmethods• Howtomeasuresuccess/performanceofrecommendationmethods

10/20/16 12

(1)GatheringRatings

• Explicit• Askpeopletorateitems• Doesn’tworkwellinpractice– peoplecan’tbebothered

• Crowdsourcing:Paypeopletolabelitems

• Implicit• Learnratingsfromuseractions

• E.g.,purchaseimplieshighrating

• Whataboutlowratings?10/20/16 13

(2)ExtrapolatingUtilities

• Keyproblem: UtilitymatrixU issparse• Mostpeoplehavenotratedmostitems• Coldstart:

• Newitemshavenoratings• Newusershavenohistory

• Threeapproachestorecommendersystems:1. Content-based2. Collaborative3. Latentfactorbased

10/20/16 14

Today!

Content-basedRecommenderSystems

10/20/16 15SLIDESADAPTEDFROMJURELESKOVEC

Content-basedRecommendations

•Mainidea: Recommenditemstocustomerx similartopreviousitemsratedhighlybyx

Example:•Movierecommendations

• Recommendmovieswithsameactor(s),director,genre,…

•Websites,blogs,news• Recommendothersiteswith“similar”content

10/20/16 16

PlanofAction

10/20/16 17

likes

Item profiles

RedCircles

Triangles

User profile

match

recommendbuild

ItemProfiles

• Foreachitem,createanitemprofile• Profileisaset(vector)offeatures

• Movies: author,genre,director,actors,year…• Text: Setof“important”wordsindocument

• Howtopickimportantfeatures?• TF-IDF (Termfrequency*InverseDocFrequency)

JohnnyDepp

MovieX

MovieY

0 1 1 0 1 1 0 1 3

1 1 0 1 0 1 1 0 4

ActorA

ActorB …

AvgRating

PirateGenre

SpyGenre

ComicGenre

UserProfiles

• Wantavectorwiththesamecomponents/dimensionsasitems• Couldbe1srepresentinguserpurchases• Orarbitrarynumbersfromarating

• Userprofileisaggregateofitems:• Average(weighted?)ofrateditemprofiles

10/20/16 19

NataliePortman

UserU 0.2 .005 0 0 …

ActorA

ActorB …

Prediction

•Useranditemvectorshavethesamecomponents/dimensions,recommendtheitemswhosevectorsaremostsimilartotheuservector!

•Givenuserprofilex anditemprofilei,• estimate𝑢 𝒙, 𝒊 = cos(𝒙, 𝒊) = 𝒙·𝒊

| 𝒙 |⋅| 𝒊 |

10/20/16 20

Pros

• +:Noneedfordataonotherusers• Nocold-startorsparsity problems

• +:Abletorecommendtouserswithuniquetastes

• +:Abletorecommendnew&unpopularitems

• Nofirst-raterproblem• +:Abletoprovideexplanations

• Canprovideexplanationsofrecommendeditemsbylistingcontent-featuresthatcausedanitemtoberecommended

10/20/16 21

• –:Findingtheappropriatefeaturesishard

• E.g.,images,movies,music• –:Recommendationsfornewusers

• Howtobuildauserprofile?• –:Overspecialization

• Neverrecommendsitemsoutsideuser’scontentprofile

• Peoplemighthavemultipleinterests• Unabletoexploitqualityjudgmentsofotherusers

Cons

CollaborativeFilteringHarnessingqualityjudgmentsofotherusers

10/20/16 22

CollaborativeFiltering

• Consideruserx

• FindsetN ofotheruserswhoseratingsare“similar”tox’sratings

• Estimatex’sratingsbasedonratingsofusersinN

10/20/16 23

x

N

MAIN: a methodological learning-based approach

movies

users

A methodological learning-based approach

1 00 1

0 1 101

0 0 01 1 1

0 0

Howmanyfactorsdeterminepreference?

Thelow-rankassumption

1 0

0 1

0 1 1

0

1

0 0 0

1 1 1

0 0

⇡

⇥

“Preferenceisdeterminedbykfactors”usuallyk={5,…,10}

n

m

n

mk

k

Foreveryentryinthepreferencematrix

𝑀01 = 𝑣0 ⋅ 𝑢1 = 3 𝑣0 𝑡 𝑢1(𝑡)567589

Where𝑣0,𝑢1 ∈ 𝑅9

Example– rank1anditsbenefits

⇡

⇥

Afterobserving(m+n)entries– cancomputetheentirematrix!

n

m

n

mK=1

K=1

Foreveryentryinthepreferencematrix𝑀01 = 𝑣0 ⋅ 𝑢1

Where𝑣0,𝑢1 ∈ 𝑅arescalars.

Howmanyunknowns? Howmanyobservationsareneeded tocompletethematrix?

(food forthought: relatetostatisticallearningtheory– samplecomplexity? )

Thematrixcompletionapproach

1 0

0 1

0 1 1

0

1

0 0 0

1 1 1

0 0

⇡

⇥

n

m

n

mk

k

Solveforu,v:

min?@,AB

3 𝑀01 − 𝑣0 ⋅ 𝑢1D

0,18EFGH?GI

Where𝑣0,𝑢1 ∈ 𝑅9

Totalofk(m+n) variables.

Analgorithmforpredictingrecommendations

Input:observationsofpreferences𝑀01 for{ 𝑖7,𝑗7 , 𝑖D, 𝑗D ,… , (𝑖N, 𝑗N)} (mnumbers intherange[0,1])

Output:AmatrixM ∈ 𝑅AFGHF×N8?0GF thathasallpredictedpreferences

Assumption: thereexistlowdimensionalvectors{vS, uU} suchthatMSU = vS ⋅ 𝑢1

Algorithm: Gradientdescent!Objectivefunction:

𝑓 {𝑢, 𝑣} = 3 𝑀01 − 𝑣0 ⋅ 𝑢1D

0,18EFGH?GI

Whatdowedowiththevectors?

SpellingoutGDinthiscase

GDformatrixcompletion:

• InitializeuS, vU randomly• Foriteration=1,2,…do:

• Update𝑣 ← 𝑣 − 𝜂 YY?𝑓 {𝑣0,𝑢1} forallvectors 𝑣0,𝑢1

spelling itout, foreachcoordinatetofvector𝑣0,update:

YY?Z 5

𝑓 = 2 ∑ (𝑀]1 −1 𝑣] ⋅ 𝑢1 ) ⋅ 𝑢1(𝑡) Thus,

∀𝑎, 𝑡:𝑣] 𝑡 ← 𝑣] 𝑡 − 𝜂 ⋅ 23(𝑀]1 −1

𝑣] ⋅ 𝑢1) ⋅ 𝑢1(𝑡)

• Ifneeded,normalizeeachvector,𝑣 ← ?abc{7, ? }

• EndFor• Returnfinal(oraverageof lastfew)vectorsolutions

𝑓 {𝑢, 𝑣} = 3 𝑀01 − 𝑣0 ⋅ 𝑢1D

0,18EFGH?GI

1 0

0 1

0 1 1

0

1

0 0 0

1 1 1

0 0

⇡

⇥

n

m

n

mk

k

Predictingmeta-datafromrec.data

Personvector!

Movie/songvector!

Predictingmeta-datafromrec.data[EstherRolf’15]

10011101

Gender?Annual income?WillbuyproductX?

1 00 1

0 1 101

0 0 01 1 1

0 0

users

movies

Implicationstouserprivacy,security,…

Evaluation

10/20/16 33

1 3 4

3 5 5

4 5 5

3

3

2 2 2

5

2 1 1

3 3

1

movies

users

Evaluation

10/20/16 34

1 3 4

3 5 5

4 5 5

3

3

2 ? ?

?

2 1 ?

3 ?

1

Test Data Set

users

movies

EvaluatingPredictions

• Comparepredictionswithknownratings• Root-mean-squareerror (RMSE)

• ∑ 𝑟e0 − 𝑟e0∗D

e0 where𝒓𝒙𝒊 ispredicted,𝒓𝒙𝒊∗ isthetrueratingofx oni

• Narrowfocusonaccuracysometimesmissesthepoint

• PredictionDiversity• PredictionContext• Orderofpredictions

• Inpractice,wecareonlytopredicthighratings:• RMSEmightpenalizeamethodthatdoeswellforhighratingsandbadlyforothers

10/20/16 35

FamousHistoricalExample:TheNetflixPrize

• Trainingdata• 100millionratings,480,000users,17,770movies• 6yearsofdata:2000-2005

• Testdata• Lastfewratingsofeachuser(2.8million)• Evaluationcriterion:rootmeansquarederror(RMSE)• NetflixCinematch RMSE:0.9514

• Competition• 2700+teams• $1millionprizefor10%improvementonCinematch• BellKor systemwonin2009.Combinedmanyfactors

• Overalldeviationsofusers/movies• Regionaleffects• Localcollaborativefilteringpatterns• Temporalbiases

10/20/16 36

Summary:RecommendationSystems

• TheLongTail• Content-basedSystems• CollaborativeFiltering(touched)• LatentFactors

• Foodforthought:samplecomplexity?

10/20/16 37

slides lec 11

Documents