dynamic covering for recommendation systems
DESCRIPTION
Dynamic Covering for Recommendation Systems. Ioannis Antonellis Anish Das Sarma Shaddin Dughmi. Outline. Covering & Recommendations Succinct Dynamic Covering Results: Upper Bounds Lower Bounds. Max k-cover Problem. Input: integer k items: X = {1,2, ..., n} - PowerPoint PPT PresentationTRANSCRIPT
Dynamic Covering for Recommendation Systems
Ioannis AntonellisAnish Das SarmaShaddin Dughmi
Outline
• Covering & Recommendations• Succinct Dynamic Covering• Results:
o Upper Boundso Lower Bounds
Max k-cover Problem• Input:
o integer ko items: X = {1,2, ..., n}o sets: I = {S1, ..., Sm}, Si subset of X
• Output: Find subset of I with size less than k that maximizes cover of items
A
B
1
5
4
3
k=1, Solution: A (size=3)
k=2, Solutions: A,C (size=4) A,B
(size=4)
B,C (size=4)
C
2
Sets
Items
Max k-cover Problem• NP-complete• Greedy Algorithm
o pick set that cover more itemso iterate
• 1 - ((k-1)/k)^k <= 1 - 1/e = 0.67 approximation
A
B
C
1
5
4
3
2
Sets
Items
k=1, Solution: A (size=3)
k=2, Solutions: A,C (size=4) A,B
(size=4)
B,C (size=4)
Max k-cover in Recommendations
• Alice views and rates movies• Netflix would like to recommend
new movies to Alice for watching
• Important problem: o Find users "similar" to Aliceo Find users who cover a large set of
Alice's likes and dislikes
Netflix example• Each user is identified by subset of movies
he likes/viewed• Alice likes {A, B, C}• Fred likes {A, D}• Bob likes {B, E}• Ben likes {C, F}• Jim likes {A, B, F}• James likes {A, B, F}
Ben and Jim in conjunction cover all Alice's likesFred, Bob and Ben in conjunction cover all Alice's likesJim and James add same value
k-covering vs nearest neighbor
• for k=1, equivalent (dot product similarity)• covering allows for diversifying
recommendations• want to cover all genres liked by a user
o consider a user that likes 100 thriller movies and 10 comedies
o want "similar" users to cover as many movies as possible
o k-nearest neighbor attempts to find many similar users, not cover as many movies as possible
oDesk example• Online labor marketplace• clients post jobs and/or invite contractors• contractors apply to jobs
• Contractor recommendations for clientso Bob invites/interviews/hires contractorso find clients "similar" to Bob
• Job recommendations for contractorso Alice applies to jobso find contractors "similar" to Alice
Succinct Dynamic Covering (SDC)
• Input:o integer ko items: X = {1,2, ..., n}o sets: I = {S1, ..., Sm}, Si subset of Xo query Q subset of X
• Output: Find subset of I with size less than k that maximizes cover of items in query Q
• However we further constrain the problem:o space constrained: statically preprocess (X,I)
and store a small sketch, much smaller than O(mn)
o dynamic: Q is not known apriori during the sketch creation
Notice two twists• dynamic
o for each user the set of movies that need to be covered is different
o covering is not static
• space-constrainedo real time, interactive recommendationso the whole netflix graph is huge
10 million users 100k movies popular movies have been viewed many
timeso cannot process over the entire graph at query
time
Ad serving• online advertisers
o bid on webpages matching relevancy criteriao target certain user demographics
When a user visits a page• Ad servers:
o have some (not precise) idea about the demographic of the user (e.g. from click logs)
o try to pick a set of ads that cover many user demographics
o need to solve the SDC probem
Ad serving• space-constraint:
o set system consists of users, webpages and clicks
• dynamic:o each user view of each page is
associated with different user demographic
A
B
C
1
5
4
3
2
Ads
Webpages
User visited pages
Coverage Oracle• Offline stage:
o Input: integer k items: X = {1,2, ..., n} sets: I = {S1, ..., Sm}, Si subset of X
• Output: Data Structure D
• Dynamic stage:o Input: Query Q subset of Xo Output: use D to find subset of I with
size less than k that maximizes cover of items in query Q
Outline
• Covering & Recommendations• Succinct Dynamic Covering• Results:
o Upper Boundso Lower Bounds
Results• given space limitations
o interested in approximate solutions for SDC
• space vs approximation ratio tradeoffs
• ε: [0,1/2]• δ1, δ1: non-negative integers, not both
zero
Simple Deterministic Algorithm
• For every item, "remember" one set• break ties arbitrarily• m/k approximation, linear space
Sets ItemsSets Items
k=2:OPT = 16APPROX = 8ratio = 16/8 =2
Better Deterministic Algorithm• Find unchosen set containing the most
uncovered items. Iterate.• similar to previous algorithm, order is fixed• sqrt(n/k) approximation, linear spaceSets Items
Sets Items
k=2:OPT = 16APPROX = 16ratio = 16/16 = 1
Randomized Algorithm• mε/sqrt(k) approximation• nm1-2ε space
• Find unchosen set containing at least n/(mεsqrt(k)). Choose and Iterate.
• For every remaining unchosen set, choose n/m2ε uniformly at random from the uncovered items
Randomized Algorithm• mε/sqrt(k) approximation• nm1-2ε space
• Find unchosen set containing at least n/(mεsqrt(k)). Choose and Iterate.
• For every remaining unchosen set, choose n/m2ε uniformly at random from the uncovered items
Lower Bound• holds for deterministic oracles only• proof somewhat involved, uses the probabilistic
method• matches randomized upper bound
• Open problem: randomized lower bound
Related word• distance oracles in graphs, Thorup and
Zwick• set cover in streaming model (sets are
streams or items are streams)• nearest neighbor (NN) search:
o for k=1, SDC and NN are equivalent using the dot product similarity
o no locality sensitive hashing for dot product (Charikar). So, no hope for signature schemes for SDC.
Summary• Introduced Succinct Dynamic Covering
problem
• Applications in many real-world recommendation systems
• approximation ratio and space tradeoffs
• Deterministic and Randomized upper bounds
• Deterministic lower bound
Thank you!