collaborative filtering matrix completion alternating least squares · 2016-05-20 · women on the...

42
©Sham Kakade 2016 1 Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016 Collaborative Filtering Matrix Completion Alternating Least Squares Case Study 4: Collaborative Filtering

Upload: others

Post on 30-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

©Sham Kakade 2016 1

Machine Learning for Big Data CSE547/STAT548, University of Washington

Sham Kakade

May 19, 2016

Collaborative FilteringMatrix Completion

Alternating Least Squares

Case Study 4: Collaborative Filtering

Page 2: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Collaborative Filtering

• Goal: Find movies of interest to a user based on movies watched by the user and others

• Methods: matrix factorization

©Sham Kakade 2016 2

Page 3: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

City of God

Wild Strawberries

The Celebration

La Dolce Vita

Women on the Verge of aNervous Breakdown

What do I recommend???

©Sham Kakade 2016 3

Page 4: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Netflix Prize

• Given 100 million ratings on a scale of 1 to 5, predict 3 million ratings to highest accuracy

• 17770 total movies

• 480189 total users

• Over 8 billion total ratings

• How to fill in the blanks?

©Sham Kakade 2016 4Figures from Ben Recht

Page 5: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Matrix Completion Problem

• Filling missing data?

©Sham Kakade 2016 5

Xij known for black cells

Xij unknown for white cells

Rows index moviesColumns index users

X =Rows index users

Columns index movies

Page 6: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Interpreting Low-Rank Matrix Completion (aka Matrix Factorization)

©Sham Kakade 2016 6

X LR’

=

Page 7: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Identifiability of Factors

• If ruv is described by Lu , Rv what happens if we redefine the “topics” as

• Then,

©Sham Kakade 2016 7

X LR’

=

Page 8: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Matrix Completion via Rank Minimization

• Given observed values:

• Find matrix

• Such that:

• But…

• Introduce bias:

• Two issues:

©Sham Kakade 2016 8

Page 9: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Approximate Matrix Completion

• Minimize squared error:– (Other loss functions are possible)

• Choose rank k:

• Optimization problem:

©Sham Kakade 2016 9

Page 10: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Coordinate Descent for Matrix Factorization

• Fix movie factors, optimize for user factors

• First observation:

©Sham Kakade 2016 10

Page 11: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Minimizing Over User Factors

• For each user u:

• In matrix form:

• Second observation: Solve by

©Sham Kakade 2016 11

Page 12: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Coordinate Descent for Matrix Factorization: Alternating Least-Squares

• Fix movie factors, optimize for user factors– Independent least-squares over users

• Fix user factors, optimize for movie factors– Independent least-squares over movies

• System may be underdetermined:

• Converges to

©Sham Kakade 2016 12

Page 13: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Effect of Regularization

©Sham Kakade 2016 13

X LR’

=

Page 14: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

What you need to know…

• Matrix completion problem for collaborative filtering

• Over-determined -> low-rank approximation

• Rank minimization is NP-hard

• Minimize least-squares prediction for known values for given rank of matrix

– Must use regularization

• Coordinate descent algorithm = “Alternating Least Squares”

©Sham Kakade 2016 14

Page 15: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

©Sham Kakade 2016 15

Machine Learning for Big Data CSE547/STAT548, University of Washington

Sham Kakade

May 19, 2016

SGD for Matrix Completion,more algorithms

(Matrix-norm Minimization)

Case Study 4: Collaborative Filtering

Page 16: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Stochastic Gradient Descent

• Observe one rating at a time ruv

• Gradient observing ruv:

• Updates:

©Sham Kakade 2016 16

Page 17: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Local Optima v. Global Optima

• We are solving:

• We (kind of) wanted to solve:

• Which is NP-hard… – How do these things relate???

©Sham Kakade 2016 17

Page 18: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Eigenvalue Decompositions for PSD Matrices

• Given a (square) symmetric positive semidefinite matrix:

– Eigenvalues:

• Thus rank is:

• Approximation:

• Property of trace:

• Thus, approximate rank minimization by:

©Sham Kakade 2016 18

Page 19: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Generalizing the Trace Trick

• Non-square matrices have no trace

• For (square) positive semidefinite matrices, eigendecomposition:

• For rectangular matrices, singular value decomposition:

• Nuclear norm:

©Sham Kakade 2016 19

Page 20: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Nuclear Norm Minimization

• Optimization problem:

Possible to relax equality constraints:

Both are convex problems!(solved by semidefinite programming)

©Sham Kakade 2016 20

Page 21: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Nuclear Norm Minimization vs. Direct (Bilinear) Low Rank Solutions

• Nuclear norm minimization:

– Annoying because:

• Instead:

– Annoying because:

– But

• So

• And

• Under certain conditions [Burer, Monteiro ‘04]©Sham Kakade 2016 21

Page 22: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Nuclear Norm Minimization vs. Direct (Bilinear) Low Rank Solutions

• Nuclear norm minimization:

– Annoying because:

• Instead:

– Annoying because:

– But

• So

• And

• Under certain conditions [Burer, Monteiro ‘04]©Sham Kakade 2016 22

Page 23: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Theory

• Suppose true matrix is exactly low rank, what might we hope for?– Exact recovery?

– Statistically?

– Computationally?

• Is this possible?

• Assumptions:

©Sham Kakade 2016 23

Page 24: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Analysis of Nuclear Norm

• Nuclear norm minimization = convex relaxation of rank minimization:

• Theorem [Candes, Recht ‘08]: – If there is a true matrix of rank k,

– And, we observe at least

random entries of true matrix

– Then true matrix is recovered exactly with high probability via convex nuclear norm minimization!

• Under certain conditions

©Sham Kakade 2016 24

Page 25: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Alternating Minimization

• Alt. Min. used in practice.

• Alt. Min. was widely thought to be a search heuristic, does it work?

©Sham Kakade 2016 25

Page 26: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

SGD

• SGD also widely used in practice (streaming, fast)

• Does it work?

©Sham Kakade 2016 26

Page 27: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

What you need to know…

• Stochastic gradient descent for matrix factorization

• Norm minimization as convex relaxation of rank minimization– Trace norm for PSD matrices

– Nuclear norm in general

• Intuitive relationship between nuclear norm minimization and direct (bilinear) minimization

©Sham Kakade 2016 27

Page 28: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

©Sham Kakade 2016 28

Machine Learning for Big Data CSE547/STAT548, University of Washington

Emily Fox

May 6th, 2015

Nonnegative Matrix FactorizationProjected Gradient

Case Study 4: Collaborative Filtering

Page 29: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Matrix factorization solutions can be unintuitive…

• Many, many, many applications of matrix factorization

• E.g., in text data, can do topic modeling (alternative to LDA):

• Would like:

• But…

©Sham Kakade 2016 29

X LR’

=

Page 30: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Nonnegative Matrix Factorization

• Just like before, but

• Constrained optimization problem– Many, many, many, many solution methods… we’ll check out a simple one

©Sham Kakade 2016 30

X LR’

=

Page 31: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Recall: Projected Gradient

• Standard optimization:

– Want to minimize:

– Use, e.g., gradient updates:

• Constrained optimization:

– Given convex set C of feasible solutions

– Want to find minima within C:

• Projected gradient:

– Take a gradient step (ignoring constraints):

– Projection into feasible set:

©Sham Kakade 2016 31

Page 32: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Projected Stochastic Gradient Descent for Nonnegative Matrix Factorization

• Gradient step observing ruv ignoring constraints:

• Convex set:

• Projection step:

©Sham Kakade 2016 32

Page 33: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

What you need to know…

• In many applications, want factors to be nonnegative

• Corresponds to constrained optimization problem

• Many possible approaches to solve, e.g., projected gradient

©Sham Kakade 2016 33

Page 34: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

©Sham Kakade 2016 34

Machine Learning for Big Data CSE547/STAT548, University of Washington

Sham Kakade

May 19, 2016

Cold Start Problem

Case Study 4: Collaborative Filtering

Page 35: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Cold-Start Problem

• Challenge: Cold-start problem (new movie or user)

• Methods: use features of movie/user

©Sham Kakade 2016 35

IN THEATERS

Page 36: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Cold-Start Problem More Formally

• Consider a new user u’ and predicting that user’s ratings– No previous observations

– Objective considered so far:

– Optimal user factor:

– Predicted user ratings:

©Sham Kakade 2016 36

Page 37: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

An Alternative Formulation

• A simpler model for collaborative filtering– We would not have this issue if we assumed all users were identical

– What about for new movies? What if we had side information?

– What dimension should w be?

– Fit linear model:

– Minimize:

©Sham Kakade 2016 37

Page 38: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Personalization

• If we don’t have any observations about a user, use wisdom of the crowd– Address cold-start problem

• Clearly, not all users are the same

• Just as in personalized click prediction, consider model with global and user-specific parameters

• As we gain more information about the user, forget the crowd

©Sham Kakade 2016 38

Page 39: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

User Features…

• In addition to movie features, may have information about the user:

• Combine with features of movie:

• Unified linear model:

©Sham Kakade 2016 39

Page 40: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Feature-Based Approach vs. Matrix Factorization

• Feature-based approach: – Feature representation of user and movies fixed

– Can address cold-start problem

• Matrix factorization approach:– Suffers from cold-start problem

– User & movie features are learned from data

• A unified model:

©Sham Kakade 2016 40

Page 41: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

Unified Collaborative Filtering via SGD

• Gradient step observing ruv

– For L,R

– For w and wu:

©Sham Kakade 2016 41

Page 42: Collaborative Filtering Matrix Completion Alternating Least Squares · 2016-05-20 · Women on the Verge of a Nervous Breakdown What do I recommend??? ©Sham Kakade 2016 3. Netflix

What you need to know…

• Cold-start problem

• Feature-based methods for collaborative filtering

– Help address cold-start problem

• Unified approach

©Sham Kakade 2016 42