algorithmic music recommendations at spotify

42
January 13, 2014 Algorithmic Music Discovery at Spotify Chris Johnson @MrChrisJohnson Monday, January 13, 14

Upload: chris-johnson

Post on 08-Sep-2014

36 views

Category:

Technology


2 download

DESCRIPTION

In this presentation I introduce various Machine Learning methods that we utilize for music recommendations and discovery at Spotify. Specifically, I focus on Implicit Matrix Factorization for Collaborative Filtering, how to implement a small scale version using python, numpy, and scipy, as well as how to scale up to 20 Million users and 24 Million songs using Hadoop and Spark.

TRANSCRIPT

Page 1: Algorithmic Music Recommendations at Spotify

January 13, 2014

Algorithmic Music Discovery at Spotify

Chris Johnson@MrChrisJohnson

Monday, January 13, 14

Page 2: Algorithmic Music Recommendations at Spotify

Who am I??•Chris Johnson

– Machine Learning guy from NYC– Focused on music recommendations– Formerly a graduate student at UT Austin

Monday, January 13, 14

Page 3: Algorithmic Music Recommendations at Spotify

3What is Spotify?

• On demand music streaming service• “iTunes in the cloud”

Monday, January 13, 14

Page 4: Algorithmic Music Recommendations at Spotify

Section name 4

Monday, January 13, 14

Page 5: Algorithmic Music Recommendations at Spotify

5Data at Spotify....• 20 Million songs• 24 Million active users• 6 Million paying users• 8 Million daily active users• 1 TB of compressed data generated from users per day• 700 node Hadoop Cluster• 1 Million years worth of music streamed• 1 Billion user generated playlists

Monday, January 13, 14

Page 6: Algorithmic Music Recommendations at Spotify

6Challenge: 20 Million songs... how do we recommend music to users?

Monday, January 13, 14

Page 7: Algorithmic Music Recommendations at Spotify

7Recommendation Features• Discover (personalized recommendations)• Radio• Related Artists• Now Playing

Monday, January 13, 14

Page 8: Algorithmic Music Recommendations at Spotify

How can we find good recommendations?

• Manual Curation

• Manually Tag Attributes

• Audio Content, Metadata, Text Analysis

• Collaborative Filtering

8

Monday, January 13, 14

Page 9: Algorithmic Music Recommendations at Spotify

Collaborative Filtering - “The Netflix Prize” 9

Monday, January 13, 14

Page 10: Algorithmic Music Recommendations at Spotify

Collaborative Filtering 10

Hey,I like tracks P, Q, R, S!

Well,I like tracks Q, R, S, T!

Then you should check out track P!

Nice! Btw try track T!

Image via Erik BernhardssonMonday, January 13, 14

Page 11: Algorithmic Music Recommendations at Spotify

Section name 11

Monday, January 13, 14

Page 12: Algorithmic Music Recommendations at Spotify

Difference between movie and music recs 12

• Scale of catalog

60,000 movies 20,000,000 songs

Monday, January 13, 14

Page 13: Algorithmic Music Recommendations at Spotify

Difference between movie and music recs 13

• Repeated consumption

Monday, January 13, 14

Page 14: Algorithmic Music Recommendations at Spotify

Difference between movie and music recs 14

• Music is more niche

Monday, January 13, 14

Page 15: Algorithmic Music Recommendations at Spotify

“The Netflix Problem” Vs “The Spotify Problem 15

•Netflix: Users explicitly “rate” movies

•Spotify: Feedback is implicit through streaming behavior

Monday, January 13, 14

Page 16: Algorithmic Music Recommendations at Spotify

Section name 16

Monday, January 13, 14

Page 17: Algorithmic Music Recommendations at Spotify

Explicit Matrix Factorization 17

Movies

Users

Chris

Inception

•Users explicitly rate a subset of the movie catalog•Goal: predict how users will rate new movies

Monday, January 13, 14

Page 18: Algorithmic Music Recommendations at Spotify

• = bias for user• = bias for item• = regularization parameter

Explicit Matrix Factorization 18

ChrisInception

? 3 5 ?1 ? ? 12 ? 3 2? ? ? 55 2 ? 4

•Approximate ratings matrix by the product of low-dimensional user and movie matrices

•Minimize RMSE (root mean squared error)

• = user rating for movie • = user latent factor vector• = item latent factor vector

X Y

Monday, January 13, 14

Page 19: Algorithmic Music Recommendations at Spotify

Implicit Matrix Factorization 19

1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1

•Replace Stream counts with binary labels– 1 = streamed, 0 = never streamed

•Minimize weighted RMSE (root mean squared error) using a function of stream counts as weights

• = bias for user• = bias for item• = regularization parameter

• = 1 if user streamed track else 0• • = user latent factor vector• =i tem latent factor vector

X Y

Monday, January 13, 14

Page 20: Algorithmic Music Recommendations at Spotify

Alternating Least Squares 20

• Initialize user and item vectors to random noise

• Fix item vectors and solve for optimal user vectors– Take the derivative of loss function with respect to user’s vector, set

equal to 0, and solve– Results in a system of linear equations with closed form solution!

• Fix user vectors and solve for optimal item vectors• Repeat until convergence

code: https://github.com/MrChrisJohnson/implicitMF

Monday, January 13, 14

Page 21: Algorithmic Music Recommendations at Spotify

Alternating Least Squares 21

• Note that:

• Then, we can pre-compute once per iteration– and only contain non-zero elements for tracks that

the user streamed– Using sparse matrix operations we can then compute each user’s

vector efficiently in time where is the number of tracks the user streamed

code: https://github.com/MrChrisJohnson/implicitMF

Monday, January 13, 14

Page 22: Algorithmic Music Recommendations at Spotify

22Alternating Least Squares

code: https://github.com/MrChrisJohnson/implicitMFMonday, January 13, 14

Page 23: Algorithmic Music Recommendations at Spotify

•User-Item score is the dot product

•Item-Item similarity is the cosine similarity

•Both operations have trivial complexity based on the number of latent factors

23How do we use the learned vectors?

Monday, January 13, 14

Page 24: Algorithmic Music Recommendations at Spotify

24Latent Factor Vectors in 2 dimensions

Monday, January 13, 14

Page 25: Algorithmic Music Recommendations at Spotify

Section name 25

Monday, January 13, 14

Page 26: Algorithmic Music Recommendations at Spotify

Scaling up Implicit Matrix Factorization with Hadoop

26

Monday, January 13, 14

Page 27: Algorithmic Music Recommendations at Spotify

Hadoop at Spotify 2009 27

Monday, January 13, 14

Page 28: Algorithmic Music Recommendations at Spotify

Hadoop at Spotify 2014 28

700 Nodes in our London data center

Monday, January 13, 14

Page 29: Algorithmic Music Recommendations at Spotify

Implicit Matrix Factorization with Hadoop 29

Reduce stepMap step

u % K = 0i % L = 0

u % K = 0i % L = 1 ... u % K = 0

i % L = L-1

u % K = 1i % L = 0

u % K = 1i % L = 1 ... ...

... ... ... ...

u % K = K-1i % L = 0 ... ... u % K = K-1

i % L = L-1

item vectorsitem%L=0

item vectorsitem%L=1

item vectorsi % L = L-1

user vectorsu % K = 0

user vectorsu % K = 1

user vectorsu % K = K-1

all log entriesu % K = 1i % L = 1

u % K = 0

u % K = 1

u % K = K-1

Figure via Erik BernhardssonMonday, January 13, 14

Page 30: Algorithmic Music Recommendations at Spotify

Implicit Matrix Factorization with Hadoop 30

One map taskDistributed

cache:All user vectors where u % K = x

Distributed cache:

All item vectors where i % L = y

Mapper Emit contributions

Map input:tuples (u, i, count)

where u % K = x

andi % L = y

Reducer New vector!

Figure via Erik BernhardssonMonday, January 13, 14

Page 31: Algorithmic Music Recommendations at Spotify

Implicit Matrix Factorization with Spark 31

Vs

http://www.slideshare.net/Hadoop_Summit/spark-and-shark

Spark

Hadoop

Monday, January 13, 14

Page 32: Algorithmic Music Recommendations at Spotify

Section name 32

Monday, January 13, 14

Page 33: Algorithmic Music Recommendations at Spotify

Approximate Nearest Neighbors 33

code: https://github.com/Spotify/annoy

Monday, January 13, 14

Page 34: Algorithmic Music Recommendations at Spotify

Ensemble of Latent Factor Models 34

Figure via Erik BernhardssonMonday, January 13, 14

Page 35: Algorithmic Music Recommendations at Spotify

AB-Testing Recommendations 35

Monday, January 13, 14

Page 36: Algorithmic Music Recommendations at Spotify

Open Problems 36

•How to go from predictive model to related artists? (learning to rank?)

•How do you learn from user feedback?•How do you deal with observation bias in the user feedback?

(active learning?)•How to factor in temporal information?•How much value in content based recommendations?•How to best evaluate model performance?•How to best train an ensemble?

Monday, January 13, 14

Page 37: Algorithmic Music Recommendations at Spotify

Section name 37

Thank You!

Monday, January 13, 14

Page 38: Algorithmic Music Recommendations at Spotify

Section name 38

Monday, January 13, 14

Page 39: Algorithmic Music Recommendations at Spotify

Section name 39

Monday, January 13, 14

Page 40: Algorithmic Music Recommendations at Spotify

Section name 40

Monday, January 13, 14

Page 41: Algorithmic Music Recommendations at Spotify

Section name 41

Monday, January 13, 14

Page 42: Algorithmic Music Recommendations at Spotify

Section name 42

Monday, January 13, 14