music personalization : real time platforms

44
Music Personalization: Realtime Platforms + ML + You = CrunchConf, Budapest, October 30, 2015

Upload: esh

Post on 22-Jan-2018

1.206 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Music Personalization : Real time Platforms

Music Personalization: Realtime Platforms

♫ + ML + You = ❤

CrunchConf, Budapest, October 30, 2015

Page 2: Music Personalization : Real time Platforms

Esh KumarMachine Learning & Data Products @ Spotify NYC @eshvk

Page 3: Music Personalization : Real time Platforms

Who am I?

• UT Austin Machine Learning • Building Large Scale Recommendation Systems @ Mozilla, StumbleUpon & Spotify

Page 4: Music Personalization : Real time Platforms

75 M+ Active Users

Page 5: Music Personalization : Real time Platforms

58 Markets

Page 6: Music Personalization : Real time Platforms

1 TB of Logs/Day

Page 7: Music Personalization : Real time Platforms

1200+ Node Hadoop Cluster

Page 8: Music Personalization : Real time Platforms

Products

•Discover … to find new albums •Discover Weekly … A weekly Playlist •Editorial Playlist Recommendations •Radio

Page 9: Music Personalization : Real time Platforms

Music Personalization

•Understanding People ➡ User Experience, Cultural Variations

•Understanding Content ➡ Genres, Cultural knowledge

•Models ➡ Collaborative Filtering, Content Based

ML

Content

User

Page 10: Music Personalization : Real time Platforms

Music Personalization

•Understanding People ➡ User Experience, Cultural Variations

•Understanding Content ➡ Genres, Cultural knowledge

•Models ➡ Collaborative Filtering, Content Based

• News, Blogs, NLP

Page 11: Music Personalization : Real time Platforms

Music Personalization

•Understanding People ➡ User Experience, Cultural Variations

•Understanding Content ➡ Genres, Cultural knowledge

•Models ➡ Collaborative Filtering, Content Based

• News, Blogs, NLP

• Manually tag attributes

• Curation

Page 12: Music Personalization : Real time Platforms

Music Personalization

•Understanding People ➡ User Experience, Cultural Variations

•Understanding Content ➡ Genres, Cultural knowledge

•Models ➡ Collaborative Filtering, Content Based

• News, Blogs, NLP

• Manually tag attributes

• Curation

• CF

Page 13: Music Personalization : Real time Platforms

30 Million Songs…

What To Play?

75 Million Users … 1 Person Every 3 Secs…

Page 14: Music Personalization : Real time Platforms

Recommendation Systems

• Predict user response to options. • Rich field: Matrix completion, ranking, text models, latent factor models.

• Several conferences annually. RecSys, NIPS, ICML etc • Industry researchers include NFLX, GOOG, MS and more…

Page 15: Music Personalization : Real time Platforms

Collaborative Filtering

Hey,I like tracks P, Q, R, S!

Well,I like tracks Q, R, S, T!

Then you should check out track P!

Nice! Btw try track T!

Model you based on songs you played…

Predict your future based on similar users…

Millions of users and billions of streams… …. so there is someone like you out there

Page 16: Music Personalization : Real time Platforms

Collaborative Filtering

The Netflix Prize.

A million dollars for beating NFLX’s best algorithms by ~ 10%.

Page 17: Music Personalization : Real time Platforms

Similarity

Our problem is to figure out how similar two items are.

Mathematically, this means modeling a function Similarity(x,y) for all users and items, if possible.

Page 18: Music Personalization : Real time Platforms

How do we do this? Matrix Completion. A matrix expresses a system. We model the data in the form of a matrix. For example, play counts for all songs and all users could be:

Users

8>>>>>><

>>>>>>:

0

BBBBBB@

Song Playsz }| {s1,1 s

1,2 14 · · · s1,n

s2,1 s

2,2 2 · · · s2,n

···

sm,1 sm,2 1 · · · sm,n

1

CCCCCCAUsers

8>>>>>><

>>>>>>:

0

BBBBBB@

Song Playsz }| {s1,1 s

1,2 14 · · · s1,n

s2,1 s

2,2 2 · · · s2,n

···

sm,1 sm,2 1 · · · sm,n

1

CCCCCCA

Call Me Maybe

Esh

Esh listened to call me maybe once…

0

BBBBBBBBB@

u1

u2.........

um

1

CCCCCCCCCA

�t1 t2 · · · · · · · · · tn

�⇡

0

BBBBBBBBB@

u1

u2.........

um

1

CCCCCCCCCA

�t1 t2 · · · · · · · · · tn

Page 19: Music Personalization : Real time Platforms

Matrix Completion is well studied …Start with random vectors around the origin. Run alternating least squares or gradient descent or stochastic gradient descent… All this is Hadoopable™.

Users

8>>>>>><

>>>>>>:

0

BBBBBB@

Song Playsz }| {s1,1 s

1,2 14 · · · s1,n

s2,1 s

2,2 2 · · · s2,n

···

sm,1 sm,2 1 · · · sm,n

1

CCCCCCAUsers

8>>>>>><

>>>>>>:

0

BBBBBB@

Song Playsz }| {s1,1 s

1,2 14 · · · s1,n

s2,1 s

2,2 2 · · · s2,n

···

sm,1 sm,2 1 · · · sm,n

1

CCCCCCA

Call Me Maybe

Esh

Esh listened to call me maybe once…

0

BBBBBBBBB@

u1

u2.........

um

1

CCCCCCCCCA

�t1 t2 · · · · · · · · · tn

�⇡

0

BBBBBBBBB@

u1

u2.........

um

1

CCCCCCCCCA

�t1 t2 · · · · · · · · · tn

Page 20: Music Personalization : Real time Platforms

30 Million Songs…

What To Play?

75 Million People … 1 Person Every 3 Secs…

Page 21: Music Personalization : Real time Platforms

1.5 Billion Playlists

Page 22: Music Personalization : Real time Platforms

Language Models

• Language models work well too. For example, a playlist could be considered as a document and you could learn the latent vectors for tracks (words).

• Then represent a User as a linear combination of their Tracks.

Page 23: Music Personalization : Real time Platforms

word2vec

Words with similar contexts have similar meaning

Page 24: Music Personalization : Real time Platforms

word2vec

Page 25: Music Personalization : Real time Platforms

word2vec

Target Word

Context Word

Page 26: Music Personalization : Real time Platforms

word2vec

Target Words and Corresponding Contexts

shining bright trees dark green

stars 61 50 10 30 1

sun 71 60 5 2 0

cucumber 2 1 15 3 40

Page 27: Music Personalization : Real time Platforms

word2vec

Playlists CPU VectorsRead Get Vectors & Update

Page 28: Music Personalization : Real time Platforms

Vectors are awesome!

•Unique fingerprint for every users, tracks, albums, artists & even playlists in the same space.

•Similarity is easily computable. Euclidean Distance or Cosine Similarity.

Page 29: Music Personalization : Real time Platforms

Approximate Nearest Neighbors

•Fast approximate nearest neighbor search.

• Locality Sensitive Hashing

• https://github.com/spotify/annoy

Page 30: Music Personalization : Real time Platforms

Vectors are great for Infrastructure too…

•Machine Learning can be decomposed & abstracted away.

•A Lambda Architecture involving Machine Learning becomes eas(ier).

•Platforms for Personalization become possible….

Page 31: Music Personalization : Real time Platforms

The Record Store… The List Maker …

How do you scale this?

Page 32: Music Personalization : Real time Platforms

Tools of the trade

• Build models in Python. (NumPy, SciPy )

• Jobs in Scalding + Luigi ( https://github.com/spotify/luigi )

• Storm for real time.

• In house RPC for serving requests.

Page 33: Music Personalization : Real time Platforms

Storm 101

• Realtime Stream Processing.

• Like Hadoop but easier.

• Fault tolerant.

• Java, Clojure (yay!) and more!

Page 34: Music Personalization : Real time Platforms

Storm @ Spotify

• Major users are Ads & Personalization!

• Every team manages its own cluster. For personalization, we have a 12 node cluster.

• Relatively a new tech, compared to Hadoop™.

Page 35: Music Personalization : Real time Platforms

So why Storm?

• Hadoop is slowwww. Daily User Vector jobs takes ~ 16 hours to run. Small Data FTW!

• New Users are important; they need a friend!

• What moment are you in? Gym, Running etc?.

Page 36: Music Personalization : Real time Platforms

Getting Data Across The Globe

Page 37: Music Personalization : Real time Platforms

HDFS

Kafka

Pipeline …

UserListens

Playlists

Realtime Listens Spout

Page 38: Music Personalization : Real time Platforms

HDFS

Kafka

Pipeline …

UserListens

Playlists

Realtime Listens Spout

User Vector Generation Job

Latent Vector Models

Track, Artist, Album Vectors

Page 39: Music Personalization : Real time Platforms

HDFS

Kafka

Pipeline …

UserListens

Playlists

Realtime Listens Spout

User Vector Generation Job

Latent Vector Models

Track, Artist, Album Vectors

Compressed Listening History

BoltsCassandra

Cassandra

Page 40: Music Personalization : Real time Platforms

HDFS

Kafka

Pipeline + Platform

UserListens

Playlists

Realtime Listens Spout

User Vector Generation Job

Latent Vector Models

Track, Artist, Album Vectors

Compressed Listening History

BoltsCassandra

Cassandra

Backend Systems

•Top Albums •Top Tracks •Top Playlists

Page 41: Music Personalization : Real time Platforms

Discover New User

• Going from two weeks of no recommendations to recommendations as soon as a user plays a track.

• Successful A/B test • First team to build a production ready

personalization feature using Storm.

Page 42: Music Personalization : Real time Platforms

Lessons Learnt …

• Boring technology works well. Complicated Storm Topology = Bad. (Dan Mckinley)

• Storm is nice. Would have preferred reusing batch Scalding Code. Maybe Spark Streaming?

• Grow your API from one use case to another. Don’t solve for everything at one time.

Page 43: Music Personalization : Real time Platforms

Join the band!

• Machine Learning, Data & Backend Gigs.

• Now touring in New York, Boston & Stockholm!

• https://www.spotify.com/jobs/

Page 44: Music Personalization : Real time Platforms

Thanks !Esh Kumar @eshvk