how apache drives music recommendations at spotify
TRANSCRIPT
![Page 1: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/1.jpg)
How Apache Drives Music Recommendations At Spotify
Josh Baer ([email protected])Note: The view expressed is my own and does not necessarily represent that of Spotify
![Page 2: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/2.jpg)
Who Am I?• Technical Product Owner at
Spotify • Working with batch and fast
processing infrastructure
@l_phant
![Page 3: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/3.jpg)
Music Discovery in the 90s
![Page 4: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/4.jpg)
What is Spotify?• Music Streaming Service • Launched in 2008 • Free and Premium Tiers • Available in 58 Countries
![Page 5: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/5.jpg)
75+ Million Active Users
![Page 6: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/6.jpg)
30+ Million Songs
![Page 7: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/7.jpg)
1+ Billion Plays/Day
![Page 8: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/8.jpg)
Music Recommendations with Apache
![Page 9: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/9.jpg)
![Page 10: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/10.jpg)
![Page 11: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/11.jpg)
How do we recommend a personalized playlist of
new music to 75+ million users?
![Page 12: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/12.jpg)
10.123.133.333 - - [Mon, 3 June 2015 11:31:33 GMT] "GET /api/admin/job/aggregator/status HTTP/1.1" 200 1847 "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
10.123.133.222 - - [Mon, 3 June 2015 11:31:43 GMT] "GET /api/admin/job/aggregator/status HTTP/1.1" 200 1984 "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36”
10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
10.321.145.111 - - [Mon, 3 June 2015 11:33:03 GMT] "GET /api/loggedInUser HTTP/1.1" 304 - "https://my.analytics.app/dashboard/courses/1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
10.112.322.111 - - [Mon, 3 June 2015 11:33:03 GMT] "POST /api/instrumentation/events/new HTTP/1.1" 200 2 "https://my.analytics.app/dashboard/courses/1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36”
10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
It begins with a log
![Page 13: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/13.jpg)
![Page 14: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/14.jpg)
![Page 15: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/15.jpg)
Apache Kafka at Spotify•340 Kafka-related nodes
•30 TB/day from logs
![Page 16: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/16.jpg)
![Page 17: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/17.jpg)
How do we store TBs of new data every data?
![Page 18: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/18.jpg)
![Page 19: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/19.jpg)
Apache Hadoop at Spotify• 1700 Nodes
• 60 PB of Data
• 70 TB of Memory
• Over 1 Million jobs run in Q3, 2015
![Page 20: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/20.jpg)
Proc
essi
ng G
row
th
150%
250%
350%
450%
550%
Q4-2013 Q1-2014 Q2-2014 Q3-2014 Q4-2014 Q1-2015 Q2-2015 Q3-2015
Hadoop at Spotify
![Page 21: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/21.jpg)
Processing Toolbox• Apache Crunch
• Scalding
• Apache Hive
• Apache Spark
• Apache Storm
• Hadoop Streaming
• Apache Pig
![Page 22: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/22.jpg)
Storage Formats• Apache Avro
• Apache Parquet
![Page 23: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/23.jpg)
How do we personalize the playlists?
![Page 24: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/24.jpg)
![Page 25: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/25.jpg)
Collaborative FilteringJustin Bieber Drake Avicii Major Lazer
Anna Listened Listened
Gustav Listened Listened Listened
Mary Listened Listened Listened Listened
Michael Listened ListenedSuggest
![Page 26: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/26.jpg)
![Page 27: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/27.jpg)
How do we serve new playlists to all our users
every week?
![Page 28: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/28.jpg)
![Page 29: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/29.jpg)
Apache Cassandra at Spotify• Number of Clusters: 113
• Number of Machines: 1155
• Largest Cluster: 60 Nodes
![Page 30: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/30.jpg)
Driven By Data
![Page 31: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/31.jpg)
Driven By Apache
![Page 32: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/32.jpg)
Thank YOU for your contributions to
Apache products!
![Page 33: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/33.jpg)
One Last Thing…
![Page 34: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/34.jpg)
Spotify Luigi•Workflow Manager •Over 150 contributors •Used by 10s, possibly 100s of companies
![Page 35: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/35.jpg)
Maybe… Apache Luigi?Sponsors/mentors/contributors wanted!
![Page 36: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/36.jpg)
Think this stuff is interesting?We have a great time building it!
spotify.com/jobs
![Page 37: How Apache Drives Music Recommendations At Spotify](https://reader030.vdocuments.site/reader030/viewer/2022033021/587142751a28ab55588b4c49/html5/thumbnails/37.jpg)
Better Spotify ML Presentations• Algorithmic Music Recommendations at Spotify (Chris Johnson)
• Interactive Recommender Systems with Netflix and Spotify (Chris Johnson)
• Music recommendations @ MLConf 2014 (Erik Bernhardsson)
• Machine learning @ Spotify (Andy Sloane)
• Recommending music on Spotify with deep learning (Sander Dieleman)
• Scala Data Pipelines @ Spotify (Neville Li)
• Spotify's Music Recommendations Lambda Architecture (Esh Kumar and Emily Samuels)