playlists at spotify

17
Playlists at Spotif A massively scalable storage system Marcus Better Software Engineer [email protected]

Upload: datastax-academy

Post on 22-Jan-2018

653 views

Category:

Technology


0 download

TRANSCRIPT

Playlists at SpotifA massively scalable storage system

Marcus BetterSoftware Engineer

[email protected]

Spotif by the Numbers

‣ 75 million active users– 20 million paying subscribers

‣ 30 million songs‣ 1.5 billion playlists created‣ 6 000 servers in 4 data centres‣ Available in 58 markets

Architecture overview● 400+ loosely coupled services● Backend is mostly Java● Storage options:

– Cassandra– PostgreSQL– Sparkey (our own open-source database

for static data sets)● 120+ Cassandra clusters

Playlists

Requirements● Over 1 billion lists● > 100k reqs/s● Collaborative editing● Concurrent changes● Offline editing

Version controlPlaylists as versioned objectsStore all changesChanges are immutable!

ROOT

1,2bfd16

3,def87a

2,f7a9ba

Head revision

ADD i=0, items=[A,B,C]

MOV from=1, to=0, len=1

REM from=0, len=1

List: A, B, C

List: B, A, C

List: A, C

Branches

ROOT

1,2bfd16

2,81ahcd2,f7a9ba

Two heads!

Concurrent updates lead to branchingThese will be automatically merged by the system

Merging

ROOT

1,2bfd16

2,81ahcd2,f7a9ba

Concurrent updates lead to branchingThese will be automatically merged by the system3,39acc 3,8a0ba

2,f7a9ba

ADD i=5, [A] REM i=2, len=3

ADD i=2, [A]REM i=2, len=3

Playlist data model

ROOT

1,2bfd16

3,def87a

2,f7a9ba

Head revision

ADD i=0, items=[A,B,C]

MOV from=1, to=0, len=1

REM from=0, len=1

List: A, B, C

List: B, A, C

List: A, C

Typical requests“Give me all changes since rev 2”

“Give me the latest snapshot of the playlist”

Playlist changes● Column family playlist_change stores

changes● Row key = playlist ID● Column name = revision ID

Row key 1,2bfd16 2,f7a9ba 3,def87a

spotify:user:mbetter:playlist:1234

ADD i=0, [A,B,C] MOV from=1, to=0, len=1 REM from=0, len=1

Head pointers● Column family playlist_head stores head

pointers

Row key 3,def87a

spotify:user:mbetter:playlist:1234 <empty>

Snapshot cache● playlist_change works well for syncing● Not so well for fetching new playlists● Snapshot cache

Row key snapshot

spotify:user:mbetter:playlist:1234 [A, C]

Full data model

playlist_snapshot snapshot

playlist:1234 [A, C]

playlist_change 1,2bfd16 2,f7a9ba 3,def87a

playlist:1234ADD i=0, [A,B,C]

MOV from=1, to=0, len=1

REM from=0, len=1

playlist_head 3,def87a

playlist:1234 <empty>

The playlist cluster‣ 90 Cassandra nodes‣ 18 service hosts‣ Uses FusionIO solid-state drives‣ 30 TB of data‣ 1.5 billion playlists‣ 170k reqs/s at peak globally‣ 50 playlists created every second

Pain points (ouch!)‣ Repairs‣ JVM garbage collection‣ Tombstones‣ Bulk ingestion

Open source from SpotifGet yours on spotify.github.io!– Cassandra Reaper – automates repairs– Cassandra Ops Tools– hdfs2cass – bulk load data into Cassandra– Heroic – time series database backed by Cassandra

Other contributions:– Date-tiered compaction strategy (DTCS)

Thank you!Questions?

We're hiring!https://www.spotify.com/jobsTwitter: @SpotifyEng