(sowemine workshop) "#nowplaying on #spotify: leveraging spotify information on twitter for...

20
Seite 1 Page 1 The University of Innsbruck was founded in 1669 and is one of Austrias oldest universities. Today, with over 28.000 students and 4.000 staff, it is western Austrias largest institution of higher education and research. For further information visit: www.uibk.ac.at. #nowplaying on #Spotify: Leveraging Spotify Information on Twitter for Artist Recommendations Martin Pichl , Eva Zangerle and Günther Specht

Upload: icwe2015

Post on 15-Aug-2015

32 views

Category:

Internet


0 download

TRANSCRIPT

Seite 1

Page 1

The University of Innsbruck was founded in 1669 and is one of Austria’s oldest universities. Today, with over 28.000 students and 4.000 staff, it is

western Austria’s largest institution of higher education and research. For further information visit: www.uibk.ac.at.

#nowplaying on #Spotify: Leveraging Spotify

Information on Twitter for Artist RecommendationsMartin Pichl, Eva Zangerle and Günther Specht

Seite 2

Page 2

Agenda

• Why Music Recommendations?

• Dataset Creation & Recommendation Approach

• Discussion and Future Work

Seite 3

Page 3

Recent Trends

• Rise of the web enabled new distribution channels

• Online Stores

• Music Streaming Platforms

• …

• These new distribution channels

– Exploit a word-wide market

– Virtually no inventory costs

→ More and more dives music is available

Seite 4

Page 4

Why (Music) Recommender Systems?

• The user is confronted with more and more diverse music

– on streaming platforms

– in online stores

– on mobile devices

• and has a free choice

• Users often do not know what to listen to

→ Information Overload

• Recommender Systems

– Helps users finding music they like

→ Increase usability

Seite 5

Page 5

Research on Music Recommender System

• Publicly available data necessary

• Twitter

– People share what they are listening at the moment

• Get additional information from Spotify

– Additional listening events

– Additional information about the tracks and the artists

– Additional information about the listening context

• The additional information is necessary to build a more

specialized recommender system

Seite 6

Page 6

Example Tweets

Seite 7

Page 7

The Dataset

• Generated dataset based Tweets that contains

– <UserID, ArtistID, TrackID>-triples

– Boolean preferences (listened/not listened)

• Cleaning

– Removed duplicates

– Removed certain accounts i.e. @SpotifyNowPlaying

– Removed “Various Artists”

Seite 8

Page 8

Dataset Snapshot

• Dataset contains

– 513,489 listening events

– by 68,045 unique users

– listening to 97,586 unique tracks

– by unique 40,593 artists

• Distribution

– In average 4.77 tweets per user (SD= 30.02)

– Median of 2

Seite 9

Page 9

Artist Recommendations using this Dataset

• No content based information

– Recommendations are computed using collaborative filtering

• Collaborative Filtering (CF)

– CF recommends items that the most similar users of a user

listened to (and are new to the user)

• CF relies on

– A user similarity measure

– A number of nearest neighbors 𝑘

Seite 10

Page 10

User Similarity

• Boolean Preferences

– Jaccard Coefficient is suitable

– 𝐽𝑎𝑐𝑐𝑎𝑟𝑑𝑖,𝑗 =𝑆𝑖 ∩ 𝑆𝑗

𝑆𝑖 ∪ 𝑆𝑗

• Include all the information available

– Compute Jaccard Coefficient using the artist listening history

– Compute Jaccard Coefficient using the track listening history

– Combined using an weighted average

• 𝑢𝑠𝑒𝑟𝑆𝑖𝑚 = 𝑤𝑎 ∗ 𝑎𝑟𝑡𝑖𝑠𝑡𝑆𝑖𝑚 + 𝑤𝑡 ∗ 𝑡𝑟𝑎𝑐𝑘𝑆𝑖𝑚

Seite 11

Page 11

Parameter Tuning

• Input Parameters

– 𝑤𝑎, 𝑤𝑡, 𝑘

– Optimized using a Genetic Algorithm (GA)

– Fitness = Precision of the recommender system

– In average a good solution was found after 4.14 iterations

(SD=2.27)

Seite 12

Page 12

Genetic Algorithm

• 𝑤𝑎,𝑤𝑡, 𝑘 are float point genes between 0 and 1 and form a

individual

• Random initial distribution

• The fitness of each individual is measured using the

precision

• Crossover and mutations of the best individual

• Terminate if the precision is 1 or a certain number of

generations is reach

Seite 13

Page 13

The Big Picture

Seite 14

Page 14

Evaluation Setup

• Offline Evaluation

– From each user we removed 1/3 of the listening events for

testing

– Recommended 𝑝 ∗ 𝑆𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑇𝑒𝑠𝑡𝑠𝑒𝑡 items

– Varied 𝑝 between 0 and 1

– Computed precision and recall for each 𝑝

• Parameters used for the Evaluation

– 𝑤𝑎 = 0.21

– 𝑤𝑡 = 0.94

– 𝑘 = 59

Seite 15

Page 15

Evaluation Metrics

• Hit: Item found in the testset

• 𝑝𝑟𝑒𝑐𝑖𝑠𝑜𝑛 =ℎ𝑖𝑡𝑠

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑐𝑜𝑚𝑚𝑒𝑛𝑑𝑎𝑡𝑖𝑜𝑛𝑠

• Relevant Items: All items in the testset

• 𝑟𝑒𝑐𝑎𝑙𝑙 =ℎ𝑖𝑡𝑠

𝑠𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡𝑠𝑒𝑡

Seite 16

Page 16

Performance of the optimized Recommender

Systemn Precision Recall

1 0.4656 0.0228

2 0.3622 0.0547

3 0.3137 0.0782

4 0.2812 0.1003

5 0.2531 0.1195

6 0.2315 0.1286

7 0.2170 0.1396

8 0.2170 0.1396

9 0.1871 0.1583

10 0.1871 0,1583

0

0,1

0,2

0,3

0,4

0,5

Pre

cisi

on

/ R

ecal

l

Number of Recommendations (% of the Testset)

Precision

Recall

Seite 17

Page 17

Discussion

• Heading into the right direction

• Performance is limited for a high number of

recommendations

– Data sparsity

– Too general approach

• Performance improvements with

– Reducing data sparsity

– Specialized algorithm that fits more to music

recommendation

Seite 18

Page 18

Next Steps towards a more specialized RS

• Match Spotify and Twitter Users

– Early experiments show that we can match ~ 10% of the

dataset

– Better matching than using the username and played tracks?

• Extract listening context from playlist names, i.e.

– Christmas

– Workout, training

– Driving

– …

Seite 19

Page 19

Next Steps towards a more specialized RS

• The offline evaluation is rather limited

• Create an intuitive webinterface

• Conduct a live user experiment

Seite 20

Page 20

Acknowledgments