#nowplaying music dataset:extracting listening behavior from twitter

22
Seite 1 1 The University of Innsbruck was founded in 1669 and is one of Austrias oldest universities. Today, with over 28.000 students and 4.000 staff, it is western Austrias largest institution of higher education and research. For further information visit: www.uibk.ac.at. #nowplaying Music Dataset: Extracting Listening Behavior from Twitter Eva Zangerle, Martin Pichl, Wolfgang Gassler, Günther Specht

Upload: evazangerle

Post on 04-Jul-2015

202 views

Category:

Science


1 download

DESCRIPTION

#nowplaying Music Dataset: Extracting Listening Behavior from Twitter

TRANSCRIPT

Page 1: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 1

1

The University of Innsbruck was founded in 1669 and is one of Austria’s oldest universities. Today, with over 28.000 students and 4.000 staff, it is

western Austria’s largest institution of higher education and research. For further information visit: www.uibk.ac.at.

#nowplaying Music Dataset:

Extracting Listening Behavior from Twitter

Eva Zangerle, Martin Pichl, Wolfgang Gassler, Günther Specht

Page 2: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 2

2

Motivation

• Evaluation of Music Recommender Systems, Music Information RetrievalSystems

• User study (qualitative)

• Automatic evaluation (quantitative)

• Evaluation dataset requirements

• Up-to-date

• Large size (sparsity!)

• Publicly available

• Facilitation of social media data hardly considered for such evaluations[Schedl et al., Bertin-Mathieux et al.]

Page 3: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 3

3

Comparison to other Datasets

Name Type Entries #Artists #Tracks #Users Upd.

Celma 1K User Streams 19,150,819 174,090 1,0,84,865 992

Celma 360K User Streams 17,559,530 292,557 --- 359,349

MMTD User Streams 1,086,808 25,060 133,968 15,735

MSD Audio 1,000,000 44,745 1,000,000 ---

MusicMicro User Streams 594,306 19,529 71,400 136,866

HetRec Ratings 92,384 17,632 --- 1,892

Yahoo! Ratings 717,872,016 9,441 136,735 1,800,000

Page 4: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 4

4

Why not use Twitter?

Page 5: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 5

5

#nowplaying on Twitter

Page 6: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 6

6

Crawling Data from Twitter

• Crawl public API for #nowplaying, #np, #listeningto• Twitter Spritzer• Crawling since 2011/07/11• 140 million raw tweets

Quality?

Reference Dataset Link with other sources?

Page 7: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter
Page 8: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 8

8

Cleaning the Dataset

• Examples:Sheryl Crow – If It Makes You #nowplaying

Sheryl Crow – If it Makes You Happy

http://t.co/qNr8zeoQTj <-- LIKE THE FACEBOOK PAGE!!!

#teamfollowback #follow #instagram #nowplaying

@NickiMinaj #PinkFridayTour #Setlist

NICKI MINAJ - Pink Friday Tour

http://t.co/ifGX8BJ11D #NowPlaying

Solution: match with MusicBrainz

Page 9: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 9

9

Server

• Icinga monitoring• D2R mapping

Page 10: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 10

10

The University of Innsbruck was founded in 1669 and is one of Austria’s oldest universities. Today, with over 28.000 students and 4.000 staff, it is

western Austria’s largest institution of higher education and research. For further information visit: www.uibk.ac.at.

#nowplaying-Dataset

Page 11: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 11

11

Dataset – Extracted Elements

ListeningEvents(Tweets)

Geo-information

Tracks

Artists

MusicBrainz

User (Hash)

Timestamp

Source

Page 12: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 12

12

Dataset Overview (as of 2014/10/31)

Name Number

ListeningEvents 57,963,410

Tracks distinct 1,429,627

Artists distinct 149,765

Users distinct 4,809,337

Avg. LE per user 12 (SD=680.13, M=1)

Avg. LE per track 40 (SD=606, M=3)

Avg. LE per artist 388 (SD=3844, M=8)

Avg. new listeningEvents per day 64,278 (SD=70302, M=15831)

Page 13: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 13

13

Longtail Distributions

Page 14: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 14

14

Longtail Distributions

Page 15: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 15

15

Artists and Sources

Top-10 Artists

Rihanna

Coldplay

Taylor Swift

Bruno Mars

One Direction

Maroon 5

Adele

Drake

Katy Perry

Eminem

Top-10 Sources

Securenet Systems Radio Playlist Update

Spotify

Web

Twitter for iPhone

SAM Broadcaster Song Info

Twitter for Android

iOS

BigURL

Twitter for Blackberry

Now Playing

Page 16: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 16

16

Last.fm Tags & Genres

Page 17: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 17

17

Comparison to other Datasets

Name Type Entries #Artists #Tracks #Users Upd.

#nowplaying User Streams 57,963,410 149,765 1,429,627 4,809,337

Celma 1K User Streams 19,150,819 174,090 1,0,84,865 992

Celma 360K User Streams 17,559,530 292,557 --- 359,349

MMTD User Streams 1,086,808 25,060 133,968 15,735

MSD Audio 1,000,000 44,745 1,000,000 ---

MusicMicro User Streams 594,306 19,529 71,400 136,866

HetRec Ratings 92,384 17,632 --- 1,892

Yahoo! Ratings 717,872,016 9,441 136,735 1,800,000

Page 18: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 18

18

The University of Innsbruck was founded in 1669 and is one of Austria’s oldest universities. Today, with over 28.000 students and 4.000 staff, it is

western Austria’s largest institution of higher education and research. For further information visit: www.uibk.ac.at.

Accessing the Dataset

Page 19: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 19

19

Access to the Dataset

• dbis-nowplaying.uibk.ac.at

• HTML View• SPARQL Endpoint• RDF Dump• RDF Online Browser• Online SPARQL Query Interface

Page 20: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 20

20

Access to the Dataset

Page 21: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 21

21

Conclusion

• dbis-nowplaying.uibk.ac.at• Steadily growing dataset• Available freely via API

• Open problems:

• Only 30% resolvable against MusicBrainz

• No rating involved

• Which further information do you need?• Further interfaces?

Page 22: #nowplaying Music Dataset:Extracting Listening Behavior from Twitter

Seite 22

22

Interested in working with us?

Questions?

Contact and Social Media@[email protected]://www.evazangerle.at

http://dbis-informatik.uibk.ac.at@dbisibkhttps://www.facebook.com/dbisibk