Transcript
Page 1: Combining Audio Content and Social Context for Semantic Music Discovery

Combining Audio Content and Social Context for Semantic Music Discovery

José Carlos Delgado Ramos

Universidad Católica San Pablo

Page 2: Combining Audio Content and Social Context for Semantic Music Discovery

I. Introduction

II. Sources of Music Information

III. Combining multiple sources of music information

IV. Experiments

Page 3: Combining Audio Content and Social Context for Semantic Music Discovery

Introduction

• Most music IR system focus on either content-based analysis of audio signals

Page 4: Combining Audio Content and Social Context for Semantic Music Discovery

Introduction

• Or content-based analysis of webpages…

Page 5: Combining Audio Content and Social Context for Semantic Music Discovery

Introduction

• …user preference information…

Page 6: Combining Audio Content and Social Context for Semantic Music Discovery

Introduction

• … and social tagging data.

Page 7: Combining Audio Content and Social Context for Semantic Music Discovery

Tags

• Short text-based tokens• Helpful when describing songs

Page 8: Combining Audio Content and Social Context for Semantic Music Discovery

Tags

• Not always accurate, the strength of the semantic association betwen each song and each tag may vary.

Page 9: Combining Audio Content and Social Context for Semantic Music Discovery

Sources of semantic information

• Surveys

• Social tagging websites

• Annotation games

Page 10: Combining Audio Content and Social Context for Semantic Music Discovery

Relevance of tags to songs

• May be determined by using content-based audio analysis or by text-mining associated web documents.

Page 11: Combining Audio Content and Social Context for Semantic Music Discovery

Main sources for information retrieval

• Audio content, Social tags and Web documents

• Also used audio signal analysis by using two acoustic feature representations related to timbre and harmony.

Page 12: Combining Audio Content and Social Context for Semantic Music Discovery

Sources of Music Information

• A relevance score function r(s;t) is derived; evaluates the relevance of a song s to a tag t.

• Song-tag representations are dense if based on audio content, sparse if based on social representations.

Page 13: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Audio Content: Supervised Multiclass Labeling (SML)

• Audio track s represented as a bag of feature vectors X = {x1,x2,…,xT}

• 1: Expectation maximization algorithm • 2: Identify set of example songs with a given tag.• 3: Mixture-hiearchies expectation maximization

algorithm.

Page 14: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Audio Content: Supervised Multiclass Labeling (SML)

• Given a song s, X is extracted and likehood is evaluated using each of the tag GMMs.

• Result: vector or probabilites. Relevance of song s to a tag t may be written as:

Page 15: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Audio Content: Audio feature representations

• Mel Frequency Cepstral Coefficients (MFCC): associated with musical notion of timbre.

• Chroma: represents the armonic content (keys, chords) by computing spectral energy at frequences corresponding to chromatic scale.

Page 16: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Social Context:

• Summarize each song with annotation vector over a vocabulary of tags.

• Methods for retrieval tags: social & web-mined.• Missing song-tag pair: Tag not relevant or

relevant but not annotated.

Page 17: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Social Context:Social Tags

• Last.FM: Music discovery website.• 20 million users a month annotate 3.8 million

items over 50 million times using a 1.2 million tags universe.

• Last.FM db: 150 million songs/16 million artists.

Page 18: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Social Context:Social Tags

Page 19: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Social Context:Social Tags

• Two lists of social Last.FM tags for each song: relating song to tags, and relating artist to tags.

• Relevance Tsocial(s,t) = artist list tag scores + songs lists tag scores + tag score for synonyms or wildcard matches of t on either list.

Page 20: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Social Context:Web-Mined Tags

• Relevance Scoring (RS) algorithm.• Relevance function is a function of tag-

frequency, document frequency, number of total words in documents, etc

• Site-specific queries in HQ web-sites.• Steps: Collect Document Corpus and Tag songs

Page 21: Combining Audio Content and Social Context for Semantic Music Discovery

Combining multiple sources ofmusic information

• Given a query tag t, goal: fin a simple rank ordering of songs based on relevance to t.

• Tag-score, web-relevance score and convex optimization used.

• Three algorithms: supervised, use labeled traning data for learning.

Page 22: Combining Audio Content and Social Context for Semantic Music Discovery

Calibrated Score Averaging (CSA)

• Using training data, we can learn a function g() that calibrates scores such that

• To learn g(), we start with a rank-ordered training set of N songs where

• If data is is perfectly ordered, then g is isotonic. Otherwise:

Page 23: Combining Audio Content and Social Context for Semantic Music Discovery

Calibrated Score Averaging (CSA)

• E.g. 7 songs with relevant scores (1,2,4,5,6,7,9) and ground truth levels = (0,1,0,1,1,0,1)

• Then g(r) = 0 for r < 2, g(r) = ½ for 3<=r<6, g(r) = 2/3 for 6<=r<9 and g(r) = 1 for 9<=r.

• Missing song tags scores suggests tag isn’t relevant. Instead:

Page 24: Combining Audio Content and Social Context for Semantic Music Discovery

Rankboost algorithm

• For a given song, weak ranking function is n indicator functions that outputs 1 if the scoe for the associated representation is greater than the threshold or if the score is missing and the default value is set to 1. Otherwise 0.

Page 25: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• Linear combination of M different kernels that each encode different data features:

• Since each kernel matrix, Km is positive semi-definite, their positive-weighted sum, K is also a valid positive semi-definite kernel.

Page 26: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• Km represents similarities between all songs in the data set, after vectors X = {x1,x2,…,xT} obtained from MFCC and Chroma. Compute the entries of a probability product kernel (PPK)

Page 27: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• For each of the social context features, a radial basis function (RBF) function is computed, with entries:

• Where K(i,j) represents the similaritybetween xi and xj, the annotation vectors for songs i and j.

Page 28: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• For each tag t and corresponding class-label vector, y, the primal problem for single-kernel SVM is to find the decision boundary with maximum margin separating the two clases..

• Optimum K can be learned by minimizing the function that optimizes the dual (thereby maximizing hte margin) with respect to the kernel weights .

Page 29: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• Where and e is an n-vector of ones such that constrains the weights tu sum to one. C is a hyper parameter that limits violations of the margin.

Page 30: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• The solution returns a linear decision function that defines the distance of a new song sz, from the hyperplane boundary between the positive and negative classes (i.e. elevance of sz to tag t)

• b: offset of the decision boundary from the region.

Page 31: Combining Audio Content and Social Context for Semantic Music Discovery

Semantic Music Retrieval Experiments

• 500 songs by 500 unique artists, each annotated by a minimum of 3 individual from a 174-tag vocabulary.

• Song annotated: 80% agree with tag relevance.• Experiment: 72 tags associated with at least 20

songs each.

Page 32: Combining Audio Content and Social Context for Semantic Music Discovery

Thanks!


Top Related