improving musical genre classification with rbf networks douglas turnbull department of computer...

Improving Musical Genre Classification with RBF Networks

Douglas TurnbullDepartment of Computer Science and Engineering

University of California, San Diego

June 4, 2003

motivation:goal:

The goal of this project is to improve automatic musical classification by genre.

previous work:

A method proposed by Tzanetakis and Cook extract high level features from a large database of songs and then use Gaussian Mixture Model (GMM) and K-nearest neighbor (KNN) classifiers to decide the genre of a novel song.

idea:

Use the existing audio feature extraction technology but improve the classification accuracy using Radial Basis Function (RBF) networks.

motivation:secondary goal:

Find techniques for improving RBF network performance.

previous work:

An RBF network is commonly used classifier in machine learning. We would like to explore ways to improving their ability to classify novel data.

ideas:

Merge supervised and unsupervised initialization methods for basis functions parameters.

Use with feature subset selection methods to eliminate unnecessary features.

audio feature extraction:

feature vector:

…1001011001…

music:

digital signal:

feature extraction: MARSYASDigital Signal Processing

http://images.google.com/imgres?imgurl=members.tripod.com/~stanislavs/p-music/cd.jpg&imgrefurl=http://members.tripod.com/~stanislavs/music/music.htm&h=233&w=233&prev=/images%3Fq%3Dmusic%2BCD%26svnum%3D10%26hl%3Den%26lr%3D%26ie%3DUTF-8

MARSYAS:

x1 xDxi

Extraction of 30 features from a 30-second audio tracks

Timberal Texture (19):

• music-speech discrimination

Rhythmic Content (6):

• beat strength, amplitude, tempo analysis

Pitch Content (5):

• frequency of dominant chord, pitch intervals

For this application, the dimension D of our feature vector is 30.

radial basis functions:

Inputs:x1 xDxi

Basis Functions: ΦΦj ΦMΦ1

A radial basis function measure how far an input vector (x) is from a prototype vector (μ). We use Gaussians for our M basis functions.

We will see three method for initializing the parameters – (μ , σ).

linear discriminant:


Outputs: y yk yCy1

wkjw11Weights: W

The output vector is a weighted sum of the basis function:

We find the optimal set of weights (W) by minimizing the sum of squares error function using a training set of data:

Where the target value, , is 1 if the nth data point belongs to the kth class. Otherwise, is 0.

a radial basis function network:

Inputs: xx1 xDxi


Outputs: y yk yCy1

Targets: t t1 tk tC

wkjw11Weights: W

constructing RBF networks:1. number of basis functions

• Too few make it hard to separate data

• Too many can cause over-fitting

• Depends on initialization method

2. initializing parameters to basis functions - (μ , σ).

• unsupervised

1. K-means clustering (KM)

• supervised

2. Maximum Likelihood for Gaussian (MLG)

3. In-class K-means clustering (ICKM)

• use above methods together

3. improving parameters of the basis functions - (μ , σ).

• Use gradient descent

gradient descent on μ , σ :

We differentiate our error function

with respect to σj and mji

We then update σj mji: by moving down the error surface:

The learning rate scale factors, η1 and η2 , decrease each epoch.

constructing RBF networks:

1. number of basis functions

2. initializing parameters to basis functions - (μ , σ).

3. improving parameters of the basis functions - (μ , σ).

4. feature subset selection• There exists noisy and/or harmful features that hurt network

performance. By isolating and removing these feature, we can find better networks.

• We also may wish to sacrifice accuracy to create a more robust network requiring less computation during training.

• Three heuristics for ranking features

• Wrapper Methods

• Growing Set (GS) Ranking

• Two-Tuple (TT) Ranking

• Filter Method

• Between-Class Variance (BCV) Ranking

growing set (GS) ranking:

A greedy heuristic that adds next best feature to a growing set of features:

This method requires the training of |D|2/2 RBF network where the first D networks use 1 feature, the next D-1 networks use 2 features, …

two-tuple (TT) ranking:

This greedy heuristic that finds the classification accuracy for network that uses every combination of two features. We select that first two feature that produce the best classification result. The next feature is the feature that has the largest minimum accuracy when used with the first two features, and so on.

This method also requires the training of |D|2/2 RBF network, but all networks are trained using only 2 features.

between-class variance (BCV) ranking:

Unlike the previous two method, it does not require training RBF networks. It can be compute in a matter of seconds as opposed to matter of minutes.

fbad

fgood

This method that ranks based on the between-class variance. The assumption is that if class averages are far from the average across all of the data for a particular feature, that feature will be useful for separating novel data.

music classification with RBF networks:

experimental setup:

• 1000 30-second songs – 100 song per genre

• 10 genres - classical, country, disco, hip hop, jazz, rock, blues, reggae, pop, metal

• 30 feature extracted / song – timbral texture, rhythmic content, pitch content

• 10-fold cross validation

results:

a. comparison of initialization method (KM, MLG, ICKM) with and without using gradient descent.

b. comparison of feature ranking methods (GS, TT, BCV).

c. table of best classification results

basis function initialization methods:

MLG does as well as the other method with fewer basis functions

feature ranking methods:

Growing Set (GS) ranking outperforms the other methods

results table:

observations:1. Multiple initialization method produces better classification than using

only one initialization method.2. Gradient descent boosts performance.3. Subsets of feature produce better results than using all of the features.

comparison with previous results:RBF networks:

70.9%* (std 0.063) GMM with 3 Gaussians per class (Tzanetakis & Cook 2001):

61% (std 0.04)

Human classification in similar experiment (Tzanetakis & Cook 2001):

70%

Support Vector Machine (SVM) (Li & Tzanetakis 2003):

69.1% (std 0.053)

Linear Discriminant Analysis (LDA) (Li & Tzanetakis 2003):

71.1% (std 0.073)

*(Found construction a network with MLG using 26 features (Experiment J) with gradient descent for 100 epochs)

discussion:

1. created more flexible musical labelsIn is not our opinion that music classification is limited to ~70% but rather that

the data set used is the limiting factor. The next steps are to find a better system for labeling music and then to create data set that uses the new labeling system. This involves working with experts such as musicologists. However, two initial ideas are:

1. Non-mutually exclusive genres2. A rating system based on the strength of relationship is between a

song and each genreThese ideas are cognitively plausible in that we naturally classify music into a

number of genres, streams, movements and generation that are neither mutually exclusive nor always agreed upon.

Both of these ideas can be easily be added handled by RBF network by altering the target vectors.

discussion:

2. larger features sets and feature subset selection

Borrowing from computer vision, one technique that has been successful is to automatically extract tens of thousands of features and then use features subset selection for find a small set (~30) of good features.

Computer Vision Features:select sub-images of different sizes an locationsalter resolution and scale factors.apply filters (e.g. Gabor filters)

Computer Audition Analogs:select sound samples of different lengths and starting locationsalter pitches and tempos within the frequency domainapply filters (e.g. comb filters)

Future work will involve extracting new features and improving existing feature subset selection algorithms.

The End

improving musical genre classification with rbf networks douglas turnbull department of computer...

Documents