automatic genre classification of music content [a survey]

Automatic Genre Classification of Music Content

[A survey]

Nicolas Scaringella, Giorgio Zoia, Daniel Mlynek,

IEEE SIGNAL PROCESSING MAGAZINEMARCH 2006

By Yi-Tang Wang

Outline

• Introduction

• Feature extraction techniques

• Genre classification paradigms

• Classification results

• Future directions & Conclusion

Introduction

• EMD (electronic music distribution)– Restoration of analog archives– New content– music catalogues become huge

• What do you want to listen ?– 1 million tracks online– Efficient ways to browse & organize

Introduction (cont.)

• Music Genres– Categories to characterize similarities– Boundaries are fuzzy

• Automatic Classification– Finding a taxonomy– Hierarchical set of categories– Nontrivial task

Critical issues

• Artists, Albums, or Titles– One song to one genre(?)– Albums - heterogeneous material– Artists - several albums– Same Titles?

• Nonagreement on Taxonomies– Allmusic, Amazon, Mp3

[2] F. Pachet and D. Cazaly, “A taxonomy of musical genres,” in Proc. Content-Based Multimedia Information Access (RIAO), Paris, France, 2000

Critical issues (cont.)

• ILL-Defined Genre Labels– Varied criteria (geographically, timely, etc) – Dependant on cultural

• Scalability of genre taxonomies– New genres appear frequently– Merging or splitting– Automatic system

Feature extraction techniques

• High-level model– Event-like format (MIDI)– Symbolic format (MusicXML)– Rarely availiable

• Low-level– Audio samples– Low level and low density of info

• Do feature extraction– Timbre, Melody, Harmony, Rhythm

Timbre

• Same pitch and loudness but sound different

• Features to characterize timbre– Temporal features– Energy features– Spectral shape features– Perceptual features– Some have been normalized in MPEG-7

Timbre (cont.)

Timbre (cont.)

• Transformations– new feature or increase dimensionality– Suggested transforming into logarithmic

decibel scale

• Texture window– Larger window– Reduce computation– Increase classification accuracy– 1s– Variant size and positions

Timbre (cont.)

• Texture model– model of features over texture window:

• 1) simple modeling with low-order statistics• 2) modeling with autoregressive model• 3) modeling with distribution estimation

algorithms (for example, EM estimation of a GMM of frames)

Melody & Harmony

• Melody– succession of pitched events– Horizontal element

• Harmony– pitch simultaneity, chords– Vertical element

Melody & Harmony (cont.)

• Pitch function– Characterizing pitch distribution– Amplitude, position of main peak, …– Unfolded

• Contains pitch content and info of its range

– Folded• Mapped to a single octave• Harmonic content

Rhythm

• No precise definition• Generically, all of the temporal aspects• Periodicity function

– Low level approach as pitch function• 1) tempo: periodicities typically in the range

0.3–1,5s (i.e., 200–40 bpm)• 2) musical pattern: periodicities between 2 and

6 s (corresponding to the length of one or more measure bar)

– Gouyon et al. get MFCCs-like descriptors

Extracting from segments

• Small segment may contain sufficient information

• Reduced required computation

• Typically 30s segment– and 30s after beginning

• Artist classification– Voice is easier to identify than music only

Local conclusion

• High level descriptors from polyphonic audio signal is not yet state of the art

• Focus on timbre modeling

• Timbre may contain sufficient info– 250ms : 53% , 3s : 72%– Among 10 genres

Local conclusion (cont.)

• Another point of view (pessimistic)– Timbre similarity measure & 20,000

titles distributed over 18 genres– Little correlation– May not scalable– Take cultrual features into account

Genre classification

• Expert systems

• Unsupervised approach– clustering

• Supervised approach– Machine learning algorithms

Expert systems

• A knowledge based system made up of a set of rules

• No model based on it so far

• Expensive to implement and maintain

• May yield unexpected interactions

Expert systems (cont.)• Pachet and Cazaly’s work

– State differences with language based, e.g. instrumentation

Unsupervised approach• Clustering with similarity measures• Similarity measures

– If time invariant• Euclidean distance or cosine distance

– Otherwise• Build statistical model (Gaussian or GMMs)

– Kullback-Leibler divergence, relative entropy– Sampling, Earth’s mover distance, asymptotic likelihood approximation

• Shao et al. use HMMs

Unsupervised approach

• Clustering algorithms– K-means– Shao et al.’s work

• agglomerative hierarchical clustering

– SOM (self-organizing map)• Artificial neural network• High dim onto lower dim• GHSOM (growing hierarchical SOM)

– Rauber et al.

Supervised approach

• A taxonomy of genres is given• VS. Expert System

– No rules (or description to genre)• Supervised machine learning algo

– KNN (K-Nearest Neighbor)– GMMs (Gaussian Mixture Models)– HMM (Hidden Markov Models)– LDA (Linear Discriminant Analysis)– SVMs (Support Vector Machines)– ANNs (Artificial Neural Networks)

Classification results

• MIREX genre classification contest– 1,005 / 510 songs over ten genres– 940 / 447 songs over six genres

Classification results

Future directions

• Classification into perceptual categories– Moods, emotions

• Novelty Detection– New or unknown data (not belong to any

class)

• Classification with multiple labels– Probably closer to human experience

• From taxonomies to folksonomies– Does the taxonomy fit to users

Conclusion

• Definitions of music genres are convoluted

• Features → classification → result• Research is evolving from purely objective

machine calculations to techniques• Machine learning plays a fundamental

role in classification domains

Thank You

automatic genre classification of music content [a survey]

Documents

positionstimbre cont

pitch content

organizeintroduction

rhythmtimbresame pitch

s segmentand

s corresponding

genreslocal conclusion

music genrescategories