automatic genre classification of music content [a survey]
DESCRIPTION
Automatic Genre Classification of Music Content [A survey]. Nicolas Scaringella, Giorgio Zoia, Daniel Mlynek, IEEE SIGNAL PROCESSING MAGAZINE MARCH 2006. By Yi-Tang Wang. Outline. Introduction Feature extraction techniques Genre classification paradigms Classification results - PowerPoint PPT PresentationTRANSCRIPT
Automatic Genre Classification of Music Content
[A survey]
Nicolas Scaringella, Giorgio Zoia, Daniel Mlynek,
IEEE SIGNAL PROCESSING MAGAZINEMARCH 2006
By Yi-Tang Wang
Outline
• Introduction
• Feature extraction techniques
• Genre classification paradigms
• Classification results
• Future directions & Conclusion
Introduction
• EMD (electronic music distribution)– Restoration of analog archives– New content– music catalogues become huge
• What do you want to listen ?– 1 million tracks online– Efficient ways to browse & organize
Introduction (cont.)
• Music Genres– Categories to characterize similarities– Boundaries are fuzzy
• Automatic Classification– Finding a taxonomy– Hierarchical set of categories– Nontrivial task
Critical issues
• Artists, Albums, or Titles– One song to one genre(?)– Albums - heterogeneous material– Artists - several albums– Same Titles?
• Nonagreement on Taxonomies– Allmusic, Amazon, Mp3
[2] F. Pachet and D. Cazaly, “A taxonomy of musical genres,” in Proc. Content-Based Multimedia Information Access (RIAO), Paris, France, 2000
Critical issues (cont.)
• ILL-Defined Genre Labels– Varied criteria (geographically, timely, etc) – Dependant on cultural
• Scalability of genre taxonomies– New genres appear frequently– Merging or splitting– Automatic system
Feature extraction techniques
• High-level model– Event-like format (MIDI)– Symbolic format (MusicXML)– Rarely availiable
• Low-level– Audio samples– Low level and low density of info
• Do feature extraction– Timbre, Melody, Harmony, Rhythm
Timbre
• Same pitch and loudness but sound different
• Features to characterize timbre– Temporal features– Energy features– Spectral shape features– Perceptual features– Some have been normalized in MPEG-7
Timbre (cont.)
Timbre (cont.)
• Transformations– new feature or increase dimensionality– Suggested transforming into logarithmic
decibel scale
• Texture window– Larger window– Reduce computation– Increase classification accuracy– 1s– Variant size and positions
Timbre (cont.)
• Texture model– model of features over texture window:
• 1) simple modeling with low-order statistics• 2) modeling with autoregressive model• 3) modeling with distribution estimation
algorithms (for example, EM estimation of a GMM of frames)
Melody & Harmony
• Melody– succession of pitched events– Horizontal element
• Harmony– pitch simultaneity, chords– Vertical element
Melody & Harmony (cont.)
• Pitch function– Characterizing pitch distribution– Amplitude, position of main peak, …– Unfolded
• Contains pitch content and info of its range
– Folded• Mapped to a single octave• Harmonic content
Rhythm
• No precise definition• Generically, all of the temporal aspects• Periodicity function
– Low level approach as pitch function• 1) tempo: periodicities typically in the range
0.3–1,5s (i.e., 200–40 bpm)• 2) musical pattern: periodicities between 2 and
6 s (corresponding to the length of one or more measure bar)
– Gouyon et al. get MFCCs-like descriptors
Extracting from segments
• Small segment may contain sufficient information
• Reduced required computation
• Typically 30s segment– and 30s after beginning
• Artist classification– Voice is easier to identify than music only
Local conclusion
• High level descriptors from polyphonic audio signal is not yet state of the art
• Focus on timbre modeling
• Timbre may contain sufficient info– 250ms : 53% , 3s : 72%– Among 10 genres
Local conclusion (cont.)
• Another point of view (pessimistic)– Timbre similarity measure & 20,000
titles distributed over 18 genres– Little correlation– May not scalable– Take cultrual features into account
Genre classification
• Expert systems
• Unsupervised approach– clustering
• Supervised approach– Machine learning algorithms
Expert systems
• A knowledge based system made up of a set of rules
• No model based on it so far
• Expensive to implement and maintain
• May yield unexpected interactions
Expert systems (cont.)• Pachet and Cazaly’s work
– State differences with language based, e.g. instrumentation
Unsupervised approach• Clustering with similarity measures• Similarity measures
– If time invariant• Euclidean distance or cosine distance
– Otherwise• Build statistical model (Gaussian or GMMs)
– Kullback-Leibler divergence, relative entropy– Sampling, Earth’s mover distance, asymptotic likelihood approximation
• Shao et al. use HMMs
Unsupervised approach
• Clustering algorithms– K-means– Shao et al.’s work
• agglomerative hierarchical clustering
– SOM (self-organizing map)• Artificial neural network• High dim onto lower dim• GHSOM (growing hierarchical SOM)
– Rauber et al.
Supervised approach
• A taxonomy of genres is given• VS. Expert System
– No rules (or description to genre)• Supervised machine learning algo
– KNN (K-Nearest Neighbor)– GMMs (Gaussian Mixture Models)– HMM (Hidden Markov Models)– LDA (Linear Discriminant Analysis)– SVMs (Support Vector Machines)– ANNs (Artificial Neural Networks)
Classification results
• MIREX genre classification contest– 1,005 / 510 songs over ten genres– 940 / 447 songs over six genres
Classification results
Future directions
• Classification into perceptual categories– Moods, emotions
• Novelty Detection– New or unknown data (not belong to any
class)
• Classification with multiple labels– Probably closer to human experience
• From taxonomies to folksonomies– Does the taxonomy fit to users
Conclusion
• Definitions of music genres are convoluted
• Features → classification → result• Research is evolving from purely objective
machine calculations to techniques• Machine learning plays a fundamental
role in classification domains
Thank You