gct731 fall 2014 topics in music technology - music information retrieval tonal harmony and chord...

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Tonal Harmony and Chord Recognition Juhan Nam 1

Outlines Tonal Harmony Tonality Critical Bands and Consonance Perceptual Distance of Two Tones Chords and Scales Chroma and Chroma Features Pitch helix and Chroma FFT-based approach Filter-bank approach Key Estimation Chord Recognition 2

Introduction 3 Bachs Chorale Harmonization Jazz Real book Pop Music

Tonality Tonal music has a tonal center called a key 12 keys (C, C#, D, , B) Notes on a music scale has different roles given a key note. E.g., C major scale A sequence of notes or chord progressions provide certain degree of stability or instability E.g., cadence (V-I, ), tension (sus2, sus4) Why the tonality is formed? In other words, why we perceive different degrees of stability or tension from notes? 4

Critical Bands and Consonance Critical Bands Bandwidth within which two sinusoidal signals interact If one is less than a certain level of the other, it is masked Otherwise, they create beats or harshness Consonance and Dissonance If two sinusoidal tones are within a critical band, they become dissonant. Otherwise, they are consonant. A single tone can sound dissonant: e.g. impulse train 5 Deflation of basilar membrane for a 200 Hz wave Tonotopic organization of the cochlea

Perceptual Distance of Two Tones Critical bands are a little less than 3 semitones (minor 3 rd ) Two sinusoidal tones whose F0s are within 3 ST become dissonant. Most dissonant when apart about one quarter of the critical band. Critical bands become wider below 500 Hz; two low notes can sound dissonant. Consonance of two harmonics tones Determined by how much two tones have closely-located overtones within critical bands 6 First eight harmonics of two tones a fifth apart

Consonance Rating of Intervals in Music Perceptual distance between two notes are different from semi-tone distance between them. 7

Chords The basic units of tonal harmony Triads, 7 th, 9 th, 11 th, Triads are formed by choosing three notes that make the most consonant (or most harmonized) sounds This ends up with stacking up major or minor 3rds 7 th, 9 th are obtained by stacking up 3rds more. The quality of consonance becomes more sophisticated as more notes are added Music theory is basically about how to make tensions and resolve it with different quality of consonance 8

Scales in Tonal Harmony Major Scale Formed by spreading notes from three major chords Minor scale Formed by spreading notes from three minor chords (natural minor scale) Harmonic or melodic minor scale can be formed by using both minor and major chords 9

Chord Recognition Identifying chord progression of tonal music It is a very challenging task Chords are not explicit in music Non-chord notes or passing notes Key change and chromaticism: requires in-depth knowledge of music theory In audio, multiple musical instruments are mixed Relevant: harmonically arranged notes Irrelevant: percussive sounds (but can help detecting chord changes) What kind of audio features can be extracted to recognize chords in a robust way? 10

Pitch Helix The basic assumption in tonal harmony is that octave-distance notes belong to the same pitch class No dissonance among them As a result, there are 12 pitch class Shepard represented the octave equivalence with pitch helix Chroma: represents the inherent circularity of pitch organization Height: naturally increase and have one octave apart for one rotation 11 Pitch Helix and Chroma (Shepard, 2001)

Chroma Chroma is independent of the height Shepard tone: single pitch class in harmonics Constant rising and falling Chroma contains the relative distribution of pitch classes and pitch height is noisy variation in chord recognition Thus, chroma is considered to be well-suited for analyzing harmony. 12 Optical illusion stairsShepard tone

Chroma Features Chroma features are audio feature vectors that contain the chroma characteristics Ideally, obtained by polyphonic note transcription but too expensive In addition, as notes are more harmonized, separating polyphonic notes become harder In practice, chroma features are obtained by projecting all time-frequency energy onto 12 pitch classes Used for not only for chord recognition but also key estimation, segmentation, synchronization, cover-song detection 13

Chroma Features: FFT-based approach Compute spectrogram Compute mapping matrix Convert frequency to music pitch scale and get the pitch class Set one to the corresponding pitch class and, otherwise, set zero Adjust non-zeros values such that low-frequency content have more weights 14

Improvements Blurring Intrinsic problem with STFT Solutions: find amplitude peaks and use them only De-tuning Notes can be deviated from reference tuning Compute 36 bin chroma features: add two neighboring bins to each pitch class Use only a peak value among the three bins per pitch class Normalization Divide the frame chroma features by the local maximum or mean to regularize the volume change 15

Chroma Features: Filter-bank approach Alternatively, a filter-bank can be used to get a log-scale time- frequency representation Center frequencies are arranged over 88 piano notes band widths are set to have constant-Q and robust to +/- 25 cent detune The outputs that belong to the same pitch class are wrapped and summed. 16 (Muller, 2011)

Beat-Synchronous Chroma Features Make chroma features homogeneous within a beat (Bartsch and Wakefield, 2001) 17 (From Ellis slides)

Key Estimation Overview Estimate music key from music data One of 24 keys: 12 pitch classes (C, C#, D,.., B) + major/minor General Framework (Gomez, 2006) 18 G major Similarity Measure Chroma Features Average Key Template Key Strength

Key Template Probe tone profile (Krumhansl and Kessler, 1982) Relative stability or weight of tones Listeners rated which tones best completed the first seven notes of a major scale. For example, in C major key, C, D, E, F, G, A, B, what? 19 Probe Tone Profile - Relative Pitch Ranking

Key Estimation Similarity by cross-correlation between chroma features and templates Find the key that produces the maximum correlation 20

Chord Recognition Estimate chords from music data Typically, one of 24 keys: 12 pitch classes + major/minor Often, diminish chords are added (36 chords) General Framework 21 Chords Decision Making Audio/ Transform Chroma Features Chord Template or Models Template Matching HMM, SVM

Template-Based Approach Use chord templates (Fujishima, 1999; Harte and Sandler, 2005) and find the best matches Chord Templates 22 (from Bellos Slides)

Template-Based Approach Compute the cross-correlation between chroma features and chord templates and select chords that have maximum values 23 (from Bellos Slides)

Problems Template approach is too straightforward The binary templates are hard assignments Temporal dependency of chords is not considered The majority of tonal music have certain types of chord progression The recognized chords are not smooth Some post-processing (smoothing) is necessary 24

Hidden Markov Model (HMM) A probabilistic model of time series Speech, gesture, DNA sequence, financial data, weather data, Assumes that Time series data are generated from hidden states The hidden states follows Markov model Learning-based approach Need training data annotated with labels The labels usually correspond to hidden states. 25

Markov Model A random variable q has N states (s 1, s 2, , s N ) and, at each time step, one of the states are chosen. The probability distribution for the next state is determined only by the current state (the first-order Markov model) Thus, joint probability of a sequence of states is simplified as P(q=s 1, s 2, s 3,, s N )=P(q 1 =s 1 )P(q 2 =s 2 |q 1 =s 1 ) P(q 3 =s 3 |q 2 =s 2 ) P(q N =s N |q N-1 =s N-1 ) Example: chord recognition 26 F C G P(q t+1 =C|q t =F) = 0.2 P(q t+1 =F|q t =F) = 0.6 P(q t+1 =G|q t =F) = 0.2 P(q t+1 =C|q t =G) = 0.3 P(q t+1 =F|q t =G) = 0.1 P(q t+1 =G|q t =G) = 0.6 P(q t+1 =C|q t =C) = 0.7 P(q t+1 =F|q t =C) = 0.1 P(q t+1 =G|q t =C) = 0.2 St End

What can we do with a Markov Model ? Generate a chord sequence E.g.) C C C C F F C C G G C C - (beat-wise) Evaluate if a chord progression is more likely than others P(q=C,G,C) is more likely than P(q=C,F,C) (P(q=C) = 1 P(q=C,G,C)= P(q 1 =C)P(q 2 =G|q 1 =C)P(q 3 =C|q 2 =G) = 1*0.2*0.3 =0.06, P(q=C,F,C)=P(q 1 =C) P(q 2 =F|q 1 =C) P(q 3 =C|q 2 =F) = 1*0.1*0.2 =0.02 Compute the probability that the chord at time T is C (or F or G) Stupid method: count all paths that reach C chord at time T: exponential Clever method: use recursive induction P(q T =C)= P(q T =C|q T-1 =C)P(q T-1 =C)+ P(q T =C|q T-1 =F)P(q T-1 =F) + P(q T =C q T- 1 =G)P(q T-1 =G) Repeat this for P(q i =C), P(q i =F) and P(q i =G) where i is T-1, T-2, T-3, 27

HMM for Chord Recognition What we observe are not chords but audio features We are going to treat chords as hidden states Infer chords from audio features (i.e. chroma features) Hidden Markov Model Hidden states follow the Markov model Given a state, the corresponding observation distribution is independent of previous states or observations Model parameters Initial state probabilities: P (q 0 ) Transition probability matrix: P (q j |q i ) or a ij (first-order Markov) Observation distribution given a state: P (O|q j ) or b j (e.g. Gaussian) 28 q t-1 qtqt q t+1 o t-1 o t+1... F C G P (O|q t )

Training HMM for Chord Recognition Model parameters are trained with labeled data If labeled every time frame easy to train but expensive to obtain such data Transition probability: count chord-to- chord transition and normalize them Observation distribution: fit chroma features to single Gaussian or Gaussian mixture model (GMM) for each chord If labeled without time information Use the Baum-Welch: the forward- backward algorithm (expectation maximization) 29 Chord Transition Probability Matrix (Lee, 2008)

Evaluating HMM for Chord Recognition Find the most likely sequence of hidden states given observations and HMM model parameters Viterbi algorithm Define a probability variable Initialization: Recursion: Termination: 30 (from start state) (to end state)

The Viterbi Trellis 31 C F G St... C F G End t=1t=2 t=3 C F G C F G C F G t=T-1 t=T Recall the Dynamic Programming!

Chord Recognition Result Trained with the Beatles data set (141 songs) Viterbi: 71.5%, maximum likelihood (without Markov model) : 44.9 % 32 trueViterbi ML (From Ellis E4896 practicals)

Demo Yanno: chord recognition for Youtube videos http://yanno.eecs.qmul.ac.uk/ http://yanno.eecs.qmul.ac.uk/ 33

References P. R. Cook (Editor), Music, Cognition, and Computerized Sound: An Introduction to Psychoacoustics, book, 2001 C. Krumhansl, Cognitive Foundations of Musical Pitch, 1990 M.A. Bartsch and G. H. Wakefield,To catch a chorus: Using chroma-based representations for audio thumbnailing, 2001 E. Gomez, P. Herrera, Estimating The Tonality Of Polyphonic Audio Files: Cognitive Versus Machine Learning Modeling Strategies, 2004. M. Mller and S. Ewert, Chroma Toolbox: MATLAB Implementations for Extracting Variants of Chroma-Based Audio Features, 2011. T. Fujishima, Real-time chord recognition of musical sound: A system using common lisp music, 1999 A. Sheh and D. Ellis, Chord Segmentation and Recognition using EM- Trained Hidden Markov Models, 2003. K Lee, M Slaney, Acoustic Chord Transcription and Key Extraction from Audio Using Key-Dependent HMMs Trained on Synthesized Audio, 2008 34

gct731 fall 2014 topics in music technology - music information retrieval tonal harmony and chord...

Documents