overview of real-time pitch tracking approaches music information retrieval seminar mcgill...
TRANSCRIPT
Overview of Real-Time Overview of Real-Time Pitch Tracking Pitch Tracking
ApproachesApproaches
Music information retrieval Music information retrieval seminarseminar
McGill UniversityMcGill University
Francois ThibaultFrancois Thibault
Presentation GoalsPresentation Goals
Describe the requirements of RT Describe the requirements of RT pitch tracking algorithm for musical pitch tracking algorithm for musical applicationsapplications
Briefly introduce key developments Briefly introduce key developments in RT pitch tracking algorithmsin RT pitch tracking algorithms
Provide insight on what techniques Provide insight on what techniques might be more suitable for a given might be more suitable for a given applicationapplication
Pitch tracking Pitch tracking requirements in musical requirements in musical
contextcontext Must often function in real-timeMust often function in real-time Minimal output latencyMinimal output latency Accuracy in the presence of noiseAccuracy in the presence of noise Frequency resolutionFrequency resolution Flexibility and adaptability to various Flexibility and adaptability to various
musical requirements:musical requirements: Pitch rangePitch range Dynamic rangeDynamic range ……
Overview of techniquesOverview of techniques Time-domain methodsTime-domain methods
Autocorrelation Function (Rabiner 77)Autocorrelation Function (Rabiner 77) Average Magnitude Difference Function Average Magnitude Difference Function
(AMDF)(AMDF) Fundamental Period Measurement (Kuhn 90)Fundamental Period Measurement (Kuhn 90)
Frequency-domain methodsFrequency-domain methods Cepstrum (Noll 66)Cepstrum (Noll 66) Harmonic Product Spectrum (Schroeder 68)Harmonic Product Spectrum (Schroeder 68) Constant-Q transform (Brown 92)Constant-Q transform (Brown 92) Least-Squares fitting (Choi 97)Least-Squares fitting (Choi 97) Maximum Likelihood (McAulay 86, Puckette 98)Maximum Likelihood (McAulay 86, Puckette 98)
Other approaches…Other approaches…
Autocorrelation methodAutocorrelation method
Based on the fact that periodic signal will Based on the fact that periodic signal will correlate strongly with itself offset by the correlate strongly with itself offset by the fundamental period fundamental period
Measures to which extent a signal Measures to which extent a signal correlates with a time-shifted version of correlates with a time-shifted version of itselfitself
The time shifts which display peaks in the The time shifts which display peaks in the ACF corresponds to likely period estimateACF corresponds to likely period estimate
ø(t) = 1
N∑n=0
N-1
x(n) x(n + t)
Autocorrelation Autocorrelation Pros/ConsPros/Cons
Simple implementation (good for Simple implementation (good for hardware)hardware)
Can handle poor quality signals (phase Can handle poor quality signals (phase insensitive)insensitive)
Often requires preprocessing (spectral Often requires preprocessing (spectral flattening)flattening)
Poor resolution for high frequenciesPoor resolution for high frequencies Analysis parameters hard to tuneAnalysis parameters hard to tune Uncertainty between peaks generated by Uncertainty between peaks generated by
formants and periodicity of sound can lead formants and periodicity of sound can lead to wrong estimationto wrong estimation
AMDFAMDF Again based on the idea that a periodic Again based on the idea that a periodic
signal will be similar to itself when shifted signal will be similar to itself when shifted by fundamental periodby fundamental period
Similar in concept to ACF, but looks at Similar in concept to ACF, but looks at difference with time shifted version of with time shifted version of itselfitself
The time shifts which display valleys The time shifts which display valleys correspond to likely period estimatescorrespond to likely period estimates
psi(t) = 1
N∑n=0
N-1
x(n) - x(n + t)
AMDF Pros/ConsAMDF Pros/Cons Poor frequency resolutionPoor frequency resolution Even simpler implementation then ACF Even simpler implementation then ACF
(good for hardware)(good for hardware) Less computationally expensive then ACFLess computationally expensive then ACF Combination of AMDF and ACF yields Combination of AMDF and ACF yields
result more robust to noise (Kobayashi result more robust to noise (Kobayashi 95)95)
f(t) = ø(t)
psi(t) + k
Fundamental Period Fundamental Period Measurement approachMeasurement approach
Signal is first ran through bank of half-Signal is first ran through bank of half-octave bandpass filtersoctave bandpass filters
If filters are sharp enough, the output of If filters are sharp enough, the output of one filter should display the input one filter should display the input waveform freed of its upper partials waveform freed of its upper partials (nearly sinusoidal)(nearly sinusoidal)
It is up to a decision algorithm to decide It is up to a decision algorithm to decide which filter output corresponds to which filter output corresponds to fundamental frequencyfundamental frequency
Time between zero crossings of that filter Time between zero crossings of that filter output determines period output determines period
FPM Pros/ConsFPM Pros/Cons
Easy implementation (hardware and Easy implementation (hardware and software)software)
Efficiency of computationEfficiency of computation Decision algorithm highly dependent Decision algorithm highly dependent
on thresholdson thresholds But, automatic threshold setting But, automatic threshold setting
provided for most situationsprovided for most situations
Cepstrum approachCepstrum approach Tool often used in speech processingTool often used in speech processing Cepstrum is defined as power spectrum of Cepstrum is defined as power spectrum of
logarithm of the power spectrumlogarithm of the power spectrum Clearly separate contribution of vocal Clearly separate contribution of vocal
tract and excitationtract and excitation A strong peak is displayed in the A strong peak is displayed in the
excitation part (high cepstral region) at excitation part (high cepstral region) at the fundamental frequencythe fundamental frequency
Use a peak picker on cepstrum and Use a peak picker on cepstrum and translate quefrency into fundamental translate quefrency into fundamental frequencyfrequency
Cepstrum Pros/ConsCepstrum Pros/Cons
Less confusion between candidates Less confusion between candidates than in ACFthan in ACF
Proven method, especially suitable Proven method, especially suitable for signal easily characterized by for signal easily characterized by source-filter models (e.g. voice)source-filter models (e.g. voice)
Relatively computationally intensive Relatively computationally intensive (2 FFTs)(2 FFTs)
Harmonic Product Harmonic Product Spectrum approachSpectrum approach
Measures the maximum coincidence Measures the maximum coincidence of harmonics for each spectral frameof harmonics for each spectral frame
Resulting periodic correlation array is Resulting periodic correlation array is searched for maximum which should searched for maximum which should correspond to fundamental frequencycorrespond to fundamental frequency
Algorithm ran for octave correctionAlgorithm ran for octave correction
Y(w) = Prod X(wr) Ÿ = max (Y(wi))
HPS Pros/ConsHPS Pros/Cons
Simple to implementSimple to implement Does well under wide variety of Does well under wide variety of
conditionsconditions Poor low frequency resolutionPoor low frequency resolution Computing complexity augmented Computing complexity augmented
by zero padding required for by zero padding required for interpolation of low frequenciesinterpolation of low frequencies
Requires post-processing for error Requires post-processing for error correctioncorrection
Constant-Q transform Constant-Q transform approachapproach
First computes the Constant-Q First computes the Constant-Q transform to obtain constant pattern transform to obtain constant pattern in log frequency domain (Q = fc/bw)in log frequency domain (Q = fc/bw)
Compute the cross-correlation with a Compute the cross-correlation with a fixed comb pattern (ideal partial fixed comb pattern (ideal partial positions for given fundamental positions for given fundamental frequency)frequency)
Peak-pick the result to obtain Peak-pick the result to obtain fundamental frequencyfundamental frequency
Constant-Q Pros/ConsConstant-Q Pros/Cons
Complexity of constant-Q reduced Complexity of constant-Q reduced but still… (Brown and Puckette 91)but still… (Brown and Puckette 91)
Sensitive to octave errorsSensitive to octave errors Other peaks could be candidatesOther peaks could be candidates
Least-Squares fitting Least-Squares fitting approachapproach
Perform least-squares spectral analysis --> Perform least-squares spectral analysis --> minimize error by fitting sinusoids to the minimize error by fitting sinusoids to the signal segment signal segment
Strong sinusoidal components are identified as Strong sinusoidal components are identified as sharp valleys in least-square error signalsharp valleys in least-square error signal
Relatively few evaluation of the error signal Relatively few evaluation of the error signal are required to identify a valleyare required to identify a valley
Fundamental frequency is obtained as average Fundamental frequency is obtained as average of partial frequencies over their partial of partial frequencies over their partial numbernumber
Uses rectangular windowing to provide faster Uses rectangular windowing to provide faster responseresponse
LS fitting Pros/ConsLS fitting Pros/Cons
Operates on shorter frame segmentsOperates on shorter frame segments Best option for real-time applications Best option for real-time applications
with minimum latency requirementswith minimum latency requirements Efficient evaluation scheme allows Efficient evaluation scheme allows
reasonable computation complexityreasonable computation complexity
Maximum LikelihoodMaximum Likelihood
Maximum likelihood algorithm Maximum likelihood algorithm searches trough a set of possible ideal searches trough a set of possible ideal spectra and chooses closest match spectra and chooses closest match (Noll 69) (Noll 69)
Was adapted to sinusoidal modeling Was adapted to sinusoidal modeling theory, by finding best fit for harmonic theory, by finding best fit for harmonic partials sets to the measured model partials sets to the measured model (McAulay 86)(McAulay 86)
Enhance discrimination by suppressing Enhance discrimination by suppressing partials of small amplitude valuespartials of small amplitude values
ML Pros/ConsML Pros/Cons
Inherits high computational Inherits high computational requirement from sinusoidal requirement from sinusoidal modelingmodeling
Very robust estimationVery robust estimation Allows guess of fundamental Allows guess of fundamental
frequency even with several partials frequency even with several partials missing.missing.
Other approachesOther approaches
Neural Nets (Barnar 91)Neural Nets (Barnar 91) Hidden Markov Models (Doval 91)Hidden Markov Models (Doval 91) Parrallel processing approaches Parrallel processing approaches
(Rabiner 69)(Rabiner 69) Fourier of Fourier transforms Fourier of Fourier transforms
(Marchand 2001)(Marchand 2001) Two-way mismatch model (Cano 98)Two-way mismatch model (Cano 98) Subharmonic to harmonic ratio (Sun Subharmonic to harmonic ratio (Sun
2000)2000)
ConclusionsConclusions
Lot of research still… Motivated by Lot of research still… Motivated by speech telecommunicationspeech telecommunication
Abundant literature since 1950Abundant literature since 1950 Complete and objective performance Complete and objective performance
overviews seems missingoverviews seems missing Combination of techniques in parallel Combination of techniques in parallel
processing seems foreseeable with processing seems foreseeable with today’s fast computers today’s fast computers