overview of real-time pitch tracking approaches music information retrieval seminar mcgill...

22
Overview of Real- Overview of Real- Time Pitch Tracking Time Pitch Tracking Approaches Approaches Music information retrieval Music information retrieval seminar seminar McGill University McGill University Francois Thibault Francois Thibault

Upload: brian-parker

Post on 16-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Overview of Real-Time Overview of Real-Time Pitch Tracking Pitch Tracking

ApproachesApproaches

Music information retrieval Music information retrieval seminarseminar

McGill UniversityMcGill University

Francois ThibaultFrancois Thibault

Presentation GoalsPresentation Goals

Describe the requirements of RT Describe the requirements of RT pitch tracking algorithm for musical pitch tracking algorithm for musical applicationsapplications

Briefly introduce key developments Briefly introduce key developments in RT pitch tracking algorithmsin RT pitch tracking algorithms

Provide insight on what techniques Provide insight on what techniques might be more suitable for a given might be more suitable for a given applicationapplication

Pitch tracking Pitch tracking requirements in musical requirements in musical

contextcontext Must often function in real-timeMust often function in real-time Minimal output latencyMinimal output latency Accuracy in the presence of noiseAccuracy in the presence of noise Frequency resolutionFrequency resolution Flexibility and adaptability to various Flexibility and adaptability to various

musical requirements:musical requirements: Pitch rangePitch range Dynamic rangeDynamic range ……

Overview of techniquesOverview of techniques Time-domain methodsTime-domain methods

Autocorrelation Function (Rabiner 77)Autocorrelation Function (Rabiner 77) Average Magnitude Difference Function Average Magnitude Difference Function

(AMDF)(AMDF) Fundamental Period Measurement (Kuhn 90)Fundamental Period Measurement (Kuhn 90)

Frequency-domain methodsFrequency-domain methods Cepstrum (Noll 66)Cepstrum (Noll 66) Harmonic Product Spectrum (Schroeder 68)Harmonic Product Spectrum (Schroeder 68) Constant-Q transform (Brown 92)Constant-Q transform (Brown 92) Least-Squares fitting (Choi 97)Least-Squares fitting (Choi 97) Maximum Likelihood (McAulay 86, Puckette 98)Maximum Likelihood (McAulay 86, Puckette 98)

Other approaches…Other approaches…

Autocorrelation methodAutocorrelation method

Based on the fact that periodic signal will Based on the fact that periodic signal will correlate strongly with itself offset by the correlate strongly with itself offset by the fundamental period fundamental period

Measures to which extent a signal Measures to which extent a signal correlates with a time-shifted version of correlates with a time-shifted version of itselfitself

The time shifts which display peaks in the The time shifts which display peaks in the ACF corresponds to likely period estimateACF corresponds to likely period estimate

ø(t) = 1

N∑n=0

N-1

x(n) x(n + t)

Autocorrelation Autocorrelation Pros/ConsPros/Cons

Simple implementation (good for Simple implementation (good for hardware)hardware)

Can handle poor quality signals (phase Can handle poor quality signals (phase insensitive)insensitive)

Often requires preprocessing (spectral Often requires preprocessing (spectral flattening)flattening)

Poor resolution for high frequenciesPoor resolution for high frequencies Analysis parameters hard to tuneAnalysis parameters hard to tune Uncertainty between peaks generated by Uncertainty between peaks generated by

formants and periodicity of sound can lead formants and periodicity of sound can lead to wrong estimationto wrong estimation

AMDFAMDF Again based on the idea that a periodic Again based on the idea that a periodic

signal will be similar to itself when shifted signal will be similar to itself when shifted by fundamental periodby fundamental period

Similar in concept to ACF, but looks at Similar in concept to ACF, but looks at difference with time shifted version of with time shifted version of itselfitself

The time shifts which display valleys The time shifts which display valleys correspond to likely period estimatescorrespond to likely period estimates

psi(t) = 1

N∑n=0

N-1

x(n) - x(n + t)

AMDF Pros/ConsAMDF Pros/Cons Poor frequency resolutionPoor frequency resolution Even simpler implementation then ACF Even simpler implementation then ACF

(good for hardware)(good for hardware) Less computationally expensive then ACFLess computationally expensive then ACF Combination of AMDF and ACF yields Combination of AMDF and ACF yields

result more robust to noise (Kobayashi result more robust to noise (Kobayashi 95)95)

f(t) = ø(t)

psi(t) + k

Fundamental Period Fundamental Period Measurement approachMeasurement approach

Signal is first ran through bank of half-Signal is first ran through bank of half-octave bandpass filtersoctave bandpass filters

If filters are sharp enough, the output of If filters are sharp enough, the output of one filter should display the input one filter should display the input waveform freed of its upper partials waveform freed of its upper partials (nearly sinusoidal)(nearly sinusoidal)

It is up to a decision algorithm to decide It is up to a decision algorithm to decide which filter output corresponds to which filter output corresponds to fundamental frequencyfundamental frequency

Time between zero crossings of that filter Time between zero crossings of that filter output determines period output determines period

FPM Pros/ConsFPM Pros/Cons

Easy implementation (hardware and Easy implementation (hardware and software)software)

Efficiency of computationEfficiency of computation Decision algorithm highly dependent Decision algorithm highly dependent

on thresholdson thresholds But, automatic threshold setting But, automatic threshold setting

provided for most situationsprovided for most situations

Cepstrum approachCepstrum approach Tool often used in speech processingTool often used in speech processing Cepstrum is defined as power spectrum of Cepstrum is defined as power spectrum of

logarithm of the power spectrumlogarithm of the power spectrum Clearly separate contribution of vocal Clearly separate contribution of vocal

tract and excitationtract and excitation A strong peak is displayed in the A strong peak is displayed in the

excitation part (high cepstral region) at excitation part (high cepstral region) at the fundamental frequencythe fundamental frequency

Use a peak picker on cepstrum and Use a peak picker on cepstrum and translate quefrency into fundamental translate quefrency into fundamental frequencyfrequency

Cepstrum Pros/ConsCepstrum Pros/Cons

Less confusion between candidates Less confusion between candidates than in ACFthan in ACF

Proven method, especially suitable Proven method, especially suitable for signal easily characterized by for signal easily characterized by source-filter models (e.g. voice)source-filter models (e.g. voice)

Relatively computationally intensive Relatively computationally intensive (2 FFTs)(2 FFTs)

Harmonic Product Harmonic Product Spectrum approachSpectrum approach

Measures the maximum coincidence Measures the maximum coincidence of harmonics for each spectral frameof harmonics for each spectral frame

Resulting periodic correlation array is Resulting periodic correlation array is searched for maximum which should searched for maximum which should correspond to fundamental frequencycorrespond to fundamental frequency

Algorithm ran for octave correctionAlgorithm ran for octave correction

Y(w) = Prod X(wr) Ÿ = max (Y(wi))

HPS Pros/ConsHPS Pros/Cons

Simple to implementSimple to implement Does well under wide variety of Does well under wide variety of

conditionsconditions Poor low frequency resolutionPoor low frequency resolution Computing complexity augmented Computing complexity augmented

by zero padding required for by zero padding required for interpolation of low frequenciesinterpolation of low frequencies

Requires post-processing for error Requires post-processing for error correctioncorrection

Constant-Q transform Constant-Q transform approachapproach

First computes the Constant-Q First computes the Constant-Q transform to obtain constant pattern transform to obtain constant pattern in log frequency domain (Q = fc/bw)in log frequency domain (Q = fc/bw)

Compute the cross-correlation with a Compute the cross-correlation with a fixed comb pattern (ideal partial fixed comb pattern (ideal partial positions for given fundamental positions for given fundamental frequency)frequency)

Peak-pick the result to obtain Peak-pick the result to obtain fundamental frequencyfundamental frequency

Constant-Q Pros/ConsConstant-Q Pros/Cons

Complexity of constant-Q reduced Complexity of constant-Q reduced but still… (Brown and Puckette 91)but still… (Brown and Puckette 91)

Sensitive to octave errorsSensitive to octave errors Other peaks could be candidatesOther peaks could be candidates

Least-Squares fitting Least-Squares fitting approachapproach

Perform least-squares spectral analysis --> Perform least-squares spectral analysis --> minimize error by fitting sinusoids to the minimize error by fitting sinusoids to the signal segment signal segment

Strong sinusoidal components are identified as Strong sinusoidal components are identified as sharp valleys in least-square error signalsharp valleys in least-square error signal

Relatively few evaluation of the error signal Relatively few evaluation of the error signal are required to identify a valleyare required to identify a valley

Fundamental frequency is obtained as average Fundamental frequency is obtained as average of partial frequencies over their partial of partial frequencies over their partial numbernumber

Uses rectangular windowing to provide faster Uses rectangular windowing to provide faster responseresponse

LS fitting Pros/ConsLS fitting Pros/Cons

Operates on shorter frame segmentsOperates on shorter frame segments Best option for real-time applications Best option for real-time applications

with minimum latency requirementswith minimum latency requirements Efficient evaluation scheme allows Efficient evaluation scheme allows

reasonable computation complexityreasonable computation complexity

Maximum LikelihoodMaximum Likelihood

Maximum likelihood algorithm Maximum likelihood algorithm searches trough a set of possible ideal searches trough a set of possible ideal spectra and chooses closest match spectra and chooses closest match (Noll 69) (Noll 69)

Was adapted to sinusoidal modeling Was adapted to sinusoidal modeling theory, by finding best fit for harmonic theory, by finding best fit for harmonic partials sets to the measured model partials sets to the measured model (McAulay 86)(McAulay 86)

Enhance discrimination by suppressing Enhance discrimination by suppressing partials of small amplitude valuespartials of small amplitude values

ML Pros/ConsML Pros/Cons

Inherits high computational Inherits high computational requirement from sinusoidal requirement from sinusoidal modelingmodeling

Very robust estimationVery robust estimation Allows guess of fundamental Allows guess of fundamental

frequency even with several partials frequency even with several partials missing.missing.

Other approachesOther approaches

Neural Nets (Barnar 91)Neural Nets (Barnar 91) Hidden Markov Models (Doval 91)Hidden Markov Models (Doval 91) Parrallel processing approaches Parrallel processing approaches

(Rabiner 69)(Rabiner 69) Fourier of Fourier transforms Fourier of Fourier transforms

(Marchand 2001)(Marchand 2001) Two-way mismatch model (Cano 98)Two-way mismatch model (Cano 98) Subharmonic to harmonic ratio (Sun Subharmonic to harmonic ratio (Sun

2000)2000)

ConclusionsConclusions

Lot of research still… Motivated by Lot of research still… Motivated by speech telecommunicationspeech telecommunication

Abundant literature since 1950Abundant literature since 1950 Complete and objective performance Complete and objective performance

overviews seems missingoverviews seems missing Combination of techniques in parallel Combination of techniques in parallel

processing seems foreseeable with processing seems foreseeable with today’s fast computers today’s fast computers