Real-Time Time-Domain Pitch Tracking Using Wavelets
Ross Maddox and Eric Larson
August 3rd, 2005NSF REU ProgramUIUC PhysicsProf. Steve Errede, Advisor
Physics of Music 2005 Projects
Development of a PC/104-based single board computer DAQ systemInvestigation of non-linear properties of second sound in superfluid LHeMeasurement of the mechanical resonances of differently shaped ‘ukulelesReal-time time-domain pitch tracking of monophonic musical signals using wavelets
PC/104-based SBC DAQ system
Using the Diamond Systems “Athena” single-board computer:
PC/104-based SBC DAQ system
Goals:Develop a reliable system for intense, long-term data acquisitionApply system to several different existing experiments to verify proper operation, as well as improve their functionality and reliabilityPave the way for brave new experiments!
PC/104-based SBC DAQ system
Athena DAQ hardware:16 16-bit analog-digital converters4 12-bit digital-analog converters24 TTL (digital) I/O linesNo fan!
PC/104-based SBC DAQ system
Windows XP Embedded allows us toSelect individual OS componentsCut the crapRun existing DAQ programs, with limited adaptations for Athena
Physics of Music 2005 Projects
Development of a PC/104-based single board computer DAQ systemInvestigation of non-linear properties of second sound in superfluid LHeMeasurement of the mechanical resonances of differently shaped ‘ukulelesReal-time time-domain pitch tracking of monophonic musical signals using wavelets
Mechanical Resonances in ‘Ukuleles
Different ‘ukuleles have different soundsThis is a result of varying body
ShapesSizesMaterialsFinishes
Kanile’a Koa Concert ‘Ukulele
Kanile'a Concert Ukelele|V| vs. Frequency
0
5 0
10 0
15 0
2 0 0
2 5 0
3 0 0
3 5 0
4 0 0
0 2 0 0 4 0 0 6 0 0 8 0 0 10 0 0 12 0 0 14 0 0 16 0 0 18 0 0 2 0 0 0 2 2 0 0 2 4 0 0 2 6 0 0 2 8 0 0 3 0 0 0 3 2 0 0 3 4 0 0 3 6 0 0 3 8 0 0 4 0 0 0
Fr e q u e n c y ( Hz )
Ope n S t r ings Mut e d S t r ings
Kanile’a Koa Concert ‘Ukulele
Kanile'a Concert UkeleleIm(V) vs. Re(V)
- 4 0 0
- 3 5 0
- 3 0 0
- 2 5 0
- 2 0 0
- 15 0
- 10 0
- 5 0
0
5 0
10 0
15 0
2 0 0
2 5 0
3 0 0
3 5 0
4 0 0
- 4 0 0 - 3 5 0 - 3 0 0 - 2 5 0 - 2 0 0 - 15 0 - 10 0 - 5 0 0 5 0 10 0 15 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0
R e ( V) ( m V)
Open S t r ings Mut ed S t rings X-Axis Y-Axis
Painted Chinese Soprano ‘Ukulele
Soprano Ukelele|V| vs. Frequency
0
5 0
10 0
15 0
2 0 0
2 5 0
3 0 0
3 5 0
4 0 0
0 2 0 0 4 0 0 6 0 0 8 0 0 10 0 0 12 0 0 14 0 0 16 0 0 18 0 0 2 0 0 0 2 2 0 0 2 4 0 0 2 6 0 0 2 8 0 0 3 0 0 0 3 2 0 0 3 4 0 0 3 6 0 0 3 8 0 0 4 0 0 0
Fr e q u e n c y ( Hz )
Ope n S t r ings Mut ed S t r ings
Painted Chinese Soprano ‘Ukulele
Soprano UkeleleIm(V) vs. Re(V)
- 4 0 0
- 3 5 0
- 3 0 0
- 2 5 0
- 2 0 0
- 15 0
- 10 0
- 5 0
0
5 0
10 0
15 0
2 0 0
2 5 0
3 0 0
3 5 0
4 0 0
- 4 0 0 - 3 5 0 - 3 0 0 - 2 5 0 - 2 0 0 - 15 0 - 10 0 - 5 0 0 5 0 10 0 15 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0
R e ( V) ( m V)
Open S t r ings Mut ed S t rings X-Axis Y-Axis
Don Tomás Cigar Box ‘Ukulele
Kanile'a Concert Ukelele|V| vs. Frequency
0
5 0
10 0
15 0
2 0 0
2 5 0
3 0 0
3 5 0
4 0 0
0 2 0 0 4 0 0 6 0 0 8 0 0 10 0 0 12 0 0 14 0 0 16 0 0 18 0 0 2 0 0 0 2 2 0 0 2 4 0 0 2 6 0 0 2 8 0 0 3 0 0 0 3 2 0 0 3 4 0 0 3 6 0 0 3 8 0 0 4 0 0 0
Fr e q u e n c y ( Hz )
Don Tomás Cigar Box ‘Ukulele
Kanile'a Concert UkeleleIm(V) vs. Re(V)
- 4 0 0
- 3 5 0
- 3 0 0
- 2 5 0
- 2 0 0
- 15 0
- 10 0
- 5 0
0
5 0
10 0
15 0
2 0 0
2 5 0
3 0 0
3 5 0
4 0 0
- 4 0 0 - 3 5 0 - 3 0 0 - 2 5 0 - 2 0 0 - 15 0 - 10 0 - 5 0 0 5 0 10 0 15 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0
R e ( V) ( m V)
X-Axis Y-Axis
Why measure?
Useful for synthetically modeling the soundsGives a deeper understanding of why the instruments sound the way they do and allows for prediction of how an instrument might sound based on its construction
Physics of Music 2005 Projects
Development of a PC/104-based single board computer DAQ systemInvestigation of non-linear properties of second sound in superfluid LHeMeasurement of the mechanical resonances of differently shaped ‘ukulelesReal-time time-domain pitch tracking of monophonic musical signals using wavelets
Why Pitch Tracking?
Pitch information is used in music and speech analysis
Fourier analysis / Spectrogram is vague
Pitch tracking is accurate and easy to read
Time
Freq
uenc
y
S pectrogram
5 10 15 20 250
500
1000
1500
2000
0 5 10 15 20 250
100
200
300
400
Time (s )
Freq
uenc
y (H
z)
Frequency vs . Time
Project GoalsDevelop software to track the pitch of a monophonic (one note at a time) audio signal that is:
Accurate (within a quarter-tone, or ~1.5%)Low Latency (~25 ms)Capable of high frequency and time resolutionAccurate voiced/unvoiced detection with no global amplitude thresholdsRobust to:
NoiseWeak fundamentalsSliding pitches within a window
Our Approach
Our eyes can readily see periodicity (or lack thereof), so should the computer…Implementation of fast, intelligent peak (and valley) finder coupled with simple wavelet signal manipulationUse found peaks to calculate period and frequency
Fast Lifting WaveletsTransform that separates a signal into a tree of approximations and detailsSeparation can be based on several different wavelet shapesOur implementation uses the Haar wavelet
Original Signal
Approximation 2
Approximation 1 (Detail 1)
(Detail 2)
Approximation 3 (Detail 3)
Wavelet Transform Levels
200 400 600 800 1000
-0.4
-0.2
0
0.2
0.4
S amples
Am
plitu
deOrigina l S ignal
100 200 300 400 500
-0.4
-0.2
0
0.2
0.4
S amples
Am
plitu
de
Approximation Level 1
50 100 150 200 250-0.4
-0.2
0
0.2
0.4
S amples
Am
plitu
de
Approximation Level 2
20 40 60 80 100 120-0.4
-0.2
0
0.2
0.4
S amples
Am
plitu
de
Approximation Level 3
Our AlgorithmWindow the signal into N 25-ms segments. ∀n ∈ [0, N):1. At level i of a Haar wavelet transform of the data:
a) Remove DC componentb) Find all zero-crossingsc) Find first local max after each zero-crossing that is
above a certain percentage of the window maxd) Calculate the distances (in samples) between local
maximae) Repeat b), c), d) for local minimaf) Determine the averaged mode distance
2. If the mode distance at level i ≈ the mode distance at level (i – 1), assume the distance ≈ period, yielding frequency;If not, go to the next wavelet level (up to i = 6) and return to step 1
Algorithm
First, the signal is windowed:
100 200 300 400 500 600 700 800 900 1000-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Windowed S ignal
S amples
Am
plitu
de
Algorithm
Next, the DC offset is accounted for (used in thresholding)
100 200 300 400 500 600 700 800 900 1000-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Windowed S ignal
S amples
Am
plitu
de
Algorithm
The first wavelet transform is applied to generate an approximation
50 100 150 200 250 300 350 400 450 500-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Approximation 1
S amples
Am
plitu
de
AlgorithmThe first local maximum (minimum) between zero crossings above (below) a threshold is locatedThe mode distance between extrema is calculated
50 100 150 200 250 300 350 400 450 500-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Approximation 1
S amples
Am
plitu
de
Algorithm
The next wavelet approximation is generated
50 100 150 200 250-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Approximation 2
S amples
Am
plitu
de
Algorithm
The maxima and minima are located The mode distance between extrema is compared to that of the previous level
50 100 150 200 250-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Approximation 2
S amples
Am
plitu
de
AlgorithmSince the mode distances matched, the averaged mode distance of the previous level is taken to be the period, yielding frequency
50 100 150 200 250 300 350 400 450 500-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Approximation 2
S amples
Am
plitu
de
Mode Averaging Scheme
On sine wave signals (see right), averaging the mode yields far better accuracy than simply taking the statistical mode
200 400 600 800 1000 1200 1400
200
400
600
800
1000
1200
1400Meas ured vs . Actual Frequency
Mea
sure
d Fr
eque
ncy
(Hz)
200 400 600 800 1000 1200 1400
200
400
600
800
1000
1200
1400Meas ured vs . Actual Frequency (no averaging)
Mea
sure
d Fr
eque
ncy
(Hz)
Actual Frequency (Hz)
Results: Latency
Using a window of 1024 samples at 44100 Hz implies ~22 ms minimum latencyComputation time averaged only 4 ms in MATLAB
Total pitch tracking latency: < 26 msC++ implementation will be even faster, approaching the minimum of 22 ms
Results: Voiced/Unvoiced
A voiced/unvoiced error rate of approximately 0.4% in a female vocal sampleComparable to established methods, but leaves room for improvement
0 5 10 15 20 250
100
200
300
400
Time (s )
Freq
uenc
y (H
z)
Frequency vs . Time
Results: Accuracy
A sinusoid-based test yielded favorable results
Almost constant, but completely inaudible, error in terms of musical intervals
0
0.5
1RMS Error (Hz) vs . Octave
RM
S E
(Hz)
0
1
2RMS Error (ce nts ) vs . Octave
RM
S E
(cen
ts)
0
5
10Maximum Error Magnitude (ce nts ) vs . Octave
E (c
ents
)
90-180 180-360 360-720 720-1440-0.02
0
0.02Me an Error (Hz) vs . Octave
Mea
n E
(Hz)
Fre que ncy Range (Hz)
Conclusions
Our algorithm compares favorably to established methods, and allows for real-time pitch tracking with low latency and high accuracyFurther work could be done to improve voiced/unvoiced detection
Acknowledgments
We would like to thank:Dr. Steve ErredeMatthew WinklerJack BoparaiDr. James Beauchamp and Mert Bay
UIUC Physics Dept.
Physics of Music 2005 Projects
Development of a PC/104-based single board computer DAQ systemInvestigation of non-linear properties of second sound in superfluid LHeMeasurement of the mechanical resonances of differently shaped ‘ukulelesReal-time time-domain pitch tracking of monophonic musical signals using wavelets
Questions?
Development of a PC/104-based single board computer DAQ systemInvestigation of non-linear properties of second sound in superfluid LHeMeasurement of the mechanical resonances of differently shaped ‘ukulelesReal-time Time-Domain Pitch-Tracking Using Wavelets