Download - NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition
![Page 1: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/1.jpg)
Natural Language Processing
Audio Processing, Zero Crossing Rate,
Dynamic Time Warping, Spoken Word
Recognition
Vladimir Kulyukin
www.vkedco.blogspot.com
![Page 2: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/2.jpg)
Outline
Audio Processing
Zero Crossing Rate
Dynamic Time Warping
Spoken Word Recognition
![Page 3: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/3.jpg)
Audio Processing
![Page 4: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/4.jpg)
Samples
Samples are successive snapshots of a
specific signal
Audio files are samples of sound waves
Microphones convert acoustic signals into
analog electrical signals and then analog-to-
digital converter transform analog signals
into digital samples
![Page 5: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/5.jpg)
Digital Audio Signal
time
Sound
pressure
![Page 6: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/6.jpg)
Amplitude
Amplitude (in audio processing) is a
measure of sound pressure
Amplitude is measured at a specific rate
Amplitude measures result in digital
samples
Some samples have positive values
Some samples have negative values
![Page 7: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/7.jpg)
Digital Approximation Accuracy
Any digitization of analog signals carries some
inaccuracy
Approximation accuracy depends on two
factors: 1) sampling rate and 2) resolution
In audio processing, sampling is reduction of
continuous signal to discrete signal
Sampling rate is the number of samples per unit
of time
Resolution is the size of a sample (e.g., the
number of bits)
![Page 8: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/8.jpg)
Sampling Rate & Resolution
Sampling rate is measured in Hertz
Hertz or Hz are measured in samples per
second
For example, if the audio is sampled at a
rate of 44100 per second, then its sampling
rate is 44100Hz
Some typical resolutions are 8 bits, 16 bits,
and 32 bits
![Page 9: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/9.jpg)
Nyquist-Shannon Sampling Theorem
This theorem states that perfect reconstruction of
a signal is possible if the sampling frequency is
greater than two times the maximum frequency of
the signal being sampled
For example, if a signal has a maximum frequency
of 50Hz, then it can, theoretically, be
reconstructed if sampled at a rate of 100Hz and
avoid aliasing (the effect of indistinguishable
sounds)
![Page 10: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/10.jpg)
Audio File Formats
WAVE (WAV) is often associated with Windows but
are now implemented on other platforms
AIFF is common on Mac OS
AU is common on Unix/Linux
These are similar formats that vary in how they
represent data, pack samples (e.g., little-endian vs.
big-endian), etc.
Java example of how to manipulate Wav files can
be downloaded from WavFileManip.java
![Page 11: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/11.jpg)
Zero Crossing Rate
![Page 12: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/12.jpg)
What is Zero Crossing Rate (ZCR)?
Zero Crossing Rate (ZCR) is a measure of the
number of times, in a given sample, when
amplitude crosses the horizontal line at 0
ZCR can be used to detect silence vs. non-
silence, voice vs. unvoiced, speaker’s identity,
etc.
ZCR is essentially the count of successive
samples changing algebraic signs
![Page 13: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/13.jpg)
ZCR Source
public class ZeroCrossingRate {
public static double computeZCR01(double[] signals, double normalizer)
{
long numZC = 0;
for(int i = 1; i < signals.length; i++) {
if ( (signals[i] >= 0 && signals[i-1] < 0) ||
(signals[i] < 0 && signals[i-1] >= 0) ) {
numZC++;
}
}
return numZC/normalizer;
}
}
source code is in ZeroCrossingRate.java
![Page 14: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/14.jpg)
ZCR in Voiced vs. Unvoiced Speech
Voiced speech is produced when vowels are spoken
Voiced speech is characterized of constant
frequency tones of some duration
Unvoiced speech is produced when consonants are
spoken
Unvoiced speech is non-periodic, random-like
because air passes through a narrow constriction of
the vocal tract
![Page 15: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/15.jpg)
ZCR in Voiced vs. Unvoiced Speech
Phonetic theory states that voiced speech
has a smooth air flow through the vocal tract
whereas unvoiced speech has a turbulent air
flow that produces noise
Thus, voiced speech should have a low ZCR
whereas unvoiced speech should have a high
ZCR
![Page 16: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/16.jpg)
Amplitude of Voiced vs. Unvoiced Speech
Amplitude of unvoiced speech is low
Amplitude of voiced speech is high
Given a digital sample, we can use average
amplitude as a measure of the sample’s
energy
This can be used to classify samples as
vowels and consonants
![Page 17: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/17.jpg)
ZCR & Amplitude of Voiced & Unvoiced Speech
ZCR Amplitude
Voiced LOW HIGH
Unvoiced HIGH LOW
![Page 18: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/18.jpg)
Detection of Silence & Non-Silence
silence_buffer = [];
non_silence_buffer = [];
buffer = [];
while ( there are still frames left ) {
Read a specific number of frames into buffer;
Compute ZCR and average amplitude of buffer;
if ( ZCR and average amplitude are below specific thresholds ) {
add the buffer to silence_buffer;
}
else {
add the buffer to non_silence_buffer;
}
}
source code is in WavFileManip.detectSilence()
![Page 19: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/19.jpg)
Dynamic Time Warping
source code is in DTW.java
![Page 20: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/20.jpg)
Introduction
Dynamic Time Warping (DTW) is a method to
find an optimal alignment between two time-
dependent sequences
DTW aligns (“warps”) two sequences in a non-
linear way to match each other
DTW has been successfully used in automatic
speech recognition (ASR), bioinformatics
(genetic sequence matching), and video
analysis
![Page 21: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/21.jpg)
Basic Definitions
There are two sequences:
𝑋 = 𝑥1, … , 𝑥𝑁 and 𝑌 = 𝑦1, … , 𝑦𝑀
There is a feature space F such that:
𝑥𝑖 ∈ 𝐹 & 𝑦𝑗 ∈ 𝐹 where 1 ≤ 𝑖 ≤ 𝑁, 1 ≤ 𝑗 ≤ 𝑀
There is a local cost measure mapping 2-
tuples of features to non-negative reals:
𝑐: 𝐹 x 𝐹 → 𝑅 ≥ 0
![Page 22: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/22.jpg)
Sample Sequences
![Page 23: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/23.jpg)
Sample Alignment
![Page 24: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/24.jpg)
X
Cost Matrix DTW(N, M)
Y
1 2 …. i … N
M
1
2
…
𝑑𝑡𝑤 𝑖, 𝑗 is the cost of warping X[1:i] with Y[1:j]
j
…
X and Y are sequences X[1:N] and Y[1:M]
![Page 25: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/25.jpg)
Warping Path
𝑃 = 𝑝1, … , 𝑝𝐿 , where 𝑝 = 𝑛𝑗 , 𝑚𝑗 ∈ 1, 𝑁 × [1,𝑀] and
𝑗 ∈ 1, 𝐿 is a warping path if
1) 𝑝1 = 1,1 and 𝑝𝐿 = 𝑁,𝑀 2) 𝑛1 ≤ 𝑛2 ≤ … ≤ 𝑛𝑁 and 𝑚1 ≤ 𝑚2 ≤ … ≤ 𝑚𝑀
3) 𝑝𝑙+1 − 𝑝𝑙 ∈ 1, 0 , 0, 1 , 1, 1 , 1 ≤ 𝑙 ≤ 𝐿 − 1
![Page 26: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/26.jpg)
Valid Warping Path
1 2 3 4
1
2
3
4
5
𝑃 = 𝑝1, 𝑝2, 𝑝3, 𝑝4, 𝑝5, 𝑝6 , where
𝑝1 = 1, 1 , 𝑝2 = 1, 2 , 𝑝3 = 2, 3 , 𝑝4 = 2, 4 , 𝑝5 = 3, 5 , 𝑝6 = (4, 5)
𝑝1
𝑝2
𝑝3
𝑝4
𝑝5 𝑝6
![Page 27: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/27.jpg)
Invalid Warping Path
1 2 3 4
1
2
3
4
5
𝑝1 ≠ 1, 1 so constraint 1 is not satisfied
𝑝1 𝑝2
𝑝3
𝑝4
𝑝5 𝑝6
![Page 28: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/28.jpg)
Invalid Warping Path
1 2 3 4
1
2
3
4
5
𝑝3 = 3, 3 , 𝑝4 = 2, 4 , 3 > 2 so 2nd constraint is not satisfied
𝑝1
𝑝2
𝑝3
𝑝4
𝑝5 𝑝6
![Page 29: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/29.jpg)
Invalid Warping Path
1 2 3 4
1
2
3
4
5
𝑝2 = 2, 2 , 𝑝3 = 3, 4 , 𝑝3 − 𝑝2 = 3,4 − 2,2 = 1, 2 ∉1, 0 , 0, 1 , 1, 1 so 3rd condition is not satisfied
𝑝1
𝑝2
𝑝3
𝑝4 𝑝5
![Page 30: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/30.jpg)
Total Cost of a Warping Path
𝑃 = 𝑝1, … , 𝑝𝐿 , is a warping path between sequences X
and Y, then its total cost is
𝑐𝑝 𝑋, 𝑌 = 𝑐(𝑥𝑛𝑗 , 𝑦𝑚𝑗)
𝐿
𝑗=1
![Page 31: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/31.jpg)
Example
1 2 3 4
1
2
3
4
5 Assume that 𝑃 = 𝑝1, 𝑝2, 𝑝3, 𝑝4, 𝑝5, 𝑝6 , where 𝑝1 = 1, 1 , 𝑝2 = 1, 2 , 𝑝3 =2, 3 , 𝑝4 = 2, 4 , 𝑝5 = 3, 5 , 𝑝6 = 4, 5 ,
is a warping path b/w X[1:4] and Y[1:5].
Then the total cost of P is
𝑐 𝑥1, 𝑦1 + 𝑐 𝑥1, 𝑦2 + 𝑐 𝑥2, 𝑦3 +𝑐 𝑥2, 𝑦4 + 𝑐 𝑥3, 𝑦5 + 𝑐 𝑥4, 𝑦5 .
This notation 𝑐 𝑥𝑖 , 𝑦𝑗 can be simplified
to read 𝑐(𝑖, 𝑗) or 𝑐 𝑋 𝑖 , 𝑌 𝑗 .
𝑝1
𝑝2
𝑝3
𝑝4
𝑝5 𝑝6
X
Y
![Page 32: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/32.jpg)
DTW(X, Y) – Cost of an Optimal Warping Path
𝐷𝑇𝑊 𝑋, 𝑌 = min 𝑐𝑝 𝑋, 𝑌 𝑝 is a warping path}
![Page 33: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/33.jpg)
Remarks on DTW(X, Y)
There may be several warping paths of the
same DTW(X, Y)
DTW(X, Y) is symmetric whenever the local
cost measure is symmetric
DTW(X, Y) does not necessarily satisfy the
triangle inequality (the sum of the lengths of
two sides is greater than the length of the
remaining side)
![Page 34: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/34.jpg)
X
DTW Equations: Base Cases
Y
1 2 …. i … N
M
1
2
…
Initial condition: 𝑑𝑡𝑤 1,1 = 𝑐(1,1)
j
…
1st Row: 𝑑𝑡𝑤 𝑖, 1 = 𝑑𝑡𝑤 𝑖 − 1,1 + 𝑐(𝑖, 1)
1st Column:
𝑑𝑡𝑤 1, 𝑗 = 𝑑𝑡𝑤 1, 𝑗 − 1 +𝑐(1, 𝑗)
![Page 35: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/35.jpg)
X
DTW Equations: Recursion
Y
1 2 … i … N
M
1
2
…
j
…
Inner Cell: 𝑑𝑡𝑤 𝑖, 𝑗 = min 𝑑𝑡𝑤 𝑖 − 1, 𝑗 , 𝑑𝑡𝑤 𝑖 − 1, 𝑗 − 1 , 𝑑𝑡𝑤 𝑖, 𝑗 − 1 + 𝑐(𝑖, 𝑗)
Interpretation: Cost of
warping X[1:i] with Y[1:J] is
the cost of warping X[i] with
Y[j] plus the minimum of the
following three costs: 1) the
cost of warping X[1:i-1] with
Y[1:j]; 2) the cost of warping
X[1:i-1] with Y[1:j-1]; 3) the
cost of warping X[1:i] with
Y[1:j-1]
![Page 36: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/36.jpg)
Example
Let the sequences be:
𝑋 = 𝑎, 𝑏, 𝑔 𝑌 = 𝑎, 𝑏, 𝑏, 𝑔 𝑍 = (𝑎, 𝑔, 𝑔)
Let the feature space 𝐹 = 𝑎, 𝑏, 𝑔 .
Let the local cost measure be
defined as follows:
𝑐 𝑥, 𝑦 = 0 𝑖𝑓 𝑥 = 𝑦1 𝑖𝑓 𝑥 ≠ 𝑦
Let us compute dtw(X,Y), dtw(Y,Z), and dtw(X, Z).
Work it out on paper.
![Page 37: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/37.jpg)
DTW(X, Y)
![Page 38: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/38.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0
![Page 39: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/39.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 2,1 = 𝑐 2,1 + 𝑑𝑡𝑤 1,1= 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1= 1 + 0 = 1
1
![Page 40: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/40.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 3,1 = 𝑐 3,1 + 𝑑𝑡𝑤 2,1= 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 2,1= 1 + 1 = 2
1 2
![Page 41: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/41.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 1,2 = 𝑐 1,2 + 𝑑𝑡𝑤 1,1= 𝑐 𝑎, 𝑏 + 𝑑𝑡𝑤 1,1= 1 + 0 = 1
1 2
1
![Page 42: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/42.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 1,3 = 𝑐 1,3 + 𝑑𝑡𝑤 1,2= 𝑐 𝑎, 𝑏 + 𝑑𝑡𝑤 1,2= 1 + 1 = 2
1 2
1
2
![Page 43: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/43.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 1,4 = 𝑐 1,4 + 𝑑𝑡𝑤 1,3= 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,3= 1 + 2 = 3
1 2
1
2
3
![Page 44: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/44.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 2,2= 𝑐 2,2
+min𝑑𝑡𝑤 1,2 ,𝑑𝑡𝑤 1,1 ,𝑑𝑡𝑤 2,1
= 𝑐 𝑏, 𝑏 + min 1,0,1 = 0 + 0= 0 1 2
1
2
3
0
![Page 45: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/45.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 3,2= 𝑐 3,2
+min𝑑𝑡𝑤 2,2 ,𝑑𝑡𝑤 2,1 ,𝑑𝑡𝑤 3,1
= 𝑐 𝑔, 𝑏 + min 0,1,2 = 1 + 0= 1 1 2
1
2
3
0 1
![Page 46: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/46.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 2,3= 𝑐 2,2
+min𝑑𝑡𝑤 1,3 ,𝑑𝑡𝑤 1,2 ,𝑑𝑡𝑤 2,2
= 𝑐 𝑏, 𝑏 + min 2,1,0 = 0 + 0= 0 1 2
1
2
3
0 1
0
![Page 47: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/47.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 3,3= 𝑐 3,3
+min𝑑𝑡𝑤 2,3 ,𝑑𝑡𝑤 2,2 ,𝑑𝑡𝑤 3,1
= 𝑐 𝑔, 𝑏 + min 0,0,1 = 1 + 0= 1 1 2
1
2
3
0 1
0 1
![Page 48: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/48.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 2,4= 𝑐 2,4
+min𝑑𝑡𝑤 1,4 ,𝑑𝑡𝑤 1,3 ,𝑑𝑡𝑤 2,3
= 𝑐 𝑏, 𝑔 + min 3,2,0 = 1 + 0= 1 1 2
1
2
3
0 1
0 1
1
![Page 49: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/49.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
𝑑𝑡𝑤 3,4= 𝑐 3,4
+min𝑑𝑡𝑤 2,4 ,𝑑𝑡𝑤 2,3 ,𝑑𝑡𝑤 3,3
= 𝑐 𝑔, 𝑔 +min 1,0,1 = 0 + 0= 0
So DTW(X,Y) = 0
1 2
1
2
3
0 1
0 1
1 0
![Page 50: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/50.jpg)
Example: DTW(X,Y)
a 𝑏 𝑔
𝑎
𝑏
𝑔
0
Y
X
𝑏
1 2 3
4
3
2
1
DTW(X, Y) = 0.
Optimal Warping Path
(OWP) P can be found by
chasing pointers (red
arrows): P = ((1,1), (2, 2),
(2, 3), (3, 4)). 1 2
1
2
3
0 1
0 1
1 0
![Page 51: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/51.jpg)
DTW(Y, Z)
![Page 52: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/52.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0 Z
![Page 53: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/53.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 2,1= 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1= 1 + 0 = 1
Z
1
![Page 54: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/54.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 3,1= 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 2,1= 1 + 1 = 2
Z
1 2
![Page 55: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/55.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 4,1= 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 3,1= 1 + 2 = 3
Z
1 2 3
![Page 56: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/56.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 1,2= 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,1= 1 + 0 = 1
Z
1 2 3
1
![Page 57: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/57.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 1,3= 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,2= 1 + 1 = 2
Z
1 2 3
1
2
![Page 58: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/58.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 2,2= 𝑐 𝑏, 𝑔+ min {𝑑𝑡𝑤 1,2 ,
𝑑𝑡𝑤 1,1 , 𝑑𝑡𝑤 2,1 }
= 1 + 0 = 1
Z
1 2 3
1
2
1
![Page 59: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/59.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 3,2= 𝑐 𝑏, 𝑔+ min {𝑑𝑡𝑤 2,2 ,
𝑑𝑡𝑤 2,1 , 𝑑𝑡𝑤 3,1 }
= 1 +min 1,1,2 = 1 + 1 = 2
Z
1 2 3
1
2
1 2
![Page 60: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/60.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 4,2= 𝑐 𝑔, 𝑔+ min {𝑑𝑡𝑤 3,2 ,
𝑑𝑡𝑤 3,1 , 𝑑𝑡𝑤 4,1 }
= 0 +min 2,2,3 = 0 + 2 = 2
Z
1 2 3
1
2
1 2 2
![Page 61: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/61.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 2,3= 𝑐 𝑏, 𝑔+ min {𝑑𝑡𝑤 1,3 ,
𝑑𝑡𝑤 1,2 , 𝑑𝑡𝑤 2,2 }
= 1 +min 2,1,1 = 1 + 1 = 2
Z
1 2 3
1
2
1 2 2
2
![Page 62: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/62.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 3,3= 𝑐 𝑏, 𝑔+ min {𝑑𝑡𝑤 2,3 ,
𝑑𝑡𝑤 2,2 , 𝑑𝑡𝑤 3,2 }
= 1 +min 2,1,2 = 1 + 1 = 2
Z
1 2 3
1
2
1 2 2
2 2
![Page 63: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/63.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
𝑑𝑡𝑤 4,3= 𝑐 𝑔, 𝑔+ min {𝑑𝑡𝑤 3,4 ,
𝑑𝑡𝑤 3,2 , 𝑑𝑡𝑤 4,2 }
= 0 +min 2,2,2 = 0 + 2 = 2
Z
1 2 3
1
2
1 2 2
2 2 2
![Page 64: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/64.jpg)
DTW(Y, Z)
a 𝑏 𝑏 𝑔
𝑎
𝑔
0
Y
𝑔
1 2 3 4
3
2
1
DTW(Y, Z) = 2.
Optimal Warping Path (OWP) P
can be found by chasing pointers
(red arrows): P = ((1,1), (2, 2), (3,
2), (4, 3)).
Z
1 2 3
1
2
1 2 2
2 2 2
![Page 65: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/65.jpg)
DTW(X, Z)
![Page 66: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/66.jpg)
DTW(X, Z)
a 𝑏 𝑔
𝑎
𝑔 Z
X
𝑔
1 2 3
3
2
1
𝑑𝑡𝑤 1,1 = 𝑐 𝑎, 𝑎 = 0
0
![Page 67: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/67.jpg)
DTW(X, Z)
a 𝑏 𝑔
𝑎
𝑔 Z
X
𝑔
1 2 3
3
2
1
𝑑𝑡𝑤 2,1 = 𝑐 𝑏, 𝑎 + 𝑑𝑡𝑤 1,1= 1 + 0 = 1
0 1
![Page 68: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/68.jpg)
DTW(X, Z)
a 𝑏 𝑔
𝑎
𝑔 Z
X
𝑔
1 2 3
3
2
1
𝑑𝑡𝑤 3,1 = 𝑐 𝑔, 𝑎 + 𝑑𝑡𝑤 2,1= 1 + 1 = 2
0 1 2
![Page 69: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/69.jpg)
DTW(X, Z)
a 𝑏 𝑔
𝑎
𝑔 Z
X
𝑔
1 2 3
3
2
1
𝑑𝑡𝑤 1,2 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,1= 1 + 0 = 1
0 1 2
1
![Page 70: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/70.jpg)
DTW(X, Z)
a 𝑏 𝑔
𝑎
𝑔 Z
X
𝑔
1 2 3
3
2
1
𝑑𝑡𝑤 1,3 = 𝑐 𝑎, 𝑔 + 𝑑𝑡𝑤 1,2= 1 + 1 = 2
0 1 2
1
2
![Page 71: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/71.jpg)
DTW(X, Z)
a 𝑏 𝑔
𝑎
𝑔 Z
X
𝑔
1 2 3
3
2
1
𝑑𝑡𝑤 2,2= 𝑐 𝑏, 𝑔
+ min𝑑𝑡𝑤 1,2 ,𝑑𝑡𝑤 1,1 ,𝑑𝑡𝑤 2,1
= 1 +min 1,0,1= 1 + 0 = 1
0 1 2
1
2
1
![Page 72: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/72.jpg)
DTW(X, Z)
a 𝑏 𝑔
𝑎
𝑔 Z
X
𝑔
1 2 3
3
2
1
𝑑𝑡𝑤 3,2= 𝑐 𝑔, 𝑔
+min𝑑𝑡𝑤 2,2 ,𝑑𝑡𝑤 2,1 ,𝑑𝑡𝑤 3,1
= 0 +min 1,1,2= 0 + 1 = 1
0 1 2
1
2
1 1
![Page 73: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/73.jpg)
DTW(X, Z)
a 𝑏 𝑔
𝑎
𝑔 Z
X
𝑔
1 2 3
3
2
1
𝑑𝑡𝑤 2,3= 𝑐 𝑏, 𝑔
+ min𝑑𝑡𝑤 1,3 ,𝑑𝑡𝑤 1,2 ,𝑑𝑡𝑤 2,2
= 1 +min 2,1,1= 1 + 1 = 2
0 1 2
1
2
1 1
2
![Page 74: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/74.jpg)
DTW(X, Z)
a 𝑏 𝑔
𝑎
𝑔 Z
X
𝑔
1 2 3
3
2
1
𝑑𝑡𝑤 3,3= 𝑐 𝑔, 𝑔
+min𝑑𝑡𝑤 2,3 ,𝑑𝑡𝑤 2,2 ,𝑑𝑡𝑤 3,2
= 0 +min 2,1,2= 0 + 1 = 1
0 1 2
1
2
1 1
2 1
![Page 75: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/75.jpg)
DTW(X, Z)
a 𝑏 𝑔
𝑎
𝑔 Z
X
𝑔
1 2 3
3
2
1 0 1 2
1
2
1 1
2 1 DTW(X, Z) = 1.
Optimal Warping Path (OWP)
P can be found by chasing
pointers (red arrows): P =
((1,1), (2, 2), (3, 3)).
![Page 76: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/76.jpg)
Possible Optimizations of DTW
The computation of DTW can be optimized so that only the
cells within a specific window are considered
![Page 77: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/77.jpg)
Possible Optimizations of DTW
You may have realized by now that if we care
only about the total cost of warping sequence X
with sequence Y, we do not need to compute
the entire N x M cost matrix – we need only two
columns
The storage savings are huge, but the running
time remains the same – O(N x M)
We can also normalize the DTW cost by N x M
to keep it low
![Page 78: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/78.jpg)
Spoken Word Recognition
source code is in WavAudioDictionary.java
![Page 79: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/79.jpg)
General Outline
Given a directory of audio files with spoken words, process
each file into a table that maps specific words (or phrases)
to digital signal vectors
These signal vectors can be pre-processed to eliminate
silences
An input audio file is taken and digitized into a digital
signal vector
The input vector is compared with DTW scores b/w the
input vector and the digital vectors in the table
![Page 80: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/80.jpg)
Optimizations
If we use DTW to compute the similarity b/w the
digital audio input vector and the vectors in the table,
it is vital to keep the vectors as short as possible w/o
sacrificing precision
Possible suggestions: decreasing the sampling rate
and merging samples into super-features (e.g., Haar
coefficients)
Parallelizing similarity computations
![Page 81: NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition](https://reader036.vdocuments.site/reader036/viewer/2022081602/55502227b4c90535638b55d1/html5/thumbnails/81.jpg)
References
M. Muller. Information Retrieval for Music and
Motion, Ch.04. Springer, ISBN 978-3-540-74047-6
Bachu, R. G., et al. “Separation of Voiced and
Unvoiced using Zero Crossing Rate and Energy of the
Speech Signal." American Society for Engineering
Education (ASEE) Zone Conference Proceedings. 2008.