lecture 2 - northwestern university
TRANSCRIPT
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Topic 7
Convolution, Filters, Correlation,
Representation
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Short time Fourier Transform
•Break signal into windows
•Calculate DFT of each window
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
The Spectrogram
•A series of short term DFTs
•Typically just displays the magnitudes of X from
0 Hz to Nyquist rate
spectrogram(y,1024,512,1024,fs,'yaxis');
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Equal Temperament
• Octave is a relationship by power of 2.
• There are 12 half-steps in an octave
122n
reff f
number of half-steps
from the reference pitch
frequency of the
reference pitch
frequency of
desired pitch
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Spiral Pitch representation
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Chroma: Many to one
• Chroma = log2(freq) – floor(log2(freq))
• Chroma periodic in range 0 to (almost) 1
• Chroma map on to pitch classes
CHROMA
0 50 100 150 200 250 300 3500
100
200
300
400
500
600
700
800
900
time
freq
uen
cy
0.25
0.0
0.5
0.75
800 Hz
400 Hz
200 Hz
100 Hz
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Making a Chromagram
• Decide how to quantize (bin) the chroma range.
– 12 pitch classes? 120 bins? Equal temperment?
• Make a spectrogram
• For each time-step in the spectrogram
– find the chroma for each frequency from 0 to N/2
– Sum the amplitude of all frequencies with the same
chroma bin
• (Some chromagrams also add in the energy from the odd
harmonics)
– Place that value in the chroma bin
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Chromagram of Clarinet
100 200 300 400 500 600 700 800 900
2
4
6
8
10
12
C
C#
D#
E
F
F#
G
G#
A
A#
B
D
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Chromagram of Clarinet
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Convolution
• convolution is a mathematical operator which takes
two functions f and g and produces a third function
that represents the amount of overlap between f and
a reversed and translated version of g.
• Typically, one of the functions is taken to be a fixed
filter impulse response, and is known as a kernel.
( )( ) ( ) ( )b
af g t f g t d
Convolution
operator
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Discrete Convolution
• convolution is a mathematical operator which takes
two functions f and g and produces a third function
that represents the amount of overlap between f and
a reversed and translated version of g.
• Typically, one of the functions is taken to be a fixed
filter impulse response, and is known as a kernel.
( )[ ] [ ] [ ]n
f g m f n g m n
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Convolution (Time Domain)
function C= convolution(A,B)
lengthA= length(A);
lengthB= length(B);
C = zeros(1, lengthA + lengthB - 1);
for m = 1:lengthA
for n = 1:lengthB
C(m+n-1) = C(m+n-1) + A(m)*B(n);
end
end
TIMES! Not Convolution
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Convolution (Frequency Domain)
( ) ( )* ( )
( ) ( ) ( )
( ) ( ) ( )
y t x t h t
y t x h t d
Y X H
Convolution…
is defined in the
time domain as…
and in frequency
domain as…..
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Speeding implementation
• How long does convolution take?
• Think about these factsFFT(x) takes O(nlog(n)) time
• Knowing this, how can we speed convolution?
( ) ( )* ( )
( ) ( ) ( )
y t x t h t
Y X H
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Convolution = Filtering
(in the time domain)
Source Signalx(t)
Filterh(t)
Output Signaly(t)
)()(*)( tythtx Convolution
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
So what?
• So now you can design a quick-n-dirty bandpass filter H(k)!
• Steps
– Decide on the frequency resolution (number of points)
– Set the amplitudes of the different frequencies as you like
– Take the inverse FFT of your filter
– Do convolution with your signal
• Let’s try it!
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Impulse response
• An “impulse” is this signal:
• The impulse response h(t) of a filter is what happens when you do convolution of the impulse signal with the filter.
• The impulse response h(t) of a filter is the same as the IFFT of the transfer function H() of that filter.
1 if 0( )
0 else
tt
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Multiplication = Filtering
(in the frequency domain)
Source Signal
X()Filter
H()
Output SignalY()
( ) ( ) ( )X H Y
Multiplication Transfer function
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Multiplication
(in the frequency domain)
• Multiply kth frequency of X by the kth
frequency of H
• This is COMPLEX multiplication
• Doing X.*H in Matlab takes care of this
( ) ( ) [ ] [ ]X H X k H k
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Estimating H(k)
(in the frequency domain)
[ ] [ ] [ ]
so....
[ ] [ ] / [ ]
...more or less.
X k H k Y k
H k Y k X k
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Noise and Distortion
Source Signal
X()Filter
H()
Output SignalY()
Distortion
D()
Noise
N()
+
Distortion and noise make it a lot harder to estimate H
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Estimating H
• Hope noise and distortion are not
correlated with X.
• Call noise + distortion N.
• Then…
[ ] [ ] [ ] [ ]X k H k N k Y k
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Estimating H
• Estimate H a lot of times, in the hopes
that the noise will wash out in the mix…
1
ˆ [ ] [ ] [ ] / [ ] [ ]
1[ ] [ ] [ ] [ ]
N
n n
n
H k Y k X k X k X k
where
Y k X k Y k X kN
The average value Over N estimates
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio
Caveats
• This really only works on the frequencies
where there was energy in the input signal
X. If there wasn’t energy…well…then
we’re out of luck.
• So it is best to use broadband white noise
for X, if you can help it.