lecture 2 - northwestern university

24
Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio Topic 7 Convolution, Filters, Correlation, Representation

Upload: others

Post on 09-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Topic 7

Convolution, Filters, Correlation,

Representation

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Short time Fourier Transform

•Break signal into windows

•Calculate DFT of each window

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

The Spectrogram

•A series of short term DFTs

•Typically just displays the magnitudes of X from

0 Hz to Nyquist rate

spectrogram(y,1024,512,1024,fs,'yaxis');

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Equal Temperament

• Octave is a relationship by power of 2.

• There are 12 half-steps in an octave

122n

reff f

number of half-steps

from the reference pitch

frequency of the

reference pitch

frequency of

desired pitch

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Spiral Pitch representation

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Chroma: Many to one

• Chroma = log2(freq) – floor(log2(freq))

• Chroma periodic in range 0 to (almost) 1

• Chroma map on to pitch classes

CHROMA

0 50 100 150 200 250 300 3500

100

200

300

400

500

600

700

800

900

time

freq

uen

cy

0.25

0.0

0.5

0.75

800 Hz

400 Hz

200 Hz

100 Hz

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Making a Chromagram

• Decide how to quantize (bin) the chroma range.

– 12 pitch classes? 120 bins? Equal temperment?

• Make a spectrogram

• For each time-step in the spectrogram

– find the chroma for each frequency from 0 to N/2

– Sum the amplitude of all frequencies with the same

chroma bin

• (Some chromagrams also add in the energy from the odd

harmonics)

– Place that value in the chroma bin

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Chromagram of Clarinet

100 200 300 400 500 600 700 800 900

2

4

6

8

10

12

C

C#

D#

E

F

F#

G

G#

A

A#

B

D

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Chromagram of Clarinet

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Convolution

• convolution is a mathematical operator which takes

two functions f and g and produces a third function

that represents the amount of overlap between f and

a reversed and translated version of g.

• Typically, one of the functions is taken to be a fixed

filter impulse response, and is known as a kernel.

( )( ) ( ) ( )b

af g t f g t d

Convolution

operator

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Discrete Convolution

• convolution is a mathematical operator which takes

two functions f and g and produces a third function

that represents the amount of overlap between f and

a reversed and translated version of g.

• Typically, one of the functions is taken to be a fixed

filter impulse response, and is known as a kernel.

( )[ ] [ ] [ ]n

f g m f n g m n

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Convolution (Time Domain)

function C= convolution(A,B)

lengthA= length(A);

lengthB= length(B);

C = zeros(1, lengthA + lengthB - 1);

for m = 1:lengthA

for n = 1:lengthB

C(m+n-1) = C(m+n-1) + A(m)*B(n);

end

end

TIMES! Not Convolution

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Convolution (Frequency Domain)

( ) ( )* ( )

( ) ( ) ( )

( ) ( ) ( )

y t x t h t

y t x h t d

Y X H

Convolution…

is defined in the

time domain as…

and in frequency

domain as…..

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Speeding implementation

• How long does convolution take?

• Think about these factsFFT(x) takes O(nlog(n)) time

• Knowing this, how can we speed convolution?

( ) ( )* ( )

( ) ( ) ( )

y t x t h t

Y X H

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Convolution = Filtering

(in the time domain)

Source Signalx(t)

Filterh(t)

Output Signaly(t)

)()(*)( tythtx Convolution

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

So what?

• So now you can design a quick-n-dirty bandpass filter H(k)!

• Steps

– Decide on the frequency resolution (number of points)

– Set the amplitudes of the different frequencies as you like

– Take the inverse FFT of your filter

– Do convolution with your signal

• Let’s try it!

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Impulse response

• An “impulse” is this signal:

• The impulse response h(t) of a filter is what happens when you do convolution of the impulse signal with the filter.

• The impulse response h(t) of a filter is the same as the IFFT of the transfer function H() of that filter.

1 if 0( )

0 else

tt

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Multiplication = Filtering

(in the frequency domain)

Source Signal

X()Filter

H()

Output SignalY()

( ) ( ) ( )X H Y

Multiplication Transfer function

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Multiplication

(in the frequency domain)

• Multiply kth frequency of X by the kth

frequency of H

• This is COMPLEX multiplication

• Doing X.*H in Matlab takes care of this

( ) ( ) [ ] [ ]X H X k H k

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Estimating H(k)

(in the frequency domain)

[ ] [ ] [ ]

so....

[ ] [ ] / [ ]

...more or less.

X k H k Y k

H k Y k X k

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Noise and Distortion

Source Signal

X()Filter

H()

Output SignalY()

Distortion

D()

Noise

N()

+

Distortion and noise make it a lot harder to estimate H

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Estimating H

• Hope noise and distortion are not

correlated with X.

• Call noise + distortion N.

• Then…

[ ] [ ] [ ] [ ]X k H k N k Y k

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Estimating H

• Estimate H a lot of times, in the hopes

that the noise will wash out in the mix…

1

ˆ [ ] [ ] [ ] / [ ] [ ]

1[ ] [ ] [ ] [ ]

N

n n

n

H k Y k X k X k X k

where

Y k X k Y k X kN

The average value Over N estimates

Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Caveats

• This really only works on the frequencies

where there was energy in the input signal

X. If there wasn’t energy…well…then

we’re out of luck.

• So it is best to use broadband white noise

for X, if you can help it.