meinard müller, christof weiss, stefan balke€¦ · meinard müller, christof weiss, stefan balke...

20
Meinard Müller, Christof Weiss, Stefan Balke Audio Features International Audio Laboratories Erlangen {meinard.mueller, christof.weiss, stefan.balke}@audiolabs-erlangen.de Tutorial Automatisierte Methoden der Musikverarbeitung 47. Jahrestagung der Gesellschaft für Informatik Book: Fundamentals of Music Processing Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de Chapter 2: Fourier Analysis of Signals 2.1 The Fourier Transform in a Nutshell 2.2 Signals and Signal Spaces 2.3 Fourier Transform 2.4 Discrete Fourier Transform (DFT) 2.5 Short-Time Fourier Transform (STFT) 2.6 Further Notes Important technical terminology is covered in Chapter 2. In particular, we approach the Fourier transform—which is perhaps the most fundamental tool in signal processing—from various perspectives. For the reader who is more interested in the musical aspects of the book, Section 2.1 provides a summary of the most important facts on the Fourier transform. In particular, the notion of a spectrogram, which yields a time–frequency representation of an audio signal, is introduced. The remainder of the chapter treats the Fourier transform in greater mathematical depth and also includes the fast Fourier transform (FFT)—an algorithm of great beauty and high practical relevance. Chapter 3: Music Synchronization 3.1 Audio Features 3.2 Dynamic Time Warping 3.3 Applications 3.4 Further Notes As a first music processing task, we study in Chapter 3 the problem of music synchronization. The objective is to temporally align compatible representations of the same piece of music. Considering this scenario, we explain the need for musically informed audio features. In particular, we introduce the concept of chroma-based music features, which capture properties that are related to harmony and melody. Furthermore, we study an alignment technique known as dynamic time warping (DTW), a concept that is applicable for the analysis of general time series. For its efficient computation, we discuss an algorithm based on dynamic programming—a widely used method for solving a complex problem by breaking it down into a collection of simpler subproblems.

Upload: others

Post on 02-Apr-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Meinard Müller, Christof Weiss, Stefan Balke

Audio Features

International Audio Laboratories Erlangen{meinard.mueller, christof.weiss, stefan.balke}@audiolabs-erlangen.de

TutorialAutomatisierte Methoden der Musikverarbeitung47. Jahrestagung der Gesellschaft für Informatik

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

Chapter 2: Fourier Analysis of Signals

2.1 The Fourier Transform in a Nutshell2.2 Signals and Signal Spaces2.3 Fourier Transform2.4 Discrete Fourier Transform (DFT)2.5 Short-Time Fourier Transform (STFT)2.6 Further Notes

Important technical terminology is covered in Chapter 2. In particular, weapproach the Fourier transform—which is perhaps the most fundamental toolin signal processing—from various perspectives. For the reader who is moreinterested in the musical aspects of the book, Section 2.1 provides a summaryof the most important facts on the Fourier transform. In particular, the notion ofa spectrogram, which yields a time–frequency representation of an audiosignal, is introduced. The remainder of the chapter treats the Fourier transformin greater mathematical depth and also includes the fast Fourier transform(FFT)—an algorithm of great beauty and high practical relevance.

Chapter 3: Music Synchronization

3.1 Audio Features3.2 Dynamic Time Warping3.3 Applications3.4 Further Notes

As a first music processing task, we study in Chapter 3 the problem of musicsynchronization. The objective is to temporally align compatiblerepresentations of the same piece of music. Considering this scenario, weexplain the need for musically informed audio features. In particular, weintroduce the concept of chroma-based music features, which captureproperties that are related to harmony and melody. Furthermore, we study analignment technique known as dynamic time warping (DTW), a concept that isapplicable for the analysis of general time series. For its efficient computation,we discuss an algorithm based on dynamic programming—a widely usedmethod for solving a complex problem by breaking it down into a collection ofsimpler subproblems.

Fourier Transform

Sinusoids

Time (seconds)

Time (seconds)

Idea: Decompose a given signal into a superpositionof sinusoids (elementary signals).

Signal

Each sinusoid has a physical meaningand can be described by three parameters:

Fourier Transform

Sinusoids

Time (seconds)

Each sinusoid has a physical meaningand can be described by three parameters:

Fourier Transform

Sinusoids

Time (seconds)

Time (seconds)

Signal

Each sinusoid has a physical meaningand can be described by three parameters:

Fourier Transform

Frequency (Hz)

Fourier transform

1 2 3 4 5 6 7 8

1

0.5

˄

Time (seconds)

Signal | |

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: Superposition of two sinusoids

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: C4 played by piano

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: C4 played by trumpet

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: C4 played by violin

Fourier Transform

Frequency (Hz)

Example: C4 played by flute

Time (seconds)

Fourier Transform

Frequency (Hz)

Example: Speech “Bonn”

Time (seconds)

Fourier Transform

Frequency (Hz)

Example: Speech “Zürich”

Time (seconds)

Fourier Transform

Frequency (Hz)

Example: C-major scale (piano)

Time (seconds)

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: Chirp signal

Fourier TransformExample: Piano tone (C4, 261.6 Hz)

Time (seconds)

Time (seconds)

Fourier TransformExample: Piano tone (C4, 261.6 Hz)

Time (seconds)

Time (seconds)

Analysis using sinusoid with 262 Hz→ high correlation→ large Fourier coefficient

Fourier TransformExample: Piano tone (C4, 261.6 Hz)

Time (seconds)

Time (seconds)

Analysis using sinusoid with 400 Hz→ low correlation→ small Fourier coefficient

Fourier TransformExample: Piano tone (C4, 261.6 Hz)

Analysis using sinusoid with 523 Hz→ high correlation→ large Fourier coefficient

Time (seconds)

Time (seconds)

Fourier TransformRole of phase

Time (seconds)

Phase

Mag

nitu

de

0

0.5

025

-0.5

-0.25

0 10.50.25 0.75

Fourier TransformRole of phase

Time (seconds)

Phase

Mag

nitu

de

0

0.5

025

-0.5

-0.25

0 10.50.25 0.75

Analysis with sinusoid having frequency 262 Hz and phase φ = 0.05

Fourier TransformRole of phase

Time (seconds)

Phase

Mag

nitu

de

0

0.5

025

-0.5

-0.25

0 10.50.25 0.75

Analysis with sinusoid having frequency 262 Hz and phase φ = 0.24

Fourier TransformRole of phase

Time (seconds)

Phase

Mag

nitu

de

0

0.5

025

-0.5

-0.25

0 10.50.25 0.75

Analysis with sinusoid having frequency 262 Hz and phase φ = 0.45

Fourier TransformRole of phase

Time (seconds)

Phase

Mag

nitu

de

0

0.5

025

-0.5

-0.25

0 10.50.25 0.75

Analysis with sinusoid having frequency 262 Hz and phase φ = 0.6

Each sinusoid has a physical meaningand can be described by three parameters:

Fourier Transform

Complex formulation of sinusoids:

Polar coordinates:

Re

Im

Fourier Transform

Signal

Fourier representation

Fourier transform

Fourier Transform

Tells which frequencies occur, but does not tell when the frequencies occur.

Frequency information is averaged over the entiretime interval.

Time information is hidden in the phase

Signal

Fourier representation

Fourier transform

Fourier Transform

Frequency (Hz)

Time (seconds) Time (seconds)

Frequency (Hz)

Idea (Dennis Gabor, 1946):

Consider only a small section of the signal for the spectral analysis

→ recovery of time information

Short Time Fourier Transform (STFT)

Section is determined by pointwise multiplication of the signal with a localizing window function

Short Time Fourier Transform Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Short Time Fourier Transform

Frequency (Hz)

Time (seconds)

Short Time Fourier Transform

Frequency (Hz)Time (seconds)

Window functions

Rectangular window

Triangular window

Hann window

Short Time Fourier Transform

Frequency (Hz)Time (seconds)

Window functions

→ Trade off between smoothing and “ringing”

Definition

Signal

Window function ( , )

STFT

with for

Short Time Fourier Transform

Short Time Fourier TransformIntuition:

Freq

uenc

y(H

z)

Time (seconds)

4

8

12

1 2 3 4 5 60

0

is “musical note” of frequency ω centered at time t Inner product measures the correlation

between the musical note and the signal

Time-Frequency RepresentationFr

eque

ncy

(Hz)

Time (seconds)

Spectrogram

Time-Frequency Representation

Intensity (dB)

Freq

uenc

y(H

z)

Time (seconds)

Spectrogram

Time-Frequency RepresentationChirp signal and STFT with Hann window of length 50 ms

Freq

uenc

y(H

z)

Time (seconds)

Time-Frequency RepresentationChirp signal and STFT with box window of length 50 ms

Freq

uenc

y(H

z)

Time (seconds)

Size of window constitutes a trade-off between time resolution and frequency resolution:

Large window : poor time resolutiongood frequency resolution

Small window : good time resolutionpoor frequency resolution

Heisenberg Uncertainty Principle: there is nowindow function that localizes in time andfrequency with arbitrary precision.

Time-Frequency RepresentationTime-Frequency Localization

Time-Frequency RepresentationSignal and STFT with Hann window of length 20 ms

Freq

uenc

y(H

z)

Time (seconds)

Time-Frequency RepresentationSignal and STFT with Hann window of length 100 ms

Freq

uenc

y(H

z)

Time (seconds)

Audio FeaturesExample: C-major scale (piano)

Time (seconds)

Freq

uenc

y(H

z)

Spectrogram

Audio FeaturesExample: Chromatic scale

Freq

uenc

y (H

z)

Inte

nsity

(dB

)

Inte

nsity

(dB

)

Freq

uenc

y(H

z)

Time (seconds)

C124

C236

C348

C460

C572

C684

C796

C8108

Spectrogram

C124

C236

C348

C460

C572

C684

C796

C8108

Audio FeaturesExample: Chromatic scale

Freq

uenc

y (H

z)

Inte

nsity

(dB

)

Inte

nsity

(dB

)

Freq

uenc

y(H

z)

Time (seconds)

SpectrogramC348

Audio Features

Model assumption: Equal-tempered scale

MIDI pitches:

Piano notes: p = 21 (A0) to p = 108 (C8)

Concert pitch: p = 69 (A4) ≙ 440 Hz

Center frequency:

→ Logarithmic frequency distributionOctave: doubling of frequency

Hz

Audio Features

Idea: Binning of Fourier coefficients

Divide up the fequency axis intologarithmically spaced “pitch regions”and combine spectral coefficientsof each region to a single pitch coefficient.

Audio FeaturesTime-frequency representation

Windowing in the time domain

Window

ing in the frequency domain

Audio Features

C124

C236

C348

C460

C572

C684

C796

C8108

Example: Chromatic scale

Freq

uenc

y (H

z)

Inte

nsity

(dB

)

Inte

nsity

(dB

)

Freq

uenc

y(H

z)

Time (seconds)

Spectrogram

Audio FeaturesExample: Chromatic scale

C4: 262 HzC5: 523 Hz

C6: 1046 Hz

C7: 2093 Hz

C8: 4186 Hz

C3: 131 Hz

Inte

nsity

(dB

)

Time (seconds)

SpectrogramC124

C236

C348

C460

C572

C684

C796

C8108

Audio FeaturesExample: Chromatic scale

C4: 262 Hz

C5: 523 Hz

C6: 1046 Hz

C7: 2093 Hz

C8: 4186 Hz

C3: 131 Hz Inte

nsity

(dB

)

Time (seconds)

Log-frequency spectrogramC124

C236

C348

C460

C572

C684

C796

C8108

Audio Features

Note MIDI pitch

Center [Hz] frequency

Left [Hz] boundary

Right [Hz] boundary

Width [Hz]

A3 57 220.0 213.7 226.4 12.7

A#3 58 233.1 226.4 239.9 13.5

B3 59 246.9 239.9 254.2 14.3

C4 60 261.6 254.2 269.3 15.1

C#4 61 277.2 269.3 285.3 16.0

D4 62 293.7 285.3 302.3 17.0

D#4 63 311.1 302.3 320.2 18.0

E4 64 329.6 320.2 339.3 19.0

F4 65 349.2 339.3 359.5 20.2

F#4 66 370.0 359.5 380.8 21.4

G4 67 392.0 380.8 403.5 22.6

G#4 68 415.3 403.5 427.5 24.0

A4 69 440.0 427.5 452.9 25.4

Frequency ranges for pitch-based log-frequency spectrogram

Audio FeaturesChroma features

Chromatic circle Shepard’s helix of pitch

Audio FeaturesChroma features

Human perception of pitch is periodic in the sense that two pitches are perceived as similar in color if they differ by an octave.

Seperation of pitch into two components: tone height (octave number) and chroma.

Chroma : 12 traditional pitch classes of the equal-tempered scale. For example:Chroma C

Computation: pitch features chroma featuresAdd up all pitches belonging to the same class

Result: 12-dimensional chroma vector.

Audio FeaturesChroma features

Audio FeaturesChroma features

C2 C3 C4

Chroma C

Audio FeaturesChroma features

C#2 C#3 C#4

Chroma C#

Audio FeaturesChroma features

D2 D3 D4

Chroma D

Audio FeaturesExample: Chromatic scale

Pitc

h (M

IDI n

ote

num

ber)

Inte

nsity

(dB

)

Time (seconds)

Log-frequency spectrogramC124

C236

C348

C460

C572

C684

C796

C8108

Audio FeaturesExample: Chromatic scale

Chroma C

Inte

nsity

(dB

)

Pitc

h (M

IDI n

ote

num

ber)

Time (seconds)

Log-frequency spectrogramC124

C236

C348

C460

C572

C684

C796

C8108

Audio FeaturesExample: Chromatic scale

Chroma C#

Inte

nsity

(dB

)

Pitc

h (M

IDI n

ote

num

ber)

Time (seconds)

Log-frequency spectrogramC124

C236

C348

C460

C572

C684

C796

C8108

Audio FeaturesExample: Chromatic scale

Chromagram

Inte

nsity

(dB

)

Time (seconds)

Chr

oma

C124

C236

C348

C460

C572

C684

C796

C8108

Audio FeaturesChroma features

Freq

uenc

y(p

itch)

Time (seconds)

Audio FeaturesChroma features

Freq

uenc

y(p

itch)

Time (seconds)

Audio FeaturesChroma features

Time (seconds)

Chr

oma

Sequence of chroma vectors correlates to theharmonic progression

Normalization → makes features invariantto changes in dynamics

Further denoising and smoothing

Taking logarithm before adding up pitch coefficientsaccounts for logarithmic sensation of intensity

Audio FeaturesChroma features

Audio Features

For a positive constantthe logarithmic compression

is defined by

A value is replacedby a compressed value

Logarithmic compression

Audio Features

γ = 1Identity

γ = 100γ = 10

Com

pres

sed

valu

es

Original values

For a positive constantthe logarithmic compression

is defined by

A value is replacedby a compressed value

The higherthe stronger the compression

Logarithmic compression

Audio Features

γ = 1Identity

γ = 100γ = 10

Com

pres

sed

valu

es

Original values

The higherthe stronger the compression

A value is replacedby a compressed value

Original chromagram

Logarithmic compression

Audio FeaturesNormalization

Replace a vectorby the normalized vector

using a suitable norm

Chroma vectorExample:

Euclidean norm

Audio FeaturesNormalization

Replace a vectorby the normalized vector

using a suitable norm

Chroma vectorExample:

Euclidean norm

Example: C4 played by piano

Chromagram

Normalized chromagram

Audio FeaturesNormalization

Replace a vectorby the normalized vector

using a suitable norm

Chroma vectorExample:

Euclidean norm

Example: C4 played by piano

Log-chromagram

Normalized log-chromagram

Karajan

Audio FeaturesChroma features (normalized)

Scherbakov

Audio FeaturesChroma features

Time (seconds)

Chromagram

Chromagram after logarithmic compression and normalization

Chromagram based on a piano tuned 40 cents upwards

Chromagram after applying a cyclic shift of four semitones upwards

Audio Features

There are many ways to implement chroma features Properties may differ significantly Appropriateness depends on respective application

http://www.mpi-inf.mpg.de/resources/MIR/chromatoolbox/ MATLAB implementations for various chroma variants

Additional Material

Inner Product

Length of a vector Angle betweentwo vectors

Orthogonality oftwo vectors

for

Inner Product

Time (seconds)

Measuring the similarity of two functions

→ Area mostly positive and large→ Integral large→ Similarity high

Inner Product

Time (seconds)

Measuring the similarity of two functions

→ Area positive and negative→ Integral small→ Similarity low

Discretization

Time (seconds)

Am

plitu

de

Discretization

Sampling period Time (seconds)

Am

plitu

de

Sampling

Discretization

Time (seconds)

Am

plitu

de

Sampling period

Sampling

DiscretizationQuantization

Quantizationstep size

Time (seconds)

Am

plitu

de

Sampling period

DiscretizationQuantization

Quantizationstep size

Time (seconds)

Am

plitu

de

Sampling period

DiscretizationSampling

CT-signal

Sampling period

Equidistant sampling,

DT-signal

Sample taken at time

Sampling rate

DiscretizationAliasing

Time (seconds)

Am

plitu

de

Original signal

DiscretizationAliasing

Time (seconds)

Am

plitu

de

Original signal

Sampled signal using a sampling rate of 12 Hz

DiscretizationAliasing

Time (seconds)

Am

plitu

de

Original signal

Reconstructed signal

Sampled signal using a sampling rate of 12 Hz

DiscretizationAliasing

Time (seconds)

Am

plitu

de

Original signal

Reconstructed signal

Sampled signal using a sampling rate of 6 Hz

DiscretizationAliasing

Time (seconds)

Am

plitu

de

Original signal

Reconstructed signal

Sampled signal using a sampling rate of 3 Hz

DiscretizationIntegrals and Riemann sums

Time (seconds)0 1 2 3 4 5 6 7 8 9

CT-signal

DiscretizationIntegrals and Riemann sums

Time (seconds)0 1 2 3 4 5 6 7 8 9

CT-signalIntegral (total area)

DiscretizationIntegrals and Riemann sums

Time (seconds)0 1 2 3 4 5 6 7 8 9

CT-signal

DT-signals (obtained by 1-sampling)

Integral (total area)

DiscretizationIntegrals and Riemann sums

Time (seconds)0 1 2 3 4 5 6 7 8 9

CT-signal

DT-signals (obtained by 1-sampling)

Integral (total area)

Riemann sum (total area) → Approximation of integral

Discretization

Time (seconds)

First CT-signal and DT-signal

Second CT-signal and DT-signal

Product of CT-signals and DT-signals

Integrals and Riemann sums

Discretization

Time (seconds)

First CT-signal and DT-signal

Second CT-signal and DT-signal

Product of CT-signals and DT-signals

Riemann sumIntegral

Integrals and Riemann sums

Exponential FunctionPolar coordinate representation of a complex number

Exponential FunctionReal and imaginary part (Euler’s formula)

Exponential FunctionComplex conjugate number

Exponential FunctionAdditivity property

Fourier TransformChirp signal with λ = 0.003

Frequency (Hz)Time (seconds)

Frequency (Hz)Time (seconds)

CT-signal

DT-signal (1-sampling)

Magnitude Fourier transform

Magnitude Fourier transform

Fourier Transform

Frequency (Hz)Time (seconds)

Frequency (Hz)Time (seconds)

Chirp signal with λ = 0.004

CT-signal

DT-signal (1-sampling)

Magnitude Fourier transform

Magnitude Fourier transform

Fourier TransformDFT approximation of Fourier transform

CT-signal Magnitude Fourier transform

DT-signal (1-sampling) Magnitude Fourier transform

IndexIndex

Frequency (Hz)Time (seconds)

Fourier TransformDFT approximation of Fourier transform

CT-signal Magnitude Fourier transform

DT-signal (1-sampling) Magnitude Fourier transform

IndexIndex

Frequency (Hz)Time (seconds)

Fourier coefficient for frequency index and time frame

Fourier Transform

DT-signal

Window function of length

Discrete STFT

Hop size

Index corresponding to Nyquist frequency

Fourier TransformDiscrete STFT

= Hop size

Physical time position associated with :

Physical frequency associated with :

= Sampling rate(seconds)

(Hertz)

Fourier Transform

Time (seconds)

Time (seconds)

Index (frames)

Index (samples)

Freq

uenc

y(H

z)

Inde

x (fr

eque

ncy)

= 8= 32 Hz

= 64Parameters

Computational world Physical world

Discrete STFT

Log-Frequency SpectrogramPooling procedure for discrete STFT = 2048

= 44100 Hz

= 4096Parameters

p = 67

p = 68

p = 69

p = 70

Frames

Fpitch (69.5) = 452.9

Fpitch (68.5) = 427.5

Fpitch (67.5) = 403.5

Fcoef (42) = 452.2

Fcoef (41) = 441.4

Fcoef (43) = 463.0

Fcoef (40) = 430.7

Fcoef (39) = 419.9

Fcoef (38) = 409.1

Fcoef (37) = 398.4

Frames

Fast Fourier Transform Signal Spaces and Fourier Transforms