2018 mueller mp-audiofeatures
TRANSCRIPT
![Page 1: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/1.jpg)
Music Processing
Meinard Müller
Lecture
Audio Features
International Audio Laboratories [email protected]
![Page 2: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/2.jpg)
Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de
![Page 3: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/3.jpg)
Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de
![Page 4: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/4.jpg)
Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de
![Page 5: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/5.jpg)
Chapter 2: Fourier Analysis of Signals
2.1 The Fourier Transform in a Nutshell2.2 Signals and Signal Spaces2.3 Fourier Transform2.4 Discrete Fourier Transform (DFT)2.5 Short-Time Fourier Transform (STFT)2.6 Further Notes
Important technical terminology is covered in Chapter 2. In particular, weapproach the Fourier transform—which is perhaps the most fundamental toolin signal processing—from various perspectives. For the reader who is moreinterested in the musical aspects of the book, Section 2.1 provides a summaryof the most important facts on the Fourier transform. In particular, the notion ofa spectrogram, which yields a time–frequency representation of an audiosignal, is introduced. The remainder of the chapter treats the Fourier transformin greater mathematical depth and also includes the fast Fourier transform(FFT)—an algorithm of great beauty and high practical relevance.
![Page 6: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/6.jpg)
Chapter 3: Music Synchronization
3.1 Audio Features3.2 Dynamic Time Warping3.3 Applications3.4 Further Notes
As a first music processing task, we study in Chapter 3 the problem of musicsynchronization. The objective is to temporally align compatiblerepresentations of the same piece of music. Considering this scenario, weexplain the need for musically informed audio features. In particular, weintroduce the concept of chroma-based music features, which captureproperties that are related to harmony and melody. Furthermore, we study analignment technique known as dynamic time warping (DTW), a concept that isapplicable for the analysis of general time series. For its efficient computation,we discuss an algorithm based on dynamic programming—a widely usedmethod for solving a complex problem by breaking it down into a collection ofsimpler subproblems.
![Page 7: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/7.jpg)
Fourier Transform
Sinusoids
Time (seconds)
Time (seconds)
Idea: Decompose a given signal into a superpositionof sinusoids (elementary signals).
Signal
![Page 8: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/8.jpg)
Each sinusoid has a physical meaningand can be described by three parameters:
Fourier Transform
Sinusoids
Time (seconds)
![Page 9: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/9.jpg)
Each sinusoid has a physical meaningand can be described by three parameters:
Fourier Transform
Sinusoids
Time (seconds)
Time (seconds)
Signal
![Page 10: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/10.jpg)
Each sinusoid has a physical meaningand can be described by three parameters:
Fourier Transform
Frequency (Hz)
Fourier transform
1 2 3 4 5 6 7 8
1
0.5
Time (seconds)
Signal
![Page 11: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/11.jpg)
Fourier Transform
Frequency (Hz)
Time (seconds)
Example: Superposition of two sinusoids
![Page 12: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/12.jpg)
Fourier Transform
Frequency (Hz)
Time (seconds)
Example: C4 played by piano
![Page 13: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/13.jpg)
Fourier Transform
Frequency (Hz)
Time (seconds)
Example: C4 played by trumpet
![Page 14: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/14.jpg)
Fourier Transform
Frequency (Hz)
Time (seconds)
Example: C4 played by violin
![Page 15: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/15.jpg)
Fourier Transform
Frequency (Hz)
Example: C4 played by flute
Time (seconds)
![Page 16: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/16.jpg)
Fourier Transform
Frequency (Hz)
Example: C-major scale (piano)
Time (seconds)
![Page 17: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/17.jpg)
Fourier Transform
Signal
Fourier representation
Fourier transform
![Page 18: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/18.jpg)
Fourier Transform
Tells which frequencies occur, but does not tell when the frequencies occur.
Frequency information is averaged over the entiretime interval.
Time information is hidden in the phase
Signal
Fourier representation
Fourier transform
![Page 19: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/19.jpg)
Fourier Transform
Frequency (Hz)
Time (seconds) Time (seconds)
Frequency (Hz)
![Page 20: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/20.jpg)
Idea (Dennis Gabor, 1946):
Consider only a small section of the signal for the spectral analysis
→ recovery of time information
Short Time Fourier Transform (STFT)
Section is determined by pointwise multiplication of the signal with a localizing window function
Short Time Fourier Transform
![Page 21: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/21.jpg)
Short Time Fourier Transform
Frequency (Hz)
Time (seconds)
![Page 22: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/22.jpg)
Short Time Fourier Transform
Frequency (Hz)
Time (seconds)
![Page 23: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/23.jpg)
Short Time Fourier Transform
Frequency (Hz)
Time (seconds)
![Page 24: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/24.jpg)
Short Time Fourier Transform
Frequency (Hz)
Time (seconds)
![Page 25: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/25.jpg)
Definition
Signal
Window function ( , )
STFT
with for
Short Time Fourier Transform
![Page 26: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/26.jpg)
Short Time Fourier TransformIntuition:
Freq
uenc
y(H
z)
Time (seconds)
4
8
12
1 2 3 4 5 60
0
is “musical note” of frequency ω centered at time t Inner product measures the correlation
between the musical note and the signal
![Page 27: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/27.jpg)
Time-Frequency Representation
Time (seconds)
SpectrogramFr
eque
ncy
(Hz)
![Page 28: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/28.jpg)
Time-Frequency Representation
Time (seconds)
Freq
uenc
y(H
z)Spectrogram
Frequency (Hz)
![Page 29: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/29.jpg)
Time-Frequency Representation
Time (seconds)
Freq
uenc
y(H
z)
Intensity
Spectrogram
Frequency (Hz)
![Page 30: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/30.jpg)
Time-Frequency Representation
Time (seconds)
SpectrogramFr
eque
ncy
(Hz)
Intensity
![Page 31: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/31.jpg)
Time-Frequency Representation
Time (seconds)
SpectrogramFr
eque
ncy
(Hz)
Intensity (dB)
![Page 32: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/32.jpg)
Time-Frequency RepresentationSignal and STFT with Hann window of length 20 ms
Freq
uenc
y(H
z)
Time (seconds)
![Page 33: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/33.jpg)
Time-Frequency RepresentationSignal and STFT with Hann window of length 100 ms
Freq
uenc
y(H
z)
Time (seconds)
![Page 34: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/34.jpg)
Size of window constitutes a trade-off between time resolution and frequency resolution:
Large window : poor time resolutiongood frequency resolution
Small window : good time resolutionpoor frequency resolution
Heisenberg Uncertainty Principle: there is nowindow function that localizes in time andfrequency with arbitrary position.
Time-Frequency RepresentationTime-Frequency Localization
![Page 35: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/35.jpg)
Audio Features
C124
C236
C348
C460
C572
C684
C796
C8108
Example: Chromatic scaleFr
eque
ncy
(Hz)
Inte
nsity
(dB)
Inte
nsity
(dB)
Freq
uenc
y(H
z)
Time (seconds)
Spectrogram
![Page 36: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/36.jpg)
Audio Features
C124
C236
C348
C460
C572
C684
C796
C8108
Example: Chromatic scaleFr
eque
ncy
(Hz)
Inte
nsity
(dB)
Inte
nsity
(dB)
Freq
uenc
y(H
z)
Time (seconds)
Spectrogram
![Page 37: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/37.jpg)
Audio FeaturesExample: Chromatic scale
C4: 262 HzC5: 523 Hz
C6: 1046 Hz
C7: 2093 Hz
C8: 4186 Hz
C3: 131 Hz
Inte
nsity
(dB)
Time (seconds)
SpectrogramC124
C236
C348
C460
C572
C684
C796
C8108
![Page 38: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/38.jpg)
Audio FeaturesExample: Chromatic scale
C4: 262 Hz
C5: 523 Hz
C6: 1046 Hz
C7: 2093 Hz
C8: 4186 Hz
C3: 131 Hz Inte
nsity
(dB)
Time (seconds)
Log-frequency spectrogramC124
C236
C348
C460
C572
C684
C796
C8108
![Page 39: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/39.jpg)
Audio FeaturesExample: Chromatic scale
Pitc
h (M
IDI n
ote
num
ber)
Inte
nsity
(dB)
Time (seconds)
Log-frequency spectrogramC124
C236
C348
C460
C572
C684
C796
C8108
![Page 40: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/40.jpg)
Audio FeaturesExample: Chromatic scale
Chroma C
Inte
nsity
(dB)
Pitc
h (M
IDI n
ote
num
ber)
Time (seconds)
Log-frequency spectrogramC124
C236
C348
C460
C572
C684
C796
C8108
![Page 41: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/41.jpg)
Audio FeaturesExample: Chromatic scale
Chroma C#
Inte
nsity
(dB)
Pitc
h (M
IDI n
ote
num
ber)
Time (seconds)
Log-frequency spectrogramC124
C236
C348
C460
C572
C684
C796
C8108
![Page 42: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/42.jpg)
Audio FeaturesExample: Chromatic scale
Chromagram
Inte
nsity
(dB)
Time (seconds)
Chr
oma
C124
C236
C348
C460
C572
C684
C796
C8108
![Page 43: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/43.jpg)
Audio FeaturesChroma features
Chromatic circle Shepard’s helix of pitch
![Page 44: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/44.jpg)
Audio FeaturesChroma features
Freq
uenc
y(p
itch)
Time (seconds)
![Page 45: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/45.jpg)
Audio FeaturesChroma features
Freq
uenc
y(p
itch)
Time (seconds)
![Page 46: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/46.jpg)
Audio FeaturesChroma features
Time (seconds)
Chr
oma
![Page 47: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/47.jpg)
Audio FeaturesChroma features
Time (seconds)
Chr
oma
![Page 48: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/48.jpg)
Audio FeaturesChroma features
Chr
oma
Time (seconds)
Chr
oma
Effect of logarithmic compression and normalization
![Page 49: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/49.jpg)
Audio FeaturesChroma features
Chr
oma
Time (seconds)
Chr
oma
Piano tuned 40 cents upwards
![Page 50: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/50.jpg)
Audio FeaturesChroma features
Chr
oma
Time (seconds)
Chr
oma
Cylic shift of four semitones upwards
![Page 51: 2018 Mueller MP-AudioFeatures](https://reader034.vdocuments.site/reader034/viewer/2022042307/625b2761830e366cc9229f23/html5/thumbnails/51.jpg)
Audio Features
There are many ways to implement chroma features
Properties may differ significantly
Appropriateness depends on respective application
Chroma Toolbox (MATLAB)https://www.audiolabs-erlangen.de/resources/MIR/chromatoolbox
LibROSA (Python)https://librosa.github.io/librosa/
Feature learning: “Deep Chroma”[Korzeniowski/Widmer, ISMIR 2016]