advanced multimedia music information retrieval tamara berg
TRANSCRIPT
![Page 1: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/1.jpg)
Advanced Multimedia
Music Information RetrievalTamara Berg
![Page 2: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/2.jpg)
Announcements
• Still missing a few assignment 1’s
• Assignment 2 is online – due March 10
![Page 3: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/3.jpg)
Audio Indexing and Retrieval
• Motivation• Features for representing audio:
– Metadata– low level features – high level audio features
• Example usage cases:Audio classificationMusic retrieval
![Page 4: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/4.jpg)
Howard Leung
![Page 5: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/5.jpg)
Content Based Music Retrieval
Extract music descriptions from a database of music documents.
Extract music description from a query music document.
Compute match between query and database descriptions.
Retrieve similar music documents to query.
Casey et al IEEE 2008
![Page 6: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/6.jpg)
MIR tasks
H: high level specificity – match specific instances of audio content.
M: mid-level specificity – match high level audio features like melody, but do not match audio content.
L: low specificity – match global (statistical) properties of the query
Different usage cases require different descriptions and matching schema.
Casey et al IEEE 2008
![Page 7: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/7.jpg)
Metadata
• Most common method of accessing music• Can be rich and expressive• When catalogues become very large, difficult
to maintain consistent metadata
Useful for low specificity queries
Casey et al IEEE 2008
![Page 8: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/8.jpg)
Metadata• Pandora.com – Uses metadata to estimate artist
similarity and track similarity and creates personalized radio stations. Human entered metadata of musical-cultural properties (20-30 minutes per track of an expert’s time – 50 person-years for 1 million tracks).
• User contributed metadata repositories (gracenote, musicbrainz). Factual metadata (artist, album, year, title, duration). Cultural metadata (mood, emotion, genre, style).
• Automatic metadata methods – generate descriptions from community metadata automatically. Language analysis to associate noun and verb phrases with musical features (Whitman & Rifkin).
Casey et al IEEE 2008
![Page 9: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/9.jpg)
Content features• Low level or high level• Want features to be robust to certain changes in
the audio signal (why?)– Noise– Volume– Sampling
• High level features will be more robust to changes, low level features will be less robust.
• Low level features will be easy to compute, high level difficult
![Page 10: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/10.jpg)
Low level audio features
• Low level measurements of audio signal that contain information about a musical work.
• Can be computed periodically (10-1000 ms intervals) or beat synchronous.
Casey et al IEEE 2008
In text analysis we had words, here we have to come up with our own set of features to compute from audio signal!
![Page 11: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/11.jpg)
Example Low-Level Audio Features
Howard Leung
![Page 12: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/12.jpg)
Howard Leung
![Page 13: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/13.jpg)
Howard Leung
Average number of times signal crosses zero amplitude value.
1 if trueO o.w.
![Page 14: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/14.jpg)
Howard Leung
![Page 15: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/15.jpg)
Frequency Domain Reminder
How much of each describes the frequency spectrum of a signal.Li & Drew
Signals can be decomposed into a weighted sum of sinusoids
![Page 16: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/16.jpg)
Frequency domain features
• How do we get to frequency domain?
Time Frequency
![Page 17: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/17.jpg)
DFTDiscrete Fourier Transform (DFT) of the audio
Converts to a frequency representation
DFT analysis occurs in terms of number of equallyspaced ‘bins’
Each bin represents a particular frequency rangeDFT analysis gives the amount of energy in the audio signalthat is present within the frequency range for each bin
Inverse Discrete Fourier Transform (IDFT)Converts from frequency representation back to audio signal.
![Page 18: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/18.jpg)
DFTDiscrete Fourier Transform (DFT) of the audio
Converts to a frequency representation
DFT analysis occurs in terms of number of equallyspaced ‘bins’
Each bin represents a particular frequency rangeDFT analysis gives the amount of energy in the audio signalthat is present within the frequency range for each bin
Inverse Discrete Fourier Transform (IDFT)Converts from frequency representation back to audio signal.
![Page 19: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/19.jpg)
DFTDiscrete Fourier Transform (DFT) of the audio
Converts to a frequency representation
DFT analysis occurs in terms of number of equallyspaced ‘bins’
Each bin represents a particular frequency rangeDFT analysis gives the amount of energy in the audio signalthat is present within the frequency range for each bin
Inverse Discrete Fourier Transform (IDFT)Converts from frequency representation back to audio signal.
![Page 20: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/20.jpg)
Howard Leung
![Page 21: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/21.jpg)
FilteringRemoves frequency components from some
part of the spectrum Low pass filter – removes high frequency
components from input and leaves only low in the output signal.
High pass filter – removes low frequency components from input and leaves only high in the output signal.
Band pass filter – removes some part of the frequency spectrum.
![Page 22: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/22.jpg)
How could you do this using the FT and IFT?
Compute FT spectrum of input.
Zero out the part of the frequency spectrum that you want to filter out.
Compute the IFT of this modified spectrum -> output will be input with some frequency components removed.
![Page 23: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/23.jpg)
How could you do this using the FT and IFT?
f = input
![Page 24: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/24.jpg)
How could you do this using the FT and IFT?
f = input FT(f)
![Page 25: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/25.jpg)
How could you do this using the FT and IFT?
1
0
.*
f = input FT(f)
![Page 26: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/26.jpg)
How could you do this using the FT and IFT?
1
0
.*
f = input FT(f)
Zero out some freq components
![Page 27: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/27.jpg)
How could you do this using the FT and IFT?
1
0
.*
=
f = input FT(f)
Zero out some freq components
x xxxxxxxxxxxxx
![Page 28: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/28.jpg)
How could you do this using the FT and IFT?
1
0
.*
=
f = input FT(f)
Zero out some freq components IFT
o = Frequency limited output
x xxxxxxxxxxxxx
![Page 29: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/29.jpg)
How could you do this using the FT and IFT?
1
0
.*
=
f = input FT(f)
Zero out some freq components IFT
o = Frequency limited output
x xxxxxxxxxxxxx
What kind of filter is this?
![Page 30: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/30.jpg)
How could you do this using the FT and IFT?
f = input
![Page 31: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/31.jpg)
How could you do this using the FT and IFT?
f = input FT(f)
![Page 32: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/32.jpg)
How could you do this using the FT and IFT?
1
0
.*
f = input FT(f)
![Page 33: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/33.jpg)
How could you do this using the FT and IFT?
1
0
.*
f = input FT(f)
Zero out some freq components
![Page 34: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/34.jpg)
How could you do this using the FT and IFT?
1
0
.*
=
f = input FT(f)
Zero out some freq components
xxxxxxxxxxxxx
![Page 35: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/35.jpg)
How could you do this using the FT and IFT?
1
0
.*
=
f = input FT(f)
Zero out some freq components IFT
o = Frequency limited output
xxxxxxxxxxxxx
![Page 36: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/36.jpg)
How could you do this using the FT and IFT?
1
0
.*
=
f = input FT(f)
Zero out some freq components IFT
o = Frequency limited output
xxxxxxxxxxxxx
What kind of filter is this?
![Page 37: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/37.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution (we’ll see this again for images):
(convolution demo)
1 3 2 5 3 2 4 5
1/3 1/3 1/3
f =
g =
signal
filter
![Page 38: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/38.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/3 1/3 1/3* * *
= - 2
![Page 39: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/39.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/3 1/3 1/3* * *
= - 2 10/3
![Page 40: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/40.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/3 1/3 1/3* * *
= - 2 10/3 10/3
![Page 41: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/41.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/3 1/3 1/3* * *
= - 2 10/3 10/3 10/3
![Page 42: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/42.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/3 1/3 1/3* * *
= - 2 10/3 10/3 10/3 3
![Page 43: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/43.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/3 1/3 1/3* * *
= - 2 10/3 10/3 10/3 3 11/3 -
What does this filter do?
![Page 44: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/44.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/4 1/2 1/4
f =
g =
signal
filter
![Page 45: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/45.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/4 1/2 1/4* * *
= - 2.25
![Page 46: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/46.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/4 1/2 1/4* * *
= - 2.25 3
![Page 47: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/47.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/4 1/2 1/4* * *
= - 2.25 3 3.75
![Page 48: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/48.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/4 1/2 1/4* * *
= - 2.25 3 3.75 3.25
![Page 49: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/49.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/4 1/2 1/4* * *
= - 2.25 3 3.75 3.25 2.75
![Page 50: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/50.jpg)
Filtering
Alternatively you can convolve the input signal with a filter to get frequency limited output signal.
Convolution:
(convolution demo)
1 3 2 5 3 2 4 5
1/4 1/2 1/4* * *
= - 2.25 3 3.75 3.25 2.75 3.75 -
In general filters will have a more complex effect on the output.
![Page 51: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/51.jpg)
What is convolution doing?
![Page 52: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/52.jpg)
Relationship
f = input F = ft(f)g = filterG = ft(g)
f f ★ g
![Page 53: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/53.jpg)
Relationship
f = input F = ft(f)g = filterG = ft(g)
f f ★ g
F F .*G
![Page 54: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/54.jpg)
Relationship
f = input F = ft(f)g = filterG = ft(g)
f f ★ g
F F .*G
FT FT
Theorem: Convolution in signal space is equivalent to point-wise multiplication in frequency space.
![Page 55: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/55.jpg)
Relationship
f = input F = ft(f)g = filterG = ft(g)
f f ★ g
F F .*G
FT FT
f ★ g = IFT(F.*G)F.*G = FT(f ★ g)
Theorem: Convolution in signal space is equivalent to point-wise multiplication in frequency space.
![Page 56: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/56.jpg)
Matlab demo
soundFilt/demo.m
![Page 57: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/57.jpg)
Howard Leung
![Page 58: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/58.jpg)
Howard Leung
![Page 59: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/59.jpg)
Pitch-Class Profile (PCP)
• Represent the energy due to each pitch class • Integrates the energy in all octaves into a single band• There are 12 equally spaced pitch classes in western tonal
music. So, typically 12 bands in the PCP.
![Page 60: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/60.jpg)
Pitch-Class Profile (PCP)
• Represent the energy due to each pitch class • Integrates the energy in all octaves into a single band• There are 12 equally spaced pitch classes in western tonal
music. So, typically 12 bands in the PCP.
How might we calculate this using the DFT?
![Page 61: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/61.jpg)
Howard Leung
![Page 62: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/62.jpg)
High level music features
High level intuitive information about a piece of music (melody, harmony etc).
“It is melody that enables us to distinguish one work from another. It is melody that human beings are innately able to reproduce by singing, humming, andwhistling. It is melody that makes music memorable: we are likely to recall a tune long after we have forgotten its text.”
-Selfridge-Field
Intuitive features, but hard to extract and ongoing areas of research.
Casey et al IEEE 2008
![Page 63: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/63.jpg)
Melody & Bass Estimation
• Melody and bass lines represented as continuous temporal trajectory of fundamental frequency, F0 (a series of musical notes).
• PreFEst (Predominant-F0 Estimation method – Goto 1999) – Estimate the F0 trajectory in mid-high freq range of
input -> melody. – Estimate the F0 trajectory in low freq range-> bass.
Casey et al IEEE 2008
![Page 64: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/64.jpg)
Chord Recognition
Musical performance is assumed to travel through a sequence of states.
Hidden Markov Model (HMM – probabilistic model good for modeling sequences of data, here sequences of chords over time) is used to model these transitions and predict the best chord sequence given a set of observations (PCP).
Transition model – Probability of transitioning from one chord to another
Output model – Probability of a PCP given a chord.
Casey et al IEEE 2008
![Page 65: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/65.jpg)
Chord Recognition
![Page 66: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/66.jpg)
Chord Recognition
![Page 67: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/67.jpg)
Music Structure
• Segment into temporal regions with some internal consistency – Beat segmentation– Verse, chorus, bridge– Speech vs music
• Uses:– facilitate audio editing– Improve similarity measurements by removing
irrelevant parts or selecting most representative parts (for recommender systems).
Casey et al IEEE 2008
![Page 68: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/68.jpg)
Music Structure
Detect repeated structures and label them as being the same.
![Page 69: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/69.jpg)
Music as vector of features
• Once again we represent (music) documents as a vector of numbers – Each entry (or set of entries) in this vector is a different
feature
![Page 70: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/70.jpg)
Music as vector of features
• Once again we represent (music) documents as a vector of numbers – Each entry (or set of entries) in this vector is a different
feature
• To retrieve music documents given a query we can:– Find exact matches– Find nearest match– Find nearby matches– Train a classifier to recognize a given category (genre, style
etc).
![Page 71: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/71.jpg)
Audio Similarity
We have a description of a music document based on some set of features, now how do we compare two descriptions?
Casey et al IEEE 2008
![Page 72: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/72.jpg)
Usage examples
![Page 73: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/73.jpg)
Howard Leung
![Page 74: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/74.jpg)
Howard Leung
![Page 75: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/75.jpg)
Howard Leung
![Page 76: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/76.jpg)
Howard Leung
![Page 77: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/77.jpg)
Howard Leung
![Page 78: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/78.jpg)
Howard Leung
![Page 79: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/79.jpg)
Query by humming• Requires high level features because matches
will not be exact• Extract melody from dataset of songs• Extract melody from hum• Match by comparing similarities of melodies
(nearby matches)
![Page 80: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/80.jpg)
Copyright monitoring
• Compute fingerprints from database examples• Compute fingerprint from query example• Find exact matches
![Page 81: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/81.jpg)
Best performing systems on MIREX 2007
Casey et al IEEE 2008
![Page 82: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/82.jpg)
Music BrowsingMusicream – UI for discovering and managing musical pieces.
User can select a disc and listen to it. By dragging a disc in the flow, the user can easily pick out other similar pieces (attach similardiscs). This interaction allows a user to unexpectedly comeacross various pieces similar to other pieces the user likes.
Link to demo
Casey et al IEEE 2008
![Page 83: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/83.jpg)
Music Browsing
Musicrainbow – UI for discovering unknown artists.
Artists are mapped on a circular rainbow where colors represent different styles of music. Similar artists are mapped near each other.
User rotates rainbow by turning a knob.
Link to demo
Casey et al IEEE 2008
![Page 84: Advanced Multimedia Music Information Retrieval Tamara Berg](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649de45503460f94adab46/html5/thumbnails/84.jpg)
Howard Leung