music similarity: what for?

Music similarity: what for?

[email protected] Building real applica.ons

for real users

!"#$%&

%'()*()

!"#$%&'()

! "#$%#&

! %'&(")

! &)*+,$

! #-"&+.$

! *+/,.)00

! 0+.12*$"'30

!"#$%&

%'()*+)

"#*,&

%'()*+)

!"#$%&'()

! 0)&-.%'32*-()*0

! 4)"5+"&)"602")4/%-%'+.

! -*(/&23+7)"2-"%8+"9

! -"%'0%:02(-391"+/.,

! &/0'327',)+23*'40

!"#$%&'()

! &++,

! -3%'7'%')0

! 0+3'-*23+.%);%

! 04-%'+!%)&4+"-*23+.%);%

! 4#$0'+*+1'3-*2-04)3%0

"#*,&-,'-*,)$*#

!"#$%&

'()%('*$+,

!"#$%&'()

! &/0'324")5)").3)0

! &/0'3-*2%"-'.'.1

! &/0'3-*2);4)"').3)

! ,)&+1"-4#'30

! +4'.'+.2-(+/%24)"5+"&)"

! -"%'0%<024+4/*-"'%$2-&+.125"').,0

Concepts and models of similarity

•  Aim of the day: modeling similarity of musical content – Challenges, goals – Formal models vs. informal expert knowledge

(Schedl et al. 2011)

Outline

•  Music similarity in MIR à audio •  Challenges •  2 projects related to music similarity

Music similarity in Music InformaFon Retrieval

(Casey et al. 2008)

“Help people find music”:


(Casey et al. 2008) (Grosche et al. 2011)

“Help people find music”: •  Specificity


(Casey et al. 2008)

“Help people find music”: •  Specificity •  Granularity / temporal scope

!" #$!"#$#%&'& ()*$#%&'&

%&'()&* +,'-&./01

*&+&,-.'&)#

2!" !()*$#%&'&$/)*$0-)0,1"2-)

&3.-"2#2-) 4&0/.2#1,/#2-)

!"5,-6/,$"1''/47

#2'& "0/,&

!"#$#%&'& .&*/,

Keyscape (Sapp 2005) (Martorell & Gómez 2011)


Different tasks and applicaFons: (Grosche et al. 2011)


Different tasks and applicaFons: (Grosche et al. 2011)

SIMILARITY

Music similarity measures

•  Task dependent: –  Content: audio, score, lyrics, etc. – Musical facets: melody, rhythm, tonality, Fmbre, instrumentaFon.

–  Descriptors. – Weights.

Audio music similarity

1.  Low-‐level spectral descriptors: Aucouturier and Pachet (2004), Pampalk (2006)

– High specificity – global – “Audio quality” (Urbano et al. 2014)

– “Timbre” à sound quality

0 1000 2000 3000 4000 5000 6000 7000

−60

−40

−20

LTSA Flute − C4

Frequency (Hz)

0 1000 2000 3000 4000 5000 6000 7000

−60

−40

−20

LTSA Oboe − C4

Frequency (Hz)

Spec

tral m

agni

tude

(dB)

0 1000 2000 3000 4000 5000 6000 7000

−60

−40

−20

LTSA Trumpet − C4

Frequency (Hz)

(McAdams and Giordano 2008)

Audio music similarity


2.  Incorporate mid-‐level musical descriptors: –  Rhythm: Foote (2002)

–  Pitch: Müller et al. (2006), Serrà et al. (2007) à cover version iden.fica.on, audio-‐score alignment

Cover version idenFficaFon

(Gómez et al. 2006)

Approaches in audio music similarity


2.  Incorporate mid-‐level musical descriptors 3.  Combine those with semanFc descriptors

obtained by automaFc classificaFon (ex: genre, instrument, mood): Bogdanov et al. (2013)

PersonalizaFon (Schedl et al. 2012)

1.  Let users control weights –  Lot of effort for a high number of descriptors –  The user should make his preference explicit

2.  Gather raFngs of the similarity of pairs of songs à robustness (Urbano et al. 2010)

3.  CollecFon clustering: ask users to group songs in a 2D plot (Stober 2011)

EvaluaFon

•  Similarity vs categorizaFon: arFst, genre, instrument, covers, co-‐occurrence in personal collecFons and playlists (Berenzweig et al. 2003)

•  Surveys (Vignoli and Pauws 2005)

But

As…”Similarity is an ill-‐defined concept”

we should evaluate each task separately!

Audio Music Similarity Task

•  7000 30-‐second audio clips drawn from 10 genres: Blues, Jazz, Country/Western, Baroque, Classical, Roman.c, Electronica, Hip-‐Hop, Rock, HardRock/Metal

•  Songs from the same arFst filter out •  EvaluaFon criteria: – User raFngs: not similar, somewhat similar, very similar

– ObjecFve staFsFcs: similarity in terms of genre, arFst and album.

•  More on talk by A. Flexer.

Tasks related to similarity

•  Audio cover idenFficaFon •  Audio classificaFon •  Query by singing (humming) •  Query by tapping •  Audio to score alignment •  Discovery of repeated themes / secFons •  Structural segmentaFon •  Audio fingerprinFng •  Symbolic melodic similarity •  …

Challenges

1.  Music is mulFmodal, mulF-‐faceted 2.  Similarity depends on

a.  the user/listener, b.  the repertoire, and c.  the task

Use-‐cases

Use case 1: •  Repertoire: symphonic music •  ModaliFes: audio, score,

video, gestures •  Task: structural analysis à

visualizaFon •  PersonalizaFon: “experts” –

listeners exposed to it (me) – naïve listeners (young people?)

Beethoven Symphony No. 3 Eroica http://phenicx.upf.edu/

ModaliFes •  Audio: dynamics, Fmbre tempo, f0 (Grachten et al. 2013) (Bosch &

Gómez 2013) •  Score: key, pitch-‐class sets, orchestraFon (Martorell and Gómez 2014) •  Video: performers, movement (Bazzica, Liem and Hanjalic 2014) •  Gestures: movement (Sarasúa and Guaus 2014) •  Context: manual annotaFons (Schedl et al. 2014)

Strategies

•  SynchronizaFon •  Generate different layers of informaFon •  PersonalizaFon: – Understand user needs: naïve listeners, music experts, performers

– Let them choose by means of visualizaFon, interacFon à HCI

Use case 2: •  Repertoire: flamenco singing •  ModaliFes: audio •  Task: style and variant

characterizaFon

•  PersonalizaFon: “experts” – listeners exposed to it (Me) – naïve listeners (you?)

http://mtg.upf.edu/research/projects/cofla

Melodic similarity

No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo.

1.  Each style is characterized by a common melodic skeleton 2.  Spontaneous improvisaFon: ornamentaFon, prolongaFon,

rhythmic and melodic modificaFon

Antonio Mairena Chano Lobato

Melodic similarity -‐ style

•  Ground truth: style annotaFons •  Specific & standard measures: – High-‐level expert specific features – Fundamental frequency (Dynamic Fme warping) – Symbolic-‐based descriptors – Chroma similarity

(Huson 1998)

Melodic similarity – variants

•  Ground truth: – human judgements – flamenco experts vs naïve listeners

•  Strongest agreement among experts and different criteria à no consensus / general soluFon yet! – Large scale user studies

(Gómez et al. 2012) (Kroher et al. 2014)

Conclusions

•  Music is mulF-‐modal, mulF-‐faceted, mulF-‐layer

•  Similarity is not a general concept, but depends on –  the task –  the repertoire, and –  the listener! (and his context…)

music similarity: what for?

Technology

ltsa flute c4 frequency

personalizafon schedl

goals formalmodelsvs

keyscape sapp

coverversionidenfcafon