crowd sourcing for tempo estimation
DESCRIPTION
Slides for presentation at ISMIR 2011 of the paper "Improving perceptual tempo estimation with crowd-source annotations".TRANSCRIPT
Improving perceptual tempo estimation with crowd-sourced annotations
Mark Levy, 26 October 2011
Tempo Estimation
Terminology: tempo = beats per minute = bpm
Tempo Estimation
Use crowd-sourcing: quantify influence of metrical ambiguity
on tempo perception improve evaluation improve algorithms
Perceived Tempo
Metrical ambiguity: listeners don’t agree about bpm typically in two camps perceived values differ by factor of 2 or 3
McKinney and Moelants: 24-40 subjects released experimental data
Perceived Tempo
Metrical ambiguity:
bpm bpm
liste
ners
liste
ners
McKinney and Moelants, 2004
Machine-Estimated Tempo
Also affected by metrical ambiguity: makes estimation difficult natural to see multiple bpm values estimated values often out by factor of 2 or 3
(“octave error”)
Crowd Sourcing
Web-based questionnaire: capture label choices capture bpm from mean tapping interval capture comparative judgements
Crowd Sourcing
Crowd Sourcing
Music: over 4000 songs 30-second clips• rock, country, pop, soul, funk and rnb, jazz,
latin, reggae, disco, rap, punk, electronic, trance, industrial, house, folk, ...
• recent releases back to 60s
Response
First week (reported/released): 4k tracks annotated by 2k listeners 20k labels and bpm estimates
To date: 6k tracks annotated by 27k listeners 200k labels and bpm estimates
Analysis: ambiguity
When people tap to a song at different bpm do they really disagree about whether it’s
slow or fast?
Investigation: inspect labels from people who tap differently quantify disagreement for ambiguous songs
Analysis: ambiguity
Subset of slow/fast songs: labelled by at least five listeners majority label “slow” or “fast”
Analysis: ambiguity
bpm vs speed label
all estimates for slow/fast songs
Analysis: ambiguity
bpm vs speed label
people can tap slowly to fast songs
all estimates for slow/fast songs
Analysis: ambiguity
Labels for fast songs from slow-tappers
Analysis: ambiguity
Quantify disagreement over labels: model conflict, extremity of tempo conflict coefficient
Ls, Lf, L: number of slow, fast, all labels for a song
L
LL
LL
LLC fs
fs
fs ),max(
),min(
Distribution of conflict coefficient C
Analysis: ambiguity
all songs with at least five labels
C > 0 means slow and fast
Analysis: ambiguity
Subset of metrically ambiguous songs: at least 30% of listeners tap at half/twice the
majority estimate
Compared to the rest: no significant difference in C
Evaluation metrics
MIREX: capture metrical ambiguity replicate human disagreement
Ambiguity considered unhelpful: automatic playlisting DJ tools, production tools jogging
Evaluation metrics
Application-oriented : compare with majority* human estimate
(*median in most popular bin) categorise machine estimates
same as humans twice as fast twice as slow three times as fast and so on unrelated to humans
Analysis: evaluation
Sources: BPM List (DJ kit, human-moderated)
Donny Brusca, 7th edition, 2011 EchoNest/MSD (closed-source algorithm)
maybe Jehan et al,? VAMP (open-source algorithm)
Davies and Landone, 2007-
Analysis: machine vs human
x2 same /2 unrelated other0%
10%
20%
30%
40%
50%
60%
70%
80%
BPM ListVAMPEchoNest
Analysis: controlled test
Controlled comparison: exploit experience from website A/B testing use this to improve algorithm iteratively
Result is independent of any quality metric
Analysis: controlled test
When visitor arrives at the page: choose a source S at random choose a bpm value at random choose two songs given that value by S display them together
Then ask which sounds faster!
Analysis: controlled test
Null Hypothesis: there will be presentation effects listeners will attend to subtle differencesbut these effects are independent of the source
of bpm estimates if the quality of the sources is the same
Analysis: controlled test
BPM List VAMP EchoNest0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
differentsame
Analysis: improving estimates
Adjust bpm based on class: imagine an accurate slow/fast classifier
Hockmann and Fujinaga, 2010 adjust as follows:
bpm:= bpm/2 if slow and bpm > 100 bpm:= bpm*2 if fast and bpm < 100 otherwise don’t adjust
simulation: accept majority human label
Analysis: adjusted vs human
x2 same /2 unrelated other0%
10%
20%
30%
40%
50%
60%
70%
80%
BPM ListVAMPEchoNest
Conclusions
Crowd sourcing: gather thousands of data points in a few days,
half a million over time humans agree over slow/fast labels, even
when they tap at different bpmImproving machine estimates: use controlled testing exploit a slow/fast classifier
Thanks!
[email protected] @gamboviol
http://mir-in-action.blogspot.comhttp://playground.last.fm/demo/speedohttp://users.last.fm/~mark/speedo.tgz
We are looking for interns/research fellows!