mask: robust local features for audio fingerprinting
DESCRIPTION
Best paper award at ICME 2012TRANSCRIPT
![Page 1: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/1.jpg)
MASK: Robust Local Features for Audio Fingerprinting
Xavier Anguera, Antonio Garzón and Tomasz Adamek
Telefonica Research
![Page 2: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/2.jpg)
Outline
• What is audio fingerprinting• MASK proposal• Experiments• Conclusions
![Page 3: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/3.jpg)
?
watermarking fingerprinting
![Page 4: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/4.jpg)
What makes a good audio fingerprint?
![Page 5: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/5.jpg)
MASK FIngerprint
Robustness
EfficiencyCompactness
Discriminability
MASK == Masked Audio Spectral Keypoints
![Page 6: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/6.jpg)
Considered prior art
• Avery Wang, “An industrial strength audio search algorithm,” in Proc. ISMIR, 2003.
• Jaap Haitsma and Antonius Kalker, “A highly robust audio fingerprinting system,” in Proc. ISMIR, 2002.
• Shumeet Baluja and Michele Covell, “Waveprint: Efficient wavelet-based audio fingerprinting”, Proc. Pattern Recognition 41 (2008)
![Page 7: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/7.jpg)
![Page 8: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/8.jpg)
FFT 1024, bandwidth limited to 300-3KHz
10ms, 100ms Hamming window
18 or 34 MEL-spectrum bands
![Page 9: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/9.jpg)
Selection of salient spectral points
1. Detect all maximas2. Trim to the desired
number
(“beautiful”, Timit-female) time
Mel spectrum
α=0.98 σ=40
![Page 10: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/10.jpg)
Spectral masking around salient points
Si Sj
Spectral Regions• Include one or several time-
frequency values• Overlaps are allowed• The number of comparisons
defines the size of the fingerprint• Designed manually (for now)
![Page 11: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/11.jpg)
Current MASK regions
19 frames = 190ms
5 MEL bands
![Page 12: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/12.jpg)
![Page 13: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/13.jpg)
Fingerprint encoding
• 4-5 bits for the MEL band where the maxima is located
• 22 Bits obtained from spectral regions comparison
![Page 14: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/14.jpg)
Indexing and retrieval
011…00001…01
…100…11
(movie1, 10); (movie6, 4); (movie9, 1) ; (movie7, 34) (movie5, 35); (movie7, 80)…(movie9, 24); (movie3, 5); (movie8, 11)
Reference inverted file index
MASK FP Content ID
Time offset
DDBB
QUERY MASK0->011…001->100…11
…13->111…0114->011…0115->000…10
Exact matching
-Nq 0 Nq
Matching Segment end
Matching segment start
Hamming > 4 -> 0Hamming < 4 -> 1 Final score
![Page 15: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/15.jpg)
Experimental section
![Page 16: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/16.jpg)
database
NIST-TRECVID 2010-2011 data for video-copy detection– 400h reference videos– 1400 audio queries per year (201 unique videos X 7)– 7 audio transformations
• original• MP3 compression• MP3 compression + multiband companding• Bandwidth limit (500-3K) + single-band companding• Mixed with speech• Mixed with speech + multiband companding• Mixed with speech + bandwidth limit + monoband companding
![Page 17: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/17.jpg)
Metric & baseline
• normalized detection cost rate (NDCR) in balanced profile
• Compare results with a similar fingerprint to the Philips fingerprint
Jaap Haitsma and Antonius Kalker, “A highly robust audio fingerprinting system,” in Proc. International Symposium on Music Information Retrieval (ISMIR), 2002.
![Page 18: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/18.jpg)
![Page 19: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/19.jpg)
![Page 20: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/20.jpg)
Comparison per transformation
![Page 21: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/21.jpg)
Scores histogram
![Page 22: MASK: Robust Local Features for Audio Fingerprinting](https://reader033.vdocuments.site/reader033/viewer/2022052900/5561b1f0d8b42a46138b46b2/html5/thumbnails/22.jpg)
Conclusions
• A novel binary fingerprint is proposed to improve on some shortcomings from well reputed prior art.
• We show that we can extract the FP and use it for VCD with excellent results