audiodb: scalable approximate nearest-neighbor search with automatic radius-bounded indexing
DESCRIPTION
AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing. Michael A. Casey Digital Musics Dartmouth College, Hanover, NH. Scalable Similarity. 8M tracks in commercial collection PByte of multimedia data Require passage-level retrieval (~ 2 bars) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/1.jpg)
Thursday, November 13, 2008ASA 156: Statistical Approaches for Analysis of
Music and Speech Audio Signals
AudioDB: Scalable approximate AudioDB: Scalable approximate nearest-neighbor search with nearest-neighbor search with
automatic radius-bounded indexingautomatic radius-bounded indexing
Michael A. CaseyMichael A. Casey
Digital MusicsDigital Musics
Dartmouth College, Hanover, Dartmouth College, Hanover, NHNH
![Page 2: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/2.jpg)
Scalable SimilarityScalable Similarity
8M tracks in commercial collection8M tracks in commercial collection PByte of multimedia data PByte of multimedia data Require passage-level retrieval (~ 2 Require passage-level retrieval (~ 2
bars)bars) Require scalable nearest-neighbor Require scalable nearest-neighbor
methodsmethods
![Page 3: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/3.jpg)
SpecificitySpecificity
Partial track retrievalPartial track retrieval Alternate versions: remix, cover, live, Alternate versions: remix, cover, live,
album album Task is mid-high specificityTask is mid-high specificity
![Page 4: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/4.jpg)
Example: remixingExample: remixing
Original TrackOriginal Track Remix 1Remix 1 Remix 2Remix 2 Remix 3Remix 3
![Page 5: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/5.jpg)
Audio ShinglesAudio Shingles
, concatenate l frames of m dimensional features
A shingle is defined as:
• Shingles provide contextual information about features • Originally used for Internet search engines:
•Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig: “Syntactic Clustering of the Web”. Computer Networks 29(8-13): 1157-1166 (1997)
•Related to N-grams, overlapping sequences of features• Applied to audio domain by Casey and Slaney :
•Casey, M. Slaney, M. “The Importance of Sequences in Musical Similarity”, in Proc.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006
![Page 6: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/6.jpg)
Audio Shingle SimilarityAudio Shingle Similarity
![Page 7: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/7.jpg)
Audio Shingle Similarity Audio Shingle Similarity
, a query shingle drawn from a query track {Q}
, database of audio tracks indexed by (n)
, a database shingle from track n
Shingles are normalized to unit vectors, therefore:
For shingles with M dimensions (M=l.m); m=12, 20; l=30,40
![Page 8: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/8.jpg)
Open source: google: Open source: google: “audioDB”“audioDB” Management of tracks, sequences, Management of tracks, sequences,
saliencesalience Automatic indexing parametersAutomatic indexing parameters OMRAS2, Yahoo!, AWAL, CHARM, more…OMRAS2, Yahoo!, AWAL, CHARM, more… Web-services interface (SOAP / JSON)Web-services interface (SOAP / JSON) Implementation of LSH for large N ~ 1BImplementation of LSH for large N ~ 1B 1-10 ms whole-track retrieval from 1B 1-10 ms whole-track retrieval from 1B
vectorsvectors
AudioDB: Shingle Nearest AudioDB: Shingle Nearest Neighbor SearchNeighbor Search
![Page 9: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/9.jpg)
AudioDB: Shingle Nearest AudioDB: Shingle Nearest Neighbor SearchNeighbor Search
![Page 10: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/10.jpg)
Whole-track similarityWhole-track similarity
Often want to know which tracks are Often want to know which tracks are similarsimilar
Similarity depends on specificity of Similarity depends on specificity of tasktask Distortion / filtering / re-encoding (high)Distortion / filtering / re-encoding (high) Remix with new audio material (mid)Remix with new audio material (mid) Cover song: same song, different artist Cover song: same song, different artist
(mid)(mid)
![Page 11: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/11.jpg)
Whole-track resemblance:Whole-track resemblance:radius-bounded searchradius-bounded search
Compute the number of shingle collisions between two tracks:
![Page 12: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/12.jpg)
Whole-track resemblance:Whole-track resemblance:radius-bounded searchradius-bounded search
Compute the number of shingle collisions between two tracks:
• Requires a threshold for considering shingles to be related• Need a way to estimate relatedness (threshold) for data set
![Page 13: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/13.jpg)
Statistical approaches to Statistical approaches to modeling modeling
distance distributionsdistance distributions
![Page 14: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/14.jpg)
Distribution of minimum Distribution of minimum distancesdistances
Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selectedquery shingles and this database. The right bump is a small sampling (1/98 000 000) of the full histogram of all distances.
![Page 15: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/15.jpg)
Radius-bounded retrieval Radius-bounded retrieval performance: cover song performance: cover song
(opus task)(opus task)
• Performance depends critically on xthresh, the collision threshold
• Want to estimate xthresh automatically from unlabelled data
![Page 16: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/16.jpg)
Order StatisticsOrder Statistics
Minimum-value distribution is Minimum-value distribution is analyticanalytic
Estimate the distribution parametersEstimate the distribution parameters Substitute into minimum value Substitute into minimum value
distributiondistribution Define a threshold in terms of FP Define a threshold in terms of FP
raterate This gives an estimate of This gives an estimate of xthreshxthresh
![Page 17: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/17.jpg)
Estimating Estimating xthresh xthresh from from unlabelled dataunlabelled data
Use theoretical statisticsUse theoretical statistics Null Hypothesis: Null Hypothesis:
HH00: shingles are drawn from unrelated tracks: shingles are drawn from unrelated tracks
Assume elements i.i.d., normally distributedAssume elements i.i.d., normally distributed MM dimensional shingles, dimensional shingles, dd effective degrees of effective degrees of
freedom: freedom:
Squared distance distribution for Squared distance distribution for HH00
![Page 18: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/18.jpg)
ML for background ML for background distributiondistribution
• Likelihood for N data points (distances squared)• d = effective degrees of freedom• M = shingle dimensionality
![Page 19: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/19.jpg)
Background distribution Background distribution parametersparameters
• Likelihood for N data points (distances squared)• d = effective degrees of freedom• M = shingle dimensionality
![Page 20: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/20.jpg)
Minimum value over Minimum value over NN samplessamples
![Page 21: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/21.jpg)
Minimum value distribution Minimum value distribution of of unrelated shinglesunrelated shingles
![Page 22: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/22.jpg)
Estimate of Estimate of xthreshxthresh
, false positive rate
![Page 23: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/23.jpg)
Unlabelled data Unlabelled data experimentexperiment
Unlabelled data set Unlabelled data set Known to contain:Known to contain:
cover songs (same work, different performer)cover songs (same work, different performer) Near duplicate recordings (misattribution, Near duplicate recordings (misattribution,
encoding)encoding) Estimate background distance distributionEstimate background distance distribution Estimate minimum value distributionEstimate minimum value distribution Set Set xthresh xthresh so FP rate is <= 1%so FP rate is <= 1% Whole-track retrieval based on shingle Whole-track retrieval based on shingle
collisionscollisions
![Page 24: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/24.jpg)
Cover song retrievalCover song retrieval
![Page 25: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/25.jpg)
ScalingScaling
Locality sensitive hashing Locality sensitive hashing Trade-off approximate NN for time Trade-off approximate NN for time
complexitycomplexity 3 to 4 orders of magnitude speed-up3 to 4 orders of magnitude speed-up No noticeable degradation in No noticeable degradation in
performanceperformance For optimal radius thresholdFor optimal radius threshold
![Page 26: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/26.jpg)
LSHLSH
![Page 27: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/27.jpg)
Remix retrieval via LSHRemix retrieval via LSH
![Page 28: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/28.jpg)
Current deploymentCurrent deployment
Large commercial collectionsLarge commercial collections AWAL ~ 100,000 tracksAWAL ~ 100,000 tracks Yahoo! 2M+ tracks, related song Yahoo! 2M+ tracks, related song
classifierclassifier AudioDB: open-source, international AudioDB: open-source, international
consortium of developersconsortium of developers Google: “audioDB”Google: “audioDB”
![Page 29: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/29.jpg)
ConclusionsConclusions
Radius-bounded retrieval model for tracksRadius-bounded retrieval model for tracks Shingles preserve temporal information, high Shingles preserve temporal information, high
dd Implements mid-to-high specificity searchImplements mid-to-high specificity search Optimal radius threshold from order statistics Optimal radius threshold from order statistics
null hypothesis: shingles are drawn from unrelated null hypothesis: shingles are drawn from unrelated trackstracks
LSH requires radius bound, automatic LSH requires radius bound, automatic estimateestimate
Scales to 1B shingles+ using LSHScales to 1B shingles+ using LSH
![Page 30: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing](https://reader033.vdocuments.site/reader033/viewer/2022050908/56812def550346895d934cd8/html5/thumbnails/30.jpg)
ThanksThanks
Malcolm Slaney, Yahoo! Research Malcolm Slaney, Yahoo! Research Inc.Inc.
Christophe Rhodes, Goldsmiths, U. Christophe Rhodes, Goldsmiths, U. of Londonof London
Michela Magas, Goldsmiths, U. of Michela Magas, Goldsmiths, U. of LondonLondon
Funding: EPSRC: EP/E02274X/1 Funding: EPSRC: EP/E02274X/1