scott mccaulay joe rinkovsky pervasive technology institute indiana university
DESCRIPTION
Music Information Retrieval With Condor. Scott McCaulay Joe Rinkovsky Pervasive Technology Institute Indiana University. Overview. PFASC is a suite of applications developed at IU to perform automated similarity analysis of audio files - PowerPoint PPT PresentationTRANSCRIPT
Scott McCaulayJoe Rinkovsky
Pervasive Technology InstituteIndiana University
Music Information Retrieval With Condor
OverviewOverview PFASC is a suite of applications developed at PFASC is a suite of applications developed at
IU to perform automated similarity analysis of IU to perform automated similarity analysis of audio filesaudio files
Potential applications include organization of Potential applications include organization of digital libraries, recommender systems, digital libraries, recommender systems, playlist generators, audio processingplaylist generators, audio processing
PFASC is a project in the MIR field, an PFASC is a project in the MIR field, an extension and adaptation of traditional Text extension and adaptation of traditional Text Information Retrieval techniques to sound filesInformation Retrieval techniques to sound files
Elements of PFASC, specifically the file by file Elements of PFASC, specifically the file by file similarity calculation, have proven to be a similarity calculation, have proven to be a very good fit with Condorvery good fit with Condor
What We’ll CoverWhat We’ll Cover
Condor at Indiana UniversityCondor at Indiana University Background on Information Retrieval Background on Information Retrieval
and Music Information Retrievaland Music Information Retrieval The PFASC project The PFASC project PFASC and Condor, experience to PFASC and Condor, experience to
date and resultsdate and results SummarySummary
Condor at IUCondor at IU
Initiated in 2003Initiated in 2003 Utilizes 2350 Windows Vista machines Utilizes 2350 Windows Vista machines
from IU’s Student Technology Clustersfrom IU’s Student Technology Clusters Minimum 2GB memory, 100 Mb Minimum 2GB memory, 100 Mb
networknetwork Available to students at 42 locations on Available to students at 42 locations on
the Bloomington campus 24 x 7the Bloomington campus 24 x 7 Student use is top priority, Condor jobs Student use is top priority, Condor jobs
are suspended immediately on useare suspended immediately on use
Costs to Support Condor at Costs to Support Condor at IUIU
Annual marginal annual cost to Annual marginal annual cost to support Condor Pool at IU is < $15Ksupport Condor Pool at IU is < $15K
Includes system administration, head Includes system administration, head nodes, file serversnodes, file servers
Purchase and support of STC Purchase and support of STC machines are funded from Student machines are funded from Student Technology FeesTechnology Fees
Challenges to Making Good use Challenges to Making Good use of Condor Resources at IUof Condor Resources at IU
Windows environmentWindows environment– Research computing environment at IU is Research computing environment at IU is
geared to Linux, or to exotic architecturesgeared to Linux, or to exotic architectures Ephemeral resourcesEphemeral resources
– Machines are moderately to heavily used at Machines are moderately to heavily used at all hours, longer jobs are likely to be all hours, longer jobs are likely to be preemptedpreempted
Availability of other computing Availability of other computing resourcesresources– Local users are far from starved for cycles, Local users are far from starved for cycles,
limited motivation to portlimited motivation to port
Examples of Applications Examples of Applications Supported on Condor at IUSupported on Condor at IU
Hydra Portal (2003)Hydra Portal (2003)– Job submission portalJob submission portal– Suite of Bio apps, Blast, Meme, Suite of Bio apps, Blast, Meme,
FastDNAmlFastDNAml Condor Render Portal (2006)Condor Render Portal (2006)
– Maya, Blender video renderingMaya, Blender video rendering PFASC (2008)PFASC (2008)
– Similarity analysis of audio filesSimilarity analysis of audio files
Information Retrieval - Information Retrieval - BackgroundBackground
Science of organizing documents for Science of organizing documents for search and retrievalsearch and retrieval
Dates back to 1880s (Hollerith)Dates back to 1880s (Hollerith) Vannevar Bush, first US presidential Vannevar Bush, first US presidential
science advisor, presages hypertext science advisor, presages hypertext in “As We May Think” (1945)in “As We May Think” (1945)
The concept of automated text The concept of automated text document analysis, document analysis, organization and retrieval was organization and retrieval was met with a good deal of met with a good deal of skepticism until the 1990s. skepticism until the 1990s. Some critics now grudgingly Some critics now grudgingly concede that it might workconcede that it might work
Calculating SimilarityCalculating SimilarityThe Vector Space ModelThe Vector Space Model
Each feature found in a file is assigned a Each feature found in a file is assigned a weight based on the frequency of its weight based on the frequency of its occurrence in the file and how common that occurrence in the file and how common that feature is in the collectionfeature is in the collection
Similarity between files is calculated based Similarity between files is calculated based on common features and their weights. If on common features and their weights. If two files share features not common to the two files share features not common to the entire collection, their similarity value will entire collection, their similarity value will be very highbe very high
This vector space model (Salton) is the basis This vector space model (Salton) is the basis of many text search engines, and also works of many text search engines, and also works well with audio fileswell with audio files
For text files, features are words or For text files, features are words or character strings. For Audio files, features character strings. For Audio files, features are prominent frequencies within frames of are prominent frequencies within frames of audio or sequences of frequencies across audio or sequences of frequencies across frames.frames.
Some Digital Audio HistorySome Digital Audio History
Uploaded to Compuserve 10/1985Uploaded to Compuserve 10/1985– one of the most popular downloads at the time!one of the most popular downloads at the time!
10 seconds of digital audio10 seconds of digital audio Time to download (300 baud): 20 minutesTime to download (300 baud): 20 minutes Time to load: 20 minutes (tape) 2 minutes Time to load: 20 minutes (tape) 2 minutes
(disk)(disk) Storage space: 42K Storage space: 42K From this to Napster in less than 15 yearsFrom this to Napster in less than 15 years
Explosion of Digital AudioExplosion of Digital Audio
0
500
1000
1500
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
RIAA Sales Figures (millions)
physical
digital
Digital audio today similar to text 15 years Digital audio today similar to text 15 years agoago
Poised for 2nd phase of the digital audio Poised for 2nd phase of the digital audio revolution?revolution?– Ubiquitous, easy to create, access, shareUbiquitous, easy to create, access, share– Lack of tools to analyze, search or organizeLack of tools to analyze, search or organize
How can we organize this How can we organize this enormous and growing enormous and growing volume of digital audio volume of digital audio data for discovery and data for discovery and
retrieval?retrieval?
What’s done todayWhat’s done today
Pandora - Music Genome ProjectPandora - Music Genome Project– expert manual classification of ~ 400 expert manual classification of ~ 400
attributesattributes AllmusicAllmusic
– manual artist similarity classification by manual artist similarity classification by critics critics
last.fm – Audioscrobblerlast.fm – Audioscrobbler– collaborative filtering from user playlistscollaborative filtering from user playlists
iTunes GeniusiTunes Genius– collaborative filtering from user playlistscollaborative filtering from user playlists
What’s NOT done todayWhat’s NOT done today
Any analysis (outside of research) of Any analysis (outside of research) of similarity or classification based on similarity or classification based on the actual audio content of song filesthe actual audio content of song files
Possible Hybrid SolutionPossible Hybrid Solution
Automated Analysis
User Behavior
Manual Metadata
Classification/Retrieval system could use elements Classification/Retrieval system could use elements of all three methods to improve performanceof all three methods to improve performance
Music Information RetrievalMusic Information Retrieval
Applying traditional IR techniques for Applying traditional IR techniques for classification, clustering, similarity classification, clustering, similarity analysis, pattern matching, etc. to analysis, pattern matching, etc. to digital audio files digital audio files
Recent field of study, has accelerated Recent field of study, has accelerated with the inception of the ISMIR with the inception of the ISMIR conference in 2000 and MIREX conference in 2000 and MIREX evaluation in 2004.evaluation in 2004.
Common Basis of an MIR Common Basis of an MIR SystemSystem
Select very small segment Select very small segment of audio data, 20-40msof audio data, 20-40ms
Use fast Fourier transform Use fast Fourier transform (FFT) to convert to (FFT) to convert to frequency datafrequency data
This ‘frame’ of audio This ‘frame’ of audio becomes the equivalent becomes the equivalent of a word in a text file for of a word in a text file for similarity analysissimilarity analysis
The output of this ‘feature The output of this ‘feature extraction’ process is extraction’ process is input to various analysis input to various analysis or classification processesor classification processes
PFASC additionally PFASC additionally combines prominent combines prominent frequencies from adjacent frequencies from adjacent frames to create temporal frames to create temporal sequences as featuressequences as features
PFASC as an MIR ProjectPFASC as an MIR Project
PParallel arallel FFramework for ramework for AAudio udio SSimilarity imilarity CClusteringlustering Initiated at IU in 2008Initiated at IU in 2008 Team includes School of Library and Information Team includes School of Library and Information
Science (SLIS), Cognitive Science, School of Science (SLIS), Cognitive Science, School of Music and Pervasive Technologies Institute (PTI)Music and Pervasive Technologies Institute (PTI)
Have developed MPI-based feature extraction Have developed MPI-based feature extraction algorithm, SVM classification, vector space algorithm, SVM classification, vector space similarity analysis, some preliminary similarity analysis, some preliminary visualization. visualization.
Wish list includes graphical workflow, job Wish list includes graphical workflow, job submission portal, use in MIR classessubmission portal, use in MIR classes
PFASC Philosophy and PFASC Philosophy and MethodologyMethodology
Provide an end-to-end framework for MIR, from Provide an end-to-end framework for MIR, from workflow to visualizationworkflow to visualization
Recognize temporal context as an critical element Recognize temporal context as an critical element of audio and a necessary part of feature extractionof audio and a necessary part of feature extraction
Simple concept, simple implementation, one Simple concept, simple implementation, one highly configurable algorithm for feature highly configurable algorithm for feature extractionextraction
Dynamic combination and tuning of results from Dynamic combination and tuning of results from multiple runs, user controlled weightingmultiple runs, user controlled weighting
Make good use of available cyberinfrastructureMake good use of available cyberinfrastructure Support education in MIRSupport education in MIR
PFASC Feature Extraction PFASC Feature Extraction ExampleExample
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
folkhiphoprock
Summary of 450 files classified by genre, Summary of 450 files classified by genre, showing most prominent frequencies across showing most prominent frequencies across spectrumspectrum
PFASC Similarity Matrix PFASC Similarity Matrix ExampleExample
Hip HopHip Hop FolkFolk RockRock
Hip HopHip Hop 0.1150.115 0.0490.049 0.0420.042
FolkFolk 0.0490.049 0.0870.087 0.0240.024
RockRock 0.0420.042 0.0240.024 0.1680.168
Audio file summarized as a vector of feature values, Audio file summarized as a vector of feature values, similarity calculated between vectorssimilarity calculated between vectors
Value is between 0.0 and 1.0, 0.0 = no Value is between 0.0 and 1.0, 0.0 = no commonality, 1.0 = files are identicalcommonality, 1.0 = files are identical
In the above example, same genre files had In the above example, same genre files had similarity scores 3.352 times higher than different similarity scores 3.352 times higher than different genre filesgenre files
Classification vs. ClusteringClassification vs. Clustering Most work in MIR involves classification, e.g. Most work in MIR involves classification, e.g.
genre classification, an exercise that may be genre classification, an exercise that may be arbitrary and limited in valuearbitrary and limited in value
Calculating similarity values among all songs Calculating similarity values among all songs in a library may be more practical for music in a library may be more practical for music discovery, playlist generation, grouping by discovery, playlist generation, grouping by combinations of selected featurescombinations of selected features
Calculating similarity is MUCH more Calculating similarity is MUCH more computationally intensive than computationally intensive than categorization, comparing all songs in a categorization, comparing all songs in a library of 20,000 files requires ~200 million library of 20,000 files requires ~200 million comparisons comparisons
Using Condor for Similarity Using Condor for Similarity AnalysisAnalysis
Good fit for IU Condor resources, a Good fit for IU Condor resources, a very large number of short duration very large number of short duration jobsjobs
Jobs are independent, can be Jobs are independent, can be restarted and run in any orderrestarted and run in any order
Large number of available machines Large number of available machines provides great wall clock performance provides great wall clock performance advantage over IU supercomputersadvantage over IU supercomputers
PFASC Performance and PFASC Performance and ResourcesResources
A recent run of 450 jobs completed in 16 A recent run of 450 jobs completed in 16 minutes. Time to run in serial on a desktop minutes. Time to run in serial on a desktop machine would have been about 19 hoursmachine would have been about 19 hours
Largest run to date contained 3,245 files, Largest run to date contained 3,245 files, over 5 million song-to-song comparisons, over 5 million song-to-song comparisons, completed in less than eight hours, would completed in less than eight hours, would have been over 11 days on a desktophave been over 11 days on a desktop
Queue wait time for 450 processors on IU’s Queue wait time for 450 processors on IU’s Big Red is typically several days, for 3000+ Big Red is typically several days, for 3000+ processors it would be up to a monthprocessors it would be up to a month
Porting to WindowsPorting to Windows
Visualizing ResultsVisualizing Results
Visualizing ResultsVisualizing Results
PFASC ContributorsPFASC Contributors
Scott McCaulay (Project Lead)Scott McCaulay (Project Lead) Ray Sheppard (MPI Programming)Ray Sheppard (MPI Programming) Eric Wernert (Visualization)Eric Wernert (Visualization) Joe Rinkovsky (Condor)Joe Rinkovsky (Condor) Steve Simms (Storage & Workflow)Steve Simms (Storage & Workflow) Kiduk Yang (Information Retrieval)Kiduk Yang (Information Retrieval) John Walsh (Digital Libraries)John Walsh (Digital Libraries) Eric Isaacson (Music Cognition)Eric Isaacson (Music Cognition)
Thank you!Thank you!