scott mccaulay joe rinkovsky pervasive technology institute indiana university

Scott McCaulayJoe Rinkovsky

Pervasive Technology InstituteIndiana University

Music Information Retrieval With Condor

OverviewOverview PFASC is a suite of applications developed at PFASC is a suite of applications developed at

IU to perform automated similarity analysis of IU to perform automated similarity analysis of audio filesaudio files

Potential applications include organization of Potential applications include organization of digital libraries, recommender systems, digital libraries, recommender systems, playlist generators, audio processingplaylist generators, audio processing

PFASC is a project in the MIR field, an PFASC is a project in the MIR field, an extension and adaptation of traditional Text extension and adaptation of traditional Text Information Retrieval techniques to sound filesInformation Retrieval techniques to sound files

Elements of PFASC, specifically the file by file Elements of PFASC, specifically the file by file similarity calculation, have proven to be a similarity calculation, have proven to be a very good fit with Condorvery good fit with Condor

What We’ll CoverWhat We’ll Cover

Condor at Indiana UniversityCondor at Indiana University Background on Information Retrieval Background on Information Retrieval

and Music Information Retrievaland Music Information Retrieval The PFASC project The PFASC project PFASC and Condor, experience to PFASC and Condor, experience to

date and resultsdate and results SummarySummary

Condor at IUCondor at IU

Initiated in 2003Initiated in 2003 Utilizes 2350 Windows Vista machines Utilizes 2350 Windows Vista machines

from IU’s Student Technology Clustersfrom IU’s Student Technology Clusters Minimum 2GB memory, 100 Mb Minimum 2GB memory, 100 Mb

networknetwork Available to students at 42 locations on Available to students at 42 locations on

the Bloomington campus 24 x 7the Bloomington campus 24 x 7 Student use is top priority, Condor jobs Student use is top priority, Condor jobs

are suspended immediately on useare suspended immediately on use

Costs to Support Condor at Costs to Support Condor at IUIU

Annual marginal annual cost to Annual marginal annual cost to support Condor Pool at IU is < $15Ksupport Condor Pool at IU is < $15K

Includes system administration, head Includes system administration, head nodes, file serversnodes, file servers

Purchase and support of STC Purchase and support of STC machines are funded from Student machines are funded from Student Technology FeesTechnology Fees

Challenges to Making Good use Challenges to Making Good use of Condor Resources at IUof Condor Resources at IU

Windows environmentWindows environment– Research computing environment at IU is Research computing environment at IU is

geared to Linux, or to exotic architecturesgeared to Linux, or to exotic architectures Ephemeral resourcesEphemeral resources

– Machines are moderately to heavily used at Machines are moderately to heavily used at all hours, longer jobs are likely to be all hours, longer jobs are likely to be preemptedpreempted

Availability of other computing Availability of other computing resourcesresources– Local users are far from starved for cycles, Local users are far from starved for cycles,

limited motivation to portlimited motivation to port

Examples of Applications Examples of Applications Supported on Condor at IUSupported on Condor at IU

Hydra Portal (2003)Hydra Portal (2003)– Job submission portalJob submission portal– Suite of Bio apps, Blast, Meme, Suite of Bio apps, Blast, Meme,

FastDNAmlFastDNAml Condor Render Portal (2006)Condor Render Portal (2006)

– Maya, Blender video renderingMaya, Blender video rendering PFASC (2008)PFASC (2008)

– Similarity analysis of audio filesSimilarity analysis of audio files

Information Retrieval - Information Retrieval - BackgroundBackground

Science of organizing documents for Science of organizing documents for search and retrievalsearch and retrieval

Dates back to 1880s (Hollerith)Dates back to 1880s (Hollerith) Vannevar Bush, first US presidential Vannevar Bush, first US presidential

science advisor, presages hypertext science advisor, presages hypertext in “As We May Think” (1945)in “As We May Think” (1945)

The concept of automated text The concept of automated text document analysis, document analysis, organization and retrieval was organization and retrieval was met with a good deal of met with a good deal of skepticism until the 1990s. skepticism until the 1990s. Some critics now grudgingly Some critics now grudgingly concede that it might workconcede that it might work

Calculating SimilarityCalculating SimilarityThe Vector Space ModelThe Vector Space Model

Each feature found in a file is assigned a Each feature found in a file is assigned a weight based on the frequency of its weight based on the frequency of its occurrence in the file and how common that occurrence in the file and how common that feature is in the collectionfeature is in the collection

Similarity between files is calculated based Similarity between files is calculated based on common features and their weights. If on common features and their weights. If two files share features not common to the two files share features not common to the entire collection, their similarity value will entire collection, their similarity value will be very highbe very high

This vector space model (Salton) is the basis This vector space model (Salton) is the basis of many text search engines, and also works of many text search engines, and also works well with audio fileswell with audio files

For text files, features are words or For text files, features are words or character strings. For Audio files, features character strings. For Audio files, features are prominent frequencies within frames of are prominent frequencies within frames of audio or sequences of frequencies across audio or sequences of frequencies across frames.frames.

Some Digital Audio HistorySome Digital Audio History

Uploaded to Compuserve 10/1985Uploaded to Compuserve 10/1985– one of the most popular downloads at the time!one of the most popular downloads at the time!

10 seconds of digital audio10 seconds of digital audio Time to download (300 baud): 20 minutesTime to download (300 baud): 20 minutes Time to load: 20 minutes (tape) 2 minutes Time to load: 20 minutes (tape) 2 minutes

(disk)(disk) Storage space: 42K Storage space: 42K From this to Napster in less than 15 yearsFrom this to Napster in less than 15 years

Explosion of Digital AudioExplosion of Digital Audio

0

500

1000

1500

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

RIAA Sales Figures (millions)

physical

digital

Digital audio today similar to text 15 years Digital audio today similar to text 15 years agoago

Poised for 2nd phase of the digital audio Poised for 2nd phase of the digital audio revolution?revolution?– Ubiquitous, easy to create, access, shareUbiquitous, easy to create, access, share– Lack of tools to analyze, search or organizeLack of tools to analyze, search or organize

How can we organize this How can we organize this enormous and growing enormous and growing volume of digital audio volume of digital audio data for discovery and data for discovery and

retrieval?retrieval?

What’s done todayWhat’s done today

Pandora - Music Genome ProjectPandora - Music Genome Project– expert manual classification of ~ 400 expert manual classification of ~ 400

attributesattributes AllmusicAllmusic

– manual artist similarity classification by manual artist similarity classification by critics critics

last.fm – Audioscrobblerlast.fm – Audioscrobbler– collaborative filtering from user playlistscollaborative filtering from user playlists

iTunes GeniusiTunes Genius– collaborative filtering from user playlistscollaborative filtering from user playlists

What’s NOT done todayWhat’s NOT done today

Any analysis (outside of research) of Any analysis (outside of research) of similarity or classification based on similarity or classification based on the actual audio content of song filesthe actual audio content of song files

Possible Hybrid SolutionPossible Hybrid Solution

Automated Analysis

User Behavior

Manual Metadata

Classification/Retrieval system could use elements Classification/Retrieval system could use elements of all three methods to improve performanceof all three methods to improve performance

Music Information RetrievalMusic Information Retrieval

Applying traditional IR techniques for Applying traditional IR techniques for classification, clustering, similarity classification, clustering, similarity analysis, pattern matching, etc. to analysis, pattern matching, etc. to digital audio files digital audio files

Recent field of study, has accelerated Recent field of study, has accelerated with the inception of the ISMIR with the inception of the ISMIR conference in 2000 and MIREX conference in 2000 and MIREX evaluation in 2004.evaluation in 2004.

Common Basis of an MIR Common Basis of an MIR SystemSystem

Select very small segment Select very small segment of audio data, 20-40msof audio data, 20-40ms

Use fast Fourier transform Use fast Fourier transform (FFT) to convert to (FFT) to convert to frequency datafrequency data

This ‘frame’ of audio This ‘frame’ of audio becomes the equivalent becomes the equivalent of a word in a text file for of a word in a text file for similarity analysissimilarity analysis

The output of this ‘feature The output of this ‘feature extraction’ process is extraction’ process is input to various analysis input to various analysis or classification processesor classification processes

PFASC additionally PFASC additionally combines prominent combines prominent frequencies from adjacent frequencies from adjacent frames to create temporal frames to create temporal sequences as featuressequences as features

PFASC as an MIR ProjectPFASC as an MIR Project

PParallel arallel FFramework for ramework for AAudio udio SSimilarity imilarity CClusteringlustering Initiated at IU in 2008Initiated at IU in 2008 Team includes School of Library and Information Team includes School of Library and Information

Science (SLIS), Cognitive Science, School of Science (SLIS), Cognitive Science, School of Music and Pervasive Technologies Institute (PTI)Music and Pervasive Technologies Institute (PTI)

Have developed MPI-based feature extraction Have developed MPI-based feature extraction algorithm, SVM classification, vector space algorithm, SVM classification, vector space similarity analysis, some preliminary similarity analysis, some preliminary visualization. visualization.

Wish list includes graphical workflow, job Wish list includes graphical workflow, job submission portal, use in MIR classessubmission portal, use in MIR classes

PFASC Philosophy and PFASC Philosophy and MethodologyMethodology

Provide an end-to-end framework for MIR, from Provide an end-to-end framework for MIR, from workflow to visualizationworkflow to visualization

Recognize temporal context as an critical element Recognize temporal context as an critical element of audio and a necessary part of feature extractionof audio and a necessary part of feature extraction

Simple concept, simple implementation, one Simple concept, simple implementation, one highly configurable algorithm for feature highly configurable algorithm for feature extractionextraction

Dynamic combination and tuning of results from Dynamic combination and tuning of results from multiple runs, user controlled weightingmultiple runs, user controlled weighting

Make good use of available cyberinfrastructureMake good use of available cyberinfrastructure Support education in MIRSupport education in MIR

PFASC Feature Extraction PFASC Feature Extraction ExampleExample

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

folkhiphoprock

Summary of 450 files classified by genre, Summary of 450 files classified by genre, showing most prominent frequencies across showing most prominent frequencies across spectrumspectrum

PFASC Similarity Matrix PFASC Similarity Matrix ExampleExample

Hip HopHip Hop FolkFolk RockRock

Hip HopHip Hop 0.1150.115 0.0490.049 0.0420.042

FolkFolk 0.0490.049 0.0870.087 0.0240.024

RockRock 0.0420.042 0.0240.024 0.1680.168

Audio file summarized as a vector of feature values, Audio file summarized as a vector of feature values, similarity calculated between vectorssimilarity calculated between vectors

Value is between 0.0 and 1.0, 0.0 = no Value is between 0.0 and 1.0, 0.0 = no commonality, 1.0 = files are identicalcommonality, 1.0 = files are identical

In the above example, same genre files had In the above example, same genre files had similarity scores 3.352 times higher than different similarity scores 3.352 times higher than different genre filesgenre files

Classification vs. ClusteringClassification vs. Clustering Most work in MIR involves classification, e.g. Most work in MIR involves classification, e.g.

genre classification, an exercise that may be genre classification, an exercise that may be arbitrary and limited in valuearbitrary and limited in value

Calculating similarity values among all songs Calculating similarity values among all songs in a library may be more practical for music in a library may be more practical for music discovery, playlist generation, grouping by discovery, playlist generation, grouping by combinations of selected featurescombinations of selected features

Calculating similarity is MUCH more Calculating similarity is MUCH more computationally intensive than computationally intensive than categorization, comparing all songs in a categorization, comparing all songs in a library of 20,000 files requires ~200 million library of 20,000 files requires ~200 million comparisons comparisons

Using Condor for Similarity Using Condor for Similarity AnalysisAnalysis

Good fit for IU Condor resources, a Good fit for IU Condor resources, a very large number of short duration very large number of short duration jobsjobs

Jobs are independent, can be Jobs are independent, can be restarted and run in any orderrestarted and run in any order

Large number of available machines Large number of available machines provides great wall clock performance provides great wall clock performance advantage over IU supercomputersadvantage over IU supercomputers

PFASC Performance and PFASC Performance and ResourcesResources

A recent run of 450 jobs completed in 16 A recent run of 450 jobs completed in 16 minutes. Time to run in serial on a desktop minutes. Time to run in serial on a desktop machine would have been about 19 hoursmachine would have been about 19 hours

Largest run to date contained 3,245 files, Largest run to date contained 3,245 files, over 5 million song-to-song comparisons, over 5 million song-to-song comparisons, completed in less than eight hours, would completed in less than eight hours, would have been over 11 days on a desktophave been over 11 days on a desktop

Queue wait time for 450 processors on IU’s Queue wait time for 450 processors on IU’s Big Red is typically several days, for 3000+ Big Red is typically several days, for 3000+ processors it would be up to a monthprocessors it would be up to a month

Porting to WindowsPorting to Windows

Visualizing ResultsVisualizing Results

PFASC ContributorsPFASC Contributors

Scott McCaulay (Project Lead)Scott McCaulay (Project Lead) Ray Sheppard (MPI Programming)Ray Sheppard (MPI Programming) Eric Wernert (Visualization)Eric Wernert (Visualization) Joe Rinkovsky (Condor)Joe Rinkovsky (Condor) Steve Simms (Storage & Workflow)Steve Simms (Storage & Workflow) Kiduk Yang (Information Retrieval)Kiduk Yang (Information Retrieval) John Walsh (Digital Libraries)John Walsh (Digital Libraries) Eric Isaacson (Music Cognition)Eric Isaacson (Music Cognition)

Thank you!Thank you!

scott mccaulay joe rinkovsky pervasive technology institute indiana university

Documents

frames of audio

condor jobs

audio processingpfasc

audio filesfor text

digital audio historyuploaded

condor pool

common features

file similarity calculation